Why AI Medical Scribes Fail Functional Medicine Practitioners (And What Actually Works)

Why AI medical scribes fail functional medicine practitioners — and how HANS handles DUTCH, OAT, and GI-MAP documentation the way FM actually works.

By Peter Kozlowski, MD (voice)Reviewed by Invalid Date11 min read

FM physician reviewing complex lab results

Why AI Medical Scribes Fail Functional Medicine Practitioners (And What Actually Works)

Meta description: AI medical scribes built for conventional medicine systematically fail FM practitioners. Here's exactly why — and what functional medicine note-taking actually requires.

You finish a 90-minute consult. Your patient has complex HPA axis dysfunction, a dysbiosis driving subclinical hypothyroidism for two years, and an OAT showing a mycotoxin pattern you've been piecing together across four visits. You spent the whole session connecting the gut to the thyroid to the cortisol rhythm to the fatigue she's been dismissing as "just stress."

You open your AI scribe's output.

"Discussed lab results. Patient reports fatigue. Plan: continue current supplements, follow up in 6 weeks."

If you've used a generic AI scribe for functional medicine documentation, you know this moment. The tool transcribed the room fine. It just didn't understand a word of what happened clinically.

The problem isn't bad technology. AI note taking in functional medicine breaks down because generic scribes were built for a completely different kind of medicine. They're optimized for the 15-minute SOAP note. Your practice runs on 60-90 minute root-cause investigations, specialty lab panels that require pattern interpretation, and multi-system protocols that have to be documented as connected clinical reasoning, not bullet lists.

This is a practitioner-to-practitioner breakdown of exactly where generic AI scribes fail in FM documentation, what those failures cost you, and what FM-aware note-taking actually looks like.

Why Generic AI Scribes Fail Functional Medicine Practitioners

The short answer: they were built for something else.

Conventional medicine documentation is largely a capture problem. Write down what happened, what was found, what was ordered. The 15-minute office visit generates a predictable amount of information. AI scribes trained on hospital discharge summaries and standard office visits are good at that.

Functional medicine is a different information environment. Your initial consult runs 60-90 minutes and needs to capture a full timeline, environmental history, dietary patterns, prior labs across multiple specialties, relevant genetics if available, and a root-cause working hypothesis that connects them. That's not a SOAP note. That's a case narrative.

Then there's the lab problem. When you review a DUTCH test, you're reading a circadian cortisol rhythm, DHEA status, cortisol/DHEA ratio, and the relationship between metabolized vs free cortisol. When you review an OAT, you're reading mitochondrial markers, neurotransmitter metabolites, oxidative stress markers, detox markers, and microbial markers — 40 to 70 data points that only make clinical sense as a pattern.

Generic AI scribe training data doesn't include DUTCH tests. It doesn't include OATs. It has minimal exposure to GI-MAP or NMR lipoprofile interpretation. The result: the AI watches you interpret a mycotoxin pattern across six OAT markers and produces the note "OAT results reviewed."

This isn't a calibration problem. It's an architecture problem. No amount of fine-tuning a conventional AI scribe will teach it to recognize that elevated arabinose plus tartaric acid plus citramalic acid in a patient with a history of water-damaged building exposure constitutes a clinically actionable mycotoxin pattern. That knowledge has to be built in from the ground up.

Current literature on AI scribes in healthcare confirms heterogeneous performance by specialty, with errors of omission and hallucination as persistent concerns — and a consistent finding that clinician oversight remains essential (Leung TI et al., JMIR Med Inform. 2025; PMID 40749188). After-hours documentation burden is a well-established driver of physician burnout; physicians who minimize after-hours charting are significantly less likely to report burnout (Shanafelt TD et al., Mayo Clin Proc. 2019; PMID 26615890; Eschenroeder HC et al., J Am Med Inform Assoc. 2021; PMC10134123).

The Five Documentation Failures I See Consistently

1. The "Results Reviewed" Problem

You spend 20 minutes walking a patient through her OAT. You explain that the elevated arabinose, tartaric acid, and citramalic acid together indicate likely mycotoxin exposure. You connect it to the mold history she mentioned six months ago. You document the rationale for the antifungal protocol and the binder support you're adding.

The AI note: "OAT results reviewed. Some elevations noted. Further workup to be considered."

The clinical reasoning is gone. The pattern recognition is gone. The connection to the case history is gone. What remains is a note that's useless to any other clinician who reads the chart, and nearly useless to you at the next visit when you're trying to remember where you were in the case.

2. The Wrong Vocabulary Problem

Generic AI scribes default to conventional medicine vocabulary. That's not just a style issue — it changes the clinical meaning.

"TSH within normal limits" when TSH is 4.2, and you just spent 30 minutes explaining that the TSH in the upper quartile of normal, combined with her symptom picture, her hair loss, her cold intolerance, her low-normal T3, and the gut dysbiosis impairing T4-to-T3 conversion, points toward a subclinical pattern worth treating. To be clear: some functional medicine practitioners interpret TSH above 2.5 as suboptimal, though this falls outside mainstream endocrinology consensus. But that clinical reasoning — whatever position you take — needs to be in the note.

That note is actively misleading. Your next visit starts from "thyroid normal" instead of "subclinical hypothyroidism secondary to gut dysbiosis, current intervention: gut healing protocol with thyroid panel recheck in 90 days."

3. The Flat Protocol Problem

A dysbiosis protocol for a complex GI-MAP patient needs structure. Remove phase, Heal phase, Repopulate phase — each with clinical rationale, a titration schedule, die-off management, transition criteria, and monitoring parameters.

Generic AI produces a bulleted supplement list. No phases. No rationale. No titration. No die-off guidance.

That list is clinically inert. Three months from now, when you're deciding whether to transition phases, there's nothing in the chart telling you where you are in the protocol or why you built it this way. FM supplement protocols are structured clinical documents with a logic. A flat list is a grocery list.

4. The Standalone Visit Problem

FM cases are long-arc narratives. Interpreting today's thyroid panel requires knowing the gut history — T4-to-T3 conversion is significantly gut-dependent, with intestinal bacteria producing the sulfatases and glucuronidases required for iodothyronine deconjugation (Knezevic J et al., Nutrients. 2020; PMID 32545596) — the heavy metal burden (mercury disrupts thyroid receptor binding via avid binding to sulphydryl groups; Zhu X et al., Environ Health Prev Med. 2000; PMID 21432482), the HPA axis status (cortisol drives rT3 shunting), and the nutrient repletion history (selenium and zinc affect thyroid antibody levels).

Generic AI has no memory of prior visits. Every encounter is a fresh document. The clinical synthesis that makes FM documentation valuable — why this is improving and what it means for next steps — disappears. You're left with encounter-by-encounter transcripts instead of a coherent case narrative.

5. The Specialist Lab Interpretation Problem

"DUTCH test shows elevated cortisol metabolites" isn't a clinical note. Which pattern? High cortisol with low DHEA (early-stage stress)? Low cortisol with low DHEA (burnout)? High cortisol with normal DHEA (acute stress)? Elevated cortisol/DHEA ratio with signs of a catabolic state?

The pattern determines the downstream clinical meaning. A note that documents "elevated cortisol metabolites reviewed" isn't just incomplete — it's a liability.

"I tried three different AI scribes before I accepted the obvious: they weren't built for what I do. When my AI scribe described a patient's OAT results as 'within normal limits' after I'd just spent 20 minutes explaining why three elevated markers indicated mitochondrial dysfunction and early mold exposure, that's not a time-saver. That's a liability. I was spending more time fixing the note than it would have taken me to write it myself."

— Dr. Peter Kozlowski, MD, functional medicine physician and HANS advisor

AI scribe documentation quality limitations — including errors of omission, substitution, and LLM unpredictability — are documented in the literature and require careful physician proofreading as standard practice (Mess SA et al., Plast Reconstr Surg Glob Open. 2025; PMID 39823022).

What AI Note-Taking Looks Like When It Actually Understands FM

Done right, it looks like a note written by someone who was in the room and understood what was clinically significant.

Pattern-aware lab interpretation. When you review an OAT showing elevated arabinose and tartaric acid, an FM-aware AI flags the mycotoxin exposure pattern, cross-references the mold exposure history from the initial consult, and documents the clinical rationale for the antifungal protocol — without you dictating each connection.

Generic AI Scribe vs HANS FM-Aware Output

Structured protocol documentation. The three-phase dysbiosis protocol gets documented as three phases, not a list. Remove, Heal, Repopulate, each with rationale and transition criteria. Die-off management documented. Monitoring parameters included. Six months from now, the chart tells the story.

Cortisol pattern capture. Not "DUTCH results reviewed." The specific pattern (morning peak, afternoon crash, low DHEA, elevated cortisol/DHEA ratio suggesting catabolic state), the clinical correlations (sleep fragmentation, muscle recovery problems, blood sugar instability), and the protocol logic (adaptogens targeting morning peak stabilization, DHEA support, lifestyle modifications).

Longitudinal threading. At the 6-month visit, the AI surfaces that the homocysteine decline correlates with the methylation support protocol from visit 2, that the OAT improvement in Krebs cycle markers aligns with mitochondrial support added at visit 3, and that today's T3 improvement tracks with the gut healing progress documented across the prior two visits.

"When the documentation actually captures the clinical reasoning, the full picture of why I made the decisions I made, the notes become a tool instead of a burden. I can walk into a follow-up visit and the chart tells me exactly where we are in the case. That's what good FM documentation is supposed to do."

— Dr. Peter Kozlowski, MD

Five Questions to Ask Before Choosing an AI Scribe

1. FM lab panel fluency. Does the tool understand what DUTCH, OAT, GI-MAP, and NMR lipoprofile actually measure? Ask the vendor for a sample note from a DUTCH or OAT review visit. The output will tell you everything.

2. Protocol documentation structure. Can it document a multi-phase supplement protocol with rationale, titration, and phase transition criteria? Ask for a sample gut healing protocol note.

3. Longitudinal context. Does the tool carry clinical context across visits? A tool that treats every visit as a standalone document is rebuilding the case from scratch every time you open a chart.

4. Root-cause vocabulary. Does the AI use FM-native language — HPA axis dysfunction, T4-to-T3 conversion impairment, mycotoxin burden, tight junction disruption — or does it default to conventional phrasing? Vocabulary determines whether the note reflects the actual clinical thinking.

5. Your real editing overhead. Track how much time you spend fixing AI output for two weeks. If you're editing more than 20% of any note, the tool isn't saving you time. The number to watch isn't "time saved generating the note." It's total documentation time including editing. Most FM practitioners using generic scribes find they've just relocated the work.

For a broader EMR evaluation framework, see: [[LINK NEEDED: best-emr-fm — "Best EMR for Functional Medicine"]]

Your Documentation Burden Isn't a You Problem

The average FM practitioner spends significant hours each day on documentation after clinical hours. Physicians who minimize after-hours charting have substantially lower burnout rates — a signal that documentation burden is a tools problem, not a willpower problem (Eschenroeder HC et al., J Am Med Inform Assoc. 2021; PMID 33880534; Shanafelt TD et al., Mayo Clin Proc. 2019; PMID 26615890).

Generic AI scribes solve the wrong problem. They're excellent at reducing 15-minute SOAP note documentation time. Your 90-minute root-cause investigation notes are a completely different challenge, and the tool you're using needs to be built for that challenge, not retrofitted from conventional medicine.

Documentation is clinical thinking made visible. When the notes are generic, the clinical thinking looks generic: to your patients who read visit summaries, to any clinician who reads the chart, and to yourself at the next visit when you're trying to remember where you were in the case.

FM practitioners deserve tools built for the medicine they actually practice.

HANS was built specifically for this. Trained on actual FM charts, fluent in FM lab panels, structured around root-cause methodology. If you're spending more time editing AI output than you saved generating it, that's the wrong AI scribe.

See how HANS handles DUTCH, OAT, and GI-MAP documentation: [[LINK NEEDED: hans.fm demo or features page — URL to be confirmed]]. Or see HANS pricing and run your next three consult notes through it. You'll know immediately whether it understands your medicine.

References

Leung TI, Coristine AJ, Benis A. "AI Scribes in Health Care: Balancing Transformative Potential With Responsible Integration." JMIR Med Inform. 2025;13:e80898. PMID 40749188. PMC12316405.
Mess SA et al. "Artificial Intelligence Scribe and Large Language Model Technology in Healthcare Documentation: Advantages, Limitations, and Recommendations." Plast Reconstr Surg Glob Open. 2025;13(1):e6450. PMID 39823022. PMC11737491.
Shanafelt TD et al. "Relationship Between Clerical Burden and Characteristics of the Electronic Environment With Physician Burnout and Professional Satisfaction." Mayo Clin Proc. 2016 Jul;91(7):836-48. PMID 26615890.
Eschenroeder HC et al. "Associations of physician burnout with organizational electronic health record support and after-hours charting." J Am Med Inform Assoc. 2021 May;28(5):960-966. PMC10134123.
Knezevic J et al. "Thyroid-Gut-Axis: How Does the Microbiota Influence Thyroid Function?" Nutrients. 2020 Jun;12(6):1769. PMID 32545596. PMC9433865.
Zhu X et al. "The endocrine disruptive effects of mercury." Environ Health Prev Med. 2000 Jan;4(4):174-183. PMID 21432482.

Internal links to confirm before publish:

[[best-emr-fm]] — Best EMR for Functional Medicine (pillar) — confirm slug
[[practice-efficiency-hub]] — Practice Efficiency Hub — confirm slug
https://hans.fm/pricing — CTA destination (FIXED)
[[LINK NEEDED: hans.fm demo or features page]] — URL not yet confirmed