ARCHITECTURE INFOGRAPHIC · REV 2 · 2026-07-10 · @ fc47d49f · first cut 2026-04-25

somaCURA isn't an AI scribe. It's a clinical reasoning engine that happens to produce notes.

Stack FastAPI · Python 3.12 async · SSE streaming · SQLite (WAL) · ChromaDB Models claude-opus-4-6 default · claude-sonnet-5 scoped tier · Anthropic + Google Deployment single-host · systemd · no Docker Honest disposition known gaps in § in flight

0LLMCalls during fragment routing

~80%Deterministic accumulation

<5msRouter latency target

2callsLLM hops per finalized note

3gatesPhysician decision points

39lensesProblem→metric maps

27calcEmbedded calculator methods

204labsCanonical lab definitions

The clinical note is a reasoning artifact, not a transcription byproduct.

SCRIBE what we are not

The ambient scribe hands the LLM the entire conversation and asks for a note; the doctor proofreads.

audio → transcript → LLM → prose → physician proofreads

somaCURA what the architecture enforces

Physicians input structured fragments (text or voice). A deterministic ontology routes ~80% of those fragments to the right problem in <5ms with zero LLM calls. The model is invoked exactly twice during finalization — once for narrative problem detection (typed list, not prose), once per problem for the A&P section — and both calls receive only pre-routed evidence. The hallucination surface is one problem's prose, never the full note. The doctor approves the problem list, edits each A&P inline, and signs.

fragment → router → problems → evidence → scoped LLM → physician approves → sign

§ 01 · the canonical flow

What happens when a physician generates a progress note

POST /api/v1/encounters/{id}/progress orchestrates the largest pipeline in the system — generators/progress_generator.py at 8,957 lines, with ProgressNoteGenerator.stream_generate() as the entry point. It is also the clearest embodiment of the reasoning-artifact principle: deterministic accumulation, scoped LLM bursts, physician decision points at three stages. The green path below is the deterministic majority. The amber arrows are the only places an LLM runs.

Progress note generation — request trace stream_generate · generators/progress_generator.py:2398 · SSE

deterministic LLM (scoped) physician decision knowledge / RAG Total LLM calls per note: 2 · both scoped, neither generates a full note

§ 02 · the five pillars

What the philosophy looks like in code

The research position document — beyond-the-ai-scribe.html — defines five architectural pillars. Each one shows up in the codebase as a specific module. None is aspirational. Each cites the upstream research that motivated the design.

01 the structural shape

Problem-Oriented Clinical Course

An enumerated, evolving problem list is the chart's spine. Each problem carries status (active · improving · worsening · stable · resolved), a per-day A&P trail, and longitudinal evidence. Notes inherit prior-day A&P. The structure mirrors clinical reasoning, not conversation flow.

generators/progress_generator.py · services/problem_ap_sync_service.py · hyperdrive/problems.py

Rodman 2023 — POMR built as scientific framework; reasoning made visible. Kahn 2018 — structured templates improve accuracy and efficiency.

02 the routing layer · the load-bearing pillar

Deterministic Evidence Routing

If the architecture has a single load-bearing claim, it is this one. Clinical fragments are routed to problems in <5ms via lab ontology, vital mapping, and keyword matching. Zero LLM calls during accumulation. ~80% of real-world fragments handled without any model invocation — the doctor types K 5.6, BUN 64 and the system files those values under the right problem with no token billed and no prose generated. Reproducible. Auditable. Free. Every other pillar — knowledge, scoped compilation, physician control — depends on this layer being deterministic enough to trust.

The doctrine sits in the file's own header comment: "Do NOT import any LLM services here. This file is deterministic-only."

Implementation · services/clinical_update_router.py:328 (839 lines, class ClinicalUpdateRouter) · clinical/domain/problem_lenses.py (39 lens entries · server lens mirrored to client at static/js/census/config/problem-lenses.js with parity test) · hyperdrive/lab_mapping.py (LAB_ALIASES, normalize_problem_name, PROBLEM_LAB_MAPPING) · services/lab_problem_ontology.py (LAB_PROBLEM_MAP, VITAL_PROBLEM_MAP). Research grounding · Zhang 2025 — F1=0.90 for RAG-based problem identification (n=5,118 utterances, Omaha System framework).

03 the knowledge layer

Knowledge-Augmented Compilation

RAG retrieval from versioned guidelines + hospital protocols at compile time. Per-problem evidence graph links data to problems before any prose is generated. The LLM never sees the full note; it sees one problem's pre-scoped evidence and writes one A&P section.

rag.py (ChromaDB vector store) · services/note_compiler.py:549 · knowledge/{guidelines,templates,variable}/

Chen 2025 — knowledge graphs reduce hallucinations 44% (5 medical datasets). He 2025 — Graph-RAG outperforms naive RAG; specialists rate it higher.

04 the human in the loop

Physician as Reasoning Engine

The doctor approves the problem list, feeds clinical observations, directs the assessment. The system never generates unsupervised clinical judgments. Physician review is the workflow itself — three discrete decision points before a note is signed.

static/js/census/note-generation.js · NoteGenerationManager · diff review at generators/progress_generator.py:2749

Hack 2025 — hybrid physician-AI notes: 79% unedited approval vs 23% AI-only (n=20, 10 blinded reviewers).

05 the computation layer

Embedded Clinical Computation

Math is computed deterministically, not hallucinated. 27 named calculator methods (AKI staging, eGFR CKD-EPI 2021, SOFA, APACHE II, MELD-Na, CHA₂DS₂-VASc, HAS-BLED, Wells PE/DVT, CURB-65, CIWA-Ar, NIHSS, GCS, Glasgow-Blatchford, HEART) plus a full ABG interpreter, a 5-component acuity model, and a flag-gated renal cockpit (kEGFR · GPR · fluid correction, api/endpoints/renal.py). The LLM never re-derives a calculation the system can do exactly.

clinical/calculators/clinical_calculator.py 3,303 lines · acid_base_analyzer.py 906 lines · services/priority_scorer.py 1,407 lines · clinical/domain/reference_data.py 1,679 lines / 204 canonical labs.

Verified at rev 2 · fc47d49f (2026-07-10) — line counts measured by wc -l; calculator method count by grep -cE '^ def calculate_'.

§ 03 · the route atlas

Every endpoint, clustered by clinical workflow

Routes mapped from api/endpoints/*.py + somaNotes.py at fc47d49f — 264 route registrations across 48 endpoint files. Clustered below by clinical workflow — the cluster shape, not the endpoint count, is the load-bearing observation. Every cluster corresponds to something a hospitalist actually does at the bedside.

somaCURA route atlas · 15 clinical clusters

GET POST PUT DELETE

Note Generation 8 endpoints

progress · H&P · discharge · transfer · cheatsheet · 2-liner

POST/generate/progress

POST/generate/handp

POST/generate/discharge

POST/generate/aandp

POST/generate/cheatsheet

POST/generate/2liner

POST/generate/transfer

POST/generate/hpi

Streaming Generation SSE

generation_stream.py · 119,656 bytes · the active path

POST/api/generate/stream

POST/api/progress-analysis/stream

POST/api/handp-analysis/stream

POST/api/progress-oneshot

POST/api/quick-discharge

Census & Patients 25+

patients.py · hyperdrive/router.py (5,685 lines)

GET/api/v1/patients

POST/api/v1/patients

POST/api/v1/patients/create-flexible

GET/api/v1/patients/{id}/chart

GET/api/v1/patients/{id}/labs

GET/api/v1/patients/{id}/medications

GET/api/v1/patients/{id}/orders

GET/api/v1/patients/{id}/notes

GET/api/v1/patients/{id}/encounters

GET/api/v1/hyperdrive/...

Clinical Computation scores

scores.py · acid_base.py · clinical_charts.py

POST/api/calculate/{calculator_name}

GET/api/calculators

POST/acid-base

POST/acid-base/from-text

POST/api/clinical/calculator/contingency

POST/api/clinical/calculator/monitoring

POST/api/clinical/allergies/check

Voice & Audio 21

Deepgram Nova-3 Medical · iPhone QR pairing · 21 voice routes

POST/api/audio/transcribe

POST/api/voice/edit

POST/api/voice-edit-v2

POST/api/voice/navigate

POST/api/voice/parse

POST/api/voice/pair

POST/api/voice/telemetry

POST/api/voice/admissions

Transfer Center 135 rules

transfer.py · transfer_voice.py · transfer_twilio.py · 135 routing rules

POST/api/v1/transfer/workspace

GET/api/v1/transfer/workspace/{id}

POST/api/v1/transfer/voice

POST/api/v1/transfer/twilio/...

POST/api/v1/transfer/extract

Fragments & Diffs workspace

fragments.py · diffs.py · editor.py · the compile pipeline

POST/api/v1/fragments

GET/api/v1/fragments/{encounter_id}

PUT/api/v1/fragments/{id}

DEL/api/v1/fragments/{id}

POST/api/v1/diffs

POST/api/v1/editor/...

Reasoning Tree v3

reasoning_tree.py · D3 visualization · admin toggle

POST/api/v1/reasoning-tree/generate

GET/api/v1/reasoning-tree/{id}

GET/api/v1/reasoning-tree/stats

Timeline & Events trends

event_timeline.py · timeline.py · timeseries.py · uPlot charts

GET/api/v1/timeline/{encounter}

GET/api/v1/event-timeline

GET/api/v1/timeseries/{metric}

GET/api/v3/charts/...

Chat & Decision Support peer

chat.py · cuti_decision.py · peer-attending

POST/api/v1/chat

POST/api/v1/chat/stream

POST/api/cuti/decision

Diagnostics & AI Dashboard obs

diagnostics.py · ai_dashboard.py · analytics.py

GET/api/diagnostics/health

GET/api/diagnostics/settings/models

POST/api/diagnostics/settings/models

GET/api/ai-dashboard/...

GET/api/analytics/...

Auth & Admin 10+

auth.py · users.py · admin_sessions.py

POST/login

POST/signup

POST/logout

GET/me

GET/admin/sessions/dashboard

GET/admin/metrics

Note Evolution living note

notes.py · the Shift Workspace takeover · opt-in, progress-to-progress

POST/api/v1/note-evolution/propose

POST/api/v1/note-evolution/finalize

POST/api/v1/note-evolution/bhc-rollforward

GET/api/v1/notes/{id}/evolution-audit

Chart Import & Knowledge Cards Epic P0

ccda_import.py · formulary.py · screen-side knowledge, never prompt-side

POST/api/v1/import/ccda

GET/api/v1/formulary/antimicrobial-coverage

GET/api/v3/charts/renal/{encounter_id}

Admissions & Workspace flow

admissions.py (52KB) · workspace.py · IR pipeline

POST/admissions

POST/admissions/voice

POST/api/v1/workspace/...

GET/api/v1/notes/{type}/latest

§ 04 · the per-patient data model

What somaCURA knows about a patient

The patient is the unit of state. Everything below hangs off patient_id and encounter_id in a single WAL-mode SQLite database. Two canonical Problem models — core/models.py:Problem with field problem_title for the API surface, and hyperdrive/models.py:Problem with field name for HyperDrive storage — converted via hyperdrive/adapters.py.

problem_list

core spine of the chart

problem_titlecanonical
statusactive · improving · worsening · stable · resolved
ap_historyper-day A&P · LIST
first_observed_dayhospital day
acuity5-component score
linked_evidencegraph IDs

evidence_graph

labs/vitals/meds → problems

EvidenceNodeprov_id stable
SourceTypeLAB · VITAL · MED · MICRO · IMG
LinkQualityontology · keyword · LLM · physician
VITAL_PROBLEM_MAP39-lens
LAB_PROBLEM_MAP204-lab
build_graph()graph builder

labs · canonical_labs

204 labs · 16 panel groups

CBC · BMP · CHEM · RENALcore panels
LFT · COAGS · ABG · LIPIDSextended
CARDIAC · ENDOCRINEspecialty
INFLAMMATORY · IRONworkup
UA · PANCREATIC · TOX · MICROaux
reference_rangelow · high · critical

vitals

trend-aware

hr · sbp · dbp · rr · tempcore 5
spo2 · fio2 · weightextended
urine_outputrenal lens
trend_directionup · down · stable
reference_rangeper-vital

medications

indication-linked

name · dose · routestructured
indication→ problem
started_daytimeline
renal_adjustmentCrCl-aware
allergy_checkpre-order

cultures

organism + sensitivity

specimen_typeblood · urine · sputum · CSF
organismidentified
sensitivitiesS · I · R
statusno growth · pending · positive

scores

27 calculators · auto-computed

aki_stageKDIGO
egfr_ckd_epi_2021renal
sofa · apache2 · curb65acuity
meld_na · child_pughhepatic
cha2ds2_vasc · has_bledafib
wells_pe · wells_dvtVTE
heart · gblatchfordcardiac · GI
acid_base_full15-fn analyzer

fragments · Today's Update

durable journal · the living-note input

textrouted draft · immutable snapshot per round
created_atingestion time
clinical_effective_atphysician-asserted event time · never inferred
source_problem_keyrouting anchor
statusdraft · consumed · finalized

notes

version history · diff review

progress · handp · dischargecore types
transfer · cheatsheetops
2liner · hpibrief
change_typefirst · update · ehr_reconciliation
prior_versionsJSONL

§ 05 · how intelligence stays scoped

The LLM is invoked twice. Both calls are bounded. Neither generates a full note.

An ambient scribe feeds the entire conversation to an LLM and asks for a note. The hallucination surface is the entire output. The doctor becomes a proofreader. The privacy model is "microphone in the exam room." Determinism: zero.

somaCURA's pipeline never lets the LLM see the full note. The first call — ProblemDetectorService.analyze_clinical_update_stream at services/problem_detector_service.py:181 — receives narrative text and returns a typed list of problem candidates. The physician approves them. The second call — per-problem prose generation in services/note_compiler.py:_build_problem_prompt at line 279 — receives one problem's pre-routed evidence (the labs the ontology mapped, the vitals from its lens, the relevant prior A&P, the RAG-retrieved guidelines) and writes one A&P section. Each problem is its own attack surface, and each surface is small.

Provider routing in services/llm_service.py: default claude-opus-4-6; the scoped tier — reasoning tree, voice tier-2, IR extraction, the Note Evolution amendment loop, and the semantic fragment router — all ride claude-sonnet-5; Gemini supported through the same service for chat and BHC synthesis. Haiku is banned by user direction (2026-04-24). The living-note path (§ 09) adds calls but not scope: every one receives one problem's evidence, never the chart.

ADETfragment in → router maps to problem in <5ms (no LLM)

BDETHyperDrive NER extracts typed entities deterministically

CLLM 1narrative → problem candidates (typed list, not prose)

DDOCphysician accepts/rejects/merges candidates

EDETevidence graph + RAG per problem

FLLM 2per-problem A&P prose · scoped context only

GDOCphysician reviews each A&P · accept · edit · reject

HDOCsign & finalize

§ 06 · the diff review workflow

How the physician stays in control

Every progress note has three discrete physician decision points before a signature. Each one is a rendered diff against the previous state. The physician is never asked to proofread a wall of generated text — they are asked to accept, reject, or edit a small targeted change.

decision point #1 · problem detection

Approve the problem list

The LLM returns a typed list of problem candidates from narrative input. The physician sees each candidate, its evidence trail, and an inline action: accept · reject · merge into existing. No prose has been generated yet. The doctor is shaping the structural skeleton of the note before any sentence is written.

Implementation: analyze_clinical_update_stream SSE → frontend cards → NoteGenerationManager._handleProblemDetectionResult.

LLM call #1Latency ~2-4sOutput typed JSON

decision point #2 · per-problem A&P

Approve each assessment & plan

For each approved problem, the compiler produces an A&P diff (proposed text vs prior-day text). The physician reviews each problem's diff individually — accept the new prose, edit inline, or reject and keep yesterday's. The hallucination surface is one problem's prose, not the whole note.

Implementation: _stream_diff_proposals at generators/progress_generator.py:2749 → SSE proposed_change events → frontend per-problem diff cards.

LLM calls N (one per problem)Latency ~1-2s/problemOutput diff against prior

decision point #3 · sign & finalize

Lock the note & append A&P history

Once every section has been approved, the physician signs. PUT /api/v1/notes/{id} with change_type: "first" | "update" | "ehr_reconciliation". Pre-submit identity guard in EhrFinalModal.handleSubmit verifies this.patientId === CensusState.get('selectedPatientId') and bails on mismatch — patient-isolation contract enforced at the last possible second. On success, the note is locked, version history is preserved (JSONL), and ap_history is appended per problem so the next progress note inherits longitudinal context.

LLM calls 0Latency network onlyOutput locked note + appended history

§ 07 · behind the routes — services

The libraries every note generation step calls

Routes are thin. Most clinical reasoning lives in services/ and generators/. Each entry below is a real file at the line number cited, re-verified at fc47d49f (rev 2).

generators/progress_generator.py

8,957 lines · class ProgressNoteGenerator at line 308

The orchestration heart. Inherits ClinicalContextMixin and BaseNoteGenerator. stream_generate() at line 2398 is the SSE entry point. _stream_diff_proposals() at line 2749 produces per-problem diff events — and since rev 2 it is also the Note Evolution engine: the same method drives the per-problem amendment loop behind /api/v1/note-evolution/propose. Holds the longitudinal logic that no rewrite has yet improved on.

Entry stream_generateSSE proposed_changeInherits ClinicalContextMixin

services/problem_detector_service.py

868 lines · class ProblemDetectorService at line 29

Two-phase problem detection. analyze_clinical_update_stream() at line 182 streams the LLM's typed problem list as SSE. _suggest_consolidations() at line 610 deterministically merges duplicate-looking problems before the physician sees them — Phase 2 is not LLM, it's set logic over canonicalized names.

LLM Phase 1 onlyPhase 2 deterministicOutput typed candidates

services/clinical_update_router.py

839 lines · class ClinicalUpdateRouter at line 328

The deterministic-only routing core. Header comment is the doctrine: "Do NOT import any LLM services here. This file is deterministic-only." Pulls LAB_PROBLEM_MAP + VITAL_PROBLEM_MAP from services/lab_problem_ontology.py and PROBLEM_LAB_MAPPING + LAB_ALIASES from hyperdrive/lab_mapping.py. Target latency <5ms.

LLM noneLatency <5msCoverage ~80% of fragments

services/evidence_graph.py

723 lines · class EvidenceGraphBuilder at line 397

Builds per-problem evidence graphs. EvidenceNode with SourceType (LAB · VITAL · MED · MICRO · IMG · FRAGMENT) and LinkQuality (ontology · keyword · LLM · physician). stable_prov_id() generates deterministic provenance IDs so the same lab→problem link gets the same ID across reruns — the chart's audit trail is reproducible.

Stable IDs ✓Sources 6 typesQuality 4 levels

services/note_compiler.py

925 lines

compile_progress_note() at line 549 is the deterministic assembly path. compile_progress_note_with_prose() at line 682 is the LLM-augmented path; both share the same PatientStateSnapshot input and NoteDraft output. _build_problem_prompt() at line 279 enforces per-problem scoping — the LLM sees one problem's evidence, never the chart.

Snapshot hashedProse per-problem onlyCoverage reported

services/llm_service.py

1,006 lines

Provider router + retry + circuit breaker. Singleton client pattern: get_anthropic_client() returns a 240s-timeout shared client. schedule_background() wraps fire-and-forget tasks with strong-ref + auto-discard + shutdown-awaited. Generators use the Anthropic client directly; Gemini routes through llm_service only (chat, BHC synthesis).

Default claude-opus-4-6Scoped tier claude-sonnet-5Haiku banned

clinical/calculators/clinical_calculator.py

3,303 lines · 27 calculate_* methods

Embedded clinical math. calculate_aki_stage · calculate_egfr_ckd_epi_2021 · calculate_sofa_score · calculate_apache2 · calculate_meld_na · calculate_cha2ds2_vasc · calculate_has_bled · calculate_wells_pe · calculate_wells_dvt · calculate_curb65 · calculate_ciwa_ar · calculate_nihss · calculate_gcs · calculate_glasgow_blatchford · calculate_heart_score · calculate_anion_gap · calculate_delta_gap · calculate_corrected_calcium · calculate_osmolar_gap · calculate_fena · trend analytics · severity scoring.

Pure deterministicTrend-aware ✓Time-aware ✓

acid_base_analyzer.py

906 lines · 15 internal methods

Full ABG interpretation. analyze() resolves primary disorder, anion gap (with albumin correction), delta-delta, compensation rules, osmolal gap (when AG warrants), urine anion gap (in metabolic acidosis), corrected chloride. Generates differentials and recommendations from disorders, never from the LLM.

VBG correction ✓Compensation ✓Differentials rule-based

rag.py + knowledge/

ChromaDB vector store · 72 knowledge files

Three knowledge corpora. knowledge/guidelines/ — clinical documentation standards. knowledge/templates/ — note structure templates. knowledge/variable/ — hospital-specific protocols (averaProtocols, averaUnitSpecificTx, conditions, medications, references, research). Retrieval is per-problem at compile time, never per-token during generation.

Indexer ChromaDBLayered 3 corporaPer-problem ✓

hyperdrive/ — clinical NER + ontology

18,565 lines · router 7,117 · ontology 2,215 · ner 940

The clinical data extraction subsystem. hyperdrive/ner.py parses lab values, vital signs, medications, timestamps. hyperdrive/ontology.py normalizes lab names, maps lab→problem, vital→problem. hyperdrive/consolidation.py deduplicates labs across systems. hyperdrive/router.py exposes 25+ census/patient endpoints. A file-backed cache stores parsed results across requests for sub-millisecond reads.

NER deterministicOntology 2,215 linesCache file-backed

clinical/domain/problem_lenses.py

659 lines · 39 lens entries

Maps problem names to clinically relevant metrics. Each lens defines hero (primary metric, lab or vital), labs[] (relevant lab keys, all canonical), vitals[] (relevant vital keys). Substring match against canonicalized problem name. Fallback "*" entry returns generic labs/vitals when no pattern matches. Mirrored in static/js/census/config/problem-lenses.js with parity test at tests/test_problem_lenses_parity.py.

Server lens PythonClient lens JS mirrorParity tested

services/semantic_fragment_router.py

232 lines · SemanticRoute · new in rev 2

The escalation tier the keyword router lacked. When the deterministic ontology cannot anchor a terse update ("doing well, discontinue steroids") to an existing problem, this service asks claude-sonnet-5 for a typed routing decision — problem key in, problem key out, no prose. Shipped 2026-06-30 to fix the documented core defect where unanchorable updates fell to a raw new-problem residual. ENABLE_SEMANTIC_FRAGMENT_ROUTING live in prod.

Output typed, never proseModel sonnet-5Replaces lexical residual

services/priority_scorer.py

1,407 lines · 5-component acuity

Scores patient acuity for census sorting. Five components — clinical severity (vitals · scores · trajectory), evidence pressure (labs out of range), problem activity (new · worsening), workflow urgency (orders pending · consults), longitudinal drift (trend changes). Output is per-patient acuity used by the census view to surface the sickest patients first.

Components 5Used by census sortDeterministic ✓

§ 08 · head-to-head

What changes in the architecture when you choose reasoning over transcription

The comparison from the research position document, with code references for every row. Each somaCURA cell points to the file that implements the contrasted behavior, verified at fc47d49f.

Dimension	Ambient AI Scribe	somaCURA
Input source	Conversation audio (ambient mic)	Structured fragments — text + voice. `services/clinical_update_router.py`
Problem awareness	None — generic note formatting	Full problem list with status & evolution. `core/models.py:Problem` + `ap_history`
Clinical context	Current conversation only	Longitudinal · prior A&P · lab trajectories · med indications. `generators/progress_generator.py`
Evidence provenance	None — opaque prose generation	Per-problem evidence graph with stable provenance IDs. `services/evidence_graph.py:108`
Doctor's role	Proofreader of generated text	Reasoning director · 3 decision points before signature. `NoteGenerationManager`
Hallucination surface	Full note (unscoped generation from conversation)	Per-problem prose only — pre-routed evidence in scope. `note_compiler.py:_format_evidence_by_source`
Determinism	0% — entirely LLM-dependent	~80% deterministic routing · 2 scoped LLM calls per note. `clinical_update_router.py`

§ 09 · the living note · new since rev 1

The note stopped being regenerated. It now evolves in place.

Everything above describes generating a note. The largest architectural change since the first cut of this document is that the daily progress note is no longer re-derived from scratch each morning — the Note Evolution takeover (the Shift Workspace, ENABLE_THREE_SUBSTRATE_TAKEOVER, live in prod) treats yesterday's signed note as substrate and applies targeted, physician-reviewed amendments to it. The doctor supplies the thinking — one terse interval line ("doing well, discontinue steroids") — and the machine does the compilation across the chart. Opt-in per generation, progress-to-progress within the same encounter only; the two-phase rail — the physician's daily driver for progress notes — remains untouched, as does Quick Gen, the separate one-shot path for fast whole-note generation.

the propose → review → finalize loop

Same invariant, second pipeline: every LLM call sees one problem, never the note.

POST /api/v1/note-evolution/propose captures the composer text as a durable routed draft fragment, snapshots the pending journal immutably, and hands it to the same engine that powers the rail — _stream_diff_proposals in generators/progress_generator.py. Fragments the ontology can't anchor escalate to the semantic fragment router (typed problem-key decision, claude-sonnet-5, no prose) instead of falling to a raw new-problem residual — the core defect this document listed as broken in rev 1, fixed 2026-06-30.

Each proposal returns to the physician as a per-problem decision band carrying recommendation semantics — a closed 4-way enum distinguishing Observed fact · Assessment · Already ordered / done · Suggested next, stamped by the backend, never guessed by the UI. Accepting a suggestion adopts it into the working plan; it never marks an order executed. Fragments carry clinical_effective_at — the physician-asserted clinical event time, kept distinct from ingestion time and never silently inferred.

Finalize assembles an EvolutionPatch that replaces only the reviewed sections and writes the note. Accepted decisions project as provisional overlays onto Episode Tapestry — the course-lane view that replaced the legacy Problems column as the default census surface (ENABLE_TAPESTRY_PRIMARY, live) — until Finalize makes them real. Upstream, Epic chart export lands through POST /api/v1/import/ccda (P0 of the Epic@Avera integration, live) and feeds the same deterministic ingest the rest of the substrate uses.

ADOCone terse interval line → durable routed draft fragment

BDETontology anchors the fragment to an existing problem

CLLMunanchorable? semantic router → typed problem key, no prose

DLLMper-problem amendment loop · one problem's evidence per call

EDOCproblem-band review · Observed · Assessment · Ordered · Suggested

FDOCBHC roll-forward · one note-level decision · Accept / Edit / Keep prior

GDETFinalize · EvolutionPatch replaces only reviewed sections

HVIEWEpisode Tapestry course lanes absorb the accepted state

§ the decision rule

Knowledge belongs on the doctor's screen, not in LLM-generated prose. Every sentence in a finalized note must trace to an input — a lab, a vital, a medication, a clinical observation, a prior A&P. If a sentence cannot be traced, it shouldn't be there. When in doubt, generate less.

Audit yourself: does this change make the note more transcribed, or more reasoned? Only the second is legitimate. Adding more knowledge sources to the LLM prompt is the wrong instinct — more input means more material to hallucinate with. The right answer is almost always to surface the knowledge on the doctor's screen, not in the LLM output.

§ in flight · known gaps

Rev 2 was cut against fc47d49f on 2026-07-10 — 707 commits after rev 1. First, what rev 1 listed as broken that is now closed; then what a reader making architectural decisions today should know.

Closed since rev 1 — the CCE routing gap (terse updates falling to a raw new-problem residual) is fixed by the semantic fragment router, live since 2026-06-30. The "Problem Board is not the default view" gap was superseded rather than fixed: Episode Tapestry replaced the entire legacy Problems-column default (ENABLE_TAPESTRY_PRIMARY, live). Formulary has its first real screen-side surface (antimicrobial-coverage card, zero LLM calls, live). Overnight-data auto-load is built end-to-end (idempotent, deduped, output-capped). The organizational middle phase rev 1 called "zero code" became the Note Evolution takeover (§ 09) — a different, shipped answer to the same question.

Recommendation semantics + event-time are live but UAT-pending. The 4-way badge enum and clinical_effective_at shipped 2026-07-10 and are serving in prod, but the physician has not yet run a real propose round against them. Treat the semantics as code-verified, not practice-verified, until that UAT lands.
Epic@Avera P1 is gated on vendor review, not code. P0 — C-CDA chart upload → parse → ingest (POST /api/v1/import/ccda, ENABLE_CCDA_IMPORT) — is live. P1, SMART-on-FHIR standalone read-only access, is a 3-6 month horizon gated on Avera's security review, BAA, and vendor-vs-internal-tool classification. The long pole is institutional, not engineering. docs/plans/2026-06-02-000-EPIC-AVERA-fhir-chart-export.md.
Note Evolution matcher hardening is queued. The semantic router fixed the routing cliff, but the A&P-target matcher still has hardening work: composite headings (AFib + CHF + HTN) resolve as A&P-only bands without a Course lane, and missing/duplicate targets stay acceptance-blocked by design. Rail/takeover save-path unification (site 5) and the ontology sprint follow.
compile_course model["knots"] dead-code bug open. Known, documented, not yet fixed; the Tapestry weave renders from the live ledger path so the doctor-facing surface is unaffected.
Formulary breadth is one card. The antimicrobial-coverage card proves the screen-side knowledge pattern; the broader guidelines/calculators/preferences vision from the blueprint is not built. Product decision, not engineering gap — the data is already routable per-problem.

§ limits & counted-with

"~80% deterministic" is the design target stated in the file header of services/clinical_update_router.py — not yet measured in production telemetry. The router itself is deterministic-only; the proportion of fragments handled without an LLM in real practice is the open metric.
"<5ms router latency" is also a design target, not a measurement. The deterministic path has no I/O — match against in-memory ontology dicts — so the order of magnitude is right; the precise distribution is unmeasured.
Problem-detection F1=0.90 is upstream literature (Zhang 2025 · n=5,118 utterances · Omaha System framework), cited as the architectural target somaCURA's two-phase detection is designed against, not as a measurement of this codebase.
Hybrid-note approval rate 79% is upstream literature (Hack 2025 · n=20 · 10 blinded reviewers · 8-domain scoring). It motivates the physician-in-the-loop architecture, not internal data.
Calculator count: 27 named calculate_* methods in clinical/calculators/clinical_calculator.py (28 at rev 1; a consolidation pass merged one). The broader "30+" framing in the research doc includes the acid_base_analyzer.py internal methods, the 5-component priority_scorer.py, the flag-gated renal cockpit, and per-generator scoring helpers.
Problem lens count: 39, verified by the file's own header comment in clinical/domain/problem_lenses.py ("39 entries, mirrors problem-lenses.js exactly").
Canonical lab definitions: 204, counted via grep -cE '^\s+"[A-Z][A-Za-z0-9_]*":\s*_lab\(' clinical/domain/reference_data.py. LAB_REFERENCE_RANGES is a tighter subset; the research doc's "150+" is conservative.
Default model is claude-opus-4-6 (verified in core/settings.py:232); the scoped tier — reasoning tree, voice tier-2, IR extraction, Note Evolution amendments, semantic fragment routing — rides claude-sonnet-5; Haiku is banned by user direction (2026-04-24).
Feature-flag discipline is itself an architectural claim: 78 admin-panel flag overrides are active in production, re-applied at startup and winning over env defaults. Every rev-2 surface (takeover, Tapestry, semantic routing, C-CDA import, Formulary, TC/Admissions redesigns) is inert and byte-stable when its flag is off — the rail daily driver is never load-bearing on new code.
Single-host deployment. No Docker, no Kubernetes, no orchestrator. One systemd unit, one process, one WAL-mode SQLite database, one ChromaDB on disk. Operational simplicity is part of the architectural argument — fewer moving parts means fewer places for the reasoning to drift.

Scope

All @router.* + @app.* decorators in api/endpoints/*.py and somaNotes.py. Service files mapped from services/ · generators/ · clinical/ · hyperdrive/. Pillars cross-referenced against https://somacura.icu/static/infographics/beyond-the-ai-scribe.html.

Method

Codebase-grounded build. Every file path, line number, function name, and code snippet was verified by reading the file at the stated commit before being placed in the document. Rev 2 re-measured every count and re-resolved every line anchor against fc47d49f; drifted anchors (stream_generate 2096→2398, diff proposals 2598→2749, and others) were corrected rather than carried.

As-of · source

Rev 2 · 2026-07-10 · fc47d49f (production deploy line, 707 commits past rev 1's fab6a2c1) · 1,025 Python files · 237 census JS files · 72 knowledge files · 264 route registrations · 78 prod flag overrides · counts verified by live grep at build time.

Generated

Rev 1 by Claude Opus 4.7 following the infographic-gen skill; rev 2 update by Claude Fable 5, 2026-07-10. Single-file HTML · zero-build · zero dependencies · print-safe · reduced-motion aware. Drop on any web server.

Audit

compression: passed (v2 audit, 2026-04-25) · 14 of 15 redline items applied; deferred items logged in COMPRESSION DEFERRED block at end of source. Rev 2 adds § 09, two atlas clusters, one data silo, one service card; deletes nothing load-bearing.

somaCURA active routes by clinical domain
Cluster	Endpoint count	Primary file	Role
Note Generation	8	generators/*.py	One endpoint per note type · all POST
Streaming Generation	5	api/endpoints/generation_stream.py	SSE · the active path
Census & Patients	25+	api/endpoints/patients.py + hyperdrive/router.py	Patient CRUD · chart · labs · meds · orders
Clinical Computation	scores · acid-base · contingency	scores.py · acid_base.py · clinical_charts.py	27 calculators wrapped as endpoints
Voice & Audio	21	voice_*.py · audio_transcription.py	Deepgram · iPhone QR · companion pairing
Transfer Center	workspace + voice + twilio	transfer*.py	135-rule routing advisor
Fragments & Diffs	workspace	fragments.py · diffs.py · editor.py	Fragment CRUD · evidence graph · compile
Reasoning Tree	3	reasoning_tree.py	D3 visualization · v3 generator
Timeline & Events	4	event_timeline.py · timeline.py · timeseries.py	uPlot trends · acuity heatmap
Chat & Decision Support	3	chat.py · cuti_decision.py	Peer-attending chat · CUTI
Diagnostics & AI Dashboard	15+	diagnostics.py · ai_dashboard.py · analytics.py	Health · model config · usage
Auth & Admin	10+	auth.py · users.py · admin_sessions.py	Login · session admin · metrics
Admissions & Workspace	flow	admissions.py · workspace.py	IR pipeline · note storage
Note Evolution	propose · finalize · bhc-rollforward	api/endpoints/notes.py	Living-note takeover · per-problem amendments
Chart Import & Knowledge Cards	ccda · formulary · renal	ccda_import.py · formulary.py · renal.py	Epic C-CDA ingest · screen-side knowledge
Total	264 active route registrations	48 endpoint files	Counted with grep at fc47d49f (rev 2)

somaCURA five pillars and their code references
Pillar	Name	Implementation	Research grounding
01	Problem-Oriented Clinical Course	generators/progress_generator.py · services/problem_ap_sync_service.py · hyperdrive/problems.py	Rodman 2023 (POMR as scientific framework) · Kahn 2018 (structured templates)
02	Deterministic Evidence Routing	services/clinical_update_router.py:328 · clinical/domain/problem_lenses.py (39 lenses) · hyperdrive/lab_mapping.py	Zhang 2025 (F1=0.90, n=5,118 utterances)
03	Knowledge-Augmented Compilation	rag.py (ChromaDB) · services/note_compiler.py:549 · knowledge/{guidelines,templates,variable}/	Chen 2025 (44% hallucination reduction) · He 2025 (Graph-RAG advantage)
04	Physician as Reasoning Engine	static/js/census/note-generation.js (NoteGenerationManager) · diff review at progress_generator.py:2749	Hack 2025 (79% hybrid vs 23% AI-only approval)
05	Embedded Clinical Computation	clinical/calculators/clinical_calculator.py (3,303 lines) · acid_base_analyzer.py (906 lines) · services/priority_scorer.py (1,407 lines) · clinical/domain/reference_data.py (1,679 lines)	Verified file/line counts at fc47d49f (rev 2)

Architecture-level comparison with code references
Dimension	Ambient AI Scribe	somaCURA	Implementation
Input source	Conversation audio	Structured fragments	services/clinical_update_router.py
When LLM runs	Continuously	At finalization, scoped per-problem	note_compiler.py:_build_problem_prompt:279
Problem awareness	None	Full problem list with status evolution	core/models.py:Problem + ap_history
Clinical context	Current conversation	Longitudinal · prior A&P · trajectories	generators/progress_generator.py
Evidence provenance	None	Per-problem evidence graph · stable IDs	services/evidence_graph.py:108 (stable_prov_id)
Knowledge base	LLM training data	Versioned guidelines + protocols (RAG)	rag.py · knowledge/{guidelines,variable}/
Clinical calculations	None (LLM may guess)	27 calculators + acid-base + 5-component acuity	clinical/calculators/clinical_calculator.py
Doctor's role	Proofreader	Reasoning director · 3 decision points	NoteGenerationManager + EhrFinalModal
Hallucination surface	Full note	Per-problem prose only	note_compiler.py:_format_evidence_by_source
Privacy model	Ambient microphone	Intentional input · zero default audio	services/audio_transcription_service.py (opt-in)
Determinism	0%	~80% deterministic routing	services/clinical_update_router.py (target)