Neura: HIPAA-Compliant Mental Health Navigation Agent
The demo passes easily. The approval review is the harder bar. Neura was built to clear both. SaaStoAgent delivered a governed healthcare agent that holds its boundary across intake, matching, booking, payment, and follow-up — with safety holding structural override authority and every turn producing inspectable evidence.
The Problem
Mental-health navigation is the kind of workflow where a convincing demo and a passing approval review are two different bars. A patient in distress needs a system that asks the right questions, holds the right boundary, surfaces a credentialed match, books the appointment, takes payment, and follows up — without ever drifting into clinical interpretation, ignoring a crisis signal, or producing an output a reviewer cannot reconstruct.
What the person navigating care needs
- One coherent conversation that carries from intake through booking, payment, and follow-up — not five disconnected tools
- A provider match grounded in clinical fit, insurance, and availability, with a rationale they can read
- Immediate, structured response when distress signals appear, not delayed manual triage
- Confidence that the system stays inside its role and hands off to a person when the workflow demands it
What the governed platform owes
- A PHI boundary that survives the BAA review, not just the architecture diagram
- Safety authority that is structural — enforced by the workflow, not by prompt instruction
- Inspectable evidence for every turn: which gates fired, which tool ran, which state was written
- A release that can be expanded later because the first slice exercised the full control path
Compliance Envelope
Neura's first design decision was not an agent. It was a PHI boundary. Before any prompt was written, the team applied for a Business Associate Agreement covering the OpenAI API organization that would handle protected health information, and attached the Healthcare Addendum to it. The signed agreement set the constraints the rest of the system was built inside; every architecture choice in the sections below treats those clauses as inputs, not as legal formalities.
What the signed BAA + Healthcare Addendum bind
- Covered API organization: the BAA names the specific OpenAI org ID that may process PHI. Other organizations are out of scope and routed away at the boundary.
- Allowed model surfaces: only the API capabilities listed in the addendum are usable inside the PHI boundary. Logging, training opt-outs, and data-handling terms are pinned to those surfaces.
- Customer-side obligations: identity, consent, patient matching, clinical review, state-law fit, role design, and output validation remain Neura's responsibility — which is why Sections 3 through 7 exist.
- Evidence path: incident response, audit, and termination handling are framed against the same recorded state the architecture produces at runtime.
Delivery Model and Role Contract
A use case framed as a workflow is governable. A use case framed as a chat persona is a conversational surface with no operating boundary. Neura was scoped as a workflow from day one: navigate a person seeking therapy from intake through a confirmed booking, with safety and review as structural participants. The risk tier, role boundary, authority level, readiness gates, and escalation design were committed before any architecture was drawn.
Risk tier classification
| Dimension | Neura's classification |
|---|---|
| Data sensitivity | PHI — mental-health context, identifiers, scheduling, payment |
| Consequence of incorrect output | High — mis-routing a person in distress has real-world impact |
| Proximity to clinical judgment | Adjacent; the agent matches and routes, it does not diagnose or treat |
| Autonomy level | Bounded action: schedule, invoice, notify — within phase-gated tool authority |
| Human review | Required for high-risk and crisis tiers; available for any flagged turn |
Role boundary (paired lists)
The agent may
- Conduct supportive, structured intake conversations
- Extract a Client Signal from unstructured input
- Query the provider directory and produce a ranked, explainable shortlist
- Hold appointment slots, confirm bookings, and generate invoices via integrated tools
- Send pre-session and post-session reminders within consented channels
The agent must not
- Diagnose, interpret symptoms clinically, or recommend treatment
- Replace a clinician, crisis hotline, or emergency response path
- Continue normal flow when safety state indicates high risk or crisis
- Invoke a tool outside the current workflow phase or without its required gates
- Write to memory or state without producing a recorded, replay-able audit record
Authority level, readiness, and escalation
- Authority ceiling: Recommends with review for provider matching; Performs bounded operational actions for booking, invoicing, and notifications. The ceiling was declared before tool integration so it could be enforced structurally rather than rhetorically.
- Readiness gates: matching does not run until intake produces a complete Client Signal; booking does not run until a provider is selected and safety state is clear; invoicing does not run until booking is confirmed.
- Escalation design: each trigger names the user-facing behavior, the system action (state write, route change, paused tool), the human owner, and the audit requirement. Escalation rules live in the workflow, not in the prompt.
First Care-Flow Slice
The first release was narrow but complete. A slice that exercises only part of the control system leaves governance gaps that become expensive to close after launch; a slice that walks end-to-end proves every control layer is real at the cheapest possible point. Neura's first slice covered the full path a person travels from "I need help" to a confirmed, paid session with reminders queued.
Conversational intake
Adaptive dialogue captures presenting concerns, preferences, logistics, and prior therapy context. The interface is supportive, not a form.
Signal extraction and safety evaluation
Intake is parsed into a structured Client Signal. Safety evaluation runs on every turn and sets the risk tier that governs what the rest of the workflow may do.
Provider matching with rationale
The matching layer combines Provider Genome, Client Signal, and Profile Context into a ranked shortlist. Each recommendation carries the reasons it ranked where it did.
Selection, booking, payment
Once a provider is chosen, phase-gated tools handle slot confirmation, booking, invoice generation, and payment via a third-party gateway.
Reminders and post-session follow-up
Firebase Cloud Messaging sends contextual pre- and post-session prompts, with rescheduling routed back through the booking phase, not improvised.
Recorded state, replay-ready
Every turn produces a snapshot: the inputs the agent saw, the gates that fired, the tools that ran, the state that was written. The slice was reviewable end-to-end before it shipped.
Four-Layer Architecture
Starting from prompts makes the prompt the de-facto architecture, with safety, state, and permissions scattered across instruction text. Neura was built on four distinct layers instead. Each has a single concern, a defined input and output, and can be tested and changed independently of the others.
Agent reasoning
Decides what the next action should be, given the assembled context. Has no direct authority to invoke tools or write state.
Workflow gating
Decides whether the proposed action is permitted in the current phase, against the current safety state, with the required readiness conditions met.
Bounded execution
Runs the permitted action through a typed tool contract. No side effects exist outside this layer.
State recording
Captures the per-turn snapshot — inputs, decisions, gate outcomes, tool results — for replay, review, and audit.
Governance Zones
Layers name the concerns. Governance zones name the owners and show how authority flows between them. In Neura, the core workflow zone sits inside a safety zone that holds structural override authority, a compliance zone that pins the PHI boundary, and a review zone that runs alongside the workflow rather than downstream of it.
Core workflow zone
Owns intake, matching, booking, payment, and follow-up. Operates under the constraints set by the surrounding zones.
Safety zone
Owns risk-tier evaluation and override authority. Can make matching and booking structurally unavailable above defined thresholds.
Compliance zone
Owns the PHI boundary, consent records, and data-handling clauses inherited from the BAA and Healthcare Addendum.
Review zone
Owns clinician-adjacent oversight: provider verification, escalation handling, and sign-off on behavior changes.
Seven Control Planes
Control planes are the design-time vocabulary the team uses to discuss what the system must enforce. Each plane names one category of constraint and maps to a runtime enforcement point. When a constraint is violated, the answer to "which plane caught it?" is concrete — not "the prompt."
Safety plane
Evaluates risk every turn; holds structural override authority over matching, booking, and routing.
Compliance plane
Pins the PHI boundary, consent capture, and data-handling rules inherited from the BAA and addendum.
Context plane
Governs which session, profile, and safety inputs are assembled into the structured context the agent reasons over.
Tool plane
Pins each tool to its phase, its gate requirements, and its audit shape. Out-of-phase tool calls are unavailable.
Review plane
Defines who handles escalation, who approves recommendations, and who signs off on behavior changes.
Audit plane
Owns the per-turn snapshot, state-recording schema, and replay hooks that make every turn inspectable.
Release plane
Owns versioned prompts, gate thresholds, tool contracts, and the evidence pack that gates a release.
Phase Boundaries and Runtime Gates
A prompt tells the agent what it should do; a phase boundary determines what it is allowed to do. Each Neura workflow phase exposes a different set of tools, requires a different set of gates, and produces a different shape of recorded state. The three screens below mark phase transitions, not feature buckets.
Discovery phase — intake to shortlist
Tools available in this phase: directory query, eligibility filter, matching engine. Tools structurally unavailable: booking, invoicing, payment. Readiness gate: the Client Signal must be complete enough to score. Safety gate: risk tier must be clear or elevated; high-risk and crisis paths bypass matching and route into the escalation lane.
Conversational intake
Adaptive dialogue collects presenting concerns, preferences, logistics, and prior therapy context. Output: a structured Client Signal.
Per-turn safety evaluation
Risk tier is recomputed every turn against a versioned safety policy. The tier governs which downstream phases remain available.
Eligibility and matching
Provider directory is filtered by insurance, location, and credential. The Matching Orchestrator produces a ranked shortlist with rationale.
Shortlist with explainability
Each recommendation carries which factors contributed, how they were weighted, and why alternatives were deprioritized.
Selection and booking phase
Tools available: slot lookup, hold, confirm booking, telehealth link provisioning via VideoSDK. Gate requirements: provider selected from the shortlist, identity confirmed, consent recorded, safety state still clear or elevated.
Provider selection
The person chooses a provider from the explained shortlist. Real-time availability is checked against the booking system.
Slot confirmation
Available slots are presented across both schedules. Holding a slot is a tool call gated by the consent record.
Booking confirmation
The booking tool commits the session, provisions the VideoSDK link, and writes a confirmed-state record both parties can rely on.
Invoicing, payment, and follow-up phase
Tools available: invoice generation, payment-gateway capture, reminder dispatch via Firebase Cloud Messaging. Gate requirements: booking is confirmed; payment status is reconciled before reminders engage; rescheduling routes back through the booking phase rather than improvising a new path.
Invoice generation
Invoice is produced from session type, provider rate, and insurance applicability. The tool contract pins which fields are written and where.
Payment capture
Payment is processed via a third-party gateway. Retries, failures, and receipts are reconciled into recorded state, not held only in the gateway.
Reminders and follow-up
Pre- and post-session messages are dispatched through Firebase Cloud Messaging on consented channels, with rescheduling looped back to the booking phase.
Safety Override Authority and Risk Tiers
Safety evaluation runs every turn against a policy versioned independently of the navigator prompt. It holds authority the agent cannot override: when risk reaches the high or crisis threshold, matching, booking, and routing become structurally unavailable, regardless of what the prompt or the user input asks for.
Risk tiers and what each tier structurally permits
Crisis
Normal flow is halted. The person is routed into the escalation lane with crisis resources surfaced immediately and a human contact path engaged.
High risk
Matching is restricted to appropriately credentialed providers; booking requires reviewer-on-the-loop. Recorded state flags the turn for clinical sign-off.
Elevated
Full workflow remains available, with matching weights adjusted to prioritize providers with relevant experience. Safety-state writes are visible to review.
Standard
Full provider pool, standard matching weights, all phase-gated tools available subject to their own readiness and consent gates.
Why the authority is structural, not instructional
- Prompt-level constraints require the agent's cooperation to take effect; structural authority does not.
- Risk-tier policy is versioned, reviewed, and approved as its own artifact — not edited inside navigator prompts.
- Escalation triggers commit a state write, a route change, and a human owner; none of those are inferable from chat text alone.
Matching Logic and Explainability
Provider matching is structured reasoning over a deliberately assembled context object, not keyword search and not free-form LLM output. The matching layer reads from three authority-classified inputs and produces a shortlist whose rationale can be read by the person navigating care and by a reviewer auditing the turn.
Structured matching inputs
- Provider Genome: specializations, modalities, experience, availability, populations served, session formats — durable profile data, baseline trust.
- Client Signal: extracted needs, severity indicators, preferences, constraints, prior-therapy history — session-derived signal.
- Patient Profile Context: longitudinal history when present, governed by consent and the compliance plane.
What the matching layer produces
- Therapeutic Fit Engine: scores compatibility across clinical, logistical, and preference dimensions.
- Matching Orchestrator: combines inputs in declared precedence and produces a ranked shortlist.
- Explainability record: every recommendation carries which factors contributed, how they were weighted, and why alternatives were deprioritized.
Bounded Execution and Memory Scope
The action surface of Neura lives in typed tool contracts, not in agent reasoning. Each tool declares the phase that allows it, the gate conditions that must be true before it runs, and the audit shape it leaves behind. Memory is partitioned by authority class so that safety state cannot be displaced by session chatter, and durable profile data is not silently re-derived from one conversation.
Tool contracts pin three properties
- Phase boundary: the tool is only callable inside its declared phase; out-of-phase calls produce a defined blocked state.
- Gate requirements: safety state, readiness, consent, and identity conditions that must hold before execution.
- Audit shape: every execution leaves a typed record — inputs, outputs, side effects, and downstream state writes.
Memory partitioned by authority class
- Session-derived signal accumulates per turn and is the lowest-trust input.
- Durable profile data loads from persistent store with baseline trust and is not overwritten from chat.
- Safety state holds override authority; the assembly gate integrates these classes in declared precedence.
Prompt Discipline and Versioned Behavior
Neura's prompts are behavioral specifications, not content. A prompt change is a behavior change, reviewed and versioned alongside gate thresholds, tool contracts, and the safety policy. Treating prompt edits as content edits is how systems drift invisibly between reviews; Neura's release plane refuses that path.
What the release plane versions together
- Navigator and supporting prompts
- Safety policy and risk-tier thresholds
- Phase definitions and tool contracts
- Context-assembly rules and authority precedence
- Escalation rules and named human owners
State Recording and Replay
Recording is designed into the system before the first interaction is processed; it is not bolted on later. Every turn produces a snapshot the team and reviewers can replay end-to-end: the assembled context the agent saw, the gate decisions that fired, the tool that ran, and the state written. Without that record, quality assurance becomes argument from screenshots.
Replay-ready by design
- LangSmith traces LLM calls, token usage, and response quality against the snapshot.
- Decision lineage runs from intake input through provider recommendation to booking confirmation in a single replay.
- Audit records are shaped to support clinical review and incident response — not just engineering debugging.
Persona QA and the Evidence Pack
The release gate for the first Neura slice was not a passing demo. It was an evidence pack: a set of realistic personas walked through the full workflow, with every turn's snapshot, gate outcome, tool call, and state write inspected against the role boundary. A slice was releasable because that pack produced clean evidence, not because a meeting went well.
How personas were designed
- Coverage across risk tiers: standard, elevated, high-risk, crisis — each persona pinned to the structural path it should take.
- Coverage across phase transitions: intake-only, intake-to-match, match-to-book, book-to-pay, post-session reschedule.
- Edge personas designed to probe role drift: requests for diagnosis, requests to bypass consent, requests to act outside phase.
What the evidence pack proves
- The agent stayed inside its boundary on every turn, or the system blocked it — with a recorded reason.
- Safety authority fired where it should and only where it should; risk-tier thresholds were neither over- nor under-tuned.
- Every tool call had its required gates, every state write had its audit record, every escalation reached its named owner.
Technology Stack
Neura's stack was chosen for what it allows the team to enforce, not just for what it allows the agent to do. LangGraph carries the phase machine; LangSmith carries the replay surface; the rest of the stack is shaped to the same audit and recovery requirements.
Outcome
Neura cleared the harder bar: a release that survives an approval review, not just a demo. The first slice was narrow, end-to-end, and produced inspectable evidence on every turn — the conditions that let the team expand the workflow afterward without rebuilding the control system to do it.
For the person navigating care
- One coherent conversation from intake through booking, payment, and follow-up.
- Safer by structure: distress signals change what the system can do, not just what it says.
- Explained recommendations: every match comes with a readable reason.
For the governed product
- BAA-bounded from day one, with the PHI boundary encoded into architecture and tool contracts.
- Phase-gated, replay-ready: every turn is inspectable; clinical and compliance review work from the same record engineering does.
- Expandable without retrofit: the first slice exercised the full control path, so new workflows extend the system rather than rebuild it.
Build agents that pass the approval review, not just the demo.
SaaStoAgent delivers governed agentic systems: BAA-bounded, phase-gated, safety-overridden, and replay-ready — designed for regulated domains from the first slice.