Neura: HIPAA-Compliant Mental Health Navigation Agent

The Problem

Mental-health navigation is the kind of workflow where a convincing demo and a passing approval review are two different bars. A patient in distress needs a system that asks the right questions, holds the right boundary, surfaces a credentialed match, books the appointment, takes payment, and follows up — without ever drifting into clinical interpretation, ignoring a crisis signal, or producing an output a reviewer cannot reconstruct.

What the person navigating care needs

One coherent conversation that carries from intake through booking, payment, and follow-up — not five disconnected tools
A provider match grounded in clinical fit, insurance, and availability, with a rationale they can read
Immediate, structured response when distress signals appear, not delayed manual triage
Confidence that the system stays inside its role and hands off to a person when the workflow demands it

What the governed platform owes

A PHI boundary that survives the BAA review, not just the architecture diagram
Safety authority that is structural — enforced by the workflow, not by prompt instruction
Inspectable evidence for every turn: which gates fired, which tool ran, which state was written
A release that can be expanded later because the first slice exercised the full control path

Compliance Envelope

Neura's first design decision was not an agent. It was a PHI boundary. Before any prompt was written, the team applied for a Business Associate Agreement covering the OpenAI API organization that would handle protected health information, and attached the Healthcare Addendum to it. The signed agreement set the constraints the rest of the system was built inside; every architecture choice in the sections below treats those clauses as inputs, not as legal formalities.

                        What the signed BAA + Healthcare Addendum bind
                        Covered API organization: the BAA names the specific OpenAI org ID that may process PHI. Other organizations are out of scope and routed away at the boundary.
Allowed model surfaces: only the API capabilities listed in the addendum are usable inside the PHI boundary. Logging, training opt-outs, and data-handling terms are pinned to those surfaces.
Customer-side obligations: identity, consent, patient matching, clinical review, state-law fit, role design, and output validation remain Neura's responsibility — which is why Sections 3 through 7 exist.
Evidence path: incident response, audit, and termination handling are framed against the same recorded state the architecture produces at runtime.

                    

Delivery Model and Role Contract

A use case framed as a workflow is governable. A use case framed as a chat persona is a conversational surface with no operating boundary. Neura was scoped as a workflow from day one: navigate a person seeking therapy from intake through a confirmed booking, with safety and review as structural participants. The risk tier, role boundary, authority level, readiness gates, and escalation design were committed before any architecture was drawn.

Risk tier classification

Dimension	Neura's classification
Data sensitivity	PHI — mental-health context, identifiers, scheduling, payment
Consequence of incorrect output	High — mis-routing a person in distress has real-world impact
Proximity to clinical judgment	Adjacent; the agent matches and routes, it does not diagnose or treat
Autonomy level	Bounded action: schedule, invoice, notify — within phase-gated tool authority
Human review	Required for high-risk and crisis tiers; available for any flagged turn

Role boundary (paired lists)

The agent may

Conduct supportive, structured intake conversations
Extract a Client Signal from unstructured input
Query the provider directory and produce a ranked, explainable shortlist
Hold appointment slots, confirm bookings, and generate invoices via integrated tools
Send pre-session and post-session reminders within consented channels

The agent must not

Diagnose, interpret symptoms clinically, or recommend treatment
Replace a clinician, crisis hotline, or emergency response path
Continue normal flow when safety state indicates high risk or crisis
Invoke a tool outside the current workflow phase or without its required gates
Write to memory or state without producing a recorded, replay-able audit record

Authority level, readiness, and escalation

Authority ceiling: Recommends with review for provider matching; Performs bounded operational actions for booking, invoicing, and notifications. The ceiling was declared before tool integration so it could be enforced structurally rather than rhetorically.
Readiness gates: matching does not run until intake produces a complete Client Signal; booking does not run until a provider is selected and safety state is clear; invoicing does not run until booking is confirmed.
Escalation design: each trigger names the user-facing behavior, the system action (state write, route change, paused tool), the human owner, and the audit requirement. Escalation rules live in the workflow, not in the prompt.

First Care-Flow Slice

The first release was narrow but complete. A slice that exercises only part of the control system leaves governance gaps that become expensive to close after launch; a slice that walks end-to-end proves every control layer is real at the cheapest possible point. Neura's first slice covered the full path a person travels from "I need help" to a confirmed, paid session with reminders queued.

1

Conversational intake

Adaptive dialogue captures presenting concerns, preferences, logistics, and prior therapy context. The interface is supportive, not a form.

2

Signal extraction and safety evaluation

Intake is parsed into a structured Client Signal. Safety evaluation runs on every turn and sets the risk tier that governs what the rest of the workflow may do.

3

Provider matching with rationale

The matching layer combines Provider Genome, Client Signal, and Profile Context into a ranked shortlist. Each recommendation carries the reasons it ranked where it did.

4

Selection, booking, payment

Once a provider is chosen, phase-gated tools handle slot confirmation, booking, invoice generation, and payment via a third-party gateway.

5

Reminders and post-session follow-up

Firebase Cloud Messaging sends contextual pre- and post-session prompts, with rescheduling routed back through the booking phase, not improvised.

6

Recorded state, replay-ready

Every turn produces a snapshot: the inputs the agent saw, the gates that fired, the tools that ran, the state that was written. The slice was reviewable end-to-end before it shipped.

Four-Layer Architecture

Starting from prompts makes the prompt the de-facto architecture, with safety, state, and permissions scattered across instruction text. Neura was built on four distinct layers instead. Each has a single concern, a defined input and output, and can be tested and changed independently of the others.

Agent reasoning

Decides what the next action should be, given the assembled context. Has no direct authority to invoke tools or write state.

Workflow gating

Decides whether the proposed action is permitted in the current phase, against the current safety state, with the required readiness conditions met.

Bounded execution

Runs the permitted action through a typed tool contract. No side effects exist outside this layer.

State recording

Captures the per-turn snapshot — inputs, decisions, gate outcomes, tool results — for replay, review, and audit.

Governance Zones

Layers name the concerns. Governance zones name the owners and show how authority flows between them. In Neura, the core workflow zone sits inside a safety zone that holds structural override authority, a compliance zone that pins the PHI boundary, and a review zone that runs alongside the workflow rather than downstream of it.

Core workflow zone

Owns intake, matching, booking, payment, and follow-up. Operates under the constraints set by the surrounding zones.

Safety zone

Owns risk-tier evaluation and override authority. Can make matching and booking structurally unavailable above defined thresholds.

Compliance zone

Owns the PHI boundary, consent records, and data-handling clauses inherited from the BAA and Healthcare Addendum.

Review zone

Owns clinician-adjacent oversight: provider verification, escalation handling, and sign-off on behavior changes.

Seven Control Planes

Control planes are the design-time vocabulary the team uses to discuss what the system must enforce. Each plane names one category of constraint and maps to a runtime enforcement point. When a constraint is violated, the answer to "which plane caught it?" is concrete — not "the prompt."

Safety plane

Evaluates risk every turn; holds structural override authority over matching, booking, and routing.

Compliance plane

Pins the PHI boundary, consent capture, and data-handling rules inherited from the BAA and addendum.

Context plane

Governs which session, profile, and safety inputs are assembled into the structured context the agent reasons over.

Tool plane

Pins each tool to its phase, its gate requirements, and its audit shape. Out-of-phase tool calls are unavailable.

Review plane

Defines who handles escalation, who approves recommendations, and who signs off on behavior changes.

Audit plane

Owns the per-turn snapshot, state-recording schema, and replay hooks that make every turn inspectable.

Release plane

Owns versioned prompts, gate thresholds, tool contracts, and the evidence pack that gates a release.

Phase Boundaries and Runtime Gates

A prompt tells the agent what it should do; a phase boundary determines what it is allowed to do. Each Neura workflow phase exposes a different set of tools, requires a different set of gates, and produces a different shape of recorded state. The three screens below mark phase transitions, not feature buckets.

Discovery phase — intake to shortlist

Tools available in this phase: directory query, eligibility filter, matching engine. Tools structurally unavailable: booking, invoicing, payment. Readiness gate: the Client Signal must be complete enough to score. Safety gate: risk tier must be clear or elevated; high-risk and crisis paths bypass matching and route into the escalation lane.

1

Conversational intake

Adaptive dialogue collects presenting concerns, preferences, logistics, and prior therapy context. Output: a structured Client Signal.

2

Per-turn safety evaluation

Risk tier is recomputed every turn against a versioned safety policy. The tier governs which downstream phases remain available.

3

Eligibility and matching

Provider directory is filtered by insurance, location, and credential. The Matching Orchestrator produces a ranked shortlist with rationale.

4

Shortlist with explainability

Each recommendation carries which factors contributed, how they were weighted, and why alternatives were deprioritized.

Selection and booking phase

Tools available: slot lookup, hold, confirm booking, telehealth link provisioning via VideoSDK. Gate requirements: provider selected from the shortlist, identity confirmed, consent recorded, safety state still clear or elevated.

1

Provider selection

The person chooses a provider from the explained shortlist. Real-time availability is checked against the booking system.

2

Slot confirmation

Available slots are presented across both schedules. Holding a slot is a tool call gated by the consent record.

3

Booking confirmation

The booking tool commits the session, provisions the VideoSDK link, and writes a confirmed-state record both parties can rely on.

Invoicing, payment, and follow-up phase

Tools available: invoice generation, payment-gateway capture, reminder dispatch via Firebase Cloud Messaging. Gate requirements: booking is confirmed; payment status is reconciled before reminders engage; rescheduling routes back through the booking phase rather than improvising a new path.

Neura bookings and payment screen — confirmed sessions with invoice and reminder lifecycle

1

Invoice generation

Invoice is produced from session type, provider rate, and insurance applicability. The tool contract pins which fields are written and where.

2

Payment capture

Payment is processed via a third-party gateway. Retries, failures, and receipts are reconciled into recorded state, not held only in the gateway.

3

Reminders and follow-up

Pre- and post-session messages are dispatched through Firebase Cloud Messaging on consented channels, with rescheduling looped back to the booking phase.

Safety Override Authority and Risk Tiers

Safety evaluation runs every turn against a policy versioned independently of the navigator prompt. It holds authority the agent cannot override: when risk reaches the high or crisis threshold, matching, booking, and routing become structurally unavailable, regardless of what the prompt or the user input asks for.

Risk tiers and what each tier structurally permits

Crisis

Normal flow is halted. The person is routed into the escalation lane with crisis resources surfaced immediately and a human contact path engaged.

High risk

Matching is restricted to appropriately credentialed providers; booking requires reviewer-on-the-loop. Recorded state flags the turn for clinical sign-off.

Elevated

Full workflow remains available, with matching weights adjusted to prioritize providers with relevant experience. Safety-state writes are visible to review.

Standard

Full provider pool, standard matching weights, all phase-gated tools available subject to their own readiness and consent gates.

Why the authority is structural, not instructional

Prompt-level constraints require the agent's cooperation to take effect; structural authority does not.
Risk-tier policy is versioned, reviewed, and approved as its own artifact — not edited inside navigator prompts.
Escalation triggers commit a state write, a route change, and a human owner; none of those are inferable from chat text alone.

Matching Logic and Explainability

Provider matching is structured reasoning over a deliberately assembled context object, not keyword search and not free-form LLM output. The matching layer reads from three authority-classified inputs and produces a shortlist whose rationale can be read by the person navigating care and by a reviewer auditing the turn.

Structured matching inputs

Provider Genome: specializations, modalities, experience, availability, populations served, session formats — durable profile data, baseline trust.
Client Signal: extracted needs, severity indicators, preferences, constraints, prior-therapy history — session-derived signal.
Patient Profile Context: longitudinal history when present, governed by consent and the compliance plane.

What the matching layer produces

Therapeutic Fit Engine: scores compatibility across clinical, logistical, and preference dimensions.
Matching Orchestrator: combines inputs in declared precedence and produces a ranked shortlist.
Explainability record: every recommendation carries which factors contributed, how they were weighted, and why alternatives were deprioritized.

Bounded Execution and Memory Scope

The action surface of Neura lives in typed tool contracts, not in agent reasoning. Each tool declares the phase that allows it, the gate conditions that must be true before it runs, and the audit shape it leaves behind. Memory is partitioned by authority class so that safety state cannot be displaced by session chatter, and durable profile data is not silently re-derived from one conversation.

Tool contracts pin three properties

Phase boundary: the tool is only callable inside its declared phase; out-of-phase calls produce a defined blocked state.
Gate requirements: safety state, readiness, consent, and identity conditions that must hold before execution.
Audit shape: every execution leaves a typed record — inputs, outputs, side effects, and downstream state writes.

Memory partitioned by authority class

Session-derived signal accumulates per turn and is the lowest-trust input.
Durable profile data loads from persistent store with baseline trust and is not overwritten from chat.
Safety state holds override authority; the assembly gate integrates these classes in declared precedence.

Prompt Discipline and Versioned Behavior

Neura's prompts are behavioral specifications, not content. A prompt change is a behavior change, reviewed and versioned alongside gate thresholds, tool contracts, and the safety policy. Treating prompt edits as content edits is how systems drift invisibly between reviews; Neura's release plane refuses that path.

                        What the release plane versions together
                        Navigator and supporting prompts
Safety policy and risk-tier thresholds
Phase definitions and tool contracts
Context-assembly rules and authority precedence
Escalation rules and named human owners

                    

State Recording and Replay

Recording is designed into the system before the first interaction is processed; it is not bolted on later. Every turn produces a snapshot the team and reviewers can replay end-to-end: the assembled context the agent saw, the gate decisions that fired, the tool that ran, and the state written. Without that record, quality assurance becomes argument from screenshots.

Per-turn snapshot

The exact assembled context, prompt version, and policy version the agent reasoned against on each turn.

Gate decisions

Which gates evaluated, which passed, which blocked, and the inputs each gate used to decide.

State transitions

Each workflow state change — intake, matching, booking, payment — with timestamps and triggering conditions.

Safety interventions

Every risk-tier assignment, override decision, and escalation route, including the signals that triggered it.

                        Replay-ready by design
                        LangSmith traces LLM calls, token usage, and response quality against the snapshot.
Decision lineage runs from intake input through provider recommendation to booking confirmation in a single replay.
Audit records are shaped to support clinical review and incident response — not just engineering debugging.

                    

Persona QA and the Evidence Pack

The release gate for the first Neura slice was not a passing demo. It was an evidence pack: a set of realistic personas walked through the full workflow, with every turn's snapshot, gate outcome, tool call, and state write inspected against the role boundary. A slice was releasable because that pack produced clean evidence, not because a meeting went well.

How personas were designed

Coverage across risk tiers: standard, elevated, high-risk, crisis — each persona pinned to the structural path it should take.
Coverage across phase transitions: intake-only, intake-to-match, match-to-book, book-to-pay, post-session reschedule.
Edge personas designed to probe role drift: requests for diagnosis, requests to bypass consent, requests to act outside phase.

What the evidence pack proves

The agent stayed inside its boundary on every turn, or the system blocked it — with a recorded reason.
Safety authority fired where it should and only where it should; risk-tier thresholds were neither over- nor under-tuned.
Every tool call had its required gates, every state write had its audit record, every escalation reached its named owner.

Technology Stack

Neura's stack was chosen for what it allows the team to enforce, not just for what it allows the agent to do. LangGraph carries the phase machine; LangSmith carries the replay surface; the rest of the stack is shaped to the same audit and recovery requirements.

PostgreSQL

Primary data store & audit records

Django

Application framework

LangGraph / LangChain

Phase state machine & gate authority

LangSmith

Per-turn snapshot & replay

VideoSDK

Telehealth session provisioning

Firebase Cloud Messaging

Consented reminders & follow-up

Outcome

Neura cleared the harder bar: a release that survives an approval review, not just a demo. The first slice was narrow, end-to-end, and produced inspectable evidence on every turn — the conditions that let the team expand the workflow afterward without rebuilding the control system to do it.

For the person navigating care

One coherent conversation from intake through booking, payment, and follow-up.
Safer by structure: distress signals change what the system can do, not just what it says.
Explained recommendations: every match comes with a readable reason.

For the governed product

BAA-bounded from day one, with the PHI boundary encoded into architecture and tool contracts.
Phase-gated, replay-ready: every turn is inspectable; clinical and compliance review work from the same record engineering does.
Expandable without retrofit: the first slice exercised the full control path, so new workflows extend the system rather than rebuild it.

The Problem

What the person navigating care needs

What the governed platform owes

Compliance Envelope

What the signed BAA + Healthcare Addendum bind

Delivery Model and Role Contract

Risk tier classification

Role boundary (paired lists)

The agent may

The agent must not

Authority level, readiness, and escalation

First Care-Flow Slice

Conversational intake

Signal extraction and safety evaluation

Provider matching with rationale

Selection, booking, payment

Reminders and post-session follow-up

Recorded state, replay-ready

Four-Layer Architecture

Agent reasoning

Workflow gating

Bounded execution

State recording

Governance Zones

Core workflow zone

Safety zone

Compliance zone

Review zone

Seven Control Planes

Safety plane

Compliance plane

Context plane

Tool plane

Review plane

Audit plane

Release plane

Phase Boundaries and Runtime Gates

Discovery phase — intake to shortlist

Conversational intake

Per-turn safety evaluation

Eligibility and matching

Shortlist with explainability

Selection and booking phase

Provider selection

Slot confirmation

Booking confirmation

Invoicing, payment, and follow-up phase

Invoice generation

Payment capture

Reminders and follow-up

Safety Override Authority and Risk Tiers

Risk tiers and what each tier structurally permits

Crisis

High risk

Elevated

Standard

Why the authority is structural, not instructional

Matching Logic and Explainability

Structured matching inputs

What the matching layer produces

Bounded Execution and Memory Scope

Tool contracts pin three properties

Memory partitioned by authority class

Prompt Discipline and Versioned Behavior

What the release plane versions together

State Recording and Replay

Replay-ready by design

Persona QA and the Evidence Pack

How personas were designed

What the evidence pack proves

Technology Stack

Outcome

For the person navigating care

For the governed product

Build agents that pass the approval review, not just the demo.