Agentic Process Intelligence · Healthcare Distribution
Agentic PI
Ready
From Anomaly to Action
How AI agents detect, reason, and act — reliably. A 5-stage pipeline for enterprise agentic process automation with calibrated confidence.
STAGE 1–2
Detect
→
STAGE 3
Reason
→
STAGE 4
Validate
→
STAGE 5
Execute
→
CONTINUOUS
Learn
Key Design Principle
The LLM only touches Stages 3 and 4. It never owns the facts (Stages 1–2) and never directly executes system actions (Stage 5). This separation is what makes the system auditable.
Cases Analyzed
—
order traces
Events Mined
—
raw log entries
Anomalous Cases
—
process deviations
Max Dwell Gap
—
minutes
AI-Generated Executive Headline
Annual Recoverable Value
Conservative
—
—
Upside
—
Pattern Distribution
Stage 1–2: Anomaly Detection & Root Cause
Deterministic process mining over raw event logs. No LLM involved. Every fact in the payload is a deterministic output from log data.
STEP 1
Ingest Events
ERP/WMS emits events with timestamps, resources, quantities. Events are grouped by case ID and sorted chronologically.
STEP 2
Build Process Graph
Extract activity sequences, compute transition counts. The process graph shows how cases actually flow — not how they should.
STEP 3
Conformance Check
Compare each case to the most frequent path. Detect loops, dwell gaps, quantity mismatches. Flag deviations with evidence.
STEP 4
Anomaly Payload
Package all signals as structured JSON — anomaly type, severity, evidence event IDs. This payload is what the LLM will receive.
Critical
By this point, no LLM has been involved. Every fact in the payload is a deterministic output from log data. The LLM receives evidence — it does not discover it.
Process Flow Diagram
Case-Level Conformance Check
Structured Payload Sent to LLM
This is exactly what the LLM agent receives — a pre-computed, schema-validated JSON object. No raw event rows. Only analytical signals.
Most Frequent Path (Happy Path)
The "happy path" is not a prescribed or correct sequence — it is the most frequently observed path in the data. It represents how the majority of cases actually flow through the system, derived empirically from event log frequencies.
All Transitions (sorted by frequency)
Variant Explorer
Explore process variants — select one or more to see their flow and compare deviations from the most frequent path.
Variant Explorer
0% cases covered
Variant
Count
Coverage
Pattern
Selected variant flow
Select a variant to view its process flow
Stage 3: LLM Reasoning
The LLM receives the structured anomaly payload and produces schema-validated JSON using function calling — not free text.
Without Function Calling
"Based on the following events, determine the cause and recommend an action."
Model returns free text. Any action string is possible. No schema enforcement. No auditability. The model can hallucinate actions that don't exist in your system.
With Function Calling
Model must return a schema-validated JSON object. Action field is an enum. rationale_ids must reference real event IDs.
Why This Matters
Function calling enforces the action space at the API contract level. The model cannot return an action outside the enum — the API call fails structurally before any downstream system is touched.
Expected Output Schema
{
"findings": [{
"pattern_id": "ghost_pick | cold_chain | dea_rework | credit_cascade | otif_fail",
"title": "short executive-facing title (max 8 words)",
"affected_cases": ["ORD-XXXX"],
"root_cause": "operational explanation (2-3 sentences)",
"downstream_consequence": "business and compliance impact",
"process_intelligence_signal": "specific PI signal that reveals this",
"recommended_action": "one concrete, actionable fix",
"urgency": "critical | high | medium"
}],
"executive_summary": {
"headline": "one sentence, dollar-impact framing",
"key_insight": "the single most important non-obvious finding"
}
}
Run the Analysis
Send the structured PI payload to the LLM agent. The agent reasons over anomaly signals and returns structured findings in JSON.
Findings will appear here after running the analysis.
Stage 4: Confidence Calibration & Routing
An LLM's self-reported confidence is just another generated token. We replace it with a calibrated score derived from verifiable, external signals.
The Confidence Problem
A model can be 94% confident and completely wrong. Confidence without calibration is worse than no confidence — it suppresses human review on exactly the cases that need it.
The Miscalibration Gap
What the LLM says
What the data shows
Gap
Self-reported: 0.91
Actual accuracy: 0.63
-0.28 (dangerous)
Self-reported: 0.72
Actual accuracy: 0.71
-0.01 (calibrated)
Self-reported: 0.55
Actual accuracy: 0.80
+0.25 (under-confident)
Four Independent Calibration Checks
CHECK 1
Faithfulness
RAGAS / TruLens: is each claim in the LLM's rationale supported by the provided event data? If the LLM says the invoice was posted before GR but the log shows the opposite, score drops sharply.
CHECK 2
Re-ranker
Cross-encoder model scores how relevant each cited event ID is to the anomaly type. Catches hallucinated but syntactically valid rationale_ids that aren't causally related.
CHECK 3
Sampling
Run the same case 5 times at temperature 0.7. Measure semantic entropy over action choices. A model that flips between actions is less trustworthy than one that's consistent.
CHECK 4
Meta-Model
XGBoost classifier trained on resolved cases. Inputs: LLM confidence, faithfulness, relevance, entropy, structural features. Output: calibrated probability that the action is correct.
The Key Insight
You cannot calibrate an LLM's confidence from inside the LLM. You calibrate it from the outside, using evidence the LLM cannot fabricate. The meta-model's output probability replaces the LLM's self-reported score.
Three-Way Routing
> 0.85
Auto-Execute
Action passed to deterministic execution layer. Full audit trail logged.
0.60 – 0.85
Human-in-the-Loop
Case presented to analyst with evidence, rationale, and confidence breakdown. One-click approve or override.
< 0.60
Escalation
Routed to senior analyst. LLM output shown as suggestion only, not recommendation.
Dollar-Value Override
Regardless of confidence, transactions above a defined threshold always require human approval. Non-negotiable for compliance.
Routing Simulation
After running the LLM analysis (Stage 3), each finding is scored and routed. This simulation shows how calibrated confidence drives the routing decision.
Run the LLM analysis in Stage 3 first to see routing simulation.
Stage 5: Bounded Execution
When the routing decision is 'auto-execute', the action intent passes to a deterministic orchestration layer — not directly to the ERP. This is the last line of defence.
GATE 1
Precondition Check
Business rules run independently of the LLM. "A credit memo cannot be issued if the original invoice is already paid." Enforced in code, not in a prompt.
GATE 2
Reversibility Flag
Irreversible actions (permanent blocks, large write-offs, supplier blacklisting) are always flagged and require second human confirmation — regardless of confidence.
GATE 3
Execute
Deterministic API call to the target system (ERP, WMS). The LLM's recommendation has already been validated. This is a direct system operation, not a prompt.
GATE 4
Audit Log
Every action — auto-executed, analyst-approved, or overridden — is logged with: timestamp, action, case_id, confidence score, who approved, outcome.
The LLM's Role Ended at Stage 3
From here, a deterministic orchestration layer validates and executes. Every action is pre-checked, flagged for reversibility, and logged. The LLM's recommendation is ignored if preconditions are not met.
Simulated Audit Trail
Each finding from the LLM analysis flows through the execution pipeline. Below shows the audit trail for each action.
Run the LLM analysis in Stage 3 first to see the audit trail.
Feedback Loop: The System That Improves Itself
Every resolved case — auto-resolved, analyst-approved, or overridden — is ground truth. This data continuously improves every component.
LOOP 1
Case Resolved
Action taken (by agent or analyst) is recorded with outcome: correct, overridden, or escalated.
LOOP 2
Ground Truth Label
Was the LLM's recommended action correct? This becomes a training row for the meta-model.
LOOP 3
Meta-Model Retrain
XGBoost classifier updated monthly on new labelled cases. Calibration thresholds reviewed and adjusted.
If override rate rises above 5%, routing thresholds tighten. If it drops below 1%, they can relax.
What Separates an Agentic Pipeline from a Chatbot
The system has memory, learns from outcomes, and self-calibrates. The goal is not a system that is always right. The goal is a system that knows when it is likely to be wrong — and routes those cases to a human before acting.
Simulated Improvement Over Time
Model Accuracy
87% → 93%
after 3 monthly retrains
False Escalation Rate
12% → 5%
fewer unnecessary human reviews
Auto-Resolve Rate
41% → 68%
more cases handled autonomously
Override Rate
8% → 2.1%
analyst corrections declining
Key Takeaways
The Problem
The Engineering Response
LLM confidence is self-reported
Replace with a meta-model trained on ground truth outcomes
Action space is open-ended
Enforce via function calling schema — not prompting
Rationale can hallucinate
Validate rationale_ids against event log before acting
High-value actions need humans
Dollar-value overrides are non-negotiable for audit
System improves over time
Every resolved case is training data — close the loop
ROI Opportunity
Quantified value at risk across all detected process failure patterns. Low / high scenario modeling.
Assumptions
Pattern-Level Breakdown
Pattern
Primary Driver
Annual Low
Annual High
Raw Event Log
The source of truth — flat, unlabeled, no annotations. Process Intelligence derives all findings from this data.