Chapter 5

Hallucinations, Confidence, and Validation

Learning Objective

Learn why hallucinations occur, how to reduce them, and how confidence scoring supports safe routing.

What it means

A hallucination occurs when an LLM generates information that sounds correct but is not supported by the provided data, retrieved documents, or business rules. Hallucinations are dangerous because the response may sound confident even when it is wrong.

Why it matters

LLMs generate probable text. They are not databases and do not inherently verify truth. Hallucinations can occur due to missing context, ambiguous questions, stale knowledge, weak prompts, poor retrieval, or asking the model to answer beyond the evidence provided.

Healthcare Example

A clinical assistant is asked whether a procedure meets a policy rule. If the model says 'approved under policy section 4.2' but no such section exists, that is a hallucination. In healthcare, this can create compliance, patient safety, and financial risk.

Architecture Flow

Question→Retrieve trusted evidence→LLM generates answer→Validate citations and JSON→Confidence score→Auto response or human review

Code: Confidence Scoring

def calculate_confidence(retrieval_score, extraction_score, validation_score):
    confidence = (0.4 * retrieval_score) + (0.3 * extraction_score) + (0.3 * validation_score)
    return round(confidence * 100, 2)

score = calculate_confidence(0.96, 0.92, 1.0)
if score >= 95:
    route = "auto_process"
elif score >= 80:
    route = "review_queue"
else:
    route = "human_review"

print(score, route)

Common Mistakes

Trusting LLM self-reported confidence.
Assuming a bigger model eliminates hallucination.
No source citations.
No validation layer.
No human review for low-confidence outputs.

Interview Q&A

Q: What hallucination percentage is acceptable?

A: There is no universal percentage. For healthcare, the goal is near-zero unvalidated hallucinations through RAG, grounding, validation, and human review.

Q: Does reducing hallucination cost more?

A: Usually yes because better retrieval, larger models, validation calls, monitoring, and human review increase cost. Architects balance accuracy, cost, and latency based on risk.

Architect Takeaway

Do not solve hallucinations by model selection alone. Solve them through evidence retrieval, constraints, validation, routing, and governance.

Ch 4: LLMs, Tokens, Context Window, and Model Parameters

Ch 6: Prompt Engineering and Structured Outputs