Chapter 14

Healthcare Data Protection, PHI, and Responsible AI

Learning Objective

Understand how to protect sensitive healthcare data in GenAI systems.

What it means

Healthcare GenAI systems may process protected health information such as names, dates of birth, member IDs, diagnoses, medications, clinical notes, and provider details. Data protection requires technical controls, governance, access control, monitoring, and careful vendor selection.

Why it matters

AI systems can unintentionally expose sensitive information through prompts, logs, outputs, vector stores, or debugging tools. A responsible architecture minimizes data exposure and preserves auditability.

Healthcare Example

Before sending a clinical note to an LLM, a system may mask direct identifiers and keep only the minimum information needed for the task. Full identifiers remain in a secure internal system.

Architecture Flow

Sensitive Document→PHI Detection→Masking / Redaction→Minimum Necessary Context→Private Model Endpoint→Encrypted Logs→Access-controlled Output

Code: Simple PHI Masking

import re

def mask_member_id(text):
    return re.sub(r"\b[A-Z]{1,3}\d{6,12}\b", "[MEMBER_ID]", text)

def mask_dob(text):
    return re.sub(r"\b\d{1,2}/\d{1,2}/\d{4}\b", "[DATE]", text)

note = "Patient DOB 05/12/1978. Member ID ABC123456789 has diabetes."
print(mask_member_id(mask_dob(note)))

Common Mistakes

Logging raw prompts with PHI.
Storing sensitive text in vector databases without encryption.
Not defining data retention rules.
No role-based access control.
No human review for high-risk outputs.

Interview Q&A

Q: How do you secure healthcare data in a GenAI system?

A: I apply minimum necessary data sharing, masking, encryption, role-based access, private endpoints, audit logging, retention controls, and output validation.

Q: Should PHI be sent to an external LLM?

A: Only if approved by legal/security policies, covered by proper agreements, and protected by enterprise controls. Otherwise use de-identification or private deployment options.

Architect Takeaway

Healthcare AI architecture must treat prompts, embeddings, logs, and outputs as sensitive data surfaces.

Ch 13: Prompt Injection: Mitigation and Implementation

Ch 15: Python and FastAPI for GenAI Services