GoofyCubes
14

Chapter 14

Healthcare Data Protection, PHI, and Responsible AI

Learning Objective

Understand how to protect sensitive healthcare data in GenAI systems.

What it means

Healthcare GenAI systems may process protected health information such as names, dates of birth, member IDs, diagnoses, medications, clinical notes, and provider details. Data protection requires technical controls, governance, access control, monitoring, and careful vendor selection.

Why it matters

AI systems can unintentionally expose sensitive information through prompts, logs, outputs, vector stores, or debugging tools. A responsible architecture minimizes data exposure and preserves auditability.

Healthcare Example

Before sending a clinical note to an LLM, a system may mask direct identifiers and keep only the minimum information needed for the task. Full identifiers remain in a secure internal system.

Architecture Flow

Sensitive DocumentPHI DetectionMasking / RedactionMinimum Necessary ContextPrivate Model EndpointEncrypted LogsAccess-controlled Output

Code: Simple PHI Masking

import re

def mask_member_id(text):
    return re.sub(r"\b[A-Z]{1,3}\d{6,12}\b", "[MEMBER_ID]", text)

def mask_dob(text):
    return re.sub(r"\b\d{1,2}/\d{1,2}/\d{4}\b", "[DATE]", text)

note = "Patient DOB 05/12/1978. Member ID ABC123456789 has diabetes."
print(mask_member_id(mask_dob(note)))

Common Mistakes

  • Logging raw prompts with PHI.
  • Storing sensitive text in vector databases without encryption.
  • Not defining data retention rules.
  • No role-based access control.
  • No human review for high-risk outputs.

Interview Q&A

Q: How do you secure healthcare data in a GenAI system?

A: I apply minimum necessary data sharing, masking, encryption, role-based access, private endpoints, audit logging, retention controls, and output validation.

Q: Should PHI be sent to an external LLM?

A: Only if approved by legal/security policies, covered by proper agreements, and protected by enterprise controls. Otherwise use de-identification or private deployment options.

Architect Takeaway

Healthcare AI architecture must treat prompts, embeddings, logs, and outputs as sensitive data surfaces.