GoofyCubes
11

Chapter 11

Agent Memory, Context Management, and Context Windows

Learning Objective

Learn how agents remember useful information without overwhelming the LLM context window.

What it means

Agent memory refers to how an AI system stores and uses information across steps or sessions. Context management is deciding what information should be included in each model call. The context window is limited, so a system should not send everything to the LLM.

Healthcare Example

For a clinical review assistant, short-term memory may include case ID, document type, extracted diagnosis, and missing fields. Long-term memory may include previous similar cases or policy history. The prompt should include only what is relevant to the current decision.

Types of Memory

Memory TypePurposeStorage
Short-term workflow memoryCurrent case stateLangGraph state, Redis
Long-term memoryHistorical knowledge or preferencesSQL database, vector database
Conversation memoryRecent chat historySession storage, summaries
Audit memoryTrace of actions and decisionsLogs, database records

Code: Workflow State Management

case_state = {
    "case_id": "CASE-1001",
    "document_type": "clinical_note",
    "diagnosis_code": "E11.9",
    "missing_fields": [],
    "confidence": 0.91,
    "next_action": "review"
}

def update_state(state, key, value):
    state[key] = value
    return state

case_state = update_state(case_state, "review_reason", "Confidence below threshold")
print(case_state)

Common Mistakes

  • Putting all history into every prompt.
  • No separation between workflow state and long-term memory.
  • No summarization strategy.
  • No data retention policy.
  • Storing sensitive data without access controls.

Interview Q&A

Q: How do you manage agent memory?

A: I separate workflow state, short-term conversation context, long-term knowledge, and audit history. I use retrieval and summaries instead of sending everything to the model.

Q: How do you handle context window limits?

A: I use chunking, retrieval, summarization, token budgeting, and strict prompt construction.

Architect Takeaway

The goal is not to remember everything. The goal is to retrieve the right information at the right time with the least risk and cost.