Chapter 7

RAG: Retrieval-Augmented Generation

Learning Objective

Understand how RAG grounds LLM responses in trusted enterprise knowledge.

What it means

RAG combines information retrieval with text generation. Instead of expecting the LLM to know everything, the system retrieves relevant documents and provides them as context before the model answers.

Why it matters

RAG reduces hallucination, supports current business knowledge, and allows the model to cite sources. It is especially useful when policies, procedures, guidelines, or product information change frequently.

Healthcare Example

A healthcare assistant receives a question about a clinical policy. RAG retrieves the relevant medical policy sections and the model answers only from those sections, instead of guessing from general knowledge.

Architecture Flow

Documents→Chunking→Embeddings→Vector DB→User Question→Semantic Search→Relevant Chunks→LLM→Grounded Answer

Code: Simple RAG Skeleton

# Pseudo-code for RAG flow
def rag_answer(question, vector_db, llm):
    relevant_chunks = vector_db.search(question, top_k=5)
    context = "\n\n".join(chunk.text for chunk in relevant_chunks)

    prompt = f"""
    Answer only using the context below.
    If the answer is not in the context, say 'Information not found'.

    Context:
    {context}

    Question:
    {question}
    """
    return llm.generate(prompt)

Common Mistakes

Poor chunking strategy.
Retrieving too many irrelevant chunks.
Not storing source metadata.
No reranking.
Allowing the model to answer outside retrieved context.

Interview Q&A

Q: What is RAG?

A: RAG retrieves relevant enterprise knowledge and gives it to the LLM so the response is grounded in trusted documents.

Q: Why use RAG instead of fine-tuning?

A: RAG is better for frequently changing knowledge because documents can be updated without retraining the model.

Architect Takeaway

RAG is not just vector search. It is a controlled evidence pipeline that includes ingestion, chunking, retrieval, grounding, citation, and validation.

Ch 6: Prompt Engineering and Structured Outputs

Ch 8: Embeddings, Chunking, and Vector Databases