Chapter 8

Embeddings, Chunking, and Vector Databases

Learning Objective

Learn how documents become searchable knowledge for RAG.

What it means

Embeddings convert text into numerical vectors that capture semantic meaning. Chunking splits large documents into smaller sections. A vector database stores embeddings and supports similarity search.

Why it matters

Good retrieval depends on good chunks and metadata. If chunks are too large, retrieval may include irrelevant text. If chunks are too small, context may be incomplete. The vector database must also store metadata such as source, page number, document type, and version.

Healthcare Example

A medical policy document can be split by headings such as Eligibility, Required Documentation, Exclusions, and Review Criteria. Each chunk includes source name, page, and effective date.

Architecture Flow

PDF / HTML / DOCX→Text Extraction→Clean Text→Chunk Text→Generate Embeddings→Store in Vector DB with Metadata

Code: Text Chunking with Overlap

def chunk_text(text, chunk_size=800, overlap=100):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

policy_text = "Medical policy text..." * 200
chunks = chunk_text(policy_text)
print(len(chunks))

Common Mistakes

No overlap between chunks.
No metadata stored with embeddings.
Using only vector similarity without keyword filters.
Ignoring document version and effective date.

Interview Q&A

Q: What are embeddings?

A: Embeddings are vector representations of text that allow semantic similarity search.

Q: Why is chunking important?

A: Chunking controls what evidence is retrieved and sent to the LLM. Poor chunking leads to poor answers.

Architect Takeaway

Retrieval quality starts before the LLM call. Document preparation is one of the most important parts of a RAG system.

Ch 7: RAG: Retrieval-Augmented Generation

Ch 9: LangChain for Simple GenAI Pipelines