Chapter 8
Embeddings, Chunking, and Vector Databases
Learning Objective
Learn how documents become searchable knowledge for RAG.
What it means
Embeddings convert text into numerical vectors that capture semantic meaning. Chunking splits large documents into smaller sections. A vector database stores embeddings and supports similarity search.
Why it matters
Good retrieval depends on good chunks and metadata. If chunks are too large, retrieval may include irrelevant text. If chunks are too small, context may be incomplete. The vector database must also store metadata such as source, page number, document type, and version.
Healthcare Example
A medical policy document can be split by headings such as Eligibility, Required Documentation, Exclusions, and Review Criteria. Each chunk includes source name, page, and effective date.
Architecture Flow
Code: Text Chunking with Overlap
def chunk_text(text, chunk_size=800, overlap=100):
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunks.append(text[start:end])
start = end - overlap
return chunks
policy_text = "Medical policy text..." * 200
chunks = chunk_text(policy_text)
print(len(chunks))Common Mistakes
- No overlap between chunks.
- No metadata stored with embeddings.
- Using only vector similarity without keyword filters.
- Ignoring document version and effective date.
Interview Q&A
Q: What are embeddings?
A: Embeddings are vector representations of text that allow semantic similarity search.
Q: Why is chunking important?
A: Chunking controls what evidence is retrieved and sent to the LLM. Poor chunking leads to poor answers.
Architect Takeaway
Retrieval quality starts before the LLM call. Document preparation is one of the most important parts of a RAG system.