Chunking
Splitting documents into smaller pieces before embedding so retrieval returns relevant fragments.
Last updated: April 26, 2026
Definition
Chunking decides what unit gets retrieved. Too big (whole documents): retrieved chunks contain mostly irrelevant text and waste context window. Too small (single sentences): chunks lack context and retrieval quality drops. Common strategies: fixed-size (500-1000 tokens with overlap), semantic (split at paragraph/heading boundaries), and structure-aware (treat code, tables, and prose differently). The right strategy depends on your document type. Most teams iterate on chunking 3-5 times before they get retrieval quality right.
Code Example
# Recursive character text splitter. Start here
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200,
separators=["\n\n", "\n", ". ", " "],
)
chunks = splitter.split_text(document)Start with 1000-token chunks with 200-token overlap. Iterate based on retrieval quality.
When To Use
Required for any RAG system. Spend the time to get this right. Chunking quality dominates retrieval quality.
Related Terms
Building with Chunking?
I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.