Jahanzaib
RAG & Retrieval

Chunking

Splitting documents into smaller pieces before embedding so retrieval returns relevant fragments.

Last updated: April 26, 2026

Definition

Chunking decides what unit gets retrieved. Too big (whole documents): retrieved chunks contain mostly irrelevant text and waste context window. Too small (single sentences): chunks lack context and retrieval quality drops. Common strategies: fixed-size (500-1000 tokens with overlap), semantic (split at paragraph/heading boundaries), and structure-aware (treat code, tables, and prose differently). The right strategy depends on your document type. Most teams iterate on chunking 3-5 times before they get retrieval quality right.

Code Example

python
# Recursive character text splitter. Start here
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " "],
)
chunks = splitter.split_text(document)

Start with 1000-token chunks with 200-token overlap. Iterate based on retrieval quality.

When To Use

Required for any RAG system. Spend the time to get this right. Chunking quality dominates retrieval quality.

Related Terms

Building with Chunking?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.