Recursive Character Text Splitter
The Recursive Character Text Splitter intelligently breaks long documents into smaller, manageable chunks while keeping related content together. It’s like having a smart librarian who knows exactly where to split a book so each section makes sense on its own.
This is the go-to choice for preparing documents for AI processing, knowledge bases, and search systems.
How it works
Section titled “How it works”The splitter tries different ways to break your text, starting with natural boundaries like paragraphs, then sentences, and finally individual words if needed. It ensures each chunk is the right size while preserving meaning.
graph LR Document[Long Document] --> Analyze[Find Break Points] Analyze --> Split[Smart Splitting] Split --> Chunks[Perfect Chunks] style Split fill:#6d28d9,stroke:#fff,color:#fff
Setup guide
Section titled “Setup guide”-
Set Chunk Size: Choose how big each piece should be (1000 characters works well for most AI models).
-
Add Overlap: Set how much chunks should overlap to maintain context between sections.
-
Choose Separators: Pick where to split (paragraphs, sentences, or let it decide automatically).
-
Process Your Text: The splitter will create perfectly sized chunks ready for AI processing.
Practical example: Preparing documents for AI knowledge base
Section titled “Practical example: Preparing documents for AI knowledge base”Let’s split a technical document into chunks perfect for AI search and retrieval.
Option 1: General Documents
- Chunk Size: 1000 characters.
- Overlap: 200 characters.
- Best For: Business documents, articles, stories.
Option 2: Technical Docs
- Chunk Size: 800 characters (smaller for precision).
- Separators: Paragraphs, then sentences, then punctuation.
- Best For: Manuals, instructions, specs.
Option 3: Code Documentation
- Chunk Size: 1200 characters (larger to keep code blocks intact).
- Separators: Code blocks (
```), then paragraphs. - Best For: API docs, tutorials with code.
Why use smart splitting
Section titled “Why use smart splitting”| Smart Splitting (Recursive) | Simple Splitting |
|---|---|
| Keeps related content together | May break sentences mid-way |
| Respects document structure | Ignores natural boundaries |
| Maintains context between chunks | Can lose important connections |
| Works with different content types | One-size-fits-all approach |
Configuration settings
Section titled “Configuration settings”| Setting | Purpose | Recommended Values |
|---|---|---|
| Chunk Size | Maximum characters per chunk | 800-1200 for most AI models |
| Chunk Overlap | Characters shared between chunks | 150-300 to maintain context |
| Separators | Where to split text | Start with paragraphs, then sentences |
Real-world examples
Section titled “Real-world examples”Knowledge base preparation
Section titled “Knowledge base preparation”Split company documentation for AI search:
Input: Employee handbook, policies, proceduresChunk Size: 1000 (good for general business content)Overlap: 200 (maintains policy context)Result: Searchable knowledge base chunksResearch paper processing
Section titled “Research paper processing”Prepare academic papers for analysis:
Input: Research papers and articlesChunk Size: 1200 (longer for academic context)Overlap: 300 (important for research continuity)Result: AI-ready research databaseTechnical documentation
Section titled “Technical documentation”Process API docs and technical guides:
Input: Technical documentation with code examplesChunk Size: 800 (shorter for specific instructions)Separators: Respect code blocks and sectionsResult: Precise technical search systemTroubleshooting
Section titled “Troubleshooting”- Chunks too large or small: Adjust chunk size based on your AI model’s requirements and content type.
- Lost context between chunks: Increase overlap to preserve more connections between sections.
- Poor splitting quality: Customize separators to better match your document structure.
- Slow processing: Reduce chunk overlap or process documents in smaller batches.