Indexer Node
The Indexer Node is like a smart librarian that takes long documents and organizes them into searchable sections. It breaks your content into manageable chunks and prepares them for AI to search through and understand.
This is the essential first step for building any AI knowledge base or document search system.
How it works
Section titled “How it works”The node takes your documents, intelligently splits them at natural break points (like paragraphs), and converts each chunk into a searchable format that AI can understand and find relevant information from.
graph LR Document[Long Document] --> Split[Smart Splitting] Split --> Index[Create Search Index] Index --> Ready[Searchable Chunks] style Index fill:#6d28d9,stroke:#fff,color:#fff
Setup guide
Section titled “Setup guide”-
Provide Your Content: Connect documents, web pages, or any text content you want to make searchable.
-
Choose Chunk Size: Decide how big each searchable piece should be (1000 characters works well for most content).
-
Set Overlap: Choose how much chunks should overlap to maintain context between sections.
-
Select AI Model: Choose an embedding model to convert text into searchable format.
Practical example: Company knowledge base
Section titled “Practical example: Company knowledge base”Let’s create a searchable knowledge base depending on what kind of documents you have.
For General Business Docs:
- Chunk Size: 1000 characters (about 2-3 paragraphs).
- Overlap: 200 characters (helps keep context between chunks).
- Best For: Policies, procedures, and FAQs.
For Technical Manuals:
- Chunk Size: 800 characters (smaller chunks for specific instructions).
- Overlap: 150 characters.
- Separators: Split by paragraphs or new lines.
For Research Papers:
- Chunk Size: 1200 characters (larger chunks to keep complex ideas together).
- Overlap: 300 characters.
- Goal: Detailed analysis and understanding.
Common configurations
Section titled “Common configurations”| Content Type | Chunk Size | Overlap | Best For |
|---|---|---|---|
| General Business Docs | 1000 | 200 | Policies, procedures, FAQs |
| Technical Documentation | 800 | 150 | User manuals, API docs |
| Research Papers | 1200 | 300 | Academic content, detailed analysis |
| Customer Support | 600 | 100 | Quick answers, troubleshooting |
Configuration settings
Section titled “Configuration settings”| Setting | Purpose | Recommended Values |
|---|---|---|
| Chunk Size | How big each searchable piece is | 800-1200 characters |
| Chunk Overlap | How much pieces overlap | 150-300 characters |
| Separators | Where to split content | Paragraphs, sentences, sections |
| Embedding Model | AI model for search capability | OpenAI text-embedding-ada-002 |
Real-world examples
Section titled “Real-world examples”Employee handbook search
Section titled “Employee handbook search”Make company policies instantly searchable:
Input: Employee handbook PDFChunk Size: 1000 (good for policy sections)Overlap: 200 (maintains context)Result: Searchable HR knowledge baseTechnical documentation
Section titled “Technical documentation”Create searchable API documentation:
Input: Technical documentationChunk Size: 800 (shorter for specific instructions)Separators: ["##", "###", "\n\n"] (respects heading structure)Result: Instant technical support systemResearch database
Section titled “Research database”Build searchable academic paper collection:
Input: Research papers and articlesChunk Size: 1200 (longer for academic context)Overlap: 300 (important for research continuity)Result: AI-powered research assistantTroubleshooting
Section titled “Troubleshooting”- Poor chunk quality: Adjust chunk size and overlap settings, or customize separators for your document type.
- Slow processing: Reduce document size, process in smaller batches, or use local embedding models.
- Missing context: Increase chunk overlap to maintain better connections between sections.
- Memory issues: Process large documents in smaller segments or reduce the maximum number of chunks.