Indexer Node
Indexer Node (Document Processor)
Section titled “Indexer Node (Document Processor)”What It Does
Section titled “What It Does”The Indexer Node takes long documents and breaks them into smart, searchable chunks. Think of it as a librarian that organizes books into sections and creates a detailed catalog - it makes your documents ready for AI to search through and understand.
What Goes In, What Comes Out
Section titled “What Goes In, What Comes Out”| Name | Type | Description | Required | Default |
|---|---|---|---|---|
inputText | Text | Document content to process | Yes | - |
embeddingModel | Text | AI model for creating searchable vectors | Yes | - |
chunkSize | Number | Maximum characters per chunk | Yes | - |
chunkOverlap | Number | Characters to overlap between chunks | No | 200 |
separators | Array | How to split text (paragraphs, sentences) | No | [“\n\n”] |
Output
Section titled “Output”| Name | Type | Description |
|---|---|---|
chunks | Array | Smart text chunks with embeddings |
summary | Object | Processing statistics and info |
metadata | Object | Document details and timestamps |
Real-World Examples
Section titled “Real-World Examples”📚 Knowledge Base Builder: Turn company docs into searchable AI database
- Input: Employee handbook, policies, procedures
- Output: Searchable chunks ready for Q&A system
🔍 Research Assistant: Make academic papers searchable by concept
- Input: Research papers and articles
- Output: Organized chunks that AI can search through
💬 Smart Customer Support: Index help docs for instant answers
- Input: FAQ pages, user manuals, troubleshooting guides
- Output: Searchable knowledge base for support chatbot
How It Works
Section titled “How It Works”flowchart LR
A[📄 Long Document] --> B[✂️ Smart Chunking]
B --> C[🧠 Create Embeddings]
C --> D[📊 Searchable Chunks]
style A fill:#e3f2fd
style B fill:#fff3e0
style C fill:#f3e5f5
style D fill:#e8f5e8
Simple Process:
- Smart Splitting: Breaks documents at natural points (paragraphs, sections)
- Create Embeddings: Converts text chunks into searchable vectors
- Add Metadata: Keeps track of source, position, and relationships
- Ready for Search: Chunks are ready for AI knowledge systems
Configuration Options
Section titled “Configuration Options”Basic Settings
Section titled “Basic Settings”What to Index 📄
- Input Text: The content you want to make searchable (from documents, web pages, etc.)
- Embedding Model: The AI model that creates the searchable format (OpenAI recommended)
How to Split Content ✂️
- Chunk Size: How big each piece should be (1000 characters works well for most content)
- Chunk Overlap: How much pieces should overlap (200 characters prevents losing context)
Content Organization 🏷️
- Metadata: Extra information like document title, date, category
- Separators: Where to split (paragraphs work best for most documents)
Browser Compatibility
Section titled “Browser Compatibility”Works in all major browsers:
- ✅ Chrome: Full support with fast processing
- ✅ Firefox: Full support
- ⚠️ Safari: Limited storage for large documents
- ✅ Edge: Full support
Privacy & Security
Section titled “Privacy & Security”- 🔒 API Keys Protected: Embedding keys stored securely
- 🌐 Secure Processing: Uses encrypted connections for AI services
- 💾 Local Caching: Processed chunks cached locally for performance
- ✅ Content Validation: Input text validated for security
Try It Yourself
Section titled “Try It Yourself”Example 1: Company Knowledge Base
Section titled “Example 1: Company Knowledge Base”What you’ll build: Turn company documents into searchable AI database
Workflow:
Get All Text From Link → Indexer Node → Local Knowledge → RAG NodeSetup:
- Chunk Size: 1000 (good for general business docs)
- Chunk Overlap: 200 (maintains context)
- Embedding Model: “text-embedding-ada-002”
Result: Smart, searchable knowledge base ready for employee Q&A system.
Example 2: Research Paper Collection
Section titled “Example 2: Research Paper Collection”What you’ll build: Searchable academic paper database
Workflow:
Upload Documents → Indexer Node → Local Knowledge → Q&A NodeSetup:
- Chunk Size: 1200 (longer for academic content)
- Chunk Overlap: 300 (important for research context)
- Preserve Format: true (keeps citations and formatting)
Result: AI-powered research assistant that can find relevant papers and concepts.
Example 3: Customer Support Knowledge Base
Section titled “Example 3: Customer Support Knowledge Base”What you’ll build: Smart help documentation system
Workflow:
Get HTML From Link → Indexer Node → Local Knowledge → RAG NodeSetup:
- Chunk Size: 800 (shorter for quick answers)
- Separators: [“\n\n”, “\n”, “Q:”, “A:”] (respects Q&A format)
Result: Instant, accurate customer support powered by your documentation.
🔍 Advanced Example: Multi-Language Documents
What you’ll build: Knowledge base supporting multiple languages
Setup:
- Use multilingual embedding models
- Separate chunk processing by language
- Maintain language metadata for each chunk
Use case: International company with documentation in multiple languages.
Best Practices
Section titled “Best Practices”✅ Do This
Section titled “✅ Do This”- Test chunk sizes: Start with 1000 characters, adjust based on your content
- Use appropriate overlap: 150-200 characters maintains good context
- Choose smart separators: Respect natural document structure (paragraphs, sections)
- Add meaningful metadata: Include source, category, date for better organization
- Process in batches: Break very large documents into manageable pieces
❌ Avoid This
Section titled “❌ Avoid This”- Making chunks too small (loses context) or too large (reduces precision)
- Using zero overlap (can break important connections between chunks)
- Ignoring document structure (splits sentences or concepts awkwardly)
- Processing without metadata (makes it hard to track sources later)
Troubleshooting
Section titled “Troubleshooting”🚫 “Rate Limit Exceeded” Error
Section titled “🚫 “Rate Limit Exceeded” Error”Problem: Too many embedding API requests Solution: Reduce batch size, add delays between requests, or upgrade API plan
✂️ Poor Chunk Quality
Section titled “✂️ Poor Chunk Quality”Problem: Chunks cut off mid-sentence or lose context Solution: Adjust chunk size and overlap, customize separators for your document type
💾 “Out of Memory” Error
Section titled “💾 “Out of Memory” Error”Problem: Browser crashes with very large documents Solution: Process documents in smaller segments, reduce max chunks limit
🐌 Slow Processing
Section titled “🐌 Slow Processing”Problem: Indexing takes too long Solution: Use smaller documents, reduce chunk overlap, or try local embedding models
Limitations to Know
Section titled “Limitations to Know”- Document Size: Maximum 100MB per document for browser processing
- Processing Time: Large documents may take several minutes to process
- API Costs: Cloud embedding services charge per chunk processed
- Memory Usage: Large documents require significant browser memory
Related Nodes
Section titled “Related Nodes”🔗 Works Great With
Section titled “🔗 Works Great With”- Local Knowledge: Stores the processed chunks for searching
- RAG Node: Uses indexed chunks for intelligent question-answering
- Recursive Character Text Splitter: Alternative for simpler text splitting
- Ollama Embeddings: Creates the searchable vectors from text
🔄 Essential for RAG Workflows
Section titled “🔄 Essential for RAG Workflows”Indexer Node is the first step in:
- Building smart knowledge bases
- Creating searchable document collections
- Enabling AI-powered Q&A systems
What’s Next?
Section titled “What’s Next?”Related nodes: Local Knowledge • RAG Node • Q&A Node
Common workflows: AI Knowledge Base • Document Search • Smart Research
Learn more: AI Workflow Builder • [Understanding RAG](/advanced-ai/basics/rag-in-Agentic WorkFlow/) • Vector Databases
💡 Pro Tip: Start with standard settings (1000 character chunks, 200 overlap) and adjust based on your specific content type. Technical docs might need smaller chunks, while narrative content can handle larger ones.