Character Text Splitter
Character Text Splitter (Simple Document Splitter)
Section titled “Character Text Splitter (Simple Document Splitter)”What It Does
Section titled “What It Does”The Character Text Splitter breaks long documents into smaller, equal-sized pieces. It’s like cutting a long article into pages - each piece is roughly the same size, making it easier for AI to process.
What Goes In, What Comes Out
Section titled “What Goes In, What Comes Out”| Name | Type | Description | Required | Default |
|---|---|---|---|---|
text | Text | Document to split | Yes | - |
chunk_size | Number | Size of each piece | Yes | - |
separator | Text | Where to split (like paragraphs) | No | ”\n\n” |
chunk_overlap | Number | How much pieces should overlap | No | 0 |
Output
Section titled “Output”| Name | Type | Description |
|---|---|---|
chunks | Array | Text pieces ready for AI processing |
chunk_count | Number | How many pieces were created |
Perfect For
Section titled “Perfect For”📄 Simple Document Processing: When you just need to break text into equal pieces ⚡ Quick Setup: Fastest way to prepare documents for AI 📊 Consistent Sizing: All pieces are roughly the same size 🔧 Basic Workflows: Good starting point for document processing
| Parameter | Type | Description | Example |
|---|---|---|---|
inputText | string | The text content to be split into chunks | "{{document.text}}" |
chunkSize | number | Maximum number of characters per chunk | 1000 |
Optional Parameters
Section titled “Optional Parameters”| Parameter | Type | Default | Description | Example |
|---|---|---|---|---|
separator | string | "\n\n" | Character sequence used to split the text | "\n" |
chunkOverlap | number | 200 | Number of characters to overlap between consecutive chunks | 100 |
keepSeparator | boolean | false | Whether to keep the separator in the resulting chunks | true |
stripWhitespace | boolean | true | Whether to strip leading and trailing whitespace from chunks | false |
Advanced Configuration
Section titled “Advanced Configuration”{ "inputText": "{{document.content}}", "chunkSize": 1000, "separator": "\n\n", "chunkOverlap": 200, "keepSeparator": false, "stripWhitespace": true, "processingOptions": { "minChunkSize": 50, "maxChunks": 1000, "preserveFormatting": false }, "metadata": { "sourceDocument": "{{document.id}}", "processingTimestamp": "auto" }}Browser API Integration
Section titled “Browser API Integration”Required Permissions
Section titled “Required Permissions”The Character Text Splitter operates entirely within the browser environment and does not require additional browser permissions.
Browser APIs Used
Section titled “Browser APIs Used”- String Processing APIs: Native JavaScript string manipulation for efficient text splitting
- Regular Expression Engine: For advanced separator pattern matching when needed
- Memory Management: Efficient memory usage for processing large text documents
Cross-Browser Compatibility
Section titled “Cross-Browser Compatibility”| Feature | Chrome | Firefox | Safari | Edge |
|---|---|---|---|---|
| Text Splitting | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
| Character Counting | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
| Separator Processing | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
| Large Document Handling | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
Security Considerations
Section titled “Security Considerations”- Data Processing: All text processing occurs locally within the browser environment
- Memory Safety: Efficient memory management prevents memory leaks with large documents
- Input Validation: Text input is validated and sanitized to prevent processing errors
- No External Dependencies: No external API calls or data transmission required
- Content Security: Processed text remains within the secure browser context
Input/Output Specifications
Section titled “Input/Output Specifications”Input Data Structure
Section titled “Input Data Structure”{ "inputText": "string", "processingOptions": { "chunkSize": "number", "separator": "string", "chunkOverlap": "number" }, "metadata": { "sourceId": "string", "documentType": "string" }}Output Data Structure
Section titled “Output Data Structure”{ "chunks": [ { "text": "string", "index": "number", "startPosition": "number", "endPosition": "number", "characterCount": "number" } ], "summary": { "totalChunks": "number", "totalCharacters": "number", "averageChunkSize": "number", "separator": "string", "chunkOverlap": "number" }, "metadata": { "processingTime": "number_ms", "timestamp": "ISO_8601_string", "splitterType": "character", "langchainVersion": "string" }}Practical Examples
Section titled “Practical Examples”Example 1: Basic Document Splitting
Section titled “Example 1: Basic Document Splitting”Scenario: Split a technical documentation file into chunks for embedding generation, using paragraph breaks as natural splitting points.
Configuration:
{ "inputText": "{{document.content}}", "chunkSize": 800, "separator": "\n\n", "chunkOverlap": 100, "stripWhitespace": true}Input Data:
{ "inputText": "Introduction\n\nThis document provides comprehensive guidelines for using the platform.\n\nGetting Started\n\nTo begin, create an account and log into the dashboard. The interface consists of several key components that work together to provide a seamless experience.\n\nFeatures Overview\n\nThe platform offers multiple features including workflow, data processing, and integration capabilities.", "metadata": { "sourceId": "user-guide-v2", "documentType": "technical_documentation" }}Expected Output:
{ "chunks": [ { "text": "Introduction\n\nThis document provides comprehensive guidelines for using the platform.\n\nGetting Started\n\nTo begin, create an account and log into the dashboard. The interface consists of several key components that work together to provide a seamless experience.", "index": 0, "startPosition": 0, "endPosition": 247, "characterCount": 247 }, { "text": "The interface consists of several key components that work together to provide a seamless experience.\n\nFeatures Overview\n\nThe platform offers multiple features including workflow, data processing, and integration capabilities.", "index": 1, "startPosition": 147, "endPosition": 367, "characterCount": 220 } ], "summary": { "totalChunks": 2, "totalCharacters": 367, "averageChunkSize": 233, "separator": "\n\n", "chunkOverlap": 100 }, "metadata": { "processingTime": 15, "timestamp": "2024-01-15T10:30:00Z", "splitterType": "character", "langchainVersion": "0.1.0" }}Step-by-Step Process:
- Input text is analyzed for separator occurrences (“\n\n”)
- Text is split at separator boundaries while respecting chunk size limits
- Overlap is applied between consecutive chunks to maintain context
- Each chunk is measured by character count and positioned within the original text
- Metadata is generated including processing statistics and chunk information
Example 2: Custom Separator for Structured Content
Section titled “Example 2: Custom Separator for Structured Content”Scenario: Process a CSV-like structured document where each record should be kept intact, using custom separators and specific formatting requirements.
Configuration:
{ "inputText": "{{structuredData.content}}", "chunkSize": 500, "separator": "---", "chunkOverlap": 0, "keepSeparator": true, "stripWhitespace": false}Workflow Integration:
[Document Loader] → [Character Text Splitter] → [Data Validator] → [Embedding Generator] ↓ ↓ ↓ ↓ raw_document structured_chunks validated_data vector_embeddingsComplete Example: This configuration is ideal for processing structured data files where maintaining the exact formatting and separator structure is crucial for downstream processing, such as preparing data for specialized embedding models or maintaining data integrity in analytical workflows.
Integration Patterns
Section titled “Integration Patterns”Common Node Combinations
Section titled “Common Node Combinations”Pattern 1: Document Preprocessing Pipeline
Section titled “Pattern 1: Document Preprocessing Pipeline”- Nodes: [Document Loader] → [Character Text Splitter] → [Text Cleaner] → [Embedding Generator]
- Use Case: Prepare documents for AI processing with consistent chunk sizes
- Configuration Tips: Use paragraph separators (“\n\n”) for natural text boundaries
Pattern 2: Multi-Format Content Processing
Section titled “Pattern 2: Multi-Format Content Processing”- Nodes: [Format Detector] → [Character Text Splitter] → [Format Normalizer] → [Content Analyzer]
- Use Case: Process various document formats with consistent chunking strategy
- Data Flow: Detect format, split appropriately, normalize output, analyze content
Pattern 3: RAG System Data Preparation
Section titled “Pattern 3: RAG System Data Preparation”- Nodes: [Content Extractor] → [Character Text Splitter] → [Metadata Enricher] → [Vector Store]
- Use Case: Prepare knowledge base content for retrieval-augmented generation
- Configuration Tips: Balance chunk size with embedding model requirements and retrieval accuracy
Best Practices
Section titled “Best Practices”- Performance: Use appropriate chunk sizes (500-1500 characters) for optimal processing speed and memory usage
- Error Handling: Validate input text length and handle edge cases like empty documents or very short texts
- Data Validation: Ensure separator characters exist in the input text to enable meaningful splitting
- Resource Management: Monitor memory usage when processing very large documents (>10MB)
- Separator Selection: Choose separators that respect natural document structure (paragraphs, sections, sentences)
Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”Issue: No Splitting Occurs
Section titled “Issue: No Splitting Occurs”- Symptoms: Output contains only one chunk with the entire input text
- Causes:
- Separator character sequence not found in the input text
- Chunk size larger than the entire input text
- Incorrect separator configuration
- Solutions:
- Verify the separator exists in your input text
- Try alternative separators like “\n”, ”. ”, or ” ”
- Reduce chunk size to force splitting
- Use a fallback separator strategy
- Prevention: Analyze input text structure before configuring separators
Issue: Chunks Too Small or Too Large
Section titled “Issue: Chunks Too Small or Too Large”- Symptoms: Resulting chunks don’t meet expected size requirements for downstream processing
- Causes:
- Inappropriate chunk size configuration
- Separator placement creating uneven splits
- Overlap settings affecting effective chunk size
- Solutions:
- Adjust chunk size based on your specific requirements
- Experiment with different separators for more even distribution
- Modify overlap settings to balance context preservation and chunk independence
- Use multiple splitting strategies for different document sections
- Prevention: Test with sample documents to optimize chunk size settings
Issue: Context Loss Between Chunks
Section titled “Issue: Context Loss Between Chunks”- Symptoms: Important information spans chunk boundaries, reducing effectiveness of downstream processing
- Causes:
- Insufficient chunk overlap
- Poor separator selection that breaks semantic units
- Chunk size too small for content complexity
- Solutions:
- Increase chunk overlap to preserve more context
- Use semantic-aware separators (paragraphs, sections)
- Increase chunk size to capture more complete thoughts
- Consider using recursive splitting for complex documents
- Prevention: Design chunking strategy based on content structure and downstream requirements
Browser-Specific Issues
Section titled “Browser-Specific Issues”Chrome
Section titled “Chrome”- Excellent performance with large documents up to 50MB
- Efficient memory management for text processing
- No known compatibility issues
Firefox
Section titled “Firefox”- Slightly slower processing for very large documents (>20MB)
- Good overall compatibility with all features
- May require longer processing time for complex separator patterns
Safari
Section titled “Safari”- Consistent performance across all Safari versions
- Efficient handling of Unicode and special characters
- No known limitations for typical use cases
Performance Issues
Section titled “Performance Issues”- Slow Processing: Optimize chunk size and separator complexity, consider processing documents in smaller segments
- Memory Usage: Monitor browser memory usage with large documents, implement streaming for very large files
- Character Encoding: Ensure proper UTF-8 encoding for international text and special characters
Limitations & Constraints
Section titled “Limitations & Constraints”Technical Limitations
Section titled “Technical Limitations”- Maximum Input Size: Browser memory limits may restrict processing of extremely large documents (>100MB)
- Separator Complexity: Simple character-based separators only; no regex or complex pattern matching
- Character Counting: Counts Unicode characters, which may differ from byte count or token count
- Processing Speed: Large documents may require several seconds to process depending on complexity
Browser Limitations
Section titled “Browser Limitations”- Memory Constraints: Available browser memory limits the maximum document size that can be processed
- String Length Limits: JavaScript string length limitations may affect very large documents
- Performance Variation: Processing speed varies across different browsers and devices
Data Limitations
Section titled “Data Limitations”- Input Format: Accepts plain text only; binary or encoded content must be converted first
- Output Size: Large numbers of chunks may impact browser performance and memory usage
- Character Encoding: Proper handling requires valid UTF-8 encoded text input
- Separator Requirements: Effective splitting requires appropriate separator characters in the source text
Related Nodes
Section titled “Related Nodes”Similar Functionality
Section titled “Similar Functionality”- Recursive Text Splitter: More advanced splitting with hierarchical separator fallback
- Token Text Splitter: Splits text based on token count rather than character count
- Semantic Text Splitter: AI-powered splitting that respects semantic boundaries
Complementary Nodes
Section titled “Complementary Nodes”- Text Cleaner: Preprocesses text by removing unwanted characters and formatting
- Document Loader: Loads and extracts text from various document formats
- Embedding Generator: Converts text chunks into vector embeddings for AI processing
- Metadata Enricher: Adds contextual information to processed text chunks
Workflow Suggestions
Section titled “Workflow Suggestions”- For document processing, consider combining with: Document Loader, Text Cleaner, Metadata Enricher
- For RAG systems, this node works well before: Embedding Generator, Vector Store Writer, Similarity Search
- For content analysis, follow this node with: Text Analyzer, Sentiment Processor, Topic Classifier