Process HTML
The Process HTML node takes HTML content and processes it by extracting specific elements, cleaning unwanted code, converting formats, or restructuring the content. Think of it as having a web developer clean up messy code and extract exactly what you need.
This is perfect for content cleaning, data extraction, format conversion, or preparing HTML content for use in other systems or workflows.
How it works
Section titled “How it works”The node takes raw HTML content and applies various processing operations to clean, extract, convert, or restructure it. You can focus on specific elements or process the entire HTML document according to your needs.
graph LR
HTML[Raw HTML] --> Processor{HTML Processor}
Processor --> Clean[Clean Code]
Processor --> Extract[Extract Elements]
Processor --> Convert[Convert Format]
style Processor fill:#6d28d9,stroke:#fff,color:#fff
Setup guide
Section titled “Setup guide”- Get HTML Content: Use Get All HTML node or provide HTML content from other sources.
- Choose Processing Operation: Select extract, clean, convert, or restructure based on your needs.
- Set Target Elements: Specify which parts of the HTML to focus on using CSS selectors (optional).
- Select Output Format: Choose HTML, text, or Markdown for the processed results.
Practical example: Content cleaning
Section titled “Practical example: Content cleaning”Let’s clean HTML content by removing ads and scripts while extracting the main article content.
What you configure:
- Content: The raw HTML code you want to work on.
- Operation: Choose “clean” to remove things or “extract” to find things.
- Targets: Use selectors (like
.main-content) to focus on specific parts. - Filters: Choose to remove scripts, ads, or other unwanted elements.
What you get:
- Processed Content: The clean, simplified HTML code.
- Stats: How much the file size was reduced and what elements were removed.
Common settings
Section titled “Common settings”| Setting | Purpose | When to Use |
|---|---|---|
| Extract | Pull out specific elements or content sections | When you need only certain parts of the HTML |
| Clean | Remove unwanted elements, scripts, and attributes | For content purification and security |
| Convert | Transform HTML to other formats (Markdown, text) | For cross-platform content use |
Troubleshooting
Section titled “Troubleshooting”- No results: Check that your target elements exist in the HTML using the correct CSS selectors
- Missing content: Try using broader CSS selectors or process the entire HTML without targeting specific elements
- Formatting issues: Different output formats handle styling differently - choose the format that best suits your needs