Skip to content

Process HTML

What it does: Takes HTML content and processes it by extracting specific elements, cleaning unwanted code, converting formats, or restructuring the content.

NameTypeDescriptionRequiredDefault
HTML ContentStringThe HTML code to processYes""
OperationStringWhat to do: extract, clean, convert, restructureNoclean
Target ElementsStringCSS selector for elements to focus onNo""
Output FormatStringFormat for results: html, text, markdownNohtml
NameTypeDescription
processedContentStringThe processed HTML or converted content
extractedElementsArraySpecific elements that were extracted
originalSizeNumberSize of original HTML in characters
processedSizeNumberSize after processing

Content Cleaning Remove ads, tracking scripts, and unnecessary elements from HTML to focus on the main content.

Data Extraction Extract specific information like product details, prices, or contact information from HTML pages.

Format Conversion Convert HTML content to Markdown, plain text, or other formats for use in different systems.

  1. Get HTML content - from Get All HTML or other sources
  2. Choose processing type - extract, clean, convert, or restructure
  3. Set target elements - specify what parts to focus on (optional)
  4. Run the workflow - get processed content ready for use

Simple Example:

{
"htmlContent": "<html>...</html>",
"operation": "clean",
"targetElements": ".main-content",
"outputFormat": "html"
}
🔍 Technical Details

Processing Operations:

  • Extract: Pull out specific elements or content sections
  • Clean: Remove unwanted elements, scripts, and attributes
  • Convert: Transform HTML to other formats (Markdown, text)
  • Restructure: Reorganize elements and content hierarchy

Target Selection:

  • Use CSS selectors to focus on specific parts of the HTML
  • Leave empty to process the entire HTML content
  • Combine multiple selectors with commas

Output Formats:

  • HTML: Processed HTML code
  • Text: Plain text with formatting removed
  • Markdown: Markdown format for documentation

Performance:

  • Large HTML documents may take longer to process
  • Complex operations require more processing time
  • Simple cleaning operations are very fast

Clean Content:

{
"operation": "clean",
"targetElements": "article, .content",
"outputFormat": "html"
}

Extract Product Info:

{
"operation": "extract",
"targetElements": ".price, .product-name, .description",
"outputFormat": "text"
}

Convert to Markdown:

{
"operation": "convert",
"targetElements": "",
"outputFormat": "markdown"
}

Common Issues:

  • No results? Check that your target elements exist in the HTML
  • Missing content? Try using broader CSS selectors or no selector at all
  • Formatting issues? Different output formats handle styling differently