Skip to content

Get All HTML

The Get All HTML node extracts the complete HTML source code from a webpage, giving you all the underlying structure and content for analysis, archiving, or processing. Think of it as having a web developer’s view of how the page is built.

This is perfect for SEO analysis, competitor research, web archiving, or understanding how websites are structured. Instead of just seeing the visual page, you get the complete code that creates it.

Illustration of extracting HTML source code from a webpage

The node captures the complete HTML source code of the webpage, including all tags, attributes, and structure. It can optionally format the code for easier reading and extract metadata like SEO tags and page information.

graph LR
  Page[Web Page] --> Extractor{HTML Extractor}
  Extractor --> Code[HTML Code]
  Extractor --> Meta[Metadata]
  style Extractor fill:#6d28d9,stroke:#fff,color:#fff
  1. Navigate to Target Page: Make sure you’re on the webpage whose HTML you want to extract.
  2. Configure Options: Choose whether to include metadata, exclude scripts, or format the output.
  3. Run Extraction: The node captures all the HTML source code from the page.
  4. Process Results: Use the HTML for analysis, archiving, or further processing with other tools.

Let’s extract HTML from a webpage to analyze its SEO structure and meta tags.

What you configure:

  • Include Metadata: To capture SEO tags like description and keywords.
  • Exclude Scripts: To remove JavaScript code for cleaner output.
  • Format Output: To organize the code so it’s easier to read visually.

What you get:

  • HTML Code: The full source code of the page.
  • Page Stats: Title, size of the page, and number of elements.
  • Metadata: Details like description, keywords, and social media tags.
SettingPurposeWhen to Use
Include MetadataExtract SEO and page informationFor SEO analysis and content research
Exclude ScriptsRemove JavaScript codeFor cleaner analysis or security
Pretty PrintFormat HTML for easier readingWhen you need to review the code manually
  • HTML looks messy: Enable “Pretty Print” to format it with proper indentation for easier reading
  • Too much code: Enable “Exclude Scripts” to focus on content structure rather than functionality
  • Missing dynamic content: Some content loads after the page - try waiting before extraction or the content might be generated by JavaScript