Get All HTML

The Get All HTML node extracts the complete HTML source code from a webpage, giving you all the underlying structure and content for analysis, archiving, or processing. Think of it as having a web developer’s view of how the page is built.

This is perfect for SEO analysis, competitor research, web archiving, or understanding how websites are structured. Instead of just seeing the visual page, you get the complete code that creates it.

Illustration of extracting HTML source code from a webpage

How it works

The node captures the complete HTML source code of the webpage, including all tags, attributes, and structure. It can optionally format the code for easier reading and extract metadata like SEO tags and page information.

graph LR
  Page[Web Page] --> Extractor{HTML Extractor}
  Extractor --> Code[HTML Code]
  Extractor --> Meta[Metadata]
  style Extractor fill:#6d28d9,stroke:#fff,color:#fff

Setup guide

Navigate to Target Page: Make sure you’re on the webpage whose HTML you want to extract.
Configure Options: Choose whether to include metadata, exclude scripts, or format the output.
Run Extraction: The node captures all the HTML source code from the page.
Process Results: Use the HTML for analysis, archiving, or further processing with other tools.

Practical example: SEO analysis

Let’s extract HTML from a webpage to analyze its SEO structure and meta tags.

What you configure:

Include Metadata: To capture SEO tags like description and keywords.
Exclude Scripts: To remove JavaScript code for cleaner output.
Format Output: To organize the code so it’s easier to read visually.

What you get:

HTML Code: The full source code of the page.
Page Stats: Title, size of the page, and number of elements.
Metadata: Details like description, keywords, and social media tags.

Common settings

Setting	Purpose	When to Use
Include Metadata	Extract SEO and page information	For SEO analysis and content research
Exclude Scripts	Remove JavaScript code	For cleaner analysis or security
Pretty Print	Format HTML for easier reading	When you need to review the code manually

Troubleshooting

HTML looks messy: Enable “Pretty Print” to format it with proper indentation for easier reading
Too much code: Enable “Exclude Scripts” to focus on content structure rather than functionality
Missing dynamic content: Some content loads after the page - try waiting before extraction or the content might be generated by JavaScript