Skip to content

Get HTML From Link

The Get HTML From Link node allows you to “peek behind the curtain” of a website. It takes a web address (URL) and retrieves the entire underlying source code (HTML) of that page.

While the Get All Text From Link node is better for reading content, this node is essential when you need to see the structure of a site to find hidden data, metadata, or specific elements that aren’t visible as plain text.

Illustration of fetching source code from a web link

When you provide a link, the node visits the page in the background. It can wait for the page to finish loading (including content that appears after a delay) and then captures every line of code from the top to the bottom.

graph LR
  URL --> Node{Get HTML Node}
  Node --> Output
  style Node fill:#6d28d9,stroke:#fff,color:#fff
  1. Enter the Link: In the node settings, provide the URL of the website you want to analyze. You can type it manually or map it from a previous step.
  2. Wait for Load: If the website uses modern “loading” animations or takes a while to display data, ensure the Wait For Load option is enabled.
  3. Set a Timeout: Decide how long the node should try to reach the site before giving up (default is 30 seconds).
  4. Run the node: The output will contain a text field named html containing the full source code.

Suppose you want to see if a product page contains hidden “Stock Level” data that isn’t written in plain text but is hidden in the code.

Suppose you want to see if a product page contains hidden “Stock Level” data that isn’t written in plain text but is hidden in the code.

What you configure:

  • Link: The URL of the product page (e.g., https://shop.example.com/product/123).
  • Settings: Turn on “Wait for Load” to make sure all data is present.

What you get:

  • HTML: The full code of the page (<html>...</html>).
  • Metadata: Found details like the page title (“Product Title - Tech Store”) and load time.
SettingPurpose
Wait For LoadKeeps the browser open until the page is fully ready. Turn this on for “Single Page Apps” like dashboards or modern shops.
Sanitize HTMLCleans the code to remove potentially harmful scripts. Recommended if you plan to display the HTML elsewhere.
Include ResourcesCaptures the styling (CSS) and scripts (JS) inside the HTML. This makes the file much larger but preserves the exact look.
  • Missing Content: If the HTML you get looks like it’s missing the middle of the page, try increasing the Timeout or ensuring Wait For Load is checked. Some sites wait for data to load after the initial page appears.
  • Access Denied: Some websites block automated tools. Since Agentic Workflow Studio runs in your browser, you can often bypass this by being logged into the site in another tab.