Skip to content

Get Links From Link

The Get Links From Link node is a discovery tool that scans a website and finds every clickable link on the page. It categorizes these links so you can easily distinguish between pages on the same site (Internal) and links leading elsewhere (External). [1, 2]

This node is ideal for building site crawlers, checking for broken links, or creating automated navigation paths.

Illustration of extracting hyperlinks from a webpage

When the node runs, it visits the target URL in a background browser context. It looks for all <a> tags and other link elements, extracting the destination address and the “Anchor Text” (the clickable text you see on the screen). [3, 4]

graph LR
  URL --> Node{Get Links Node}
  Node --> Internal[Internal Links]
  Node --> External[External Links]
  Node --> Files
  style Node fill:#6d28d9,stroke:#fff,color:#fff
  1. Identify the Link: Provide the URL of the page you want to scan. You can type it manually or pull it from a previous step.

  2. Set Filters: Choose whether you want to capture Internal links (same domain), External links (other sites), or both.

  3. Validate Links (Optional): If enabled, the node will check if each link actually works (slower but more accurate).

  4. Run the Node: The output will be a structured list of all found links, including their text and type.

Imagine you want to find all the social media profiles linked from a company’s homepage.

Imagine you want to find all the social media profiles linked from a company’s homepage.

What you configure:

  • URL: The homepage address (e.g., https://example.com).
  • Filters: Look for external links to find other sites they mention.

What you get:

  • Links List: A collection of found links.
  • Details:
    • URL: https://twitter.com/example
    • Text: “Follow on X”
    • Type: “external” (leads to another site)
  • Count: Total number of links found.
SettingPurpose
Max LinksLimits the number of links returned (default is 200) to keep the workflow fast.
Validate LinksBriefly pings each link to see if it leads to a real page or a “404 Not Found” error.

| | Filter Patterns | Tell the node to ignore specific links, like those containing “mailto:” or “tel:”. |

  • Missing Links: If some links aren’t appearing, the website might be generating them after the page loads. Try adding a Wait node before this step.

  • Access Blocked: Some sites block automated link extraction. Since this node runs in your browser, try visiting the site in a normal tab first to ensure you aren’t blocked by a security challenge.

  • Performance: Validating hundreds of links can take a long time. If you only need the URLs, keep Validate Links turned off.