Get Links From Link
The Get Links From Link node is a discovery tool that scans a website and finds every clickable link on the page. It categorizes these links so you can easily distinguish between pages on the same site (Internal) and links leading elsewhere (External). [1, 2]
This node is ideal for building site crawlers, checking for broken links, or creating automated navigation paths.
How it works
Section titled “How it works”When the node runs, it visits the target URL in a background browser context. It looks for all <a> tags and other link elements, extracting the destination address and the “Anchor Text” (the clickable text you see on the screen). [3, 4]
graph LR
URL --> Node{Get Links Node}
Node --> Internal[Internal Links]
Node --> External[External Links]
Node --> Files
style Node fill:#6d28d9,stroke:#fff,color:#fff
Setup guide
Section titled “Setup guide”-
Identify the Link: Provide the URL of the page you want to scan. You can type it manually or pull it from a previous step.
-
Set Filters: Choose whether you want to capture Internal links (same domain), External links (other sites), or both.
-
Validate Links (Optional): If enabled, the node will check if each link actually works (slower but more accurate).
-
Run the Node: The output will be a structured list of all found links, including their text and type.
Practical example: Site Audit
Section titled “Practical example: Site Audit”Imagine you want to find all the social media profiles linked from a company’s homepage.
Imagine you want to find all the social media profiles linked from a company’s homepage.
What you configure:
- URL: The homepage address (e.g.,
https://example.com). - Filters: Look for external links to find other sites they mention.
What you get:
- Links List: A collection of found links.
- Details:
- URL:
https://twitter.com/example - Text: “Follow on X”
- Type: “external” (leads to another site)
- URL:
- Count: Total number of links found.
Common settings
Section titled “Common settings”| Setting | Purpose |
|---|---|
| Max Links | Limits the number of links returned (default is 200) to keep the workflow fast. |
| Validate Links | Briefly pings each link to see if it leads to a real page or a “404 Not Found” error. |
| | Filter Patterns | Tell the node to ignore specific links, like those containing “mailto:” or “tel:”. |
Troubleshooting
Section titled “Troubleshooting”-
Missing Links: If some links aren’t appearing, the website might be generating them after the page loads. Try adding a Wait node before this step.
-
Access Blocked: Some sites block automated link extraction. Since this node runs in your browser, try visiting the site in a normal tab first to ensure you aren’t blocked by a security challenge.
-
Performance: Validating hundreds of links can take a long time. If you only need the URLs, keep Validate Links turned off.
Related nodes
Section titled “Related nodes”- Get HTML From Link — To see the full source code where the links are located.
- Get All Text From Link — To read the content of the pages you find.
- Python Code — To perform advanced URL cleaning or domain analysis.