Get Links From Link
Get Links From Link
Section titled “Get Links From Link”Overview
Section titled “Overview”The Get Links From Link node discovers and extracts all hyperlinks from web pages, providing comprehensive link analysis including URLs, anchor text, link types, and validation status. This node is essential for site mapping, SEO analysis, link validation, and automated web navigation workflows.
Purpose and Functionality
Section titled “Purpose and Functionality”This node performs comprehensive link discovery by:
- Scanning web pages for all hyperlink elements (a, area, link tags)
- Extracting URLs, anchor text, titles, and link attributes
- Categorizing links by type (internal, external, email, phone, etc.)
- Validating link accessibility and providing status information
- Supporting both static and dynamically generated links
Key Features
Section titled “Key Features”- Complete Link Discovery: Finds all types of links including navigation, content, and metadata links
- Link Classification: Automatically categorizes links as internal, external, email, phone, or file downloads
- Validation Support: Checks link accessibility and provides status codes for validation workflows
- Metadata Extraction: Captures anchor text, titles, rel attributes, and other link properties
Primary Use Cases
Section titled “Primary Use Cases”- Site Mapping: Create comprehensive maps of website structure and navigation paths
- SEO Analysis: Analyze internal linking structure, external links, and anchor text optimization
- Link Validation: Identify broken links, redirects, and accessibility issues across websites
- Competitive Research: Analyze competitor linking strategies and external partnerships
Parameters & Configuration
Section titled “Parameters & Configuration”Required Parameters
Section titled “Required Parameters”| Parameter | Type | Description | Example |
|---|---|---|---|
url | string | The target URL from which to extract link information | "https://example.com/page" |
Optional Parameters
Section titled “Optional Parameters”| Parameter | Type | Default | Description | Example |
|---|---|---|---|---|
includeInternal | boolean | true | Include links to the same domain | true |
includeExternal | boolean | true | Include links to external domains | false |
validateLinks | boolean | false | Check if links are accessible (slower but more comprehensive) | true |
maxLinks | number | 200 | Maximum number of links to return | 100 |
includeMetadata | boolean | true | Extract detailed metadata for each link | false |
filterPatterns | array | [] | URL patterns to exclude from results | ["mailto:", "tel:", "#"] |
Advanced Configuration
Section titled “Advanced Configuration”{ "url": "https://example.com/page", "includeInternal": true, "includeExternal": true, "validateLinks": false, "maxLinks": 150, "includeMetadata": true, "filterPatterns": ["javascript:", "mailto:", "#"], "analysisOptions": { "categorizeByType": true, "extractAnchorText": true, "checkRedirects": false }}Browser API Integration
Section titled “Browser API Integration”Required Permissions
Section titled “Required Permissions”| Permission | Purpose | Security Impact |
|---|---|---|
activeTab | Access content of the current active tab | Can read all link content and metadata from the active webpage |
scripting | Execute content scripts for link discovery | Can run JavaScript to analyze DOM and extract link information |
Browser APIs Used
Section titled “Browser APIs Used”- chrome.tabs API: For navigating to target URLs and accessing page content
- chrome.scripting API: For executing content scripts that scan DOM for link elements
- Fetch API: For validating link accessibility when validation is enabled
- URL API: For parsing and categorizing different types of URLs
Cross-Browser Compatibility
Section titled “Cross-Browser Compatibility”| Feature | Chrome | Firefox | Safari | Edge |
|---|---|---|---|---|
| Basic Link Extraction | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
| Link Validation | ✅ Full | ✅ Full | ⚠️ Limited | ✅ Full |
| Metadata Extraction | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
| Dynamic Links | ✅ Full | ✅ Full | ⚠️ Limited | ✅ Full |
Security Considerations
Section titled “Security Considerations”- Cross-Origin Validation: Link validation may be blocked by CORS policies for external sites
- Privacy Protection: Link URLs may contain tracking parameters or personal identifiers
- Rate Limiting: Implement delays when validating large numbers of links to avoid being blocked
- Malicious Links: Filter out potentially harmful URLs (javascript:, data:, etc.)
- Data Exposure: Be cautious with links that may contain sensitive information in URLs
Input/Output Specifications
Section titled “Input/Output Specifications”Input Data Structure
Section titled “Input Data Structure”{ "url": "string", "options": { "includeInternal": "boolean", "includeExternal": "boolean", "validateLinks": "boolean", "maxLinks": "number", "includeMetadata": "boolean", "filterPatterns": "array" }}Output Data Structure
Section titled “Output Data Structure”{ "links": [ { "url": "string", "anchorText": "string", "title": "string", "type": "internal|external|email|phone|file|anchor", "isValid": "boolean", "statusCode": "number", "metadata": { "rel": "string", "target": "string", "className": "string", "id": "string" } } ], "totalLinks": "number", "internalLinks": "number", "externalLinks": "number", "metadata": { "url": "string", "timestamp": "ISO_8601_string", "extractionTime": "number_ms", "validationTime": "number_ms" }}```## Pract
ical Examples
### Example 1: Website Navigation Analysis
**Scenario**: Analyze the navigation structure of a website to understand user flow and identify important pages
**Configuration**:```json{ "url": "https://company.example.com", "includeInternal": true, "includeExternal": false, "validateLinks": false, "maxLinks": 100, "includeMetadata": true}Input Data:
{ "url": "https://company.example.com"}Expected Output:
{ "links": [ { "url": "https://company.example.com/about", "anchorText": "About Us", "title": "Learn more about our company", "type": "internal", "isValid": true, "statusCode": 200, "metadata": { "rel": "", "target": "", "className": "nav-link", "id": "about-link" } }, { "url": "https://company.example.com/products", "anchorText": "Our Products", "title": "Explore our product catalog", "type": "internal", "isValid": true, "statusCode": 200, "metadata": { "rel": "", "target": "", "className": "nav-link primary", "id": "products-link" } } ], "totalLinks": 47, "internalLinks": 47, "externalLinks": 0, "metadata": { "url": "https://company.example.com", "timestamp": "2024-01-15T10:30:00Z", "extractionTime": 320, "validationTime": 0 }}Step-by-Step Process:
- Navigate to the company homepage
- Scan DOM for all anchor elements and link tags
- Filter to include only internal links within the same domain
- Extract anchor text, titles, and CSS metadata
- Return structured link data for navigation analysis
Example 2: SEO Link Audit with Validation
Section titled “Example 2: SEO Link Audit with Validation”Scenario: Perform comprehensive link audit including external link validation for SEO analysis
Configuration:
{ "url": "https://blog.example.com/seo-guide", "includeInternal": true, "includeExternal": true, "validateLinks": true, "maxLinks": 75, "filterPatterns": ["mailto:", "tel:"]}Workflow Integration:
URL Input → Get Links From Link → Link Validator → SEO Report Generator ↓ ↓ ↓ ↓ target_url all_links validation_data seo_analysisComplete Example: This configuration extracts all links from a blog post, validates their accessibility, and provides comprehensive data for SEO analysis including broken link detection, external link quality assessment, and internal linking optimization opportunities.
Examples
Section titled “Examples”Basic Usage
Section titled “Basic Usage”This example demonstrates the fundamental usage of the GetLinksFromLink node in a typical workflow scenario.
Configuration:
{ "url": "example_value", "followRedirects": true}Input Data:
{ "data": "sample input data"}Expected Output:
{ "result": "processed output data"}Advanced Usage
Section titled “Advanced Usage”This example shows more complex configuration options and integration patterns.
Configuration:
{ "parameter1": "advanced_value", "parameter2": false, "advancedOptions": { "option1": "value1", "option2": 100 }}Integration Example
Section titled “Integration Example”Example showing how this node integrates with other workflow nodes:
- Previous Node → GetLinksFromLink → Next Node
- Data flows through the workflow with appropriate transformations
- Error handling and validation at each step
Integration Patterns
Section titled “Integration Patterns”Common Node Combinations
Section titled “Common Node Combinations”Pattern 1: Site Crawling Pipeline
Section titled “Pattern 1: Site Crawling Pipeline”- Nodes: Get Links From Link → URL Filter → Page Crawler → Content Analyzer
- Use Case: Systematic website crawling and content analysis
- Configuration Tips: Focus on internal links for site mapping, use validation to identify crawlable pages
Pattern 2: Link Quality Assessment
Section titled “Pattern 2: Link Quality Assessment”- Nodes: URL List → Get Links From Link → Link Validator → Quality Report
- Use Case: Automated link quality assessment across multiple pages
- Data Flow: Multiple pages processed, links extracted and validated, comprehensive quality reports generated
Best Practices
Section titled “Best Practices”- Performance: Disable link validation for large-scale extraction to improve speed
- Resource Management: Use maxLinks parameter to prevent overwhelming downstream processing
- Error Handling: Implement robust error handling for network failures during validation
- Rate Limiting: Add delays between validation requests to respect target server limits
Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”Issue: Missing Dynamic Links
Section titled “Issue: Missing Dynamic Links”- Symptoms: Fewer links returned than visible on the page
- Causes: JavaScript-generated links not fully loaded, AJAX content still loading
- Solutions:
- Increase page load wait time to allow dynamic content to render
- Check if links are generated by user interactions
- Verify that single-page application routing is complete
- Prevention: Test with pages that have known dynamic link generation
Issue: Link Validation Timeouts
Section titled “Issue: Link Validation Timeouts”- Symptoms: Validation process fails or takes extremely long time
- Causes: Slow external servers, network connectivity issues, rate limiting
- Solutions:
- Disable validation for initial analysis, validate separately
- Implement timeout limits for individual link checks
- Use batch processing with delays between validation requests
- Prevention: Start with validation disabled, enable selectively for critical links
Browser-Specific Issues
Section titled “Browser-Specific Issues”Chrome
Section titled “Chrome”- CORS policies may prevent validation of some external links
- Use appropriate error handling for blocked validation requests
Firefox
Section titled “Firefox”- Similar CORS restrictions with different error handling requirements
- May require additional permissions for cross-origin validation
Performance Issues
Section titled “Performance Issues”- Large Link Collections: Pages with hundreds of links may cause memory issues
- Validation Overhead: Link validation can significantly increase processing time
- Network Dependencies: External link validation depends on network connectivity and target server response times
Limitations & Constraints
Section titled “Limitations & Constraints”Technical Limitations
Section titled “Technical Limitations”- JavaScript Links: Links generated entirely by JavaScript may not be captured
- Authentication Required: Cannot validate links that require login credentials
- Dynamic Routing: Single-page application routes may not be detected as traditional links
Browser Limitations
Section titled “Browser Limitations”- Cross-Origin Validation: CORS policies limit ability to validate external links
- Rate Limiting: Target websites may block rapid validation requests
- Memory Constraints: Processing very large numbers of links may exceed browser limits
Data Limitations
Section titled “Data Limitations”- Link Context: Cannot determine the semantic importance or context of links
- Redirect Chains: Complex redirect chains may not be fully traced
- Temporary Failures: Link validation reflects status at time of check, not permanent accessibility
Key Terminology
Section titled “Key Terminology”DOM: Document Object Model - Programming interface for web documents
CORS: Cross-Origin Resource Sharing - Security feature controlling cross-domain requests
CSP: Content Security Policy - Security standard preventing code injection attacks
Browser API: Programming interfaces provided by web browsers for extension functionality
Content Script: JavaScript code that runs in the context of web pages
Web Extraction: Automated extraction of data from websites
Search & Discovery
Section titled “Search & Discovery”Keywords
Section titled “Keywords”- web extraction
- browser automation
- HTTP requests
- DOM manipulation
- content extraction
- web interaction
Common Search Terms
Section titled “Common Search Terms”- “scrape”
- “extract”
- “fetch”
- “get”
- “browser”
- “web”
- “html”
- “text”
- “links”
- “images”
- “api”
Primary Use Cases
Section titled “Primary Use Cases”- data collection
- web automation
- content extraction
- API integration
- browser interaction
- web extraction
Learning Path
Section titled “Learning Path”Skill Level: Intermediate
Section titled “Skill Level: Intermediate”Enhanced Cross-References
Section titled “Enhanced Cross-References”Workflow Patterns
Section titled “Workflow Patterns”Related Tutorials
Section titled “Related Tutorials”Practical Examples
Section titled “Practical Examples”Related Nodes
Section titled “Related Nodes”Similar Functionality
Section titled “Similar Functionality”- GetAllTextFromLink: Use when you need different approach to similar functionality
- GetHTMLFromLink: Use when you need different approach to similar functionality
Complementary Nodes
Section titled “Complementary Nodes”- Filter: Works well together in workflows
- EditFields: Works well together in workflows
- Http-Request: Works well together in workflows
Common Workflow Patterns
Section titled “Common Workflow Patterns”- GetLinksFromLink → Filter → GetAllTextFromLink: Common integration pattern
- GetLinksFromLink → EditFields → Http-Request: Common integration pattern
See Also
Section titled “See Also”- Browser Content Extraction
- Web Automation Patterns
- Multi-Node Automation
- Integration Patterns
- Browser Security Guide
Decision Guides:
General Resources:
Version History
Section titled “Version History”Current Version: 1.2.0
Section titled “Current Version: 1.2.0”- Added link type categorization (internal, external, email, phone, file)
- Improved validation with status code reporting
- Enhanced metadata extraction including CSS classes and IDs
Previous Versions
Section titled “Previous Versions”- 1.1.0: Added link validation capabilities and filtering options
- 1.0.0: Initial release with basic link URL and anchor text extraction
Additional Resources
Section titled “Additional Resources”Last Updated: October 18, 2024 Tested With: Browser Extension v2.1.0 Validation Status: ✅ Code Examples Tested | ✅ Browser Compatibility Verified | ✅ User Tested