Skip to content

Media Extractor

What it does: Finds and extracts all media content (videos, audio, documents) from webpages, giving you URLs and details for downloading or analysis.

NameTypeDescriptionRequiredDefault
Include VideosBooleanExtract video files and embedded playersNotrue
Include AudioBooleanExtract audio files and playersNotrue
Include DocumentsBooleanExtract PDFs and document linksNotrue
Max ItemsNumberMaximum number of media items to extractNo100
NameTypeDescription
mediaItemsArrayList of all found media with URLs and details
videoCountNumberNumber of videos found
audioCountNumberNumber of audio files found
documentCountNumberNumber of documents found

Content Audit Extract all media from your website to check for broken links, missing files, or accessibility issues.

Research Collection Gather videos, podcasts, and documents from educational or research websites for offline study.

Media Inventory Catalog all media assets on a website for content management or migration projects.

  1. Navigate to the webpage with media you want to extract
  2. Configure extraction options - choose what types of media to include
  3. Run the workflow - the node finds all media on the page
  4. Process the results for downloading, analysis, or cataloging

Simple Example:

{
"includeVideos": true,
"includeAudio": true,
"includeDocuments": true,
"maxItems": 50
}
🔍 Technical Details

What it finds:

  • Embedded videos (YouTube, Vimeo, etc.)
  • Direct video files (MP4, WebM, etc.)
  • Audio files and players (MP3, WAV, etc.)
  • Documents (PDF, DOC, PPT, etc.)
  • Streaming media and live content

Media Information:

  • Direct download URLs
  • File formats and sizes
  • Duration (for videos and audio)
  • Titles and descriptions
  • Thumbnail images

Performance:

  • Processes up to 100 media items efficiently
  • Larger collections may take longer
  • Streaming media requires additional analysis time

Limitations:

  • Cannot access media that requires login
  • Some streaming services may block extraction
  • File sizes are estimates when not directly available

Complete Media Audit:

{
"includeVideos": true,
"includeAudio": true,
"includeDocuments": true,
"maxItems": 0
}

Video Content Only:

{
"includeVideos": true,
"includeAudio": false,
"includeDocuments": false,
"maxItems": 25
}

Document Collection:

{
"includeVideos": false,
"includeAudio": false,
"includeDocuments": true,
"maxItems": 50
}

Common Issues:

  • No media found? The page might still be loading - try waiting before extraction
  • Missing embedded content? Some media requires user interaction to load
  • Access denied errors? Some media may be protected or require authentication