Skip to content

Media Extractor

The Media Extractor node finds and extracts all media content from web pages - videos, audio files, and documents. Think of it as a digital librarian that automatically catalogs every piece of media on a website, giving you direct links and details for downloading or analysis.

This is perfect for content audits, research collection, or building media inventories. Instead of manually searching through pages, the node does all the detective work to find every video, audio file, and document.

Illustration of extracting various media files from a webpage

The node scans the entire web page looking for media content - embedded videos, audio players, downloadable documents, and more. It extracts the URLs, file information, and metadata for each piece of media it finds.

graph LR
  Page[Web Page] --> Extractor{Media Extractor}
  Extractor --> Videos[Videos]
  Extractor --> Audio[Audio]
  Extractor --> Docs[Documents]
  style Extractor fill:#6d28d9,stroke:#fff,color:#fff
  1. Navigate to Target Page: Make sure you’re on the page containing the media you want to extract.
  2. Choose Media Types: Select which types of media to extract - videos, audio, documents, or all.
  3. Set Extraction Limits: Choose how many media items to extract (useful for large pages).
  4. Run Extraction: The node scans the page and returns all found media with details and download links.

Practical example: Educational content audit

Section titled “Practical example: Educational content audit”

Let’s extract all educational media from a university course page to create an offline study collection.

What you configure:

  • Media Types: Choose to include videos, audio files, and documents.
  • Max Items: Limit the number of files to find (e.g., 50).
  • Embedded: Look for content inside players (like YouTube frames) too.

What you get:

  • Media List: A collection of found items, each with:
    • Type: Video, audio, or document.
    • URL: The link to download or view it.
    • Details: Title, file size, and format (e.g., MP4, PDF).
  • Counts: How many of each type were found.
SettingPurposeWhen to Use
VideosEmbedded players, direct video filesFor multimedia content analysis
AudioAudio players, podcast files, musicFor audio content collection
DocumentsDownloadable files, presentationsFor document libraries and archives
  • No media found: The page might still be loading - try adding a wait step before extraction, or the media might be behind login walls
  • Missing embedded content: Some media requires user interaction to load - try scrolling or clicking on the page first
  • Access denied errors: Some media may be protected or require authentication to access