Media Extractor
The Media Extractor node finds and extracts all media content from web pages - videos, audio files, and documents. Think of it as a digital librarian that automatically catalogs every piece of media on a website, giving you direct links and details for downloading or analysis.
This is perfect for content audits, research collection, or building media inventories. Instead of manually searching through pages, the node does all the detective work to find every video, audio file, and document.
How it works
Section titled “How it works”The node scans the entire web page looking for media content - embedded videos, audio players, downloadable documents, and more. It extracts the URLs, file information, and metadata for each piece of media it finds.
graph LR
Page[Web Page] --> Extractor{Media Extractor}
Extractor --> Videos[Videos]
Extractor --> Audio[Audio]
Extractor --> Docs[Documents]
style Extractor fill:#6d28d9,stroke:#fff,color:#fff
Setup guide
Section titled “Setup guide”- Navigate to Target Page: Make sure you’re on the page containing the media you want to extract.
- Choose Media Types: Select which types of media to extract - videos, audio, documents, or all.
- Set Extraction Limits: Choose how many media items to extract (useful for large pages).
- Run Extraction: The node scans the page and returns all found media with details and download links.
Practical example: Educational content audit
Section titled “Practical example: Educational content audit”Let’s extract all educational media from a university course page to create an offline study collection.
What you configure:
- Media Types: Choose to include videos, audio files, and documents.
- Max Items: Limit the number of files to find (e.g., 50).
- Embedded: Look for content inside players (like YouTube frames) too.
What you get:
- Media List: A collection of found items, each with:
- Type: Video, audio, or document.
- URL: The link to download or view it.
- Details: Title, file size, and format (e.g., MP4, PDF).
- Counts: How many of each type were found.
Common settings
Section titled “Common settings”| Setting | Purpose | When to Use |
|---|---|---|
| Videos | Embedded players, direct video files | For multimedia content analysis |
| Audio | Audio players, podcast files, music | For audio content collection |
| Documents | Downloadable files, presentations | For document libraries and archives |
Troubleshooting
Section titled “Troubleshooting”- No media found: The page might still be loading - try adding a wait step before extraction, or the media might be behind login walls
- Missing embedded content: Some media requires user interaction to load - try scrolling or clicking on the page first
- Access denied errors: Some media may be protected or require authentication to access