Map URLs
The Map URLs node discovers all accessible URLs on a website. It crawls a starting URL and returns a structured list of pages found, which you can then pass to Scrape Page or Extract Data nodes for batch processing.
When to use Map URLs
| Scenario | Why Map URLs |
|---|---|
| Scrape an entire site | Discover all product/blog/landing pages first, then scrape each |
| Monitor for new pages | Run daily to detect new URLs that weren't there yesterday |
| Build a URL inventory | Get a complete list of a competitor's public pages |
| Feed a For Each loop | Pass discovered URLs into a For Each → Scrape Page chain |
Configuration
| Field | Type | Required | Description |
|---|---|---|---|
| Start URL | FX formula | Yes | The base URL to start crawling from (e.g., https://example.com) |
| Discovery method | Select | Yes | How to find URLs (see methods below) |
| Max URLs | Number | No | Maximum number of URLs to return. Default: 100 |
| URL filter | Text | No | Regex pattern to include/exclude URLs (e.g., /blog/.* to only get blog pages) |
| Respect robots.txt | Toggle | Yes | Whether to follow robots.txt directives. Default: Yes |
Discovery methods
| Method | What it does | Speed | Coverage |
|---|---|---|---|
| Sitemap | Reads the site's sitemap.xml file | Fast | Only pages in the sitemap |
| Link crawl | Follows <a href> links from the start URL | Slower | Discovers pages linked from the homepage |
| Both | Combines sitemap + link crawl results | Slowest | Most comprehensive |
Output variables
{{map_urls.urls}} → array of discovered URL objects
{{map_urls.urls[0].url}} → the URL string
{{map_urls.urls[0].title}} → page title (if available from sitemap)
{{map_urls.count}} → total number of URLs found
Common patterns
Scrape all blog posts
Map URLs (sitemap, filter: /blog/) → For Each (url in urls) →
Scrape Page (url) → Transformer (extract data) → Google Sheets (append)
Monitor for new pages
Schedule (daily) → Map URLs (sitemap) → Transformer (compare with yesterday's list) →
If-Else (new URLs found?) → Slack (alert: "3 new pages discovered")
Competitive analysis
Manual Trigger → Map URLs (competitor site, max 500) → For Each →
Scrape Page → TinyGPT (analyze content strategy) → Google Sheets (log findings)
URL filtering
The URL filter field accepts regex patterns:
| Pattern | Matches |
|---|---|
/blog/.* | Only blog pages |
/products/.* | Only product pages |
.*\\.pdf$ | Only PDF files |
^(?!.*/tag/).*$ | Everything except tag pages |
Warning
Map URLs makes real HTTP requests to the target site. Respect rate limits and robots.txt. Crawling too aggressively can get your IP blocked. Start with small Max URLs values during testing.
Note
Map URLs costs 2 credits per execution. The credit cost is the same regardless of how many URLs are discovered.