Map URLs

The Map URLs node discovers all accessible URLs on a website. It crawls a starting URL and returns a structured list of pages found, which you can then pass to Scrape Page or Extract Data nodes for batch processing.

When to use Map URLs

Scenario	Why Map URLs
Scrape an entire site	Discover all product/blog/landing pages first, then scrape each
Monitor for new pages	Run daily to detect new URLs that weren't there yesterday
Build a URL inventory	Get a complete list of a competitor's public pages
Feed a For Each loop	Pass discovered URLs into a For Each → Scrape Page chain

Configuration

Field	Type	Required	Description
Start URL	FX formula	Yes	The base URL to start crawling from (e.g., `https://example.com`)
Discovery method	Select	Yes	How to find URLs (see methods below)
Max URLs	Number	No	Maximum number of URLs to return. Default: 100
URL filter	Text	No	Regex pattern to include/exclude URLs (e.g., `/blog/.*` to only get blog pages)
Respect robots.txt	Toggle	Yes	Whether to follow robots.txt directives. Default: Yes

Discovery methods

Method	What it does	Speed	Coverage
Sitemap	Reads the site's `sitemap.xml` file	Fast	Only pages in the sitemap
Link crawl	Follows `<a href>` links from the start URL	Slower	Discovers pages linked from the homepage
Both	Combines sitemap + link crawl results	Slowest	Most comprehensive

Output variables

{{map_urls.urls}}           → array of discovered URL objects
{{map_urls.urls[0].url}}    → the URL string
{{map_urls.urls[0].title}}  → page title (if available from sitemap)
{{map_urls.count}}          → total number of URLs found

Common patterns

Scrape all blog posts

Map URLs (sitemap, filter: /blog/) → For Each (url in urls) →
  Scrape Page (url) → Transformer (extract data) → Google Sheets (append)

Monitor for new pages

Schedule (daily) → Map URLs (sitemap) → Transformer (compare with yesterday's list) →
  If-Else (new URLs found?) → Slack (alert: "3 new pages discovered")

Competitive analysis

Manual Trigger → Map URLs (competitor site, max 500) → For Each →
  Scrape Page → TinyGPT (analyze content strategy) → Google Sheets (log findings)

URL filtering

The URL filter field accepts regex patterns:

Pattern	Matches
`/blog/.*`	Only blog pages
`/products/.*`	Only product pages
`.*\\.pdf$`	Only PDF files
`^(?!./tag/).$`	Everything except tag pages

Warning

Map URLs makes real HTTP requests to the target site. Respect rate limits and robots.txt. Crawling too aggressively can get your IP blocked. Start with small Max URLs values during testing.

Note

Map URLs costs 2 credits per execution. The credit cost is the same regardless of how many URLs are discovered.