Map URLs

The Map URLs node discovers all accessible URLs on a website. It crawls a starting URL and returns a structured list of pages found, which you can then pass to Scrape Page or Extract Data nodes for batch processing.

When to use Map URLs

ScenarioWhy Map URLs
Scrape an entire siteDiscover all product/blog/landing pages first, then scrape each
Monitor for new pagesRun daily to detect new URLs that weren't there yesterday
Build a URL inventoryGet a complete list of a competitor's public pages
Feed a For Each loopPass discovered URLs into a For Each → Scrape Page chain

Configuration

FieldTypeRequiredDescription
Start URLFX formulaYesThe base URL to start crawling from (e.g., https://example.com)
Discovery methodSelectYesHow to find URLs (see methods below)
Max URLsNumberNoMaximum number of URLs to return. Default: 100
URL filterTextNoRegex pattern to include/exclude URLs (e.g., /blog/.* to only get blog pages)
Respect robots.txtToggleYesWhether to follow robots.txt directives. Default: Yes

Discovery methods

MethodWhat it doesSpeedCoverage
SitemapReads the site's sitemap.xml fileFastOnly pages in the sitemap
Link crawlFollows <a href> links from the start URLSlowerDiscovers pages linked from the homepage
BothCombines sitemap + link crawl resultsSlowestMost comprehensive

Output variables

{{map_urls.urls}}           → array of discovered URL objects
{{map_urls.urls[0].url}}    → the URL string
{{map_urls.urls[0].title}}  → page title (if available from sitemap)
{{map_urls.count}}          → total number of URLs found

Common patterns

Scrape all blog posts

Map URLs (sitemap, filter: /blog/) → For Each (url in urls) →
  Scrape Page (url) → Transformer (extract data) → Google Sheets (append)

Monitor for new pages

Schedule (daily) → Map URLs (sitemap) → Transformer (compare with yesterday's list) →
  If-Else (new URLs found?) → Slack (alert: "3 new pages discovered")

Competitive analysis

Manual Trigger → Map URLs (competitor site, max 500) → For Each →
  Scrape Page → TinyGPT (analyze content strategy) → Google Sheets (log findings)

URL filtering

The URL filter field accepts regex patterns:

PatternMatches
/blog/.*Only blog pages
/products/.*Only product pages
.*\\.pdf$Only PDF files
^(?!.*/tag/).*$Everything except tag pages
Warning

Map URLs makes real HTTP requests to the target site. Respect rate limits and robots.txt. Crawling too aggressively can get your IP blocked. Start with small Max URLs values during testing.

Note

Map URLs costs 2 credits per execution. The credit cost is the same regardless of how many URLs are discovered.