Crawl Site

Crawls multiple pages starting from a URL. Follows links within the domain, respects path filters, and returns content from each page.

Type: TINY_CRAWL Color: Orange (#F97316) Credits: 2 per run Tabs: Initialise → Configure → Test

Templates

Template	Settings	Use case
Crawl Blog	Include: `/blog/*`	Index all blog posts
Crawl Documentation	Include: `/docs/*`	Index documentation pages

Field	Type	Required	Description
URL	FX formula	Yes	Starting URL (domain root or specific path)
Max pages	FX formula	Yes	Maximum pages to crawl (default: 10)
Include paths	String	No	Only crawl URLs matching this pattern (e.g., `/blog/*`)
Exclude paths	String	No	Skip URLs matching this pattern (e.g., `/admin/*`)

Array of pages, each with URL and content.

Crawl Site (blog.example.com, include: /blog/*, max: 50) →
  For Each (pages) → TinyGPT (summarize each) → Create Record (store in knowledge base)

Crawl Site (competitor.com/pricing, max: 5) →
  TinyGPT (extract pricing structure) → Create Record

Warning

Large crawls (100+ pages) take time and consume credits. Start with a small max pages to test, then increase.