Crawl Site

Crawls multiple pages starting from a URL. Follows links within the domain, respects path filters, and returns content from each page.

Type: TINY_CRAWL Color: Orange (#F97316) Credits: 2 per run Tabs: Initialise → Configure → Test

Templates

TemplateSettingsUse case
Crawl BlogInclude: /blog/*Index all blog posts
Crawl DocumentationInclude: /docs/*Index documentation pages

Configure tab fields

FieldTypeRequiredDescription
URLFX formulaYesStarting URL (domain root or specific path)
Max pagesFX formulaYesMaximum pages to crawl (default: 10)
Include pathsStringNoOnly crawl URLs matching this pattern (e.g., /blog/*)
Exclude pathsStringNoSkip URLs matching this pattern (e.g., /admin/*)

How it works

  1. Starts at the given URL
  2. Extracts all links on the page
  3. Follows links that match the include pattern (if set) and don't match exclude
  4. Continues until max pages is reached or no more links to follow
  5. Returns content from all crawled pages

Output

Array of pages, each with URL and content.

Common patterns

Index a blog

Crawl Site (blog.example.com, include: /blog/*, max: 50) →
  For Each (pages) → TinyGPT (summarize each) → Create Record (store in knowledge base)

Competitive analysis

Crawl Site (competitor.com/pricing, max: 5) →
  TinyGPT (extract pricing structure) → Create Record
Warning

Large crawls (100+ pages) take time and consume credits. Start with a small max pages to test, then increase.