Crawl Site
Crawls multiple pages starting from a URL. Follows links within the domain, respects path filters, and returns content from each page.
Type: TINY_CRAWL
Color: Orange (#F97316)
Credits: 2 per run
Tabs: Initialise → Configure → Test
Templates
| Template | Settings | Use case |
|---|---|---|
| Crawl Blog | Include: /blog/* | Index all blog posts |
| Crawl Documentation | Include: /docs/* | Index documentation pages |
Configure tab fields
| Field | Type | Required | Description |
|---|---|---|---|
| URL | FX formula | Yes | Starting URL (domain root or specific path) |
| Max pages | FX formula | Yes | Maximum pages to crawl (default: 10) |
| Include paths | String | No | Only crawl URLs matching this pattern (e.g., /blog/*) |
| Exclude paths | String | No | Skip URLs matching this pattern (e.g., /admin/*) |
How it works
- Starts at the given URL
- Extracts all links on the page
- Follows links that match the include pattern (if set) and don't match exclude
- Continues until max pages is reached or no more links to follow
- Returns content from all crawled pages
Output
Array of pages, each with URL and content.
Common patterns
Index a blog
Crawl Site (blog.example.com, include: /blog/*, max: 50) →
For Each (pages) → TinyGPT (summarize each) → Create Record (store in knowledge base)
Competitive analysis
Crawl Site (competitor.com/pricing, max: 5) →
TinyGPT (extract pricing structure) → Create Record
Warning
Large crawls (100+ pages) take time and consume credits. Start with a small max pages to test, then increase.