How to Train Your Agent on 100+ Pages in 2 Minutes

You want your agent to know your entire website. But adding pages one by one? Nobody has time for that.

Maybe you've got a docs site with 200 pages. Maybe your help centre keeps growing. Maybe you've been putting this off because it sounds like a nightmare.

Good news: Chat Thing can crawl your entire site and add it to your agent automatically.

The old way vs the new way

The old way:

Copy URL. Paste. Click add. Repeat. Repeat. Repeat. Give up after 15 pages and hope nobody asks about the other 185.

The new way:

Give Chat Thing one URL. Click "Crawl". Go make a coffee. Come back to an agent that knows your entire website.

We'll discover up to 600 pages automatically, let you review them, and sync the lot in one go.

How to crawl your website

Go to your agent's tab (see documentation here).
Click New data source and select Website
Click the Crawl button
Enter your website's URL and hit Crawl
Wait while we discover your pages (larger sites take a few minutes)
Review the discovered URLs and toggle off any you don't want
Click Add URLs and then Synchronise

That's it. Your agent now knows your entire site.

Pro tip: Use content selectors

By default, we grab everything in the page body (minus headers and footers, which would just waste tokens by repeating on every page).

But you can get smarter. If your content lives in a specific container, like main or .article-content, set that as your CSS selector. Your agent gets cleaner data, and you use fewer storage tokens.

Worth spending 5 minutes on before you sync 200 pages.

If your site uses Basic Auth (that browser popup asking for username/password), we've got you covered. Just add your credentials in the scraping settings and we'll access the protected content. Other login types (like form-based logins or OAuth) aren't supported though - we can only crawl public pages or Basic Auth protected ones.

When crawling makes sense

Great for:

Documentation sites
Help centres and FAQs
Marketing sites with lots of pages
Blogs and content libraries
Any site where pages link to each other

Maybe not ideal for:

Single-page apps (crawling won't find much)
Sites with lots of duplicate content
Pages behind form logins or OAuth (Basic Auth is fine)

Keep it fresh with auto-sync

Once you've crawled your site, turn on auto-sync to keep it up to date. We'll check for new and changed pages on whatever schedule you set.

Your agent stays current without you lifting a finger.

Go try it

Got a website with more than a handful of pages? Give crawling a try. It takes about 2 minutes to set up, and your agent gets instant access to everything.

Create a free account and add your first website data source. Your future self will thank you.

Website Crawling: Train Your Agent on 100+ Pages in 2 Minutes

On this page

How to Train Your Agent on 100+ Pages in 2 Minutes

The old way vs the new way

How to crawl your website

Pro tip: Use content selectors

When crawling makes sense

Keep it fresh with auto-sync

Go try it

Related articles

Lead Capture is live: meet your visitors before the first message

Discord V2: Mentions, DMs, and Slash Commands

Your agent noticed something your team missed, AI analytics are here

Website Crawling: Train Your Agent on 100+ Pages in 2 Minutes

On this page

How to Train Your Agent on 100+ Pages in 2 Minutes

The old way vs the new way

How to crawl your website

Pro tip: Use content selectors

What about sites behind a login?

When crawling makes sense

Keep it fresh with auto-sync

Go try it

Related articles

Lead Capture is live: meet your visitors before the first message

Discord V2: Mentions, DMs, and Slash Commands

Your agent noticed something your team missed, AI analytics are here