Data sources

Website

ChatGPT trained on your website in minutes!

The website data source allows you to create a custom ChatGPT bot that learns from the content of your website in minutes! Perfect for creating customer support bots, sales assistants, documentation bots, and more.

Once your bot has all your content, you can easily embed it on your site as a chat widget or iFrame.


Adding a website data source to your bot

You can add a website data source to your bot to let it use content from any website in its answers.

  1. From your bot’s dashboard, click the “Add” button in the website data source box.

web-card.png

  1. In the web page data source settings window that appears, you can now add web pages to be included in your bot. There are a number of ways to add pages to your website data source:
  • "Add" - Add a single URL to the data source
  • "Bulk Add" - Add multiple URLs at once
  • "Sitemap XML" - Add a link to a sitemap to add all the URLs it contains
  • "Crawl" - Add a single URL and crawl the website to discover other links. Will add a maximum of 300 URLs.

web.png

  1. Check the advanced settings (see below). It's generally a good idea to set a content selector to control which parts of the web page we load into your data source.

  2. Once you have added all the pages you want to include, click “Save & sync data source”. Your web page data source will begin to synchronise and will be ready for the bot to use shortly.

web-add.png

Do you support automatic crawling?

Yes, you can automatically crawl a website to discover all of its content. From within the web data source settings screen, click the "Crawl" button:

Add the address of the site you would like to crawl and click the "Crawl" button:

web-crawl.png

💡

Be patient

Please be patient while your crawl runs. Depending on the size and speed of the site, it can take a number of minutes.

When the crawl has finished, you will be returned to the web data source settings where you will be able to see all of the URLs added as a result of the crawl.

Advanced settings

There are few advanced settings which do require some additional knowledge, the first option is the Content selector or CSS selector, this allows you to control which parts of your website we load.

The default is body, which is a safe default as it's present in all webpages. However, this default will pick up the site's header and footer, which are generally duplicated across all pages. Under most circumstances, this isn't what you want, since it will result in a sync using up far more storage tokens than required. Now, we won't go into too much detail here, but this is certainly worth a little research and a review of your site's structure to work out the optimum CSS selector to use. It's worth trying this out on a single URL to fine tune it before attempting to sync your entire site.

💡

Please be aware that the structure of your website could vary between sections, and since the CSS selector can only be defined per data source, it may be helpful to split out differing sections into separate data sources so that the CSS selector can be varied as required.

If your website is protected by Basic Auth, we also provide inputs to provide the username and password so our syncing system can access protected content within your website.

Xnapper-2023-08-07-10.30.21.png

Previous
Notion