Setup instructions
- Navigate to the Glean admin console.
- Go to Data sources tab.
- Click Add data source and select Website.
- Configure the Website connector as described in the Website connector guide.
- Click Validate setup (optional).
-
Use the Crawl summary and Test a specific URL options to verify your configuration and resolve potential indexing issues before scheduling a full crawl.
Pages that require client-side rendering (CSR) cannot be previewed.

Crawl summary
The Crawl summary allows you to verify your site configuration by performing a limited crawl to preview the content and links accessible to Glean.- Click Validate setup in the Crawl summary panel. This crawls a few pages from your site and show you what content and links Glean can access.
Test a specific URL
Testing a URL verifies that Glean can access it under your current configuration. It is important to note that for a page to be crawled and indexed, it must be linked from your starting URLs and reachable from the seed URL.- Paste a URL that you want to verify.
- Click Test crawl.
- Review the results:
- Verify that the request succeeded and was not redirected to a login or error page.
- Confirm that the main page content and child links are correctly identified.
Expected behavior and outcomes
Successful configuration
If the configuration is correct, the preview should:- Load page content that matches what an end user sees in the browser.
- Show a representative sample of child links consistent with:
- The configured seed URLs
- Include/exclude patterns
- Other advanced site fetch rules
- Glean is targeting the correct areas of your site.
- Glean is not being blocked by authentication or robots-equivalent constraints (within the connector’s supported capabilities).
Misconfigurations and errors
When configs are incorrect, the preview will typically surface issues immediately, such as:-
Authentication problems
- Redirects to login pages instead of the expected content.
- Pages that look like “access denied” or generic auth failures.
-
Site fetch issues
- The crawl returns no child links, or significantly fewer than you expect based on your site’s structure and configured starting URLs.
- The crawl includes pages from unintended sections (for example, marketing pages instead of documentation) despite your defined include/exclude rules.
-
Advanced configuration and CSR issues
- Partial content rendering: Pages display only part of their content because elements that depend on JavaScript (client-side rendering) are not loaded during the crawl preview.