Getting Started
Access the data sources management interface by navigating to th Glean admin console (click the wrench icon in the left-hand navigation menu). You’ll land on the Data Sources page by default.Monitoring Data Sources
The data sources page provides an overview of all your connected sources and their current status.Status indicators like crawl status, document count, and content crawl may not always reflect real-time information. When you see “Crawl in progress”, content should still be searchable as long as:
- There is a document count displayed
- The data source is set as visible
If content isn’t appearing in search results despite meeting these conditions, please submit a support ticket.
Best practices
- During onboarding, monitor Crawl Rate to spot stalls early and confirm work is ongoing.
- In steady state, monitor Change Rate to ensure day‑to‑day updates are flowing. You can sort by the column to surface the noisiest or quietest sources.
- Investigate sustained 0 values when you expect activity, for example, misconfiguration, scope issues, throttling.
Troubleshooting
- If Crawl Rate is 0 for longer than expected during initial sync, re‑check connector permissions and configuration.
- If Change Rate is 0 but you expect edits, verify that the connector type supports the metric and confirm there were recent changes in the source system.
These metrics indicate activity, not a complete health verdict. Use them alongside other status indicators and error surfacing.
Setting Up New Data Sources
1
Add Data Source
Click the “Add data source” button to view available native connectors. You can add:
- Pre-configured data sources with native connectors
- Custom websites
- Custom data sources
2
Configure Settings
Provide necessary credentials and configurations for crawling. You can save your progress and resume setup later.
3
Manage Data
Define rules to control what content Glean can crawl:
- Set inclusion/exclusion rules
- Configure content type filters
- Customize crawling parameters
This step becomes available after saving the initial setup.
4
Start Crawling
Choose to either:
- Start crawling immediately
- Schedule crawling for later (can be initiated from the Get Started tab under Review Data Source Crawl)
Visibility Controls
Best Practices for New Data Sources
- Initially set visibility to “Visible to test group only”
- Configure your test group through the Manage test group option
- Let the test group verify search results and content accuracy
- Once verified, set visibility to “visible to everyone”
Content Overview
Clicking into a specific data source reveals detailed information about:- Crawl status
- Content indexing progress
- Document types (e.g., for Dropbox):
FolderDocumentSpreadsheetPaperVideo
Advanced Configuration
Managing Data Rules
Some data sources (like Google Drive) offer a dedicated “Manage Data” tab for advanced configuration options:- Define inclusion/exclusion rules
- Set content type filters
- Configure crawling parameters
If you need custom configurations for data sources without a “Manage Data” tab, contact support or your Glean representative. We continuously expand self-service configuration options.
Looking for the original version of this page? You can find the archived version here.