Skip to main content

Getting Started

Access the data sources management interface by navigating to the Glean admin console (click the wrench icon in the left-hand navigation menu). You’ll land on the Data Sources page by default.

Monitoring Data Sources

The data sources page provides an overview of all your connected sources and their current status.

Understanding Sync Status

When monitoring your data sources, you’ll see them organized into sections that reflect their position in the connector lifecycle:
  • Initial sync
  • All data sources
The sections below break down each phase in more detail.

Initial sync

Initial sync is split into two phases to help you understand where each data source is in the sync process:
1
CrawlingThe data source is actively fetching content, metadata, and permissions from the source system. A data source typically remains in this phase while Glean retrieves the initial corpus of data.It’s normal for a data source to remain in this phase for a while, especially for large data sources or data sources with strict API rate limits. As long as Items synced continues to increase (even slowly), the crawler is making healthy progress.
2
IndexingThe data source is processing crawled content and incorporating it into the Knowledge Graph. A data source moves to this phase as crawling completes or nears completion.Indexing can take time for large volumes of content or during periods of high system load. You can monitor Items synced to confirm that indexing is progressing. Steady increases indicate healthy operation.

All data sources

Once both crawling and indexing are complete, the data source moves from Initial sync in progress to All data sources. At this point, the connector has completed its initial sync and is in steady-state operation, processing incremental updates via webhooks and periodic crawls. You can return here to monitor Change rate (items/day) to ensure ongoing freshness.

Using Metrics for Troubleshooting

You can use Items synced and Change rate (items/day) together with the sync status to troubleshoot connector health:
  • Items synced should increase over time. Metrics refresh on an hourly cadence.
  • If Items synced stops increasing for 24+ hours, verify connector permissions,API throttling, or potential configuration issues.
  • For sources in Crawling, slow but steady growth in Items synced is typically normal, especially for large sources or those with API rate limits.
  • If Items synced is stagnant and the source remains in Indexing (step 2/2) for multiple days, contact Glean support for assistance.
  • Change rate (items/day) shows recent edits, additions, and deletions synced in the past 24 hours.
  • A Change rate of 0 may be normal if no changes occurred, but sustained 0 values when activity is expected could indicate webhook or incremental crawl issues.
  • High Change rate with flat Items synced may indicate heavy edit activity on existing content rather than new content being added.

Best Practices

  • During onboarding, monitor the Initial sync in progress section and Items synced metric to spot stalls early and confirm work is progressing.
  • In steady state, monitor Change rate (items/day) in All data sources to ensure day-to-day updates are flowing. You can sort by the column to surface the most or least active sources.
  • Investigate sustained 0 values in Items synced or Change rate when you expect activity (e.g., misconfiguration, scope issues, throttling).
Status indicators like crawl status, document count, and content crawl may not always reflect real-time information. When you see “Crawl in progress”, content should still be searchable as long as:
  • There is a document count displayed
  • The data source is set as visible
If content isn’t appearing in search results despite meeting these conditions, please submit a support ticket.
These metrics and status indicators provide visibility into connector activity and health, but don’t represent a complete health verdict. Use them alongside other status indicators and error surfacing for comprehensive monitoring.

Setting Up New Data Sources

1

Add Data Source

Click the “Add data source” button to view available native connectors. You can add:
2

Configure Settings

Provide necessary credentials and configurations for crawling. You can save your progress and resume setup later.
3

Manage Data

Define rules to control what content Glean can crawl:
  • Set inclusion/exclusion rules
  • Configure content type filters
  • Customize crawling parameters
This step becomes available after saving the initial setup.
4

Start Crawling

Choose to either:
  • Start crawling immediately
  • Schedule crawling for later (can be initiated from the Get Started tab under Review Data Source Crawl)

Visibility Controls

Best Practices for New Data Sources

  1. Initially set visibility to “Visible to test group only”
  2. Configure your test group through the Manage test group option
  3. Let the test group verify search results and content accuracy
  4. Once verified, set visibility to “visible to everyone”
You can create multiple test groups to manage different rollout phases (e.g., “POC Admins” for initial testing, “Beta Users” for broader validation, “GA Rollout” for final verification). Each data source can be assigned to a specific test group. For detailed information on managing test groups, see Managing Result Visibility.
If your organization uses group-based permissions, you can populate test groups with identity provider groups (Azure AD or Google Groups) instead of adding individual users. This simplifies management for large-scale rollouts.

Content Overview

Clicking into a specific data source reveals detailed information about:
  • Crawl status
  • Content indexing progress
  • Document types (e.g., for Dropbox):
    • Folder
    • Document
    • Spreadsheet
    • Paper
    • Video

Advanced Configuration

Managing Data Rules

Some data sources (like Google Drive) offer a dedicated “Manage Data” tab for advanced configuration options:
  • Define inclusion/exclusion rules
  • Set content type filters
  • Configure crawling parameters
If you need custom configurations for data sources without a “Manage Data” tab, contact support or your Glean representative. We continuously expand self-service configuration options.