Skip to main content

Getting Started

Access the data sources management interface by navigating to th Glean admin console (click the wrench icon in the left-hand navigation menu). You’ll land on the Data Sources page by default.

Monitoring Data Sources

The data sources page provides an overview of all your connected sources and their current status.
Status indicators like crawl status, document count, and content crawl may not always reflect real-time information. When you see “Crawl in progress”, content should still be searchable as long as:
  • There is a document count displayed
  • The data source is set as visible
If content isn’t appearing in search results despite meeting these conditions, please submit a support ticket.
Use the Crawl Rate and Change Rate columns on the Data sources page to monitor connector activity. During initial sync, watch Crawl Rate to confirm the crawler is actively progressing. After the initial sync completes, watch Change Rate to confirm that recent user changes are being ingested each day.

Best practices

  • During onboarding, monitor Crawl Rate to spot stalls early and confirm work is ongoing.
  • In steady state, monitor Change Rate to ensure day‑to‑day updates are flowing. You can sort by the column to surface the noisiest or quietest sources.
  • Investigate sustained 0 values when you expect activity, for example, misconfiguration, scope issues, throttling.

Troubleshooting

  • If Crawl Rate is 0 for longer than expected during initial sync, re‑check connector permissions and configuration.
  • If Change Rate is 0 but you expect edits, verify that the connector type supports the metric and confirm there were recent changes in the source system.
These metrics indicate activity, not a complete health verdict. Use them alongside other status indicators and error surfacing.

Setting Up New Data Sources

1

Add Data Source

Click the “Add data source” button to view available native connectors. You can add:
2

Configure Settings

Provide necessary credentials and configurations for crawling. You can save your progress and resume setup later.
3

Manage Data

Define rules to control what content Glean can crawl:
  • Set inclusion/exclusion rules
  • Configure content type filters
  • Customize crawling parameters
This step becomes available after saving the initial setup.
4

Start Crawling

Choose to either:
  • Start crawling immediately
  • Schedule crawling for later (can be initiated from the Get Started tab under Review Data Source Crawl)

Visibility Controls

Best Practices for New Data Sources

  1. Initially set visibility to “Visible to test group only”
  2. Configure your test group through the Manage test group option
  3. Let the test group verify search results and content accuracy
  4. Once verified, set visibility to “visible to everyone”

Content Overview

Clicking into a specific data source reveals detailed information about:
  • Crawl status
  • Content indexing progress
  • Document types (e.g., for Dropbox):
    • Folder
    • Document
    • Spreadsheet
    • Paper
    • Video

Advanced Configuration

Managing Data Rules

Some data sources (like Google Drive) offer a dedicated “Manage Data” tab for advanced configuration options:
  • Define inclusion/exclusion rules
  • Set content type filters
  • Configure crawling parameters
If you need custom configurations for data sources without a “Manage Data” tab, contact support or your Glean representative. We continuously expand self-service configuration options.
Looking for the original version of this page? You can find the archived version here.
I