Skip to main content

Crawl types

Glean runs several types of crawls, each serving a different purpose in keeping search results accurate and fresh.

Crawl types

Full content crawl

A comprehensive process that indexes the entire corpus of a connector. These crawls are scheduled at regular intervals to ensure complete dataset accuracy in the search index.

Incremental content crawl

An efficient update strategy that focuses on modified or newly added content since the previous crawl, optimizing resource usage by avoiding full repository scans.

Activity crawl

A continuous monitoring process that tracks and indexes specific changes within a connector, including content additions, updates, deletions, and permission modifications.

Identity crawl

A specialized process for retrieving and updating identity-related information across various connectors.

People data

Organizational information about individuals, encompassing names, titles, email addresses, departmental affiliations, and other relevant attributes.

Update rate

The frequency at which the system performs incremental fetches to update or refresh data from different sources to ensure that the latest information is available.

Configuration

info

All crawling frequencies are default settings. These values can be customized to meet specific organizational needs. Contact Glean Support to adjust crawl frequencies for your deployment.

Organizations can fine-tune their crawling configuration in the following ways:

API call rate management

Administrators can configure:

  • The rate of API calls per second
  • The number of concurrent API calls
  • Dynamic exponential backoff parameters for handling overload scenarios

Time-based controls

The system supports granular scheduling with different rates for:

  • Peak operational hours
  • Off-peak periods
  • Specific days of the week
  • Custom time windows