A comprehensive guide to Glean’s crawling system, detailing how it optimizes data retrieval while respecting API limits and source application performance.
API Call Rate Management
Time-Based Controls
Platform | Full Crawl | Incremental | People Data | Activity | Update Rate | Webhook | Notes |
---|---|---|---|---|---|---|---|
GChat | 28d | 5m | 30m | N/A | N/A | N/A | |
Microsoft Teams | 30d | 1h | 1h | N/A | N/A | N/A | |
Slack | 28d | 3h | 1h | N/A | <5m | webhook | Incremental crawl addresses cases where webhook wasn’t delivered |
Slack Enterprise | 28d | N/A | 1h | 10m | 5m | N/A | SEG uses activity crawl to find channels with recent activity, then queues incremental crawls |
Yammer | 1h | N/A | 1h | N/A | N/A | N/A |
Platform | Full Crawl | Incremental | People Data | Activity | Update Rate | Webhook | Notes |
---|---|---|---|---|---|---|---|
Box | 28d | 10m | 1h | 1m | 10m | API | Uses Events API to identify new/modified/deleted docs |
GDrive | 28d | 3h | 1h | 1m | 10m | API | Uses Reports API for 10-minute updates. Activity reports every 10m, with 12h recrawl for missed events |
OneDrive/SharePoint | 28d | 1h | 10m | 10m | 10m | API | Uses User Insights API for 10-minute updates with hourly incremental fetch |
Notion | 6h | N/A | 1h | N/A | N/A | N/A | |
Quip | 28d | 10m | 10m | 10m | N/A | API |
Platform | Full Crawl | Incremental | People Data | Activity | Update Rate | Webhook | Notes |
---|---|---|---|---|---|---|---|
Asana | 7d | 10m | 1h | 10m | 10m | webhook | |
Bitbucket | 28d | 10m | 1h | N/A | N/A | N/A | |
GitHub | 28d | 10m | 10m | N/A | N/A | webhook | |
GitLab | 28d | 10m | 1h | N/A | <5m | webhook | PR/issue/comment updates trigger immediate crawls; 10m incremental crawls |
Jira | 7d | 3h | 10m | 30m | <5m | webhook | Webhooks for issue/comment modifications |
Trello | 1d | N/A | 30m | N/A | N/A | webhook | Requires Jira or Confluence setup |
Platform | Full Crawl | Incremental | People Data | Activity | Update Rate | Webhook | Notes |
---|---|---|---|---|---|---|---|
Azure | N/A | N/A | 1h | N/A | N/A | N/A | Hourly people data crawl with additional hour for indexing |
BambooHR | N/A | N/A | 1h | N/A | N/A | N/A | Hourly people data crawl with additional hour for indexing |
Greenhouse | 28d | 1h | 10m | N/A | N/A | N/A | |
Lattice | 1d | N/A | 1h | N/A | N/A | N/A | Hourly people data crawl with additional hour for indexing |
Lever | 1h | 1h | 10m | N/A | N/A | N/A | |
People Data API | N/A | N/A | 1h | N/A | 1h | API | Real-time updates via push API |
Pingboard | N/A | N/A | 1h | N/A | N/A | N/A |
Platform | Full Crawl | Incremental | People Data | Activity | Update Rate | Webhook | Notes |
---|---|---|---|---|---|---|---|
Aha | 1d | 1h | 1h | N/A | N/A | N/A | |
Airtable | 1d | 1h | 1h | N/A | N/A | N/A | |
Coda | 7d | N/A | 1h | N/A | N/A | N/A | |
Confluence | 7d | 1h | 10m | N/A | 5m | webhook | New DC versions use webhooks for 5m updates; hourly fetch for older versions |
Guru | 1d | 10m | 1h | 10m | 1h | N/A | |
Miro | 1d | 1h | 30m | 10m | 1m | N/A | |
SmartSheet | 1h | 10m | 10m | N/A | N/A | N/A |
Platform | Full Crawl | Incremental | People Data | Activity | Update Rate | Webhook | Notes |
---|---|---|---|---|---|---|---|
Salesforce | 28d | 10m | 1h | N/A | 10m | N/A | |
ServiceNow | 3d/30m* | 1h | 1h | 30m | 3d | N/A | *3d for Knowledge Articles, 30m for Catalog Items |
Zendesk | 28d | 1h | 1h | N/A | 1h | N/A | |
Freshservice | 1h | N/A | 10m | N/A | N/A | N/A | |
Monday.com | 1h | N/A | 10m | N/A | N/A | N/A | |
Pager Duty | 1h | 10m | 10m | 10m | N/A | API |
Platform | Full Crawl | Incremental | People Data | Activity | Update Rate | Webhook | Notes |
---|---|---|---|---|---|---|---|
Brightspot | 1h | N/A | 10m | N/A | N/A | N/A | |
Google Sites | 1d | 4h | N/A | N/A | N/A | N/A | |
Web pages | 1d | N/A | N/A | N/A | 1d | N/A | Default daily crawl; fully configurable |
Wordpress | 12h | 10m | N/A | N/A | N/A | N/A |
Platform | Full Crawl | Incremental | People Data | Activity | Update Rate | Webhook | Notes |
---|---|---|---|---|---|---|---|
Docebo | 1h | N/A | 1h | N/A | N/A | N/A | |
Egnyte | 1d | 10m | 1h | N/A | N/A | N/A | |
Fifteen Five | 1h | N/A | 1h | N/A | N/A | N/A | |
Figma | 30m | N/A | 30m | N/A | N/A | N/A | |
Gmail | N/A | N/A | N/A | N/A | 0m | N/A | Real-time updates via federated API |
Gong | 1h | N/A | 1h | N/A | N/A | N/A | |
Google Groups | 7d | 1d | N/A | N/A | N/A | N/A | |
Highspot | 4h | N/A | 1h | N/A | N/A | N/A | |
Looker | 1h | N/A | 1h | N/A | N/A | N/A | External connector |
LumApps | 1h | N/A | 10m | 10m | 1m | N/A | Hourly document fetch with corpus size-dependent processing |
Okta | 3h | N/A | 3h | 1h | N/A | N/A | Additional hour for people data indexing |
Seismic | 28d | 1d | 10m | 1d | N/A | API | |
Simpplr | 28d | 1h | 1h | N/A | N/A | N/A | |
Stack Overflow | 2h | N/A | 1h | N/A | 3h | N/A | |
Tableau | 1d | 45m | 45m | N/A | N/A | N/A |
Platform | Full Crawl | Incremental | People Data | Activity | Update Rate | Webhook | Notes |
---|---|---|---|---|---|---|---|
Push API for Content | N/A | N/A | N/A | N/A | N/A | N/A | Customer controls update frequency via bulk upload APIs and real-time CRUD operations |