Common questions about crawling
How long will my crawl take?
How long will my crawl take?
Datasource Size
API Rate Limits
Estimating crawl completion time
If your datasource supports initial crawl estimates, you’ll have the option to enter an estimated document count during setup. Based on this input and historical data, you’ll see a projected time range for when the initial crawl is expected to finish.This feature is available for select data sources and is designed to give you a data-driven estimate, so you can better anticipate when your content will be ready to use in Glean. Initial crawl estimates are historical averages computed from past datasource crawls. Please note that actual crawl time can vary due to factors such as data volume, change frequency and structure.What if my crawl is taking a long time?
What if my crawl is taking a long time?
How can I restrict what Glean crawls?
How can I restrict what Glean crawls?
Time-based
User-based
Group-based
Site/Channel-based
What should I do if I see errors in my crawl status?
What should I do if I see errors in my crawl status?
- Verifying your datasource configuration
- Contacting Glean support if issues persist
Can I crawl multiple datasources simultaneously?
Can I crawl multiple datasources simultaneously?
How can I monitor crawl progress?
How can I monitor crawl progress?
- Job in progress: Indicates an active crawl
- Synced: Indicates a completed crawl

Why is my crawl stuck at 'Job in progress'?
Why is my crawl stuck at 'Job in progress'?
How do I delete a datasource?
How do I delete a datasource?
How do I stop or restart a crawl?
How do I stop or restart a crawl?
What is Crawl Rate?
What is Crawl Rate?
What is Change Rate?
What is Change Rate?
When do the Crawl Rate and Change Rate columns appear?
When do the Crawl Rate and Change Rate columns appear?
How often does Crawl Rate and Change Rate metrics update?
How often does Crawl Rate and Change Rate metrics update?
What does a “0” Crawl Rate mean?
What does a “0” Crawl Rate mean?
What does a “0” Change Rate mean?
What does a “0” Change Rate mean?
Why don’t I see Crawl Rate and Change Rate metrics for some connectors?
Why don’t I see Crawl Rate and Change Rate metrics for some connectors?
Does a high Crawl Rate mean everything is healthy?
Does a high Crawl Rate mean everything is healthy?
Is “Change rate” the number of documents added, deleted, or edited per day?
Is “Change rate” the number of documents added, deleted, or edited per day?
- Yes—conceptually, it’s the count of document change events Glean processed in the last 24 hours for an ongoing crawl. These events include creates (adds), updates (content/metadata/permissions), moves/renames, and deletes. Think of it as an activity “heartbeat” showing that new or changed content is actively being processed.
- In the original spec, the admin table highlights “added in the past day” as the simplest, user-friendly roll-up for ongoing crawls; internally, it’s backed by the change-event stream described above.
Why can Change rate be higher than items synced?
Why can Change rate be higher than items synced?
- Multiple changes to the same item: If one document is edited many times in a day, each edit is a separate change event. “Items synced” is a cumulative count of distinct items indexed, so it won’t rise with repeated edits.
- Updates and deletes don’t increase “items synced”: Edits and permission-only changes are counted in “Change rate” but do not add to the total items. Deletes can even decrease “items synced” while still incrementing the change count.
- Timing and pipeline lag: “Items synced” is a lagging, cumulative indicator that updates after indexing completes; “Change rate” reflects event processing activity within the last 24 hours and can surface earlier in the pipeline. Over short windows, you may see a high change rate without a corresponding immediate increase in the items total.
- Permission/metadata churn: Some connectors generate events for permission or metadata changes (e.g., access list updates), which raise “Change rate” even when no new items are added to the index.
What can administrators learn by viewing this “Change rate”?
What can administrators learn by viewing this “Change rate”?
- A live “heartbeat” of connector activity: It shows whether the data source is actively processing new or changed content in the last 24 hours (creates, edits, permission/metadata changes, deletes), so you can quickly confirm the crawl isn’t idle.
- Early stall detection: If Change rate flatlines while you expect activity, it’s a signal to check connector health (auth scopes, webhook subscriptions, errors) even before the total items count moves.
- Interpreting gaps vs. Items synced: Because Items synced is a cumulative, lagging indicator, a high Change rate with little movement in Items synced can indicate many edits/deletes or permission-only changes (which don’t add to the total).
- Validating configuration changes propagate: After updating inclusion/exclusion rules or visibility settings, a non-zero Change rate is a quick way to verify those changes are being picked up and applied by the pipeline.
- Spotting surges or operational events: Spikes can reflect bulk content uploads/migrations or large permission sweeps—useful operational context that can explain search result shifts or indexing load.
- Where it fits with Crawl rate: Use Change rate to monitor ongoing crawls; Crawl rate is the companion metric during initial syncs.
Is “Crawl rate” the number of tasks that read permissions/metadata rather than documents?
Is “Crawl rate” the number of tasks that read permissions/metadata rather than documents?
- Not exactly. For initial crawls, Crawl rate is the number of document parts discovered in the past hour. It’s a throughput metric of the initial sync, reported in “parts,” not a count of internal processing tasks, and it’s not limited to permissions/metadata-only operations
- In the admin UI this shows up as a parts/processing rate during initial sync. Once a data source moves to ongoing crawls, this column switches to show Change rate instead.
What can administrators learn by looking at “Crawl rate”?
What can administrators learn by looking at “Crawl rate”?
- A live initial‑sync heartbeat: If Crawl rate is non‑zero and changing, the initial crawl is progressing and discovering new parts; if it flatlines during initial sync, it suggests a stall that merits a health check.
- Speed and time‑to‑completion context: Because Crawl rate is “parts discovered per hour,” monitoring it alongside the initial crawl time estimate helps you gauge how quickly the initial sync will finish. The setup flow includes a “Get sync time estimate.”
- Interpreting vs. Items synced: Items synced is a cumulative, lagging count that updates after indexing completes. During initial sync, you may see Crawl rate activity without an immediate Items synced increase, especially when pipelines are still processing discovered parts.
- When the metric changes: Remember that Crawl rate is shown for initial syncs; once the source transitions to ongoing syncs, the table shows Change rate as the ongoing “heartbeat.”