
About Data Sources and Connectors
You will see frequent references to the terms “data source” and “connector” in the Glean documentation.Data Sources
Data Sources are the platforms, services, or cloud apps where your data resides. These could be:Category | Example Apps |
---|---|
Cloud Storage | Box, OneDrive |
Outlook, Gmail | |
Communication | Slack, Teams |
Documentation | Confluence, Docusign |
Ticketing & Support | Jira, Zendesk |
Code & Engineering | GitHub, BitBucket |
HR | Workday, Lattice |
Sales & Marketing | Salesforce, Marketo |
Project Management | Asana, Monday |
…and more! |
Connectors
Connectors are the tools/integrations that Glean uses to connect to your data sources and crawl data from them. Today, Glean has 100+ connectors already built to allow you to connect to different data sources in use at your company. Connectors typically pull data from your data sources securely over API, but may also receive data from your data sources via a webhook.Select a Data Source to Connect
Navigate to Admin Console > Data sources and click the Add app button at the top-right.
Click the Add app button under Admin Console > Data sources and select a connector to configure

Follow the instructions carefully to configure the connector
- The contents of the asset itself (ie: spreadsheet, document, message, email, event, etc)
- Access permissions for the item (ie: which users have access to the item)
- Activities performed on the item (ie: when was the item created/posted/modified/viewed/etc and by which users?)
You must apply any API access permissions in the setup documents exactly as referenced.Failure to set the correct API access permissions will cause your Glean crawl to fail, or for data to be missing from Glean.
(Optional)Apply Crawling Restrictions
If you would like to restrict the content that Glean crawls, DO NOT start crawling after saving the connector configuration. Crawling restrictions can be applied from Admin Console > Data sources once the initial configuration for the datasource has been saved. The restrictions that are supported vary between apps, but most data sources support at least two of the following restrictions:- Time-based restrictions (eg: Only crawl created or accessed in the last 6 months)
- User-based restrictions (eg: Only crawl content from the specified users)
- Group-based restrictions (eg: Only crawl content from the specified AD group)
- Site/channel-based restrictions (eg: Only crawl content from the specified site or channel)
- Folder-based restrictions (eg: Only crawl content from within the specified folders)
Not all crawling restrictions are available in the UI: some can only be applied by Glean. Contact your Glean account team or Glean support for additional information.
Start Crawling
Once you have connected your data source, you can initiate the crawl for it. Crawling is the process in which Glean sifts through the data in each of your connected apps and indexes it for search. To start the crawl, click on the Start crawl button after setting up the connector configuration. You can also start the crawl later by selecting the app under Admin Console > Data sources, and selecting Start crawl.
Crawling can be started once the connector configuration is saved
Checking the Crawl Status
You can check the status of your crawl at any time by going to Admin Console > Data sources and reviewing the table of configured apps. Here, you will see information about the progress of the crawl, including how many documents have been indexed and any errors that may have occurred.
Admin Console > Data sources will display a list of all connected apps and their crawl status
For crawls of large data sources, or data sources with low rate limits, it is normal for the document count to be low initially and then exponentially increase over a few days.If the document count remains low after a few days, please check the permissions granted to the Glean connector and contact Glean support.
How long does a crawl take?
How long does a crawl take?
The initial crawl for any data source will always take a while; the total time of which is dependent on two key factors:
- The size of the data source (eg: number of documents/messages, and the size of each).
- The rate limit of the data source’s API.