Connect Data Sources
In this section, you will learn how to connect the sources of data that Glean will crawl and index for search
About Data Sources and Connectors
You will see frequent references to the terms “data source” and “connector” in the Glean documentation.
Data Sources
Data Sources are the platforms, services, or cloud apps where your data resides. These could be:
Category | Example Apps |
---|---|
Cloud Storage | Box, OneDrive |
Outlook, Gmail | |
Communication | Slack, Teams |
Documentation | Confluence, Docusign |
Ticketing & Support | Jira, Zendesk |
Code & Engineering | GitHub, BitBucket |
HR | Workday, Lattice |
Sales & Marketing | Salesforce, Marketo |
Project Management | Asana, Monday |
…and more! |
Connectors
Connectors are the tools/integrations that Glean uses to connect to your data sources and crawl data from them. Today, Glean has 100+ connectors already built to allow you to connect to different data sources in use at your company.
Connectors typically pull data from your data sources securely over API, but may also receive data from your data sources via a webhook.
Select a Data Source to Connect
Navigate to Admin Console > Data sources and click the Add app button at the top-right.
Click the Add app button under Admin Console > Data sources and select a connector to configure
Select the data source that you want to connect Glean to and follow the instructions that are presented on-screen.
Follow the instructions carefully to configure the connector
Connector configuration is typically achieved via OAuth and/or via installing Glean via your app’s marketplace/store (e.g. Atlassian Marketplace, Box App Center, etc).
As part of the setup flow for each connector, your API credentials and permissions will be validated.
For each piece of data within a datasource, Glean will crawl 3 things:
- The contents of the asset itself (ie: spreadsheet, document, message, email, event, etc)
- Access permissions for the item (ie: which users have access to the item)
- Activities performed on the item (ie: when was the item created/posted/modified/viewed/etc and by which users?)
Glean only asks for the most minimal permissions to perform the above, however, this varies between datasources based on the capabilities of the API provided by the cloud service.
You must apply any API access permissions in the setup documents exactly as referenced.
Failure to set the correct API access permissions will cause your Glean crawl to fail, or for data to be missing from Glean.
(Optional)Apply Crawling Restrictions
If you would like to restrict the content that Glean crawls, DO NOT start crawling after saving the connector configuration.
Crawling restrictions can be applied from Admin Console > Data sources once the initial configuration for the datasource has been saved.
The restrictions that are supported vary between apps, but most data sources support at least two of the following restrictions:
- Time-based restrictions (eg: Only crawl created or accessed in the last 6 months)
- User-based restrictions (eg: Only crawl content from the specified users)
- Group-based restrictions (eg: Only crawl content from the specified AD group)
- Site/channel-based restrictions (eg: Only crawl content from the specified site or channel)
- Folder-based restrictions (eg: Only crawl content from within the specified folders)
For most apps, greenlisting (explicit inclusion), and redlisting (explicit exclusion) are typically both supported.
Not all crawling restrictions are available in the UI: some can only be applied by Glean. Contact your Glean account team or Glean support for additional information.
Start Crawling
Once you have connected your data source, you can initiate the crawl for it.
Crawling is the process in which Glean sifts through the data in each of your connected apps and indexes it for search.
To start the crawl, click on the Start crawl button after setting up the connector configuration.
You can also start the crawl later by selecting the app under Admin Console > Data sources, and selecting Start crawl.
Crawling can be started once the connector configuration is saved
Checking the Crawl Status
You can check the status of your crawl at any time by going to Admin Console > Data sources and reviewing the table of configured apps.
Here, you will see information about the progress of the crawl, including how many documents have been indexed and any errors that may have occurred.
Admin Console > Data sources will display a list of all connected apps and their crawl status
For crawls of large data sources, or data sources with low rate limits, it is normal for the document count to be low initially and then exponentially increase over a few days.
If the document count remains low after a few days, please check the permissions granted to the Glean connector and contact Glean support.
Was this page helpful?