Note: The instructions below will work only for on-prem instances that the Glean Crawler running on GCP can access. Please reach out to Glean Support for any network configuration required.
Glean requires authentication to the GitHub instance in order to fetch relevant information.
Authentication is done by creating an application in GitHub.
Glean understands all user access permissions and strictly enforces permissions for users at the time of the query which ensures that users are not able to see results which they do not have access to.
It’s important to note that all data is stored in the GCP project in the customer’s cloud account and no data leaves the customer’s environment.
Glean uses the standard API to ingest all data.In order to capture changes as quickly as possible, Glean will deploy a webhook which will send push notifications to an endpoint deployed in the GCP project (in your cloud infrastructure).The app requests access to the following with a read-only scope:
Copy the following values into the corresponding fields in Glean:
App ID
Client ID
Client Secret
At the very bottom of the page, click “Generate a private key” It will download the key to your local machine. Upload this file into the corresponding field in Glean.
Exclusions/Redlisting repositories is possible, as well as control over which file extensions have full content indexed.
Users will be prompted to authenticate to GitHub oauth to help sync user aliases. They will not be able to see data in private repositories until the auth flow is completed for them. Once authentication is complete the next entity crawl will sync the aliases, which happens every hour.
For any questions or issues with this setup, please reach out to support@glean.com.