Features & limitations
Supported objects
The Databricks connector indexes the following entities:- Dashboards: All dashboards, including content and permissions.
- Workspaces: Workspace metadata (except for Azure).
- Users and Groups: All users, account groups, and system groups.
Key features
- Comprehensive Indexing: The connector crawls dashboard content and metadata, including creation/update timestamps, title, and datasets.
- Permission Enforcement: It strictly enforces Databricks permissions, including explicit and inherited access, at query time.
- Unified Identity: It fetches user and group identities and memberships to accurately map permissions in Glean.
Limitations
- Crawling: The connector relies on full content crawls; it does not support incremental crawls or webhooks, so deletions are detected only during full crawls.
- API Scope: The connector’s indexing is limited to static objects and does not support audit activity or user actions due to API limitations. Some fields, such as dashboard author/updater, are not available for indexing.
- No Granular Permissions: Only admin-level assignment and access are supported for crawling.
- Azure Configuration: Azure Databricks does not support the “list all workspaces” API; you must provide explicit workspace URLs during setup.
Requirements
To set up the Databricks connector, you must meet the following requirements:- Technical: Your Databricks account must have AI/BI Dashboards enabled. A Databricks Premium Plan is required for API access.
- Administrative privileges: The user setting up the connector must have the necessary privileges to perform a complete crawl of your Databricks environment. These privileges apply to both the admin account and the Service Principal used for the setup. To ensure a complete and accurate crawl, these two roles are required:
- The account admin role is needed for a global view of all users and groups, which is essential for mapping permissions on dashboards shared across workspaces.
- The workspace admin role grants access to the dashboard data and permissions within each workspace.
- For more information, see Databricks administration introduction.
- Credentials: The connector uses an M2M OAuth flow. You must provide the following credentials from your Service Principal:
- Account URL
- OAuth Client ID
- OAuth Client Secret
- (For Azure Only): All workspace URLs you wish to index.
How to set up the connector
The setup process involves two main phases: preparing your Databricks environment to get the credentials, and then configuring the connector in Glean.Step 1: Prepare Databricks
This step requires you to obtain the three key credentials needed for the Glean connector: the Account URL, the OAuth Client ID, and the OAuth Secret.- Determine your account URL
- Sign in to Databricks as an Account Admin by visiting the link that corresponds to your cloud provider:
- After the page loads, copy the entire URL from your browser’s address bar. This is your unique Account URL.
- Generate a service principal
- In Databricks, navigate to the User management page and click on the Service principals tab.
- Click Add service principal, enter
Glean
as the name, and press Add. - Go to the new service principal’s page and verify that the Account Admin role is toggled on under the Roles tab.
- Generate OAuth credentials
- While on your service principal’s page, navigate to the Credentials & secrets tab.
- Under OAuth secrets, click Generate secret.
- Set the lifetime (720 days is recommended) and click Generate.
- The generated secret value is your OAuth Secret. Copy this value immediately as it will not be displayed again.
- The Client ID is the value in the Client ID field. Copy this value as well.
Step 2: Configure in Glean
- In the Glean Admin Console, navigate to Data Sources, select Add Data Source, and select Databricks.
- In the Databricks connector setup page, enter the three credentials you just obtained from Databricks into their corresponding fields:
- Account URL
- OAuth Client ID
- OAuth Secret
Step 3: Add Azure workspace URLs (For Azure customers only)
This step is only required if you are using Azure Databricks, as it does not support the “list all workspaces” API.- In Databricks, go to the Workspaces tab.
- For each workspace you wish to index, copy the Workspace URL and enter it into the Glean setup.
Step 4: Save and validate
Click Save in the Glean Admin Console to complete the setup. The connector will automatically perform an initial crawl and validate the permissions.FAQs
Why are account_admin and workspace_admin required?
Both roles are required to ensure the connector performs a complete and accurate crawl of your Databricks environment.- The connector needs Account Admin privileges to list all users and groups at the account level. This is essential because dashboards can have permissions assigned to users outside of their specific workspace. Without this global view, the connector cannot fully enforce permissions.
- The Workspace Admin role is required to access the dashboard data and its permissions within each workspace.
Why is my new workspace/dashboard missing?
If the workspace was added after the admin assignment task last ran, it may be missing. It will be picked up in the next full crawl.Do I need to configure anything special for Azure?
Yes, since Azure does not have a list-workspaces API, you must provide an array of workspace URLs.Can I crawl Databricks on GCP/AWS/Azure?
Yes, supported across all three. Only Azure requires special configuration and may have partial API support. For any questions or issues with this setup, please reach out to support@glean.com.Looking for the original version of this page? You can find the archived version here.