Skip to main content
This article provides comprehensive documentation for the Glean Databricks connector, which allows you to index and search Databricks AI/BI Dashboards (formerly Lakeview Dashboards) and associated metadata. This integration brings your Databricks knowledge into Glean’s unified enterprise search, automatically enforcing permissions to ensure users only see authorized content. The connector is designed for Databricks deployments on AWS, GCP, or Azure.

Features & limitations

Supported objects

The Databricks connector indexes the following entities:
  • Dashboards: All dashboards, including content and permissions.
  • Workspaces: Workspace metadata (except for Azure).
  • Users and Groups: All users, account groups, and system groups.

Key features

  • Comprehensive Indexing: The connector crawls dashboard content and metadata, including creation/update timestamps, title, and datasets.
  • Permission Enforcement: It strictly enforces Databricks permissions, including explicit and inherited access, at query time.
  • Unified Identity: It fetches user and group identities and memberships to accurately map permissions in Glean.

Limitations

  • Crawling: The connector relies on full content crawls; it does not support incremental crawls or webhooks, so deletions are detected only during full crawls.
  • API Scope: The connector’s indexing is limited to static objects and does not support audit activity or user actions due to API limitations. Some fields, such as dashboard author/updater, are not available for indexing.
  • No Granular Permissions: Only admin-level assignment and access are supported for crawling.
  • Azure Configuration: Azure Databricks does not support the “list all workspaces” API; you must provide explicit workspace URLs during setup.

Requirements

To set up the Databricks connector, you must meet the following requirements:

Technical requirements

Your Databricks account must have AI/BI Dashboards enabled. A Databricks Premium Plan is required for API access.

Administrative requirements

The requirements for the Databricks connector depend on the configuration mode you select.
  • For Account-level configuration (Recommended):
    • User: The person performing the setup must be a Databricks Account Admin or have credentials for an existing service principal that has the Account Admin role. This is required to authorize the connector and link the service principal at the account level.
    • Service principal: Must be assigned the Account Admin role.
  • For Workspace-level configuration (Non-Account Admin):
    • User: Does not require the Account Admin role, but must be able to grant the service principal Workspace Admin access within each workspace being indexed and provide the required workspace details during setup.
    • Service principal: Must be a Workspace Admin on every workspace you intend to index.

Configuration patterns

The choice of configuration pattern significantly impacts how Glean resolves permissions and surfaces dashboards to your users.
  • Account-level (Recommended): In this mode, the service principal provides a unified view of all users and groups across the entire Databricks account. This is essential for accurately resolving permissions when dashboards are shared with groups that span multiple workspaces.
  • Workspace-level (Non-Account Admin): Use this mode only if your security policy prohibits granting the Account Admin role to a service principal. For details, see Permission limitations in Non-Account Admin mode.

Permission limitations in Non-Account Admin mode

In this mode, the connector can only see groups that are members of the specific workspaces provided for indexing. If a dashboard is shared with a group that is not a member of the crawled workspaces, Glean cannot verify that group’s membership. Consequently, permissions will not be applied, and users in that group will be unable to find those dashboards in Glean, even if they have access in Databricks. For more information, see Databricks administration introduction.

Credentials requirements

The connector uses an M2M OAuth flow. You must provide the following credentials from your Service Principal:
  • Account URL
  • OAuth Client ID
  • OAuth Client Secret
  • (For Azure Only): All workspace URLs you wish to index.

How to set up the connector

The setup process involves two main phases: preparing your Databricks environment to get the credentials, and then configuring the connector in Glean.

Step 1: Prepare Databricks

This step requires you to obtain the three key credentials needed for the Glean connector: the Account URL, the OAuth Client ID, and the OAuth Secret.
  1. Determine your account URL
  2. Generate a service principal
    • In Databricks, navigate to the User management page and click on the Service principals tab.
    • Click Add service principal, enter Glean as the name, and press Add.
    • Go to the new service principal’s page:
      • If you are using the account‑level configuration, ensure the Account Admin role is toggled on under the Roles tab.
      • If you plan to use non‑Account‑Admin mode, you can skip assigning the Account Admin role; instead, make sure the service principal is an admin on each workspace you want Glean to crawl.
  3. Generate OAuth credentials
    • While on your service principal’s page, navigate to the Credentials & secrets tab.
    • Under OAuth secrets, click Generate secret.
    • Set the lifetime (720 days is recommended) and click Generate.
    • The generated secret value is your OAuth Secret. Copy this value immediately as it will not be displayed again.
    • The Client ID is the value in the Client ID field. Copy this value as well.

Step 2: Configure in Glean

  1. In the Glean Admin Console, navigate to Data Sources, select Add Data Source, and select Databricks.
  2. In the Databricks connector setup page, enter the three credentials you just obtained from Databricks into their corresponding fields:
    • Account URL
    • OAuth Client ID
    • OAuth Secret

Step 3: Non account admin mode

Follow these steps only if you are configuring the connector without the Account Admin role. You must manually grant the service principal Workspace Admin access to every workspace you intend to index.
  1. In the Databricks Account Console, navigate to the Workspaces tab.
  2. For each workspace to be indexed, copy the Workspace name, Workspace URL, and Workspace ID, then enter them into the Glean setup page.
  3. Click Add workspace to include additional environments as needed.

Step 4: Save and validate

Click Save in the Glean Admin Console to complete the setup. The connector will automatically perform an initial crawl and validate the permissions.

FAQs

Why are account_admin and workspace_admin required?

Both roles are required to ensure the connector performs a complete and accurate crawl of your Databricks environment.
  • The connector needs Account Admin privileges to list all users and groups at the account level. This is essential because dashboards can have permissions assigned to users outside of their specific workspace. Without this global view, the connector cannot fully enforce permissions.
  • The Workspace Admin role is required to access the dashboard data and its permissions within each workspace.
No workaround is available at present until Databricks supports more granular permissions.

Why is my new workspace/dashboard missing?

If the workspace was added after the admin assignment task last ran, it may be missing. It will be picked up in the next full crawl.

Do I need to configure anything special for Azure?

Yes, since Azure does not have a list-workspaces API, you must provide an array of workspace URLs.

Can I crawl Databricks on GCP/AWS/Azure?

Yes, supported across all three. Only Azure requires special configuration and may have partial API support. For any questions or issues with this setup, please file a support ticket at https://support.glean.com/hc/en-us.
Looking for the original version of this page? You can find the archived version here.