SharePoint & OneDrive Connector Overview

The Glean SharePoint and OneDrive connectors enable secure and efficient data fetching from the Microsoft 365 (M365) SharePoint and OneDrive platforms. User permissions are strictly enforced, and all data remains securely within your Glean environment.

  • Glean requires authentication to the M365 instance to fetch relevant information.
  • Authentication is accomplished by creating and registering an App Registration for each deployment.
  • Glean understands all user access permissions and strictly enforces permissions for users at the time of the query. This ensures that users are not able to see results that they do not have access to.
  • Quicklinks are provided to quickly create Word, Excel, and PowerPoint documents in OneDrive.

Integration Features

OneDrive

For OneDrive, Glean indexes the following content:

  • Folders
  • Documents (All document types, e.g. Word, Excel, PowerPoint)
  • OneNote (limited support, indexing Notebooks + Sections)

SharePoint

Glean will capture:

  • Site Pages (web part or wiki page libraries)
  • Site Drives (document libraries)
  • Other Site Lists (Basic List and Calendar List items) [optional configuration, not by default]

Objects Supported

  • Folders: Captured and indexed within OneDrive & SharePoint
  • Documents: Various types stored in OneDrive & SharePoint
  • Native File Types: Office including Word, Excel, PowerPoint, etc.
  • Content from Personal and Shared Drives: Supported from both personal and shared drives

API Usage & Permissions

Glean will use the standard Graph API v1.0 and SharePoint REST API to ingest data. We use application permissions with admin-granted access.

Glean uses the recommended best practices strategy provided by Microsoft to both crawl and record incremental changes for all documents.

Setup Prerequisites

A tenant administrator (with global admin privileges for both the Azure/Entra ID and SharePoint admin portals) is required to set up several dedicated service applications granted with the required privileges above.

Sites.Selected

The Sites.Selected permission can be leveraged instead of Sites.FullControl.All, however there are some significant trade-offs:

  • Each individual site to be crawled must be explicitly set in the Glean admin UI.
  • Updates to content and content permissions are only updated every 24 hours.
  • Ranking for search and generative AI results will be heavily impacted as Sites.Selected prevents the crawler from accessing activity and content metadata.

Files.ReadWrite.All

Glean subscribes to the webhook events for all files in the tenant. This allows Glean to react to changes in content and permissions in near real-time.

The minimum permission available for webhooks to be both set up and re-authenticated is Files.ReadWrite.All.

SharePoint REST API Permissions

In order to read data from the SharePoint REST API and crawl site collections, site content, and content permissions via REST, the FullControl permission need to be granted. Microsoft does not provide granular controls or a dedicated read scope for these data endpoints in the SharePoint REST API, so the FullControl permission is required.

Versions Supported

There are no specific version limitations of the SharePoint & OneDrive connector.

License Tier(s) Required

There are no specific license tier requirements for the SharePoint & OneDrive connector.

Looking for the original version of this page? You can find the archived version here.