Overview

Glean leverages both the Microsoft Graph API and the SharePoint REST API to fetch content from your company’s SharePoint instance.

Both the Microsoft Graph API and the SharePoint REST API have separate permission sets that need to be managed independently. For each API, this document will cover:

  • Each permission that Glean requests, why it is required, and how it is used.
  • Typical objections encountered when requesting the specified permission.
  • Alternatives for permissions that are not permitted as part of your organization’s Standard Operating Procedure (SOP).

Graph API Permissions

Overview

Glean adheres to the principles of least privilege and only requests the minimum permission scopes required to read data from the Graph API. The following permission scopes are required when integrating with SharePoint:

PermissionUsage
User.Read.AllUsed to list all the users within the directory. Used for mapping content permissions.
GroupMember.Read.AllUsed to expand the members of a group. Used for mapping content permissions.
Member.Read.HiddenAllows the app to read the memberships of any groups that are marked as “hidden” in Entra ID. Used for mapping content permissions.
Sites.Read.AllRetrieve sites, site content, and site metadata in order to map and index content. This is covered by Sites.FullControl.All, so is not explicitly needed.
Sites.FullControl.AllRetrieve details regarding permission update events (i.e. when site/content permissions change).
Files.Read.AllRetrieve items, metadata, and associated content for an item so it can be indexed. It is also used to create webhook subscriptions so that Glean can be alerted to and process changes as they occur. This is covered by Files.ReadWrite.All, so is not explicitly needed.
Files.ReadWrite.AllWebhooks expire periodically and require re-authorization. This permission is required by Microsoft to reauthorize a webhook subscription.
Reports.Read.AllUsed to report on crawler progress and to ensure your infrastructure is scaled correctly based on the volume of content expected.

All permissions must be granted as Application permissions. Delegated permissions cannot be used.

Permissions Explained


SharePoint REST API Permissions

Overview

Glean adheres to the principles of least privilege and only requests the minimum permission scopes required to read data from the Graph API. The following permission scopes are required when integrating with SharePoint:

PermissionUsage
Sites.FullControl.AllRetrieve site metadata, SharePoint user groups and role assignments, and contents within web components (including classic components).

Permissions Explained

There are four (4) SharePoint REST API endpoints that Glean leverages:

The minimum permission to be able to read data from these endpoints is Sites.FullControl.All. At no point is data ever written by Glean’s crawlers using the API.

Should Microsoft adjust the minimum API scope required to read from these endpoints, Glean will follow suit and adjust our guidance on requiring Sites.FullControl.All


Constraining Access & Scope

To constrain the scope of data that the SharePoint crawler has access to (and the actions it can subsequently perform via the Graph and SharePoint REST APIs), there are multiple methods of control that can be leveraged.

These can be divided into two categories:

  • Controls configured on the Glean side.
  • Controls configured on the Microsoft side.

Crawling Restrictions

This control is configured on the Glean side.

For SharePoint and OneDrive, Glean can constrain the SharePoint crawler in your deployment to only target specific Sites or User drives as defined by you. Conversely, specific Sites or User drives can also be excluded from crawling.

This involves providing Glean with:

  • The SharePoint Site URL(s) to explicitly include (or exclude) when crawling; and/or
  • The Azure AD/Entra ID Group ID containing the users to explicitly include (or exclude) when crawling; and/or
  • The usernames of the users to explicitly include (or exclude) when crawling.

For more information, refer to Supported Crawling Restrictions for SharePoint.

IP Restrictions (Conditional Access)

This control is configured on the Microsoft side.

This feature requires a license for Microsoft Entra Workload ID. Please contact your Microsoft representative for more information.

Using Entra ID Conditional Access, you can restrict usage of the App Registration (and associated secret) created for the Glean SharePoint crawler to only be permitted from the IP range of your Glean tenant.

This involves:

  • Adding the static IP address of NAT gateway used by the crawlers in your Glean tenant as a location within Entra ID.
  • Adding a new Conditional Access policy targeting specific source workload identities corresponding to the Service Principal(s) created for the Glean SharePoint crawler.
  • Excluding the location created in step 1 from the Conditional Access policy.
  • Setting the policy to block as an action.

For more information, refer to Permission & Security Controls for SharePoint - IP Restrictions.

Monitoring Audit Logs

This control is configured on the Microsoft side.

Microsoft Purview can be used to monitor the Glean SharePoint crawler and alert on any attempts to misuse granted permissions to write data back to SharePoint.

For more information, refer to Permission & Security Controls for SharePoint - Monitoring via Purview.

Sites.Selected

This control is configured on the Microsoft side.

Leveraging Sites.Selected comes with significant drawbacks that should be understood before being used.

The Sites.Selected permission can be used to provide the required Graph & SharePoint REST API permissions at a site-level instead of at a global-level. Use of Sites.Selected has significant trade-offs however, including:

  • Severely degrading the search result quality for SharePoint.
  • Limiting Glean’s ability to synchronize content updates (including permissions) to only once every 24 hours.
  • Requiring your company to manually authorize every site AND sub-site that you want to be available in Glean (Sites.Selected does not allow for automatic site and sub-site discovery)

Glean does not recommend this approach.

For more information, refer to: