About SharePoint Connector Permissions
This document details the permission scopes required to fetch content from your company’s SharePoint environment using the Microsoft Graph and SharePoint REST APIs
Overview
Glean leverages both the Microsoft Graph API and the SharePoint REST API to fetch content from your company’s SharePoint instance.
- → More information: SharePoint & OneDrive API Endpoints
Both the Microsoft Graph API and the SharePoint REST API have separate permission sets that need to be managed independently. For each API, this document will cover:
- Each permission that Glean requests, why it is required, and how it is used.
- Typical objections encountered when requesting the specified permission.
- Alternatives for permissions that are not permitted as part of your organization’s Standard Operating Procedure (SOP).
Graph API Permissions
Overview
Glean adheres to the principles of least privilege and only requests the minimum permission scopes required to read data from the Graph API. The following permission scopes are required when integrating with SharePoint:
Permission | Usage |
---|---|
User.Read.All | Used to list all the users within the directory. Used for mapping content permissions. |
GroupMember.Read.All | Used to expand the members of a group. Used for mapping content permissions. |
Member.Read.Hidden | Allows the app to read the memberships of any groups that are marked as “hidden” in Entra ID. Used for mapping content permissions. |
Sites.Read.All | Retrieve sites, site content, and site metadata in order to map and index content. This is covered by Sites.FullControl.All , so is not explicitly needed. |
Sites.FullControl.All | Retrieve details regarding permission update events (i.e. when site/content permissions change). |
Files.Read.All | Retrieve items, metadata, and associated content for an item so it can be indexed. It is also used to create webhook subscriptions so that Glean can be alerted to and process changes as they occur. This is covered by Files.ReadWrite.All , so is not explicitly needed. |
Files.ReadWrite.All | Webhooks expire periodically and require re-authorization. This permission is required by Microsoft to reauthorize a webhook subscription. |
Reports.Read.All | Used to report on crawler progress and to ensure your infrastructure is scaled correctly based on the volume of content expected. |
All permissions must be granted as Application permissions. Delegated permissions cannot be used.
- → More information: Application vs Delegated Permissions
Permissions Explained
SharePoint REST API Permissions
Overview
Glean adheres to the principles of least privilege and only requests the minimum permission scopes required to read data from the Graph API. The following permission scopes are required when integrating with SharePoint:
Permission | Usage |
---|---|
Sites.FullControl.All | Retrieve site metadata, SharePoint user groups and role assignments, and contents within web components (including classic components). |
Permissions Explained
There are four (4) SharePoint REST API endpoints that Glean leverages:
The minimum permission to be able to read data from these endpoints is Sites.FullControl.All
. At no point is data ever written by Glean’s crawlers using the API.
Should Microsoft adjust the minimum API scope required to read from these endpoints, Glean will follow suit and adjust our guidance on requiring Sites.FullControl.All
Constraining Access & Scope
To constrain the scope of data that the SharePoint crawler has access to (and the actions it can subsequently perform via the Graph and SharePoint REST APIs), there are multiple methods of control that can be leveraged.
These can be divided into two categories:
- Controls configured on the Glean side.
- Controls configured on the Microsoft side.
Crawling Restrictions
This control is configured on the Glean side.
For SharePoint and OneDrive, Glean can constrain the SharePoint crawler in your deployment to only target specific Sites or User drives as defined by you. Conversely, specific Sites or User drives can also be excluded from crawling.
This involves providing Glean with:
- The SharePoint Site URL(s) to explicitly include (or exclude) when crawling; and/or
- The Azure AD/Entra ID Group ID containing the users to explicitly include (or exclude) when crawling; and/or
- The usernames of the users to explicitly include (or exclude) when crawling.
For more information, refer to Supported Crawling Restrictions for SharePoint.
IP Restrictions (Conditional Access)
This control is configured on the Microsoft side.
This feature requires a license for Microsoft Entra Workload ID. Please contact your Microsoft representative for more information.
Using Entra ID Conditional Access, you can restrict usage of the App Registration (and associated secret) created for the Glean SharePoint crawler to only be permitted from the IP range of your Glean tenant.
This involves:
- Adding the static IP address of NAT gateway used by the crawlers in your Glean tenant as a location within Entra ID.
- Adding a new Conditional Access policy targeting specific source workload identities corresponding to the Service Principal(s) created for the Glean SharePoint crawler.
- Excluding the location created in step 1 from the Conditional Access policy.
- Setting the policy to block as an action.
For more information, refer to Permission & Security Controls for SharePoint - IP Restrictions.
Monitoring Audit Logs
This control is configured on the Microsoft side.
Microsoft Purview can be used to monitor the Glean SharePoint crawler and alert on any attempts to misuse granted permissions to write data back to SharePoint.
For more information, refer to Permission & Security Controls for SharePoint - Monitoring via Purview.
Sites.Selected
This control is configured on the Microsoft side.
Leveraging Sites.Selected
comes with significant drawbacks that should be understood before being used.
The Sites.Selected
permission can be used to provide the required Graph & SharePoint REST API permissions at a site-level instead of at a global-level. Use of Sites.Selected
has significant trade-offs however, including:
- Severely degrading the search result quality for SharePoint.
- Limiting Glean’s ability to synchronize content updates (including permissions) to only once every 24 hours.
- Requiring your company to manually authorize every site AND sub-site that you want to be available in Glean (
Sites.Selected
does not allow for automatic site and sub-site discovery)
Glean does not recommend this approach.
For more information, refer to: