About SharePoint Connector Permissions
This document details the permission scopes required to fetch content from your company’s SharePoint environment using the Microsoft Graph and SharePoint REST APIs
Overview
Glean leverages both the Microsoft Graph API and the SharePoint REST API to fetch content from your company’s SharePoint instance.
- → More information: SharePoint & OneDrive API Endpoints
Both the Microsoft Graph API and the SharePoint REST API have separate permission sets that need to be managed independently. For each API, this document will cover:
- Each permission that Glean requests, why it is required, and how it is used.
- Typical objections encountered when requesting the specified permission.
- Alternatives for permissions that are not permitted as part of your organization’s Standard Operating Procedure (SOP).
Graph API Permissions
Overview
Glean adheres to the principles of least privilege and only requests the minimum permission scopes required to read data from the Graph API. The following permission scopes are required when integrating with SharePoint:
Permission | Usage |
---|---|
User.Read.All | Used to list all the users within the directory. Used for mapping content permissions. |
GroupMember.Read.All | Used to expand the members of a group. Used for mapping content permissions. |
Member.Read.Hidden | Allows the app to read the memberships of any groups that are marked as “hidden” in Entra ID. Used for mapping content permissions. |
Sites.Read.All | Retrieve sites, site content, and site metadata in order to map and index content. This is covered by Sites.FullControl.All , so is not explicitly needed. |
Sites.FullControl.All | Retrieve details regarding permission update events (i.e. when site/content permissions change). |
Files.Read.All | Retrieve items, metadata, and associated content for an item so it can be indexed. It is also used to create webhook subscriptions so that Glean can be alerted to and process changes as they occur. This is covered by Files.ReadWrite.All , so is not explicitly needed. |
Files.ReadWrite.All | Webhooks expire periodically and require re-authorization. This permission is required by Microsoft to reauthorize a webhook subscription. |
Reports.Read.All | Used to report on crawler progress and to ensure your infrastructure is scaled correctly based on the volume of content expected. |
Deprecated Permissions
Deprecated Permissions
These permissions were used in previous versions of the Glean SharePoint/OneDrive connector and are no longer required. They can be safely removed provided the premissions above are present:
Group.Read.All
Directory.Read.All
All permissions must be granted as Application permissions. Delegated permissions cannot be used.
- → More information: Application vs Delegated Permissions
Permissions Explained
Sites.Read.All
Sites.Read.All
Sites.Read.All
Although Sites.FullControl.All
(discussed below) encompasses the permissions granted by Sites.Read.All
, we are outlining Sites.Read.All
here to explicitly define the minimum permissions required for specific connector operations.
Purpose
Sites.Read.All
allows Glean to crawl each of your SharePoint sites to provide search capability for them.
Specifically, this permission is used to fetch site collections, sub-sites, site lists, site columns (attributes), and site metadata. Site metadata is a dependency in order for files from associated document libraries to be crawled (see Files.Read.All
below).
Common Concerns
Concerns often arise regarding the potential for Glean to access sensitive information within SharePoint sites that are subject to stringent access controls.
It is crucial to understand that Glean respects the existing permissions and access controls of each piece of content it crawls. This means that search results will only display content to users who have the appropriate permissions to view it in SharePoint. Glean’s ability to map a piece of content to the users who are permitted to access it is tied to the User.Read.All
permission (detailed below).
The SharePoint sites that are crawled by Glean can be restricted by Site URL in the Glean UI.
For more information on setting these restrictions, refer to:
Sites.FullControl.All
Sites.FullControl.All
Sites.FullControl.All
Purpose
Sites.FullControl.All
is required for Glean to be able to process permission-only changes to documents as they occur. E.g. If access to a SharePoint document or site is changed, this needs to be reflected in Glean as quickly as possible.
Microsoft requires the elevated Sites.FullControl.All
permission in order for permission-only change events to be read by an integrating service (i.e. Glean).
Glean relies on these events to process changes in real time. Without this permission scope, permission-only changes are not returned by the Graph API, hence Glean will only be able to process them in non-real time (once every 24 hours during an incremental API crawl).
Common Concerns
Concerns arise regarding the potential for Glean to perform write actions (altering data) using the permission scope.
As per Microsoft’s documentation, Sites.FullControl.All
is required in order to process changes to permissions.
Glean’s crawlers have no capability to perform actions that would write/alter/modify data in a customer environment. They are only able to request and read data from a vendor’s API.
Any code changes to our crawlers require an extensive code review process with engineering management, alongside a security review. We also have comprehensive audit logging in place for all data access activities performed by our crawlers, and these logs are able to be exported to a customer SIEM on request.
Glean supports applying restrictions to the sites/data crawled within the configuration of the SharePoint connector.
Should Microsoft reduce the scope required to read permission change events, Glean will follow suit.
Users.Read.All
Users.Read.All
Users.Read.All
Purpose
The User.Read.All
permission is essential for Glean to accurately map and enforce the permissions of every site and content piece that is indexed.
This permission enables Glean to identify and respect the access controls set for your SharePoint content, ensuring that search results are appropriately restricted. Without it, Glean would be unable to apply any access controls, potentially exposing sensitive content.
Common Concerns
Concerns regarding this permission often revolve around the privacy and security of accessing user/employee information. There’s also a question of necessity, particularly if Glean as a tool is only accessible to a subset of users within the organization.
Glean uses this permission to verify the access permissions of content it crawls, obtaining a list of User and Group IDs with authorized access. It is crucial for Glean to recognize the user identities linked to these IDs in SharePoint/OneDrive, aligning them with user profiles in Glean. This ensures that search results are only shown to users with the right permissions. The [List users](https://learn.microsoft.com/en-us/graph/api/user-list?view=graph-rest-1.0\&tabs=http) endpoint of the Graph API, which requires User.Read.All
, facilitates this process.
Additionally, understanding user identities helps Glean enhance the metadata of indexed content, improving search result relevance. For example, displaying the document owner’s name alongside search results enriches the user’s search experience, even if the document owner doesn’t use Glean.
Files.Read.All
Files.Read.All
Files.Read.All
Although Files.ReadWrite.All
(discussed below) encompasses the permissions granted by Files.Read.All
, we are outlining Files.Read.All
here to explicitly define the minimum permissions required for specific connector operations.
Purpose
The Files.Read.All
permission is crucial for Glean to access and index files from SharePoint and OneDrive. It enables Glean to retrieve metadata, permissions, and content from user drives in OneDrive, and document libraries on SharePoint sites.
This permission is also used to create webhook subscriptions that play a pivotal role in enabling Glean to immediately reflect changes; such as the creation, modification, or deletion of documents and site content.
Common Concerns
Concerns about this permission often focus on the potential for Glean to access sensitive or restricted company documents.
It is important to understand that Glean respects the existing permissions and access controls for each document it indexes. This means that only users with the appropriate permissions in the source application will see the content in search results.
This careful mapping of permissions ensures that even the most sensitive documents are only visible to authorized employees. Glean’s ability to map a piece of content to the users who are permitted to access it is tied to the User.Read.All
permission (detailed above).
Additionally, Glean offers options to restrict crawling to specific SharePoint sites and user drives by specifying URLs or usernames associated with the drives. It is also possible to exclude specific individual content from search results.
For details on how to apply these restrictions, refer to:
Files.ReadWrite.All
Files.ReadWrite.All
Files.ReadWrite.All
Purpose
The Files.ReadWrite.All
permission is crucial for Glean to manage webhook subscriptions for SharePoint and OneDrive content updates. Webhooks play a pivotal role in enabling Glean to immediately reflect changes, such as the creation, modification, or deletion of documents and site content.
This real-time update capability is essential not only for keeping the search index current, but also for accurately maintaining the map of permission and access controls. For example, if a user’s access to specific content is revoked, Glean leverages webhooks to swiftly update its records, thereby preventing unauthorized access to sensitive data.
Webhook subscriptions frequently require re-authorization for security purposes. The minimum permission scope to be able to re-authorize an existing webhook is Files.ReadWrite.All
. Should Microsoft reduce the minimum permission scope required to perform this action, Glean will align with the change and remove the requirement for Files.ReadWrite.All
Common Concerns
Concerns about this permission often stem from its capability to write back to the Graph API, potentially altering data. As per Microsoft’s documentation, Files.ReadWrite.All
is the permission of least privilege to be able to reauthorize subscriptions to the driveItem
resource that is required by Glean.
Glean is committed to adopting less permissive options should they become available. To mitigate concerns, Glean advises implementing restrictions on the data it can access by specifying Site URLs or usernames in the Glean UI.
Monitoring audit logs for both Glean and the Microsoft Graph API, with configured alerts for unexpected API usage, is also recommended.
GroupMember.Read.All
GroupMember.Read.All
GroupMember.Read.All
Purpose
The GroupMember.Read.All
permission is essential for Glean to accurately map user access to content. It works in tandem with User.Read.All
to identify which users are allowed access to each piece of content Glean indexes.
Specifically, when permissions for a site or file are associated with a Group email/ID, Glean uses the List group members endpoint of the Graph API to determine the user IDs within that group. This endpoint necessitates the GroupMember.Read.All
permission, which is the least privileged permission required for this operation.
Common Concerns
Concerns about this permission often mirror those for User.Read.All
, focusing on its scope. Given that groups assigned to specific sites or files could span across the entire organization, it’s crucial for Glean to understand the membership of these groups comprehensively. This understanding allows Glean to accurately enforce permissions, ensuring that only authorized users can access specific content.
The necessity of mapping group memberships across all groups is fundamental to maintaining the integrity and security of data access within the organization.
Member.Read.Hidden
Member.Read.Hidden
Reports.Read.All
Reports.Read.All
Reports.Read.All
Purpose
The Reports.Read.All
permission enables Glean to access SharePoint & OneDrive usage data for sites, pages, users, and files within a specified time period. This is crucial for:
- Monitoring and reporting on the status of the OneDrive and SharePoint crawl (estimated time, time remaining, etc).
- Allowing the Glean infrastructure to appropriately and accurately scale to manage the vast content volume in SharePoint & OneDrive.
Common Concerns
Concerns about this permission often center on the potential access to sensitive activity and usage data via other accessible reports.
Glean strictly accesses data from four reporting endpoints:
These endpoints are essential for Glean’s functionality and require Reports.Read.All
, the least privileged permission necessary, for access.
SharePoint REST API Permissions
Overview
Glean adheres to the principles of least privilege and only requests the minimum permission scopes required to read data from the Graph API. The following permission scopes are required when integrating with SharePoint:
Permission | Usage |
---|---|
Sites.FullControl.All | Retrieve site metadata, SharePoint user groups and role assignments, and contents within web components (including classic components). |
Deprecated Permissions
Deprecated Permissions
These permissions were used in previous versions of the Glean SharePoint/OneDrive connector and are no longer required. They can be safely removed provided the premissions above are present:
Tenant.FullControl
+SiteCollections.FullControl
(XML method)
Permissions Explained
There are four (4) SharePoint REST API endpoints that Glean leverages:
The minimum permission to be able to read data from these endpoints is Sites.FullControl.All
. At no point is data ever written by Glean’s crawlers using the API.
Should Microsoft adjust the minimum API scope required to read from these endpoints, Glean will follow suit and adjust our guidance on requiring Sites.FullControl.All
Why is a broad permission required to read data?
Why is a broad permission required to read data?
If these API endpoints are queried with a permission scope other than Sites.FullControl.All
, e.g. Sites.Read.All
, the SharePoint REST API returns a HTTP 403 error.
- For more information, see this StackOverflow post.
Should Microsoft adjust the minimum API scope required to read from these endpoints, Glean will follow suit and adjust our guidance on requiring Sites.FullControl.All
.
Why does Glean need access to the SharePoint REST API in addition to the Graph API?
Why does Glean need access to the SharePoint REST API in addition to the Graph API?
Constraining Access & Scope
To constrain the scope of data that the SharePoint crawler has access to (and the actions it can subsequently perform via the Graph and SharePoint REST APIs), there are multiple methods of control that can be leveraged.
These can be divided into two categories:
- Controls configured on the Glean side.
- Controls configured on the Microsoft side.
Crawling Restrictions
This control is configured on the Glean side.
For SharePoint and OneDrive, Glean can constrain the SharePoint crawler in your deployment to only target specific Sites or User drives as defined by you. Conversely, specific Sites or User drives can also be excluded from crawling.
This involves providing Glean with:
- The SharePoint Site URL(s) to explicitly include (or exclude) when crawling; and/or
- The Azure AD/Entra ID Group ID containing the users to explicitly include (or exclude) when crawling; and/or
- The usernames of the users to explicitly include (or exclude) when crawling.
For more information, refer to Supported Crawling Restrictions for SharePoint.
IP Restrictions (Conditional Access)
This control is configured on the Microsoft side.
This feature requires a license for Microsoft Entra Workload ID. Please contact your Microsoft representative for more information.
Using Entra ID Conditional Access, you can restrict usage of the App Registration (and associated secret) created for the Glean SharePoint crawler to only be permitted from the IP range of your Glean tenant.
This involves:
- Adding the static IP address of NAT gateway used by the crawlers in your Glean tenant as a location within Entra ID.
- Adding a new Conditional Access policy targeting specific source workload identities corresponding to the Service Principal(s) created for the Glean SharePoint crawler.
- Excluding the location created in step 1 from the Conditional Access policy.
- Setting the policy to block as an action.
For more information, refer to Permission & Security Controls for SharePoint - IP Restrictions.
Monitoring Audit Logs
This control is configured on the Microsoft side.
Microsoft Purview can be used to monitor the Glean SharePoint crawler and alert on any attempts to misuse granted permissions to write data back to SharePoint.
For more information, refer to Permission & Security Controls for SharePoint - Monitoring via Purview.
Sites.Selected
This control is configured on the Microsoft side.
Leveraging Sites.Selected
comes with significant drawbacks that should be understood before being used.
The Sites.Selected
permission can be used to provide the required Graph & SharePoint REST API permissions at a site-level instead of at a global-level. Use of Sites.Selected
has significant trade-offs however, including:
- Severely degrading the search result quality for SharePoint.
- Limiting Glean’s ability to synchronize content updates (including permissions) to only once every 24 hours.
- Requiring your company to manually authorize every site AND sub-site that you want to be available in Glean (
Sites.Selected
does not allow for automatic site and sub-site discovery)
Glean does not recommend this approach.
For more information, refer to: