This document details the permission scopes required to fetch content from your company’s SharePoint environment using the Microsoft Graph and SharePoint REST APIs
Permission | Usage |
---|---|
User.Read.All | Used to list all the users within the directory. Used for mapping content permissions. |
GroupMember.Read.All | Used to expand the members of a group. Used for mapping content permissions. |
Member.Read.Hidden | Allows the app to read the memberships of any groups that are marked as “hidden” in Entra ID. Used for mapping content permissions. |
Sites.Read.All | Retrieve sites, site content, and site metadata in order to map and index content. This is covered by Sites.FullControl.All , so is not explicitly needed. |
Sites.FullControl.All | Retrieve details regarding permission update events (i.e. when site/content permissions change). |
Files.Read.All | Retrieve items, metadata, and associated content for an item so it can be indexed. It is also used to create webhook subscriptions so that Glean can be alerted to and process changes as they occur. This is covered by Files.ReadWrite.All , so is not explicitly needed. |
Files.ReadWrite.All | Webhooks expire periodically and require re-authorization. This permission is required by Microsoft to reauthorize a webhook subscription. |
Reports.Read.All | Used to report on crawler progress and to ensure your infrastructure is scaled correctly based on the volume of content expected. |
Deprecated Permissions
Group.Read.All
Directory.Read.All
Sites.Read.All
Sites.FullControl.All
(discussed below) encompasses the permissions granted by Sites.Read.All
, we are outlining Sites.Read.All
here to explicitly define the minimum permissions required for specific connector operations.Sites.Read.All
allows Glean to crawl each of your SharePoint sites to provide search capability for them.Specifically, this permission is used to fetch site collections, sub-sites, site lists, site columns (attributes), and site metadata. Site metadata is a dependency in order for files from associated document libraries to be crawled (see Files.Read.All
below).Common ConcernsUser.Read.All
permission (detailed below).The SharePoint sites that are crawled by Glean can be restricted by Site URL in the Glean UI.For more information on setting these restrictions, refer to:Sites.FullControl.All
Sites.FullControl.All
is required for Glean to be able to process permission-only changes to documents as they occur. E.g. If access to a SharePoint document or site is changed, this needs to be reflected in Glean as quickly as possible.Microsoft requires the elevated Sites.FullControl.All
permission in order for permission-only change events to be read by an integrating service (i.e. Glean).Glean relies on these events to process changes in real time. Without this permission scope, permission-only changes are not returned by the Graph API, hence Glean will only be able to process them in non-real time (once every 24 hours during an incremental API crawl).Common ConcernsSites.FullControl.All
is required in order to process changes to permissions.Glean’s crawlers have no capability to perform actions that would write/alter/modify data in a customer environment. They are only able to request and read data from a vendor’s API.Any code changes to our crawlers require an extensive code review process with engineering management, alongside a security review. We also have comprehensive audit logging in place for all data access activities performed by our crawlers, and these logs are able to be exported to a customer SIEM on request.Glean supports applying restrictions to the sites/data crawled within the configuration of the SharePoint connector.Should Microsoft reduce the scope required to read permission change events, Glean will follow suit.Users.Read.All
User.Read.All
permission is essential for Glean to accurately map and enforce the permissions of every site and content piece that is indexed.This permission enables Glean to identify and respect the access controls set for your SharePoint content, ensuring that search results are appropriately restricted. Without it, Glean would be unable to apply any access controls, potentially exposing sensitive content.Common ConcernsUser.Read.All
, facilitates this process.Additionally, understanding user identities helps Glean enhance the metadata of indexed content, improving search result relevance. For example, displaying the document owner’s name alongside search results enriches the user’s search experience, even if the document owner doesn’t use Glean.Files.Read.All
Files.ReadWrite.All
(discussed below) encompasses the permissions granted by Files.Read.All
, we are outlining Files.Read.All
here to explicitly define the minimum permissions required for specific connector operations.Files.Read.All
permission is crucial for Glean to access and index files from SharePoint and OneDrive. It enables Glean to retrieve metadata, permissions, and content from user drives in OneDrive, and document libraries on SharePoint sites.This permission is also used to create webhook subscriptions that play a pivotal role in enabling Glean to immediately reflect changes; such as the creation, modification, or deletion of documents and site content.Common ConcernsUser.Read.All
permission (detailed above).Additionally, Glean offers options to restrict crawling to specific SharePoint sites and user drives by specifying URLs or usernames associated with the drives. It is also possible to exclude specific individual content from search results.For details on how to apply these restrictions, refer to:Files.ReadWrite.All
Files.ReadWrite.All
permission is crucial for Glean to manage webhook subscriptions for SharePoint and OneDrive content updates. Webhooks play a pivotal role in enabling Glean to immediately reflect changes, such as the creation, modification, or deletion of documents and site content.This real-time update capability is essential not only for keeping the search index current, but also for accurately maintaining the map of permission and access controls. For example, if a user’s access to specific content is revoked, Glean leverages webhooks to swiftly update its records, thereby preventing unauthorized access to sensitive data.Webhook subscriptions frequently require re-authorization for security purposes. The minimum permission scope to be able to re-authorize an existing webhook is Files.ReadWrite.All
. Should Microsoft reduce the minimum permission scope required to perform this action, Glean will align with the change and remove the requirement for Files.ReadWrite.All
Common ConcernsFiles.ReadWrite.All
is the permission of least privilege to be able to reauthorize subscriptions to the driveItem
resource that is required by Glean.Glean is committed to adopting less permissive options should they become available. To mitigate concerns, Glean advises implementing restrictions on the data it can access by specifying Site URLs or usernames in the Glean UI.Monitoring audit logs for both Glean and the Microsoft Graph API, with configured alerts for unexpected API usage, is also recommended.GroupMember.Read.All
GroupMember.Read.All
permission is essential for Glean to accurately map user access to content. It works in tandem with User.Read.All
to identify which users are allowed access to each piece of content Glean indexes.Specifically, when permissions for a site or file are associated with a Group email/ID, Glean uses the List group members endpoint of the Graph API to determine the user IDs within that group. This endpoint necessitates the GroupMember.Read.All
permission, which is the least privileged permission required for this operation.Common ConcernsUser.Read.All
, focusing on its scope. Given that groups assigned to specific sites or files could span across the entire organization, it’s crucial for Glean to understand the membership of these groups comprehensively. This understanding allows Glean to accurately enforce permissions, ensuring that only authorized users can access specific content.The necessity of mapping group memberships across all groups is fundamental to maintaining the integrity and security of data access within the organization.Member.Read.Hidden
Reports.Read.All
Reports.Read.All
permission enables Glean to access SharePoint & OneDrive usage data for sites, pages, users, and files within a specified time period. This is crucial for:Reports.Read.All
, the least privileged permission necessary, for access.Permission | Usage |
---|---|
Sites.FullControl.All | Retrieve site metadata, SharePoint user groups and role assignments, and contents within web components (including classic components). |
Deprecated Permissions
Tenant.FullControl
+ SiteCollections.FullControl
(XML method)
Sites.FullControl.All
. At no point is data ever written by Glean’s crawlers using the API.
Should Microsoft adjust the minimum API scope required to read from these endpoints, Glean will follow suit and adjust our guidance on requiring Sites.FullControl.All
Why is a broad permission required to read data?
Sites.FullControl.All
, e.g. Sites.Read.All
, the SharePoint REST API returns a HTTP 403 error.Sites.FullControl.All
.Why does Glean need access to the SharePoint REST API in addition to the Graph API?
Sites.Selected
comes with significant drawbacks that should be understood before being used.Sites.Selected
permission can be used to provide the required Graph & SharePoint REST API permissions at a site-level instead of at a global-level. Use of Sites.Selected
has significant trade-offs however, including:
Sites.Selected
does not allow for automatic site and sub-site discovery)