Permission & Security Controls for SharePoint
Available alternatives for when Sites.FullControl.All and/or Files.ReadWrite.All cannot be used in your company environment
Introduction
Glean’s integration with Microsoft SharePoint utilizes both the Microsoft Graph API and SharePoint REST API to index your content. As part of our security practices, Glean requests only the minimum permission scopes required by these APIs to perform necessary operations.
When working with Microsoft’s APIs, the following permission scopes are required for standard read functionality for certain API endpoints:
- Graph API - Sites.FullControl.All:
Required to read permission changes, which ensures Glean only shows content to users who have appropriate access in the source system - Graph API - Files.ReadWrite.All:
Required to rotate webhook secrets on expiry, enabling Glean to process content changes as they occur - SharePoint REST API - Sites.FullControl.All:
Required to read site content and fetch role assignments for accurate permission mapping
We recognize that these permission requirements may not align with your organization’s Security Guidelines or Standard Operating Procedures (SOPs).
For more information on why these permissions are necessary, please refer to SharePoint Permission Design & Management at Glean.
This document provides alternative approaches to enable Glean’s integration while addressing various security considerations. Each option includes an analysis of potential limitations and associated risks to help your team make an informed decision based on your specific requirements.
Supported Controls & Alternatives
Some security teams may not be comfortable providing Glean with either the Sites.FullControl.All
and/or Files.ReadWrite.All
Graph API permissions required to integrate with SharePoint.
To constrain the scope of data that the SharePoint crawler has access to and the actions it can perform via the Graph API and SharePoint REST API, there are multiple methods of control that can be leveraged.
Recommended controls:
- Crawling Restrictions
- IP Restrictions (Conditional Access)
- Extended monitoring & alerting via Microsoft Purview
Other supported controls (not recommended):
Sites.Selected
permission for Graph API accessSites.Selected
permission for SharePoint REST API access
Crawling Restrictions
Content restrictions can adversely affect the usefulness of Glean for your users as it means that the excluded content will not be searchable in Glean; despite remaining accessible at the source.
Glean has built-in DLP reporting that can assist security teams with identifying content that is too broadly accessible throughout the organization so that it can be remediated; negating the need to restrict content.
Glean supports a variety of configurations for the SharePoint connector that allow your organization to restrict the content that is crawled and brought into the Glean search index.
These restrictions include:
- Greenlist (allowlist) sites to be explicitly crawled
The SharePoint crawler will only fetch content from the site URLs set by your organization. - Redlist (blocklist) sites to be explicitly excluded from crawling
The SharePoint crawler will fetch content from all SharePoint sites EXCEPT the site URLs set by your organization. - User Group / Entra ID Group
SharePoint content will only be visible to a user in Glean if they are a member of a specified Entra ID group.
For more information on the restrictions supported, and how to configure them, please refer to:
IP Restrictions (Conditional Access)
This feature requires a license for Microsoft Entra Workload ID. Please contact your Microsoft representative for more information.
Overview
Using Entra ID Conditional Access, you can restrict usage of the App Registration (and associated secret) created for the Glean SharePoint crawler to only be permitted from the IP range of your Glean tenant.
This involves:
- Adding the static IP address of NAT gateway used by the crawlers in your Glean tenant as a location within Entra ID.
- Adding a new Conditional Access policy targeting specific source workload identities corresponding to the Service Principal(s) created for the Glean SharePoint crawler.
- Excluding the location created in step 1 from the Conditional Access policy.
- Setting the policy to block as an action.
Configuration
For detailed information on how to configure a Conditional Access policy involving a Workload Identity, please refer to the following Microsoft help article: Conditional Access for Workload Identities (learn.microsoft.com)
You will need the static IP address assigned to the crawler infrastructure in your Glean tenant to proceed. As this IP address is different for every Glean deployment, please contact Glean support to obtain this information.
Monitoring via Microsoft Purview
Overview
Microsoft Purview can be used to monitor the Glean SharePoint crawler and alert on any attempts to misuse granted permissions to write data back to SharePoint.
Glean recommends configuring monitoring at three levels:
- Application-specific activity
- Endpoint-specific activity
- Write activities
Configuration
You will need the Application ID of each of the App Registrations created for the Glean SharePoint crawler in Entra ID to proceed.
Sites.Selected (Graph API)
Glean already requests the minimum permissions required to fetch and update SharePoint content and permissions.
Alternative permissions discussed here will begin to restrict Glean’s ability to function correctly, introducing other risks that will need to be considered. These will be highlighted so that you can make an informed decision.
Be advised that there is currently no risk-free solution to fetching SharePoint content with narrow scopes due to the way Microsoft has structured the minimum permissions required for certain API endpoints. We are confident this will be rectified in time, and encourage all customers to raise this with their Microsoft representative as an area of concern.
Overview
The Sites.Selected
permission replaces both the global Sites.FullControl.All
and Files.ReadWrite.All
permissions for the Graph API.
Sites.Selected still requires the FullControl
permission, but unlike before, this is granted at a site-level to each site and sub-site individually, rather than at a global level.
While Sites.Selected alleviates the need for global Sites.FullControl.All
and Files.ReadWrite.All
permissions, it comes with significant drawbacks as a result.
Sites.Selected
replaces the following permissions:
Sites.FullControl.All
Files.Read.All
Files.ReadWrite.All
The following permissions are still required alongside Sites.Selected
:
User.Read.All
GroupMember.Read.All
Reports.Read.All
Member.Read.Hidden
FullControl
(site level)
User.Read.All
and GroupMember.Read.All
are required for Glean to obtain and enforce document and site permissions. Member.Read.All
is also used to obtain and enforce document/site permissions for any group member that may have been marked as ‘hidden’ in Entra ID.
Reports.Read.All
is used to verify crawling state and progress, and to ensure that your search infrastructure is correctly scaled (SharePoint is one of the largest data sources typically connected to Glean).
Comparison Table
Sites.FullControl.All | Sites.Selected (with site-specific FullControl) | |
---|---|---|
Adheres to Microsoft Best Practices for Apps requiring access to permissions | Yes | Yes |
Supported by Glean | Yes | Yes |
Processing frequency for content updates | <2 hours (can vary) | 24 hours |
Processing frequency for permission-only updates | <2 hours (can vary) | 24 hours |
Availability of Signals & Metadata to enhance Search Rankings | Excellent | Poor |
Automatic detection of sites and sub-sites to scan | Yes | No |
Management Overhead | Low | Very High |
Permission Scope | Wide | Narrow |
Limitations
Using Sites.Selected
will heavily impact the end-user experience for both Search and Glean Assistant, and may have alternative impacts to your security posture.
Glean generally recommends against its use.
🚨 Changes to sites/documents will take up to 24 hours to be reflected in Glean.
Sites.Selected
prevents Glean from leveraging webhook subscriptions to be notified of changes within SharePoint when they occur.- Without webhook subscriptions, changes in SharePoint will only be reflected in Glean once a scheduled incremental crawl takes place every 24 hours. This includes changes to site and file permissions.
🚨 Search Ranking of SharePoint results will be poor compared to other data sources.
Sites.Selected
prevents Glean from accessing activity and insights data for SharePoint content. This data is used extensively for ranking results in search.- Without access to activity and insights data, you will notice a significant degradation in search quality for SharePoint results. SharePoint results will typically be ranked lower than results from other data sources where activity data is able to be accessed.
🚨 You must manually authorize every site AND sub-site that you want to be available in Glean.
Sites.Selected
prevents Glean from discovering sites and site content that need to be fetched.- Every site and sub-site must be explicitly added to the Sites.Selected permission set by your M365 administrator AND must be added to a list of sites to crawl in the Glean UI.
- Without the former step, Glean will not have the correct permissions to fetch content from a site.
- Without the latter step, Glean has no way of knowing what sites to crawl.
- This creates a high degree of friction and can hinder expansion of Glean within your organization.
Configuration
The setup process for Sites.Selected
differs from the standard setup steps for SharePoint.
Sites.Selected (SharePoint REST API)
Overview
The Sites.Selected
permission replaces the global Sites.FullControl.All
permission for the SharePoint REST API.
Sites.Selected
still requires the FullControl
permission, but unlike before, this is granted at a site-level to each site and sub-site individually, rather than at a global level.
Sites.Selected
replaces the following permissions:
Sites.FullControl.All
The following permissions are still required alongside Sites.Selected
:
FullControl
(site level)
FullControl
for the SharePoint REST API is mandatory at some level (either site or global) due to limitations in the SharePoint REST API: It is the minimum permission scope needed to read data from certain API endpoints that are critical for Glean functionality (permissions mapping).
If Read
permission is used instead of FullControl
, the API returns a HTTP 403 Unauthorized error; preventing any data from being fetched.
- → More information: About SharePoint Connector Permissions - SharePoint REST API
Comparison Table
Sites.FullControl.All | Sites.Selected (with site-specific FullControl) | |
---|---|---|
Adheres to Microsoft Best Practices for Apps requiring access to permissions | Yes | Yes |
Supported by Glean | Yes | Yes |
Automatic detection of sites and sub-sites to scan | Yes | No |
Management Overhead | Low | Very High |
Permission Scope | Wide | Narrow |
Limitations
🚨 You must manually authorize every site AND sub-site that you want to be available in Glean.
Sites.Selected
prevents Glean from discovering sites and site content that need to be fetched.- Every site and sub-site must be explicitly added to the
Sites.Selected
permission set by your M365 administrator AND must be added to a list of sites to crawl in the Glean UI.- Without the former step, Glean will not have the correct permissions to fetch content from a site.
- Without the latter step, Glean has no way of knowing what sites to crawl.
- This creates a high degree of friction and can hinder expansion of Glean within your organization.
Configuration
The setup process for Sites.Selected
differs from the standard setup steps for SharePoint.