Permission & Security Controls for SharePoint
Available alternatives for when Sites.FullControl.All and/or Files.ReadWrite.All cannot be used in your company environment
Introduction
Glean’s integration with Microsoft SharePoint utilizes both the Microsoft Graph API and SharePoint REST API to index your content. As part of our security practices, Glean requests only the minimum permission scopes required by these APIs to perform necessary operations.
When working with Microsoft’s APIs, the following permission scopes are required for standard read functionality for certain API endpoints:
- Graph API - Sites.FullControl.All:
Required to read permission changes, which ensures Glean only shows content to users who have appropriate access in the source system - Graph API - Files.ReadWrite.All:
Required to rotate webhook secrets on expiry, enabling Glean to process content changes as they occur - SharePoint REST API - Sites.FullControl.All:
Required to read site content and fetch role assignments for accurate permission mapping
We recognize that these permission requirements may not align with your organization’s Security Guidelines or Standard Operating Procedures (SOPs).
For more information on why these permissions are necessary, please refer to SharePoint Permission Design & Management at Glean.
This document provides alternative approaches to enable Glean’s integration while addressing various security considerations. Each option includes an analysis of potential limitations and associated risks to help your team make an informed decision based on your specific requirements.
Supported Controls & Alternatives
Some security teams may not be comfortable providing Glean with either the Sites.FullControl.All
and/or Files.ReadWrite.All
Graph API permissions required to integrate with SharePoint.
To constrain the scope of data that the SharePoint crawler has access to and the actions it can perform via the Graph API and SharePoint REST API, there are multiple methods of control that can be leveraged.
Recommended controls:
- Crawling Restrictions
- IP Restrictions (Conditional Access)
- Extended monitoring & alerting via Microsoft Purview
Other supported controls (not recommended):
Sites.Selected
permission for Graph API accessSites.Selected
permission for SharePoint REST API access
Crawling Restrictions
Content restrictions can adversely affect the usefulness of Glean for your users as it means that the excluded content will not be searchable in Glean; despite remaining accessible at the source.
Glean has built-in DLP reporting that can assist security teams with identifying content that is too broadly accessible throughout the organization so that it can be remediated; negating the need to restrict content.
Glean supports a variety of configurations for the SharePoint connector that allow your organization to restrict the content that is crawled and brought into the Glean search index.
These restrictions include:
- Greenlist (allowlist) sites to be explicitly crawled
The SharePoint crawler will only fetch content from the site URLs set by your organization. - Redlist (blocklist) sites to be explicitly excluded from crawling
The SharePoint crawler will fetch content from all SharePoint sites EXCEPT the site URLs set by your organization. - User Group / Entra ID Group
SharePoint content will only be visible to a user in Glean if they are a member of a specified Entra ID group.
For more information on the restrictions supported, and how to configure them, please refer to:
IP Restrictions (Conditional Access)
This feature requires a license for Microsoft Entra Workload ID. Please contact your Microsoft representative for more information.
Overview
Using Entra ID Conditional Access, you can restrict usage of the App Registration (and associated secret) created for the Glean SharePoint crawler to only be permitted from the IP range of your Glean tenant.
This involves:
- Adding the static IP address of NAT gateway used by the crawlers in your Glean tenant as a location within Entra ID.
- Adding a new Conditional Access policy targeting specific source workload identities corresponding to the Service Principal(s) created for the Glean SharePoint crawler.
- Excluding the location created in step 1 from the Conditional Access policy.
- Setting the policy to block as an action.
Configuration
For detailed information on how to configure a Conditional Access policy involving a Workload Identity, please refer to the following Microsoft help article: Conditional Access for Workload Identities (learn.microsoft.com)
You will need the static IP address assigned to the crawler infrastructure in your Glean tenant to proceed. As this IP address is different for every Glean deployment, please contact Glean support to obtain this information.
Step 1. Access the Microsoft Entra admin center
Step 1. Access the Microsoft Entra admin center
- Sign in to the Microsoft Entra admin center with the Conditional Access Administrator role (or above).
- Navigate to the Conditional Access section: Protection > Conditional Access
Step 2. Create a new named location
Step 2. Create a new named location
- In the Conditional Access section, navigate to Named locations.
- Click the +IP ranges location at the top to add a new location.
- Provide a name for the location, for example:
Glean SharePoint Crawler
. - Click the + button to add a new IP range. Enter the static IP address(es) of your Glean tenant provided by Glean support.
- Click Save to create the location.
Step 3. Create a new Conditional Access policy
Step 3. Create a new Conditional Access policy
-
In the Conditional Access section, navigate to Policies.
-
Click the New policy button at the top.
-
Provide a name for the policy, for example:
IP Restrictions - Glean Crawler
. -
Under Assignments, select Users or workload identities.
- Under What does this policy apply to?, select Workload identities from the dropdown.
- Under Include, choose Select service principals, and select the names of all App Registrations created for the Glean SharePoint crawler.
If you do not see “workload identities” mentioned, it is likely that you do not have a license for Microsoft Entra Workload ID. Please contact your Microsoft representative for more information.
-
Under Target resources > Resources (formerly cloud apps) > Include, select All resources (formerly ‘All cloud apps’).
The policy applies only when a service principal requests a token.
-
Under Network (formerly Locations), set the Configure option to Yes. Select Include > ** Selected networks and locations**, then select the location created earlier.
-
Under Grant, select Block access as an action.
-
Under Enable policy, select either Report-only or On.
- Report-only will not block access, but will log all requests to the location in the Microsoft Entra audit logs. This is useful for checking to ensure the policy is working as expected before enforcing it.
- On will block access to the location immediately.
-
Click Create to create the policy.
Step 4. Test the policy
Step 4. Test the policy
The What If feature can be used to test the policy from different IP addresses to ensure it is working as expected.
- In the Conditional Access section, navigate to Policies.
- Click the What If button at the top.
- For Users or workload identities, select ONE of the service principals created for the Glean SharePoint crawler.
- For the IP address and Country fields, fill in a random IP address and country, e.g.
192.0.2.1
andUnited States
. Alternatively, enter the IP address of your Glean tenant and the country it is hosted in. - Click What If to test the policy.
- If an IP address other than the allowlisted Glean IP address is used, you will see the policy created above appear in the Evaluation result section with the Block access action.
- If the IP address of your Glean tenant is used, the policy is not matched (the action is permitted as it is coming from an allowed IP), and the Evaluation result section will be empty.
Monitoring via Microsoft Purview
Overview
Microsoft Purview can be used to monitor the Glean SharePoint crawler and alert on any attempts to misuse granted permissions to write data back to SharePoint.
Glean recommends configuring monitoring at three levels:
- Application-specific activity
- Endpoint-specific activity
- Write activities
Configuration
You will need the Application ID of each of the App Registrations created for the Glean SharePoint crawler in Entra ID to proceed.
Step 1. Access the Microsoft Purview Compliance Portal
Step 1. Access the Microsoft Purview Compliance Portal
- Sign in to the Microsoft Purview Compliance Portal.
- Navigate to Solutions > Audit > Search.
- If prompted, select Start recording user and admin activity. This is required for logs to captured.
Step 2. Review audit logs for the SharePoint crawler
Step 2. Review audit logs for the SharePoint crawler
Step 3. Monitor specific API-endpoints
Step 3. Monitor specific API-endpoints
-
Review the list of API-endpoints that the Glean SharePoint crawler is configured to use.
-
For each API-endpoint, cross reference the Purview audit logs for activities related to these endpoints.
- Verify that all activities are read-only operations.
- The Keyword search field can be used to search for specific endpoints.
Focus on any API-endpoints that are marked as requiring a
ReadWrite
orFullControl
permission. E.g.https://graph.microsoft.com/v1.0/subscriptions/<subscriptionsId>/reauthorize
A HTTP POST activity is not indicative of a write operation. Some API-endpoints expect a HTTP POST method to fetch data instead of a GET request. Additionally, a POST request is required for Glean to create and reauthorize webhook subscriptions.
Step 4. Configure alerts for Write activities
Step 4. Configure alerts for Write activities
- In the Purview portal, navigate to Solutions > Compliance alerts.
- Click the View alert policies in Defender button: This will open the Microsoft 365 Defender portal where you will be able to configure new alerts.
- Create a new alert policy for each high-risk write activity you wish to monitor.
Sites.Selected (Graph API)
Glean already requests the minimum permissions required to fetch and update SharePoint content and permissions.
Alternative permissions discussed here will begin to restrict Glean’s ability to function correctly, introducing other risks that will need to be considered. These will be highlighted so that you can make an informed decision.
Be advised that there is currently no risk-free solution to fetching SharePoint content with narrow scopes due to the way Microsoft has structured the minimum permissions required for certain API endpoints. We are confident this will be rectified in time, and encourage all customers to raise this with their Microsoft representative as an area of concern.
Overview
The Sites.Selected
permission replaces both the global Sites.FullControl.All
and Files.ReadWrite.All
permissions for the Graph API.
Sites.Selected still requires the FullControl
permission, but unlike before, this is granted at a site-level to each site and sub-site individually, rather than at a global level.
While Sites.Selected alleviates the need for global Sites.FullControl.All
and Files.ReadWrite.All
permissions, it comes with significant drawbacks as a result.
Sites.Selected
replaces the following permissions:
Sites.FullControl.All
Files.Read.All
Files.ReadWrite.All
The following permissions are still required alongside Sites.Selected
:
User.Read.All
GroupMember.Read.All
Reports.Read.All
Member.Read.Hidden
FullControl
(site level)
User.Read.All
and GroupMember.Read.All
are required for Glean to obtain and enforce document and site permissions. Member.Read.All
is also used to obtain and enforce document/site permissions for any group member that may have been marked as ‘hidden’ in Entra ID.
Reports.Read.All
is used to verify crawling state and progress, and to ensure that your search infrastructure is correctly scaled (SharePoint is one of the largest data sources typically connected to Glean).
Comparison Table
Sites.FullControl.All | Sites.Selected (with site-specific FullControl) | |
---|---|---|
Adheres to Microsoft Best Practices for Apps requiring access to permissions | Yes | Yes |
Supported by Glean | Yes | Yes |
Processing frequency for content updates | <2 hours (can vary) | 24 hours |
Processing frequency for permission-only updates | <2 hours (can vary) | 24 hours |
Availability of Signals & Metadata to enhance Search Rankings | Excellent | Poor |
Automatic detection of sites and sub-sites to scan | Yes | No |
Management Overhead | Low | Very High |
Permission Scope | Wide | Narrow |
Limitations
Using Sites.Selected
will heavily impact the end-user experience for both Search and Glean Assistant, and may have alternative impacts to your security posture.
Glean generally recommends against its use.
🚨 Changes to sites/documents will take up to 24 hours to be reflected in Glean.
Sites.Selected
prevents Glean from leveraging webhook subscriptions to be notified of changes within SharePoint when they occur.- Without webhook subscriptions, changes in SharePoint will only be reflected in Glean once a scheduled incremental crawl takes place every 24 hours. This includes changes to site and file permissions.
🚨 Search Ranking of SharePoint results will be poor compared to other data sources.
Sites.Selected
prevents Glean from accessing activity and insights data for SharePoint content. This data is used extensively for ranking results in search.- Without access to activity and insights data, you will notice a significant degradation in search quality for SharePoint results. SharePoint results will typically be ranked lower than results from other data sources where activity data is able to be accessed.
🚨 You must manually authorize every site AND sub-site that you want to be available in Glean.
Sites.Selected
prevents Glean from discovering sites and site content that need to be fetched.- Every site and sub-site must be explicitly added to the Sites.Selected permission set by your M365 administrator AND must be added to a list of sites to crawl in the Glean UI.
- Without the former step, Glean will not have the correct permissions to fetch content from a site.
- Without the latter step, Glean has no way of knowing what sites to crawl.
- This creates a high degree of friction and can hinder expansion of Glean within your organization.
Configuration
The setup process for Sites.Selected
differs from the standard setup steps for SharePoint.
Sites.Selected (SharePoint REST API)
Overview
The Sites.Selected
permission replaces the global Sites.FullControl.All
permission for the SharePoint REST API.
Sites.Selected
still requires the FullControl
permission, but unlike before, this is granted at a site-level to each site and sub-site individually, rather than at a global level.
Sites.Selected
replaces the following permissions:
Sites.FullControl.All
The following permissions are still required alongside Sites.Selected
:
FullControl
(site level)
FullControl
for the SharePoint REST API is mandatory at some level (either site or global) due to limitations in the SharePoint REST API: It is the minimum permission scope needed to read data from certain API endpoints that are critical for Glean functionality (permissions mapping).
If Read
permission is used instead of FullControl
, the API returns a HTTP 403 Unauthorized error; preventing any data from being fetched.
- → More information: About SharePoint Connector Permissions - SharePoint REST API
Comparison Table
Sites.FullControl.All | Sites.Selected (with site-specific FullControl) | |
---|---|---|
Adheres to Microsoft Best Practices for Apps requiring access to permissions | Yes | Yes |
Supported by Glean | Yes | Yes |
Automatic detection of sites and sub-sites to scan | Yes | No |
Management Overhead | Low | Very High |
Permission Scope | Wide | Narrow |
Limitations
🚨 You must manually authorize every site AND sub-site that you want to be available in Glean.
Sites.Selected
prevents Glean from discovering sites and site content that need to be fetched.- Every site and sub-site must be explicitly added to the
Sites.Selected
permission set by your M365 administrator AND must be added to a list of sites to crawl in the Glean UI.- Without the former step, Glean will not have the correct permissions to fetch content from a site.
- Without the latter step, Glean has no way of knowing what sites to crawl.
- This creates a high degree of friction and can hinder expansion of Glean within your organization.
Configuration
The setup process for Sites.Selected
differs from the standard setup steps for SharePoint.