Introduction
Glean’s integration with Microsoft SharePoint utilizes both the Microsoft Graph API and SharePoint REST API to index your content. As part of our security practices, Glean requests only the minimum permission scopes required by these APIs to perform necessary operations. When working with Microsoft’s APIs, the following permission scopes are required for standard read functionality for certain API endpoints:- Graph API - Sites.FullControl.All:
Required to read permission changes, which ensures Glean only shows content to users who have appropriate access in the source system - Graph API - Files.ReadWrite.All:
Required to rotate webhook secrets on expiry, enabling Glean to process content changes as they occur - SharePoint REST API - Sites.FullControl.All:
Required to read site content and fetch role assignments for accurate permission mapping
Supported Controls & Alternatives
Some security teams may not be comfortable providing Glean with either theSites.FullControl.All and/or Files.ReadWrite.All Graph API permissions required to integrate with SharePoint.
To constrain the scope of data that the SharePoint crawler has access to and the actions it can perform via the Graph API and SharePoint REST API, there are multiple methods of control that can be leveraged.
Recommended controls:
- Crawling Restrictions
- IP Restrictions (Conditional Access)
- Extended monitoring & alerting via Microsoft Purview
Sites.Selectedpermission for Graph API accessSites.Selectedpermission for SharePoint REST API access
Crawling Restrictions
Content restrictions can adversely affect the usefulness of Glean for your users as it means that the excluded content will not be searchable in Glean; despite remaining accessible at the source.
- Greenlist (allowlist) sites to be explicitly crawled
The SharePoint crawler will only fetch content from the site URLs set by your organization. - Redlist (blocklist) sites to be explicitly excluded from crawling
The SharePoint crawler will fetch content from all SharePoint sites EXCEPT the site URLs set by your organization. - User Group / Entra ID Group
SharePoint content will only be visible to a user in Glean if they are a member of a specified Entra ID group.
IP Restrictions (Conditional Access)
This feature requires a license for Microsoft Entra Workload ID. Please contact your Microsoft representative for more information.
Overview
Using Entra ID Conditional Access, you can restrict usage of the App Registration (and associated secret) created for the Glean SharePoint crawler to only be permitted from the IP range of your Glean tenant. This involves:- Adding the static IP address of NAT gateway used by the crawlers in your Glean tenant as a location within Entra ID.
- Adding a new Conditional Access policy targeting specific source workload identities corresponding to the Service Principal(s) created for the Glean SharePoint crawler.
- Excluding the location created in step 1 from the Conditional Access policy.
- Setting the policy to block as an action.
Configuration
Step 1. Access the Microsoft Entra admin center
Step 1. Access the Microsoft Entra admin center
- Sign in to the Microsoft Entra admin center with the Conditional Access Administrator role (or above).
-
Navigate to the Conditional Access section: Protection > Conditional Access

Step 2. Create a new named location
Step 2. Create a new named location
- In the Conditional Access section, navigate to Named locations.
- Click the +IP ranges location at the top to add a new location.
-
Provide a name for the location, for example:
Glean SharePoint Crawler. - Click the + button to add a new IP range. Enter the static IP address(es) of your Glean tenant provided by Glean support.
-
Click Save to create the location.

Step 3. Create a new Conditional Access policy
Step 3. Create a new Conditional Access policy
- In the Conditional Access section, navigate to Policies.
- Click the New policy button at the top.
-
Provide a name for the policy, for example:
IP Restrictions - Glean Crawler. -
Under Assignments, select Users or workload identities.
- Under What does this policy apply to?, select Workload identities from the dropdown.
- Under Include, choose Select service principals, and select the names of all App Registrations created for the Glean SharePoint crawler.

-
Under Target resources > Resources (formerly cloud apps) > Include, select All resources (formerly ‘All cloud apps’).

-
Under Network (formerly Locations), set the Configure option to Yes. Select Include > ** Selected networks and locations**, then select the location created earlier.

-
Under Grant, select Block access as an action.

-
Under Enable policy, select either Report-only or On.
- Report-only will not block access, but will log all requests to the location in the Microsoft Entra audit logs. This is useful for checking to ensure the policy is working as expected before enforcing it.
- On will block access to the location immediately.

- Click Create to create the policy.
Step 4. Test the policy
Step 4. Test the policy
The What If feature can be used to test the policy from different IP addresses to ensure it is working as expected.
- In the Conditional Access section, navigate to Policies.
- Click the What If button at the top.
- For Users or workload identities, select ONE of the service principals created for the Glean SharePoint crawler.
- For the IP address and Country fields, fill in a random IP address and country, e.g.
192.0.2.1andUnited States. Alternatively, enter the IP address of your Glean tenant and the country it is hosted in. - Click What If to test the policy.
- If an IP address other than the allowlisted Glean IP address is used, you will see the policy created above appear in the Evaluation result section with the Block access action.
- If the IP address of your Glean tenant is used, the policy is not matched (the action is permitted as it is coming from an allowed IP), and the Evaluation result section will be empty.

Monitoring via Microsoft Purview
Overview
Microsoft Purview can be used to monitor the Glean SharePoint crawler and alert on any attempts to misuse granted permissions to write data back to SharePoint. Glean recommends configuring monitoring at three levels:- Application-specific activity
- Endpoint-specific activity
- Write activities
Configuration
Step 1. Access the Microsoft Purview Compliance Portal
Step 1. Access the Microsoft Purview Compliance Portal
- Sign in to the Microsoft Purview Compliance Portal.
- Navigate to Solutions > Audit > Search.
- If prompted, select Start recording user and admin activity. This is required for logs to captured.
Step 2. Review audit logs for the SharePoint crawler
Step 2. Review audit logs for the SharePoint crawler
Step 3. Monitor specific API-endpoints
Step 3. Monitor specific API-endpoints
- Review the list of API-endpoints that the Glean SharePoint crawler is configured to use.
-
For each API-endpoint, cross reference the Purview audit logs for activities related to these endpoints.
- Verify that all activities are read-only operations.
- The Keyword search field can be used to search for specific endpoints.
A HTTP POST activity is not indicative of a write operation. Some API-endpoints expect a HTTP POST method to fetch data instead of a GET request. Additionally, a POST request is required for Glean to create and reauthorize webhook subscriptions.
Step 4. Configure alerts for Write activities
Step 4. Configure alerts for Write activities
- In the Purview portal, navigate to Solutions > Compliance alerts.
- Click the View alert policies in Defender button: This will open the Microsoft 365 Defender portal where you will be able to configure new alerts.
- Create a new alert policy for each high-risk write activity you wish to monitor.
Sites.Selected (Graph API)
Overview
TheSites.Selected permission replaces both the global Sites.FullControl.All and Files.ReadWrite.All permissions for the Graph API.
Sites.Selected still requires the FullControl permission, but unlike before, this is granted at a site-level to each site and sub-site individually, rather than at a global level.
While Sites.Selected alleviates the need for global Sites.FullControl.All and Files.ReadWrite.All permissions, it comes with significant drawbacks as a result.
Sites.Selected replaces the following permissions:
Sites.FullControl.AllFiles.Read.AllFiles.ReadWrite.All
Sites.Selected:
User.Read.AllGroupMember.Read.AllReports.Read.AllMember.Read.HiddenFullControl(site level)
User.Read.All and GroupMember.Read.All are required for Glean to obtain and enforce document and site permissions. Member.Read.All is also used to obtain and enforce document/site permissions for any group member that may have been marked as ‘hidden’ in Entra ID.
Reports.Read.All is used to verify crawling state and progress, and to ensure that your search infrastructure is correctly scaled (SharePoint is one of the largest data sources typically connected to Glean).
Comparison Table
| Sites.FullControl.All | Sites.Selected (with site-specific FullControl) | |
|---|---|---|
| Adheres to Microsoft Best Practices for Apps requiring access to permissions | Yes | Yes |
| Supported by Glean | Yes | Yes |
| Processing frequency for content updates | <2 hours (can vary) | 24 hours |
| Processing frequency for permission-only updates | <2 hours (can vary) | 24 hours |
| Availability of Signals & Metadata to enhance Search Rankings | Excellent | Poor |
| Automatic detection of sites and sub-sites to scan | Yes | No |
| Management Overhead | Low | Very High |
| Permission Scope | Wide | Narrow |
Limitations
🚨 Changes to sites/documents will take up to 24 hours to be reflected in Glean.Sites.Selectedprevents Glean from leveraging webhook subscriptions to be notified of changes within SharePoint when they occur.- Without webhook subscriptions, changes in SharePoint will only be reflected in Glean once a scheduled incremental crawl takes place every 24 hours. This includes changes to site and file permissions.
Sites.Selectedprevents Glean from accessing activity and insights data for SharePoint content. This data is used extensively for ranking results in search.- Without access to activity and insights data, you will notice a significant degradation in search quality for SharePoint results. SharePoint results will typically be ranked lower than results from other data sources where activity data is able to be accessed.
Sites.Selectedprevents Glean from discovering sites and site content that need to be fetched.- Every site and sub-site must be explicitly added to the Sites.Selected permission set by your M365 administrator AND must be added to a list of sites to crawl in the Glean UI.
- Without the former step, Glean will not have the correct permissions to fetch content from a site.
- Without the latter step, Glean has no way of knowing what sites to crawl.
- This creates a high degree of friction and can hinder expansion of Glean within your organization.
Configuration
The setup process forSites.Selected differs from the standard setup steps for SharePoint.
Sites.Selected (SharePoint REST API)
Overview
TheSites.Selected permission replaces the global Sites.FullControl.All permission for the SharePoint REST API.
Sites.Selected still requires the FullControl permission, but unlike before, this is granted at a site-level to each site and sub-site individually, rather than at a global level.
Sites.Selected replaces the following permissions:
Sites.FullControl.All
Sites.Selected:
FullControl(site level)
FullControl for the SharePoint REST API is mandatory at some level (either site or global) due to limitations in the SharePoint REST API: It is the minimum permission scope needed to read data from certain API endpoints that are critical for Glean functionality (permissions mapping).
If Read permission is used instead of FullControl, the API returns a HTTP 403 Unauthorized error; preventing any data from being fetched.
- → More information: About SharePoint Connector Permissions - SharePoint REST API
Comparison Table
| Sites.FullControl.All | Sites.Selected (with site-specific FullControl) | |
|---|---|---|
| Adheres to Microsoft Best Practices for Apps requiring access to permissions | Yes | Yes |
| Supported by Glean | Yes | Yes |
| Automatic detection of sites and sub-sites to scan | Yes | No |
| Management Overhead | Low | Very High |
| Permission Scope | Wide | Narrow |
Limitations
🚨 You must manually authorize every site AND sub-site that you want to be available in Glean.Sites.Selectedprevents Glean from discovering sites and site content that need to be fetched.- Every site and sub-site must be explicitly added to the
Sites.Selectedpermission set by your M365 administrator AND must be added to a list of sites to crawl in the Glean UI.- Without the former step, Glean will not have the correct permissions to fetch content from a site.
- Without the latter step, Glean has no way of knowing what sites to crawl.
- This creates a high degree of friction and can hinder expansion of Glean within your organization.
Configuration
The setup process forSites.Selected differs from the standard setup steps for SharePoint.
