Introduction

Glean’s integration with Microsoft SharePoint utilizes both the Microsoft Graph API and SharePoint REST API to index your content. As part of our security practices, Glean requests only the minimum permission scopes required by these APIs to perform necessary operations.

When working with Microsoft’s APIs, the following permission scopes are required for standard read functionality for certain API endpoints:

  • Graph API - Sites.FullControl.All:
    Required to read permission changes, which ensures Glean only shows content to users who have appropriate access in the source system
  • Graph API - Files.ReadWrite.All:
    Required to rotate webhook secrets on expiry, enabling Glean to process content changes as they occur
  • SharePoint REST API - Sites.FullControl.All:
    Required to read site content and fetch role assignments for accurate permission mapping

We recognize that these permission requirements may not align with your organization’s Security Guidelines or Standard Operating Procedures (SOPs).

For more information on why these permissions are necessary, please refer to SharePoint Permission Design & Management at Glean.

This document provides alternative approaches to enable Glean’s integration while addressing various security considerations. Each option includes an analysis of potential limitations and associated risks to help your team make an informed decision based on your specific requirements.


Supported Controls & Alternatives

Some security teams may not be comfortable providing Glean with either the Sites.FullControl.All and/or Files.ReadWrite.All Graph API permissions required to integrate with SharePoint.

To constrain the scope of data that the SharePoint crawler has access to and the actions it can perform via the Graph API and SharePoint REST API, there are multiple methods of control that can be leveraged.

Recommended controls:

  • Crawling Restrictions
  • IP Restrictions (Conditional Access)
  • Extended monitoring & alerting via Microsoft Purview

Other supported controls (not recommended):

  • Sites.Selected permission for Graph API access
  • Sites.Selected permission for SharePoint REST API access

Crawling Restrictions

Content restrictions can adversely affect the usefulness of Glean for your users as it means that the excluded content will not be searchable in Glean; despite remaining accessible at the source.

Glean has built-in DLP reporting that can assist security teams with identifying content that is too broadly accessible throughout the organization so that it can be remediated; negating the need to restrict content.

Glean supports a variety of configurations for the SharePoint connector that allow your organization to restrict the content that is crawled and brought into the Glean search index.

These restrictions include:

  • Greenlist (allowlist) sites to be explicitly crawled
    The SharePoint crawler will only fetch content from the site URLs set by your organization.
  • Redlist (blocklist) sites to be explicitly excluded from crawling
    The SharePoint crawler will fetch content from all SharePoint sites EXCEPT the site URLs set by your organization.
  • User Group / Entra ID Group
    SharePoint content will only be visible to a user in Glean if they are a member of a specified Entra ID group.

For more information on the restrictions supported, and how to configure them, please refer to:


IP Restrictions (Conditional Access)

This feature requires a license for Microsoft Entra Workload ID. Please contact your Microsoft representative for more information.

Overview

Using Entra ID Conditional Access, you can restrict usage of the App Registration (and associated secret) created for the Glean SharePoint crawler to only be permitted from the IP range of your Glean tenant.

This involves:

  • Adding the static IP address of NAT gateway used by the crawlers in your Glean tenant as a location within Entra ID.
  • Adding a new Conditional Access policy targeting specific source workload identities corresponding to the Service Principal(s) created for the Glean SharePoint crawler.
  • Excluding the location created in step 1 from the Conditional Access policy.
  • Setting the policy to block as an action.

Configuration

For detailed information on how to configure a Conditional Access policy involving a Workload Identity, please refer to the following Microsoft help article: Conditional Access for Workload Identities (learn.microsoft.com)

You will need the static IP address assigned to the crawler infrastructure in your Glean tenant to proceed. As this IP address is different for every Glean deployment, please contact Glean support to obtain this information.


Monitoring via Microsoft Purview

Overview

Microsoft Purview can be used to monitor the Glean SharePoint crawler and alert on any attempts to misuse granted permissions to write data back to SharePoint.

Glean recommends configuring monitoring at three levels:

  1. Application-specific activity
  2. Endpoint-specific activity
  3. Write activities

Configuration

You will need the Application ID of each of the App Registrations created for the Glean SharePoint crawler in Entra ID to proceed.


Sites.Selected (Graph API)

Glean already requests the minimum permissions required to fetch and update SharePoint content and permissions.

Alternative permissions discussed here will begin to restrict Glean’s ability to function correctly, introducing other risks that will need to be considered. These will be highlighted so that you can make an informed decision.

Be advised that there is currently no risk-free solution to fetching SharePoint content with narrow scopes due to the way Microsoft has structured the minimum permissions required for certain API endpoints. We are confident this will be rectified in time, and encourage all customers to raise this with their Microsoft representative as an area of concern.

Overview

The Sites.Selected permission replaces both the global Sites.FullControl.All and Files.ReadWrite.All permissions for the Graph API.

Sites.Selected still requires the FullControl permission, but unlike before, this is granted at a site-level to each site and sub-site individually, rather than at a global level.

While Sites.Selected alleviates the need for global Sites.FullControl.All and Files.ReadWrite.All permissions, it comes with significant drawbacks as a result.

Sites.Selected replaces the following permissions:

  • Sites.FullControl.All
  • Files.Read.All
  • Files.ReadWrite.All

The following permissions are still required alongside Sites.Selected:

  • User.Read.All
  • GroupMember.Read.All
  • Reports.Read.All
  • Member.Read.Hidden
  • FullControl (site level)

User.Read.All and GroupMember.Read.All are required for Glean to obtain and enforce document and site permissions. Member.Read.All is also used to obtain and enforce document/site permissions for any group member that may have been marked as ‘hidden’ in Entra ID.

Reports.Read.All is used to verify crawling state and progress, and to ensure that your search infrastructure is correctly scaled (SharePoint is one of the largest data sources typically connected to Glean).

Comparison Table

Sites.FullControl.AllSites.Selected
(with site-specific FullControl)
Adheres to Microsoft Best Practices for Apps requiring access to permissionsYesYes
Supported by GleanYesYes
Processing frequency for content updates<2 hours (can vary)24 hours
Processing frequency for permission-only updates<2 hours (can vary)24 hours
Availability of Signals & Metadata to enhance Search RankingsExcellentPoor
Automatic detection of sites and sub-sites to scanYesNo
Management OverheadLowVery High
Permission ScopeWideNarrow

Limitations

Using Sites.Selected will heavily impact the end-user experience for both Search and Glean Assistant, and may have alternative impacts to your security posture.

Glean generally recommends against its use.

🚨 Changes to sites/documents will take up to 24 hours to be reflected in Glean.

  • Sites.Selected prevents Glean from leveraging webhook subscriptions to be notified of changes within SharePoint when they occur.
  • Without webhook subscriptions, changes in SharePoint will only be reflected in Glean once a scheduled incremental crawl takes place every 24 hours. This includes changes to site and file permissions.

🚨 Search Ranking of SharePoint results will be poor compared to other data sources.

  • Sites.Selected prevents Glean from accessing activity and insights data for SharePoint content. This data is used extensively for ranking results in search.
  • Without access to activity and insights data, you will notice a significant degradation in search quality for SharePoint results. SharePoint results will typically be ranked lower than results from other data sources where activity data is able to be accessed.

🚨 You must manually authorize every site AND sub-site that you want to be available in Glean.

  • Sites.Selected prevents Glean from discovering sites and site content that need to be fetched.
  • Every site and sub-site must be explicitly added to the Sites.Selected permission set by your M365 administrator AND must be added to a list of sites to crawl in the Glean UI.
    • Without the former step, Glean will not have the correct permissions to fetch content from a site.
    • Without the latter step, Glean has no way of knowing what sites to crawl.
  • This creates a high degree of friction and can hinder expansion of Glean within your organization.

Configuration

The setup process for Sites.Selected differs from the standard setup steps for SharePoint.


Sites.Selected (SharePoint REST API)

Overview

The Sites.Selected permission replaces the global Sites.FullControl.All permission for the SharePoint REST API.

Sites.Selected still requires the FullControl permission, but unlike before, this is granted at a site-level to each site and sub-site individually, rather than at a global level.

Sites.Selected replaces the following permissions:

  • Sites.FullControl.All

The following permissions are still required alongside Sites.Selected:

  • FullControl (site level)

FullControl for the SharePoint REST API is mandatory at some level (either site or global) due to limitations in the SharePoint REST API: It is the minimum permission scope needed to read data from certain API endpoints that are critical for Glean functionality (permissions mapping).

If Read permission is used instead of FullControl, the API returns a HTTP 403 Unauthorized error; preventing any data from being fetched.

Comparison Table

Sites.FullControl.AllSites.Selected
(with site-specific FullControl)
Adheres to Microsoft Best Practices for Apps requiring access to permissionsYesYes
Supported by GleanYesYes
Automatic detection of sites and sub-sites to scanYesNo
Management OverheadLowVery High
Permission ScopeWideNarrow

Limitations

🚨 You must manually authorize every site AND sub-site that you want to be available in Glean.

  • Sites.Selected prevents Glean from discovering sites and site content that need to be fetched.
  • Every site and sub-site must be explicitly added to the Sites.Selected permission set by your M365 administrator AND must be added to a list of sites to crawl in the Glean UI.
    • Without the former step, Glean will not have the correct permissions to fetch content from a site.
    • Without the latter step, Glean has no way of knowing what sites to crawl.
  • This creates a high degree of friction and can hinder expansion of Glean within your organization.

Configuration

The setup process for Sites.Selected differs from the standard setup steps for SharePoint.