Skip to main content

Supported Crawling Restrictions for SharePoint

Overview

  • Greenlist restrictions permit crawling only for the specified content.
  • Redlist restrictions prohibit crawling for the specified content.
Restriction TypeGreenlistRedlistDetails
Time-based RestrictionsRestrict crawling to include/exclude content created/modified/viewed after a certain date.
Identity-based RestrictionsRestrict crawling to include/exclude content created/modified/viewed by specific users or a specific group (plus public content).
Content-based RestrictionsRestrict crawling to include/exclude specific content, documents, messages, or objects.

Supported Restrictions

RestrictionGreenlistRedlistDetails
DateRestrict crawling to only content created/modified/viewed after a specific date.
SiteRestrict crawling to include/exclude specific SharePoint sites.
note

Sites should be provided in URL format without a trailing forward slash. For example:

https://<domain>.sharepoint.com/sites/<siteName>

For Group restrictions when using Azure AD/Entra ID, the Object ID of the AD Group should be provided, NOT the Group name. For example:

7c77a355-c78c-6362-a195-d2428d285107

Limitations

Applying Restrictions

MethodSupportedDetails
Admin UIRestrictions can be applied in the Admin UI under the connector settings.
Glean SupportRestrictions can be applied by Glean support on request.
note

Not all restrictions can be applied in the Admin UI. Please contact Glean support to apply the restriction if it is missing from the UI.

Hide O365 content via Purview sensitivity labels

Glean can exclude sensitive Microsoft 365 content from search results and Glean Assistant by integrating with Microsoft Purview sensitivity labels. When this integration is enabled, items in SharePoint and OneDrive with selected labels are excluded from indexing. Because these items are not indexed, they do not appear in Glean results. For more information, see Hide O365 content via sensitivity labels.

Draft Content Exclusions

Glean does not index SharePoint content that is in draft mode. Content must be published in SharePoint to be crawled and indexed by Glean. Draft items will not appear in Glean search results or Glean Assistant.