Supported Crawling Restrictions for SharePoint
Overview
- Greenlist restrictions permit crawling only for the specified content.
- Redlist restrictions prohibit crawling for the specified content.
| Restriction Type | Greenlist | Redlist | Details |
|---|---|---|---|
| Time-based Restrictions | ✅ | ❌ | Restrict crawling to include/exclude content created/modified/viewed after a certain date. |
| Identity-based Restrictions | ✅ | ❌ | Restrict crawling to include/exclude content created/modified/viewed by specific users or a specific group (plus public content). |
| Content-based Restrictions | ✅ | ❌ | Restrict crawling to include/exclude specific content, documents, messages, or objects. |
Supported Restrictions
| Restriction | Greenlist | Redlist | Details |
|---|---|---|---|
| Date | ✅ | ❌ | Restrict crawling to only content created/modified/viewed after a specific date. |
| Site | ✅ | ✅ | Restrict crawling to include/exclude specific SharePoint sites. |
Sites should be provided in URL format without a trailing forward slash. For example:
https://<domain>.sharepoint.com/sites/<siteName>
For Group restrictions when using Azure AD/Entra ID, the Object ID of the AD Group should be provided, NOT the Group name. For example:
7c77a355-c78c-6362-a195-d2428d285107
Limitations
Applying Restrictions
| Method | Supported | Details |
|---|---|---|
| Admin UI | ✅ | Restrictions can be applied in the Admin UI under the connector settings. |
| Glean Support | ✅ | Restrictions can be applied by Glean support on request. |
Not all restrictions can be applied in the Admin UI. Please contact Glean support to apply the restriction if it is missing from the UI.
Hide O365 content via Purview sensitivity labels
Glean can exclude sensitive Microsoft 365 content from search results and Glean Assistant by integrating with Microsoft Purview sensitivity labels. When this integration is enabled, items in SharePoint and OneDrive with selected labels are excluded from indexing. Because these items are not indexed, they do not appear in Glean results. For more information, see Hide O365 content via sensitivity labels.
Draft Content Exclusions
Glean does not index SharePoint content that is in draft mode. Content must be published in SharePoint to be crawled and indexed by Glean. Draft items will not appear in Glean search results or Glean Assistant.