Overview
The Glean SharePoint connector enables secure and efficient data fetching from the Microsoft 365 (M365) SharePoint platform. SharePoint is configured as a child connector under the Microsoft 365 parent connector, which provides shared authentication and identity crawling.
- Set up Microsoft 365 first to configure shared authentication and identity crawling.
- Then configure SharePoint as a child connector. SharePoint inherits the parent credentials (client ID, tenant ID, and client secret or certificate) and only requires SharePoint-specific configuration.
- Glean reads all user access permissions and enforces them at query time, so users only see results they have access to.
- Quicklinks let users create Word, Excel, and PowerPoint documents directly.
Supported features
Glean supports the following features for the SharePoint connector. These features enhance how Glean crawls, indexes, and surfaces SharePoint content for your users.
- Webhooks — Syncs content and permission changes in near real-time instead of waiting for incremental crawls
- Data fetching — Supplements indexed results with live data fetched directly from SharePoint for search, Glean Assistant, and Glean Agents
- Quicklinks — Create Word, Excel, and PowerPoint documents directly from Glean
- Search visibility — Honors the SharePoint search visibility setting on sites and lists to exclude content from indexing
- Crawling restrictions — Greenlist or redlist by site
- Purview sensitivity labels — Excludes content with selected sensitivity labels from indexing
- Draft content exclusion — Excludes unpublished/draft SharePoint content from indexing
- Permission enforcement — Enforces SharePoint user permissions at query time
- Activity signals — Ingests item insights and activity data for search ranking
Supported objects
Glean captures the following content from SharePoint.
- Pages — Classic site pages, web part pages, wiki pages, and modern page library items
- Document libraries — Site drives containing various types of documents, and folders
- Native file types — Office formats (Word, Excel, PowerPoint), PDF, and text files
- Site lists — Basic lists, calendar lists, discussion boards, custom lists in datasheet view, tasks lists, and issue tracking lists
- OneNote — Notebooks indexed as folders, sections indexed as standalone content (limited support)
- Folders — Captured and indexed within SharePoint sites
Known limitations
- SharePoint view pages are not indexed. Glean does not index SharePoint list or library view pages, such as URLs ending in
AllItems.aspxorDocSetHome.aspx. These are UI containers, not content items. - Draft documents. When a SharePoint document's latest version is a minor (draft) version, Glean may not display it in search results to avoid exposing unpublished content. If the document has a previously published major version, it may still not appear while a draft exists.
API usage and permissions
Glean uses the standard Graph API v1.0 and SharePoint REST API to ingest data. The connector uses application permissions with admin-granted access.
Required permissions and access scope
The SharePoint connector requires the Sites.FullControl.All scope on both the Microsoft Graph API and SharePoint REST API to index all content and synchronize permission changes.
| API | Required scope | Reason |
|---|---|---|
| Microsoft Graph API | Sites.FullControl.All | Consumes webhooks and scans permission hierarchies using the /delta endpoint. |
| SharePoint REST API | Sites.FullControl.All | Retrieves permissions for site pages and site lists. |
Avoid using Sites.Selected or Sites.Read.All as substitutes for Sites.FullControl.All. These alternatives cause significant indexing latency, data freshness issues, and ranking drawbacks. If you cannot grant Sites.FullControl.All, contact Glean Support to discuss configuration options.
Prerequisites
- Set up the Microsoft 365 parent connector first. SharePoint is configured as a child connector that inherits the parent app credentials (client ID, tenant ID, and certificate).
- A tenant administrator with global admin privileges for both the Azure/Entra ID and SharePoint admin portals must set up dedicated service applications with the required privileges.
Sites.Selected
You can use Sites.Selected instead of Sites.FullControl.All, but this introduces significant trade-offs:
- You must explicitly set each individual site to crawl in the Glean admin UI.
- Glean updates content and content permissions only every 24 hours.
- Search and generative AI ranking degrades because
Sites.Selectedprevents the crawler from accessing activity and content metadata.
Files.ReadWrite.All
Glean subscribes to webhook events for all files in the tenant, allowing it to react to content and permission changes in near real-time.
The minimum permission for setting up and re-authenticating webhooks is Files.ReadWrite.All.
SharePoint REST API permissions
To read data from the SharePoint REST API and crawl site collections, site content, and content permissions, you must grant the FullControl permission. Microsoft does not provide granular controls or a dedicated read scope for these data endpoints, so FullControl is required.