Skip to main content

Set up the indexing connector

This section covers the indexed connector's permissions, indexed content, and scope. To add real-time search as an additional layer once the connector is running, see Real-time access. For step-by-step configuration in the Admin console, see the SharePoint setup guide.

Before you start

Have these in place before you configure the SharePoint indexing connector, to avoid mid-setup delays:

  • Global (tenant) administrator: A tenant administrator with global admin privileges for both the Azure/Entra ID and SharePoint admin portals must grant admin consent for the API permissions and configure the SharePoint REST API. This can't be completed with a lower-privileged role.
  • Microsoft 365 parent connector already configured: SharePoint inherits its app registration, client ID, tenant ID, and certificate from the Microsoft 365 parent connector. Complete M365 setup first.
  • Permissions approved in advance: The scopes below must be granted as application permissions, on both the Microsoft Graph API and the SharePoint REST API, with admin consent. Share the permissions table with your CISO or security team before starting.

Required permissions

The SharePoint connector requires Sites.FullControl.All on both the Microsoft Graph API and the SharePoint REST API to index all content and synchronize permission changes. All permissions are granted as application permissions on the shared Microsoft 365 app registration. Delegated permissions are not supported for indexing.

For your CISO or IT security team
  • Glean is read-only for content. Glean never writes, modifies, or deletes anything in SharePoint.
  • Sites.FullControl.All technically grants write access, but Glean never uses it. Microsoft does not offer a granular read-only scope for these SharePoint endpoints, so FullControl is the only tier that can read site, page, and list permissions. The scope includes write capabilities, but Glean's access is strictly read-only, as stated above.
  • Files.ReadWrite.All is used only for change notifications. It is required solely to create and reauthorize webhook subscriptions, not to change files.
  • Every result is permission-scoped. Users only ever see SharePoint content they are already allowed to open.

Permissions are grouped below by API surface. Each scope lists why Glean needs it and what breaks without it, so you can justify every grant in your IT request.

ScopeWhy Glean needs itWhat breaks without it
Sites.FullControl.AllConsumes webhooks and scans permission hierarchies using the /delta endpoint to pick up permission changes in near real-time. Enables activity signals used for search ranking.Permissions update every 24 hours instead of near real-time, and activity signals for AI ranking are lost.
Files.ReadWrite.AllRequired by Microsoft to create and manage webhook subscriptions over drives. Used solely to subscribe to change events; no files are written.Can't subscribe to change notifications. Content freshness falls back to scheduled crawls, causing stale results.
User.Read.AllEnumerates tenant users to align SharePoint identities with Glean profiles and enforce document-level permissions.Permission mapping fails. Users may see content they shouldn't, or be blocked from content they should access.
GroupMember.Read.AllReads Microsoft 365 group membership to resolve group-based permissions.Group-based permissions aren't resolved. Content shared with groups may be over- or under-shared in search.
Member.Read.HiddenReads hidden-visibility group membership, used when confidential teams have shared content.Content shared with hidden groups may be over-shared in search results.
Reports.Read.AllReads usage reports to validate crawl coverage and auto-scale Glean infrastructure for your corpus size.Infrastructure scaling relies on manual estimates, which may cause slower crawls for large tenants.

Build your request

As you select the capabilities you need, Glean builds the exact permission set to request. Copy it straight into your IT ticket or email, so every scope is requested in one pass.

Tell us what you need, and we’ll build the request.

Indexing SharePoint sites and files is always included. Select any extras below and the permission set updates instantly. Then copy it to hand to your IT or security team, so every scope is requested in one pass.

Permissions to request from IT
Microsoft Graph API Application
  • Sites.FullControl.AllRead site, page, and list permissions; consume webhooks for near real-time permission updates.
  • Files.ReadWrite.AllCreate and reauthorize webhook subscriptions over drives. No files are written.
  • User.Read.AllMap SharePoint identities to Glean profiles for document-level permissions.
  • GroupMember.Read.AllResolve group-based permissions from Microsoft 365 group membership.
  • Member.Read.HiddenResolve permissions for content shared with hidden-visibility groups.
  • Reports.Read.AllRead usage reports to validate crawl coverage and auto-scale infrastructure.
SharePoint REST API Application
  • Sites.FullControl.AllRead permissions for site pages and site lists (no granular read scope exists for these endpoints).

What gets indexed

Content typeNotes
PagesClassic site pages, web part pages, wiki pages, and modern page library items.
Document libraries and foldersSite drives containing documents, and the folders within SharePoint sites.
Native file types: Word, Excel, PowerPoint, PDF, textOffice formats, PDF, and text files.
Site listsBasic lists, calendar lists, discussion boards, custom lists in datasheet view, task lists, and issue tracking lists. Not all list types and list attachments are crawled.
OneNoteLimited support: notebooks indexed as folders; sections indexed as standalone content.
View pagesNot indexed. List or library view pages (for example, URLs ending in AllItems.aspx or DocSetHome.aspx) are UI containers, not content.

Glean also honors the SharePoint search-visibility setting on sites and lists, supports greenlist/redlist crawling restrictions, and excludes draft content and content with selected Purview sensitivity labels.

Known limitations

  • SharePoint view pages are not indexed. Glean doesn't index list or library view pages such as URLs ending in AllItems.aspx or DocSetHome.aspx. These are UI containers, not content items.
  • Draft content is excluded. Glean excludes pages and list items in a draft state: when the latest version is a minor (draft) version, the item is checked out, or it's pending approval. See Exclude draft content.
  • Sites.Selected causes 24-hour permission staleness. Permission changes take up to 24 hours to propagate, and activity signals for ranking are unavailable.

Setup steps

1. Set up the Microsoft 365 parent connector

Registers the shared Azure app and configures certificate authentication. Required before any child connectors. See the Microsoft 365 setup guide.

2. Grant the required API permissions

Add the application permissions on both the Microsoft Graph API and the SharePoint REST API, then grant admin consent. Use the permissions table to brief your security team in advance.

3. Configure SharePoint in the Glean Admin console

Select SharePoint as a child connector under Microsoft 365. Credentials are inherited. Set the crawl scope and save. See the SharePoint setup guide.

4. Enable real-time access

Add real-time search as an additional layer once the base connector is running. See real-time access.

Verify the connection

After the initial crawl completes:

  • Check crawl status. In the Glean Admin console, confirm the SharePoint connector's crawl completed without errors.
  • Search with the app filter. Search in Glean using the SharePoint app filter and confirm pages, documents, and list content appear for a user who has access in SharePoint.
  • No results? See Troubleshooting.

Troubleshooting

  • Certificate expired or invalid. Errors such as "certificate is not authorized for this application" or "Invalid JWT token. The token is expired" mean the authentication certificate needs to be rotated. Generate a new certificate, upload it to every Azure app registration in use, update the Microsoft 365 parent connector in Glean, then re-save to validate. If you're moving off an older client-secret (ACS) setup, switch fully to certificate authentication.
  • Private key rejected on upload. If validation fails because the private key begins with BEGIN RSA PRIVATE KEY, it's in the wrong format. Convert it to unencrypted PKCS#8 with openssl pkcs8 -topk8 -inform PEM -outform PEM -in oldkey.key -out privatekey.key -nocrypt so it begins with BEGIN PRIVATE KEY, then re-upload the matching certificate and key. Use the same certificate on every Azure app registration.
  • A site or list returns nothing. Glean honors the SharePoint search-visibility setting and any greenlist/redlist restrictions; content excluded there isn't indexed. If you use Sites.Selected, newly added sites must be granted explicitly.
  • Permission changes lag. With Sites.FullControl.All, permission updates are near real-time; with Sites.Selected, they can take up to 24 hours.