Usage Methodology

Glean uses the Microsoft Graph API and the SharePoint REST API to crawl your SharePoint and OneDrive environments.

Glean uses the recommended best practices strategy provided by Microsoft to both crawl and record incremental changes for all documents.


Authentication Endpoints

EndpointPermissionsURL
Token request (Graph API)

Obtain and refresh an access token to interact with the Graph API using OAuth 2.0.
-https://login.microsoftonline.com/<tenant>/oauth2/v2.0/token
Token request (SharePoint REST API)

Obtain and refresh an access token to interact with the SharePoint REST API using OAuth 2.0.
-https://accounts.accesscontrol.windows.net/<tenant_id>/tokens/OAuth/2

Identity Endpoints

EndpointPermissionsURL
List users

List all the users within the tenant.
User.Read.Allhttps://graph.microsoft.com/v1.0/users
List groups

List all the groups within the tenant.
GroupMember.Read.All
Member.Read.Hidden
https://graph.microsoft.com/v1.0/groups
List group members

List all the members of a group.
GroupMember.Read.All
Member.Read.Hidden
https://graph.microsoft.com/v1.0/groups/<group_id>/members
Get profilePhoto

Get the profile photo of a user.
User.Read.Allhttps://graph.microsoft.com/v1.0/users/<user_id>/photo/$value
Get site groups

Get the default site groups and associated user memberships for a given site from the SharePoint REST API.
Sites.FullControl.Allhttps://<site_domain>.sharepoint.com/sites/<subsite_url>/_api/web/SiteGroups?$expand=Users

Content Endpoints

Sites

Sites include both the SharePoint site pages, and associated site metadata required for document library crawls.

EndpointPermissionsURL
List sites

List all site collections within the tenant.
Sites.Read.Allhttps://graph.microsoft.com/v1.0/sites/delta
List subsites

List all the subsites within a site or subsite.
Sites.Read.Allhttps://graph.microsoft.com/v1.0/sites/<id>/sites
List lists

List all the lists within the site.
Sites.Read.Allhttps://graph.microsoft.com/v1.0/sites/<site_id>/lists
List columns

List all columns within the site (attributes of site).
Sites.Read.Allhttps://graph.microsoft.com/v1.0/sites/<id>/sites/<id>/columns
List items delta

List all items from delta endpoint (metadata). Used heavily in conjunction with the `List sites` endpoint, as it only returns site collections from the main geolocation on its own.
Sites.FullControl.Allhttps://graph.microsoft.com/v1.0/sites/<id>/sites/ <id>/lists/ <id>/item /delta
Get site list items

Get the items within a list for a site using the SharePoint REST API. The SharePoint REST API is used as some content for classic sites is not available via Graph API.
Sites.FullControl.Allhttps://<site_domain>.sharepoint.com/sites/<subsite_url>/_api/web/lists('<list_id>')/item
Get site item permissions

Get the permissions for an item on the site using the SharePoint REST API. The SharePoint REST API is required for site pages / web components, as Graph API only exposes permissions for Document Library items.
Sites.FullControl.Allhttps://<site_domain>.sharepoint.com/sites/<subsite_url>/_api/web/lists('<list_id>')/items('<item_id>')/roleassignments
Get page content

Get the web parts on a particular page (e.g. blocks of content within text boxes, titles, etc.) using the SharePoint REST API.
Sites.FullControl.Allhttps://<site_domain>.sharepoint.com/sites/<subsite_url>/_api/web/GetFileById('<id>')/GetLimitedWebPartManager(scope=1)/ExportWebPart

Drives

Drives include both OneDrive for Business (user drives) and Document Libraries on SharePoint Sites.

EndpointPermissionsURL
List drives

List all the drives within a given site.
Files.Read.Allhttps://graph.microsoft.com/v1.0/sites/<site_id>/drives
Get driveItem

List all the items within a drive (change-based, as per Microsoft's scanning guidance)
Sites.FullControl.Allhttps://graph.microsoft.com/v1.0/drives/<drive_id>/root/delta
Get driveItem resource

Retrieve metadata for an item in a specified drive.
Files.Read.Allhttps://graph.microsoft.com/v1.0/drives/<drive_id>/items/<item_id>
Download file

Fetch the contents of an item to index its body.
Files.Read.Allhttps://graph.microsoft.com/v1.0/drives/<drive_id>/items/<item_id>/content
Get permissions

Get the permissions of a given item within a drive.
Files.Read.Allhttps://graph.microsoft.com/v1.0/drives/<drive_id>/items/<item_id>/permissions

Activity Endpoints

Activity data is critical to ensuring search results are ranked correctly, and for ensuring timely updates of content within Glean.

Insights

The insights endpoint is used to enhance search rankings.

EndpointPermissionsURL
List used

Lists recent activities performed by the user on specific items
Sites.Read.Allhttps://graph.microsoft.com/v1.0/users/<user_id>/insights/used

Reports

Glean uses the reports API endpoint to obtain site, page, user, and file usage information for SharePoint & OneDrive. This data is used to validate crawler progress, and to ensure your search index is scaled correctly in relation to the volume of data expected.

EndpointPermissionsURL
Get OneDrive Usage: File Count

Get the total number of files across all sites and how many have been created, modified, and shared within the time period.
Reports.Read.Allhttps://graph.microsoft.com/v1.0/reports/getOneDriveUsageFileCounts(period='{period_value}')
Get SharePoint Usage: Site Count

Get the total number of active sites within the time period.
Reports.Read.Allhttps://graph.microsoft.com/v1.0/reports/getSharePointSiteUsageSiteCounts(period='{period_value}')
Get SharePoint Usage: User Count

Get the total number of active SharePoint users within the time period.
Reports.Read.Allhttps://graph.microsoft.com/v1.0/reports/getSharePointActivityUserCounts(period='{period_value}')
Get SharePoint Usage: Pages

Get the number of pages viewed across all sites within the time period.
Reports.Read.Allhttps://graph.microsoft.com/v1.0/reports/getSharePointSiteUsagePages(period='{period_value}')

Webhooks

Webhooks allow Glean to sync changes to content in your environment as those changes occur; instead of waiting for the daily incremental crawl to complete. For example: If a document is deleted, or the access permissions on it change.

EndpointPermissionsURL
Create a webhook subscription

Glean subscribes to the `driveItem` resource which requires (as least privilege) the `Files.ReadWrite.All` permission to create the subscription.
Files.ReadWrite.Allhttps://webhook.azurewebsites.net/api/send/<client>
Reauthorize a webhook subscription

Reauthorize a subscription after timeout when a `reauthorizationRequired` challenge is received.
Files.ReadWrite.Allhttps://graph.microsoft.com/v1.0/subscriptions/<subscriptionsId>/reauthorize

Without webhooks, changes within SharePoint and OneDrive can take up to 24 hours to be processed (via incremental crawling), compared to within <2 hours with webhooks. This includes any changes to document permissions.