SharePoint and OneDrive
Setup selected sites
Connect to SharePoint
These instructions leverage a limited Graph API permission scope via Sites.Selected, to explicitly grant access only to a particular Sharepoint site collection.
Required permissions for setup
- The user setting up this data source must be the Global Admin.
Register a new app
- Sign into the Azure portal. Select Azure Active Directory, then App registrations > New registration.
- On the Register an application page, register an app with the following: | Field | Value | |---|---| | Name | Glean | | Supported account types | Accounts in this organizational directory only (Single tenant) | | Redirect URI | (Leave this field blank) |
- Click Register.
Configure permissions
- On the left side navigation on the overview page, click on Manage > API Permissions.
- Click Add a permission and select Microsoft Graph. Choose Application permissions and add the following:
- User.Read.All
- GroupMember.Read.All
- Sites.Selected
- Reports.Read.All
- Members.Read.Hidden
- Click Add a permission and select Sharepoint. Choose Application permissions and add the following:
- Sites.Selected
Grant admin consent
- Ensure you are signed into Azure as a Global, Application or Cloud Application Administrator.
- Use the search box to navigate to Enterprise applications. Select the Glean app you just created from the list of applications.
- Click on Permissions under Security. Review the permissions shown, and then click Grant admin consent. By this step, you should have the following permissions provisioned on the API Permissions page
Fill out keys
- Scroll to the top of the left sidebar and click Overview.
- Copy the following content from the center Essentials panel and enter it in Glean:
- Application (client) ID
- Directory (tenant) ID
- Enter your Sharepoint domain in Glean. Your Sharepoint domain should end with “sharepoint.com”
- Glean recommends 5 additional applications with the same permission settings as the initial app created to maximize crawl speeds. Repeat the setup steps from “Register a new app” until this step, saving the client ID and client secret in the process. Paste the client ID and client secret into the Glean web app.
- Ensure you go through the next steps to set up Sharepoint REST API permissions, or clicking Save will not succeed.
Grant REST API permissions to individual apps
Please complete these steps for each application created. This will require using PnP PowerShell. If you have already installed PnP PowerShell, you can skip the installation step.
Install PnP PowerShell
- You will need to use Windows PowerShell 7 to use PnP PowerShell. PowerShell 7 is a side-by-side version with PowerShell 5.1 (reference). If you have not installed PowerShell 7, in your normal PowerShell, follow the installation instructions here.
- Run the following commands in PowerShell 7 to install PnP PowerShell.
Install-Module -Name PnP.PowerShell
Import-Module PnP.PowerShell
Provision REST API permissions
- Upload the same certificate (
certificate.crt
) that you generated previously to all applications. See the Upload Certificate to Azure step. Ensure that you have Sharepoint Powershell installed. If any of the following commands do not work, you may need to install the module first before running the commands again within Powershell. - To allow a connection in PowerShell with the individual application, navigate to Authentication and toggle Yes for the Allow public client flows section. This is temporary for step 3; you may toggle this back to No after provisioning the service principal permission. Failure to do this may result in the following error message: The request body must contain the following parameter: ‘client_assertion’ or ‘client_secret’ (documentation).
- Grant consent for PnP management in your Azure tenant for the specific site collection via site collection url:
You can choose either of the two options and to see which one works for you:
Connect-PnPOnline -Url $SITE_COLLECTION_URL -DeviceLogin -ClientId <clientId> -Tenant <tenantId>
Connect-PnPOnline -Url $SITE_COLLECTION_URL -Interactive -ClientId <clientId>
(See section Interactive Connection Troubleshoot if not working) - With the application client ID and site collection url, grant Full Control for the site collection:
Grant-PnpAzureADAppSitePermission -AppId $CLIENT_ID -Site $SITE_COLLECTION_URL -Permissions FullControl
- Now, click Save in Glean to save the uploaded files and secrets. If all permissions have been provisioned appropriately, Glean should show a green dialogue indicating changes saved.
Provide the list of all sites to be crawled
Glean cannot automatically determine the sites with Sites.Selected permissions applied ahead of time. This requires configuration via the Manage Data tab.
- Navigate to the Manage Data > Inclusion Rules tab. Provide the list of urls (can be just the subsites of the site collections with permissions) for the explicit sites to be crawled. If a site collection and all associated subsites should be crawled, provide all the urls explicitly in the greenlist.
Interactive Connection Troubleshoot
- On the azure portal, find the app you just created. In the menu, look for Manage and click on Authentication
- Under Platform configurations on the page, click on Add a platform
- In the panel that shows up on the right, click on Mobile and desktop applications
- Leave the three boxes shown in the panel on the right unchecked and in the Custom redirect URIs field, enter:
http://localhost
. Note that this should really be http and not https - Click on Configure at the bottom
- Retry the command
Connect-PnPOnline -Url $SITE_COLLECTION_URL -Interactive -ClientId <clientId>