Home
The Outlook connector enables your organization to index and search Microsoft Outlook email and metadata directly in Glean. It also optionally fetches live data at query time to ensure coverage of the freshest messages.
The connector leverages Microsoft Graph API and supports both native (crawled/indexed) and federated (on-demand fetch) modes. Indexing can be scoped to specific users or domains, and configuration occurs at the tenant level using Azure Active Directory groups.
Retrieval modes
Glean supports two retrieval paths, which can be used independently or in combination.
| Feature | Data crawling and indexing | Data fetching |
|---|---|---|
| Method | Builds a native indexed corpus in Glean. | Retrieves live data via Microsoft Graph API at query time. |
| Access scope | Up to 5,000 threads per user over the last 12 months. | Accesses emails outside the indexed window or limits. |
| Visibility | Results appear directly in the "All" tab and global search in Glean Search and Assistant. | Results typically surface when explicitly filtering for Outlook (for example, app:Outlook). |
| Auth type | Certificate-based only. | Certificate-based or Client ID + Secret. |
Supported features
- Supports Microsoft 365 / Exchange Online only. On‑premises Exchange / legacy Exchange are not supported.
- Default indexing scope is up to 12 months of history and up to 5,000 threads per user; older messages are aged out to respect limits.
- To maintain data privacy, the connector enforces a 1:1 relationship between a user and their primary mailbox; shared or delegated access patterns are excluded to prevent unauthorized data exposure.
- Glean does not use any write permissions; all Graph permissions are read‑only for mail and directory data.
- Junk and spam folders are not indexed. If an email is marked as junk after crawling, it is deleted on the next sync.
Supported objects
The Outlook Connector supports indexing the text content of Outlook emails and related metadata, providing fast, secure searching of your mailbox. Incremental syncs and deletion handling ensure the index remains up to date. When emails are deleted from the original inbox, they are deleted from the Glean index as well.
- Email threads: Indexes the Inbox, Sent folder, and other configured mail folders within the user's primary mailbox for users in the product access group. Outlook Online Archive mailboxes (separate archive mailboxes) are not indexed.
- Threaded view: Individual messages are logically grouped into threads per user.
- Metadata: Captures subject lines, participants (From, To, Cc, Bcc), sent/received timestamps, and conversation identifiers.
Supported API endpoints
The connector uses the Microsoft Graph API for mailbox and identity data:
- Mailbox and message data
Mail.Read.All– read access to messages across allowed mailboxes.Calendars.Read– read access to calendar events used by Outlook‑powered experiences.- Message/folder endpoints (examples):
GET /users/{id | userPrincipalName}/messagesGET /users/{id | userPrincipalName}/mailFolders/{id}/messages
- Delta queries on mail folders for incremental updates and deletion detection:
GET /users/{id | userPrincipalName}/mailFolders/{id}/messages/delta
- Identity and group data
User.Read.All– user identities for mapping mailboxes to Glean users.GroupMember.Read.All– membership for the product access group and other Azure AD groups.
Limitations and exclusions
- Content types: Glean crawls email text content only; attachments are not indexed.
- Folder scope: Junk and Spam folders are excluded.
- Archived emails: Outlook Online Archive mailboxes (also known as In-Place Archive or archive mailboxes) are not indexed by Glean. Emails that reside only in an Online Archive mailbox will not appear in Glean search results. This applies to both Data crawling and indexing and Data fetching modes. Note that the standard "Archive" folder within a user's primary mailbox is supported; only the separate Online Archive mailbox is excluded.
- Mailbox types: Shared mailboxes, delegated mailboxes, group conversations, and public folders are not supported.
- Identity model: Permissions are enforced at the mailbox owner level only; delegated or shared access patterns are not modeled.
Update and crawling strategy
The Outlook connector uses periodic full crawls plus frequent incremental updates based on Microsoft Graph delta queries.
Content crawling
- Initial full crawl
- After enabling Data crawling and indexing, Glean performs a full crawl of:
- Mailboxes for users in the configured product access group.
- Messages in Inbox, Sent, and selected folders.
- After enabling Data crawling and indexing, Glean performs a full crawl of:
- Incremental content crawl
- Uses delta queries on mail folders to:
- Discover new messages.
- Detect updates (flags, categories).
- Detect deletions or moves (for example, to Deleted Items or Junk).
- Runs on a frequent schedule (minutes), subject to Outlook mail API rate limits (~10,000 requests / 10 minutes per mailbox).
- Uses delta queries on mail folders to:
- Deletion and junk handling
- When messages are deleted or moved to junk/spam, the thread is deleted from Glean on the next incremental pass or activity update.
Data fetching behavior
- Fetch emails that are not yet indexed by the crawler.
- Access emails outside the indexing lookback window or beyond thread limits.
Admin control: Crawl schedules and polling intervals are centrally managed by Glean’s crawling system; there is no per‑connector crawl‑frequency setting in the Admin Console.
Authentication modes
Authentication requirements depend on the selected retrieval model.
- Data crawling and indexing: Requires certificate-based application authentication. The backend forces certificate auth for native setup; any client-secret configuration is ignored. This aligns with security best practices as Microsoft deprecates client-secret auth for M365 apps.
- Data fetching: Supports certificate-based auth or Client ID + client secret both.
Requirements
Technical requirements
To use the Outlook connector, you need:
- A Microsoft 365 tenant with Exchange Online mailboxes.
- A Glean tenant with Admin Console access.
- A registered Azure AD application with Microsoft Graph application permissions (Mail, Calendar, Directory).
Credential requirements
Depending on your chosen data retrieval mode, credentials differ slightly.
For Data crawling and indexing
You must configure:
- Azure AD Application (Client ID) – Application (client) ID.
- Directory/Tenant ID – Tenant identifier for your Microsoft 365 directory.
- X.509 certificate:
- Public certificate uploaded in Azure AD (App registration → Certificates & secrets → Certificates).
- Corresponding private key securely stored by Glean to obtain Graph tokens.
Optional scoping fields:
- productAccessGroupId – Azure AD group object ID. Glean indexes only the mailboxes of users in this group.
- Allowed domains – Comma‑separated domain list (for example,
company.com,subsidiary.com) to restrict which sender domains are indexed.
For Data fetching
You can choose either of these in the setup UI:
- Certificate‑based auth (same as native above), or
- Client ID + client secret, consisting of:
- Azure AD Application (client) ID
- Directory/Tenant ID
- Client secret value for the app registration.
Permission requirements
Your Azure AD app needs application permissions with admin consent:
Mail.Read: read mail across user mailboxes.Calendars.Read: read calendar events (for Outlook‑powered calendar experiences).User.Read.All: read directory users.GroupMember.Read.All: read group memberships, especially for the product access group.
These are granted by an Azure tenant admin in the Azure Portal.
Permissions & security
Permission propagation
- Each thread is associated with a single mailbox owner (user) from Azure AD.
- Indexed emails are visible only to that owner in Glean.
- Scope is further restricted by:
productAccessGroupIdmembership, and- Allowed sender domains (if configured).
Shared/delegated mailboxes, group conversations, and public folders are not modeled and not indexed.
Security & compliance
- Read‑only access: The Graph app uses only read scopes; Glean never writes or modifies Outlook data.
- Data residency: All indexed data is stored in your isolated Glean tenant (Glean‑hosted or customer‑hosted), encrypted in transit and at rest.
- Lifecycle:
- Removing a user from the product access group causes their indexed Outlook content to be deleted at the next sync.
- Deleted/junked emails are also removed on subsequent incremental updates.
Configuration and setup instructions
Connector setup occurs primarily in the Glean Admin Console and Azure Portal. Administrators will need to perform steps in both environments.
Select the setup guide below that matches your organization’s retrieval and authentication requirements.