Skip to main content
The Glean Gmail connector integrates your organization’s email content into the unified search platform, supporting both Data crawling and indexing and Data fetching modes. You can scope the connector to index emails from specific users, groups, or domains while strictly respecting Gmail’s permission model.

Key capabilities and features

FeatureDetailsAccess and scope
Data crawling and indexing (Recommended)Gathers and organizes data for fast, high-quality search access across all your sources. Provides robust results that appear directly in the main search tab.Indexes up to 15,000 threads per user, with history limited to the last 12 months (configurable).
Data fetchingRetrieves live data directly from Gmail on a per-query basis. Best for quick access to smaller volumes of data and ensuring coverage of the most current emails.Results appear only when explicitly filtered (e.g., app:gmail).
Searchable metadataIndexes sender, recipients, subject, date, and thread-level metadata.Supports search operators like from, recipient, date, subject, and documentcategory:email.
Security controlsSupports both inclusion (greenlist) and exclusion (redlist) rules to control which mailboxes are indexed.Spam and trash folders are automatically excluded from indexing.

Limitations

  • Attachments: Attachments are not indexed; only the text content of email threads is crawled.
  • Indexing limits: Indexing is limited to 15,000 threads per user and a maximum of 12 months of history (configurable).
  • Label support: By default, custom labels and folders beyond the Inbox/Sent folders are not indexed.
  • Spam and trash folders: Spam and trash folders are not indexed.

Update frequency

  • Full content crawl: Monthly (every ~28–30 days). This ensures the indexed corpus is fully reconciled on a regular cadence.
  • Incremental content crawl: Continuous, activity-based updates via the Gmail History API (historyId). We poll for changes every few minutes and batch updates to refresh modified threads efficiently. A periodic incremental pass also runs as a safety net to discover any newly eligible threads and to cover missed events.

Indexing scope

  • Coverage: Inbox and sent folders are indexed.
  • Lookback window: Up to 12 months (configurable per deployment).
  • Thread limits: Up to 15,000 threads per user are indexed. To bound cost for very high senders, we cap Sent mailbox indexing at ~6,000 threads over a rolling 30-day window; once volume drops below the cap, older sent threads age in naturally.
  • Mailbox scope: Only the mailbox of allowed users is indexed (delegated mailboxes are not supported).
  • Deleted emails: If a user deletes an email, we will also delete the associated email thread in the Glean corpus.
  • Spam emails: Emails that have been filtered as spam will not be indexed, even if the user has previously interacted with it. If the user marks an email as spam after Glean has crawled it, we will delete the associated email after processing the next activity update or incremental crawl.

Prerequisites

Successful setup requires administrative access to both Glean and Google Workspace, along with an operational Google Drive connector.
RequirementDetails
Admin accessGlean Administrator and Google Workspace Administrator access is required.
Google Drive Connector (Not required if only Data_fetching is used)The Google Drive connector must be installed and operational to index Gmail content. For more information, see Google Drive authentication mechanisms.
Service accountThe service account must have Domain-wide Delegation enabled and provide a Directory admin email.
Required scopesThe service account must be granted the following OAuth scope: https://www.googleapis.com/auth/gmail.readonly (read-only access to Gmail).

Configuration and setup

Follow these steps to configure the Gmail connector, starting in the Google Admin Console.

Step 1: Add OAuth Scopes in Google Admin Console

  1. Go to the Domain-wide Delegation section in the Google Admin Console.
  2. Click Add new. If you have already connected Google Drive with the same Client ID, click Edit on the existing client instead.
  3. Paste the 21-digit Unique ID (Client ID) into the Client ID field.
  4. Copy and paste the following into the OAuth scopes (comma-delimited) field:
https://www.googleapis.com/auth/gmail.readonly
  1. Click Authorize.

Step 2: Configure and connect in Glean

  1. In the Glean Admin Console, navigate to Data sources and select the Gmail connector.
  2. Enter a Name and icon for your data source.
  3. Select data retrieval methods: Allow Glean to access data by checking both methods (recommended):
    • Data crawling and indexing: Gathers and organizes data for fast access.
    • Data fetching: Retrieves live data on a per-query basis.
  4. Associate Google Drive instance: Select the corresponding Google Drive instance to associate with this Gmail connection.
This option is visible if Data crawling and indexing method is selected.
  1. Save: Click Save to complete the connection.

(Optional) Manage data

These restrictions are optional and apply only to Data crawling and indexing. Data fetching will not be affected. If you have no specific rules, leave these sections blank.

Policy priority

If both inclusion and exclusion rules are applied to the same piece of content, the exclusion rule takes priority, and the content will not be indexed. Best practice: Use these rules minimally to preserve the enterprise search experience, as most end-users expect to find all content. Most customers do not apply any rules, or apply exclusion rules sparingly for sensitive data.

Inclusion rules (Greenlist)

Use these options to explicitly limit the content that Glean indexes.
ConfigurationDescriptionAction
Crawl only mailboxes of members of a Google GroupLimits the crawl to only mailboxes belonging to members of the specified Google Groups.Enter Google Group email addresses (comma-separated).
Greenlisted domains: Index email threads which include these domainsRestricts email indexing to only threads where the sender belongs to the specified domain(s).Enter domain names (e.g., company.com), separated by commas.

Exclusion rules (Redlist)

Use these options to explicitly remove sensitive or unnecessary content from the index.
ConfigurationDescriptionAction
Exclude members of a Google Group from crawlingPrevents mailboxes belonging to members of the specified Google Groups from being crawled.Enter Google Group email addresses (comma-separated).
Exclude threads sent to recipients in this Google Group from being crawledPrevents threads from being indexed if they were sent to recipients in the specified Google Groups.Enter Google Group email addresses (comma-separated).

Security and data access

  • Data integrity: The connector uses activity-based updates via the Gmail API to handle deletions and maintain a secure, current index. Emails marked as spam or deleted in Gmail are removed from the Glean index on the next update.
  • Security: Glean enforces all Gmail permissions at query time. Only authorized users see search results, and sensitive items (spam/trash) are excluded from the index.