The Glean platform architecture consists of three primary components that work together to provide secure and effective enterprise data access:

Query Path

Handles user search requests and authentication

Data Ingestion Path

Manages data collection from enterprise sources

Data Processing Pipeline

Processes and indexes collected data

Query Path

Web Application Overview

1

Initial Access

Users access Glean through the web application at https://app.glean.com, hosted within Glean’s central cloud infrastructure. The application serves static assets including images, CSS, and JavaScript.

2

Session Check

The web client checks for an existing session state in the user’s local storage. If none exists, authentication is required as anonymous searching is not supported.

3

Authentication Process

Users begin by entering their email address (e.g., user@company.com).

Tenant Resolution

Each customer tenant requires a list of company domain names for authentication. These domains are mapped to a tenant-specific Query Endpoint (QE) of the form <tenant_id>-be.glean.com.

The authentication process follows these steps:

1

Domain Lookup

When a user submits their email, the web app performs a domain lookup to determine the appropriate QE domain.

2

QE Assignment

The QE domain resolves to a static IP uniquely assigned to your company’s Glean tenant, whether deployed in Glean SaaS or your own cloud environment.

3

SSO Integration

Unauthenticated users are redirected to your configured SSO provider for authentication.

Authentication Flow Diagram

The following diagram illustrates the complete process from initial access to query execution:

Query Endpoint Communication

When users perform searches, requests are sent to:

https://<tenant_id>-be.glean.com/api/v1/search

API Documentation

Find detailed field descriptions in our Developer Documentation

Data Ingestion Flow

Glean’s data ingestion process is built around specialized connectors deployed within your tenant’s dedicated cloud project. These connectors serve multiple purposes:

Content Retrieval

Fetches content from connected enterprise sources

Activity Tracking

Monitors user interaction data

Permission Mapping

Maps and maintains access controls

Connection Methods

Data retrieval occurs via HTTPS, with two primary connection patterns depending on the data source location.

SaaS Applications

For services like Google Drive, connections occur over the public internet using HTTPS

On-Premises Systems

For internal systems like on-prem Jira, secure private connections are established via VPN or Shared VPC

Data Processing Pipelines

All data processing occurs within your tenant’s project using Google Dataflow pipelines. Your data never leaves your tenant’s environment.

The processing pipeline combines:

  • Content from connected sources
  • Permission mappings
  • User data
  • Activity metrics (creation, edits, views)

This combined data is then indexed to create a secure, searchable knowledge base within your tenant.