Skip to main content
This article provides instructions for customers hosted on GCP or AWS to configure Glean to use GPT models directly through their own OpenAI account for billing and capacity management.
Do not use this document if you are leveraging the Glean Key option. For the Glean Key option, Glean manages the configuration and provisioning of LLM resources transparently.

Enable access to models

Request access to the following models from the OpenAI Library:
Model nameHow Glean uses the model
GPT-5Agentic Reasoning model used in Fast and Thinking Modes in Chat. This is the primary model for Glean Assistant.
GPT-4.1 (legacy) GPT-4o (gpt-4o-2024-05-13) (legacy)Large model used for other, more complex tasks in Glean Assistant
GPT-4.1-mini (recommended) or GPT-4o-miniSmall model used for simpler tasks such as follow-up question generation

Capacity for OpenAI Models

We highly recommend that you use priority processing from OpenAI. Priority processing will grant you faster, more consistent performance while getting the flexibility of a pay-as-you-go model. Here is the FAQ for priority processing. Please check the OpenAI rate and usage limits for your organization. This can be found under Settings -> organization -> limits. Please ensure that you have the minimum capacity listed below, based on the number of users in your organization. Read more about the OpenAI tiers here.

Capacity Requirements for the latest assistant architecture on Agentic Engine 2 using GPT-5

UsersHigh capacity modelLow capacity model
TPMRPMTPMRPM
5001250001050005
10002500001550005
2500625000351000010
50001245000651500015
1000024900001303000030

Select the model in Glean Workspace

  1. Go to Admin Console → Platform → LLM
  2. Click on Add LLM
  3. Select OpenAI
  4. Select:
    • GPT-5 for the agentic engine model
    • GPT-4.1 (recommended) or GPT-4o for the large model
    • GPT-4.1-mini (recommended) or GPT-4o-mini for the small model
  5. Click Validate to ensure Glean can leverage the model
  6. Once validated, click Save

Verify the model used by Glean Assistant

  1. Go to Glean Assistant and select the Public Knowledge Assistant.
  2. Ask the question: Who created you?
You should get a response similar to: I was created by OpenAI

FAQ

All data is encrypted in transit between your Glean instance and your OpenAI service. Please review the Data controls in the OpenAI Platform guide.You can choose to request Zero Data Protection and opt-out of modified abuse monitoring so that your prompts and generated content are not stored on OpenAI servers or subject to human review by OpenAI employees. Note that modified abuse monitoring is required for some OpenAI features, such as data analysis.
The number of tokens we use will vary depending on the type of request (e.g. summarizing a long document will use many tokens). For requests that are retrieving an answer from the Glean search engine, the current token usage is:
  • Large Model: 19,000 input tokens + 450 output tokens
  • Small Model: 5,300 input tokens + 150 output tokens

Architecture Diagram

A system architecture diagram illustrating a user query being processed through a series of steps within the Customer Glean Project VPC. The process begins with a user question, which is then processed by Tool Selection & Query Planning, Glean Planner, Glean Index & Knowledge Graph, Query Execution, Governance Engine & Doc Redlisting, Intelligent Data Selector, and Answer Generation, ultimately providing an answer to the user. The diagram also shows the interaction with OpenAI.