Setting up Glean to use GPT models on Azure OpenAI

This article provides instructions for configuring Glean to use GPT models on Azure OpenAI, allowing direct billing of LLM usage through your Azure account. This document applies to customers hosted on GCP or AWS who want to directly bill their LLM usage via Azure.

Do not use this document if you are leveraging the Glean Key option. For the Glean Key option, Glean manages the configuration and provisioning of LLM resources transparently.

Enable access to models

Fill out the Azure OpenAI Service form and request access to the following models:

Model name	How Glean uses the model
GPT-5.1 (Preferred)	Agentic Reasoning model used in fast and thinking modes in chat. This is the primary model for Glean chat.
GPT-5	Agentic Reasoning model used in fast and thinking modes in chat. This is the primary model for Glean chat.
GPT-4.1 (legacy) GPT-4o (gpt-4o-2024-05-13) (legacy)	Large model used for other, more complex tasks in Glean Assistant
GPT-4.1-mini (recommended) or GPT-4o-mini	Small model used for simpler tasks such as follow-up question generation

Request additional capacity from Azure

Please see Azure OpenAI Service quotas and limits for the default quotas and instructions for requesting additional quota.

Capacity Requirements for the latest assistant architecture on Agentic Engine 2 using GPT-5

Users	High capacity model		Low capacity model
Users	TPM	RPM	TPM	RPM
500	125000	10	5000	5
1000	250000	15	5000	5
2500	625000	35	10000	10
5000	1245000	65	15000	15
10000	2490000	130	30000	30

Select the model in Glean Workspace

Go to Admin Console > Platform > LLM.
Click Add LLM.
Select Azure OpenAI.
Select:
- GPT-5.1 (Preferred) or GPT-5 for the agentic engine model
- GPT-4.1 (recommended) or GPT-4o for the large model
- GPT-4.1-mini (recommended) or GPT-4o-mini for the small model
Click Validate to ensure Glean can use the model
Once validated, click Save.

Verify the model used by Glean Chat

Go to Glean Chat and select the public knowledge assistant.
Ask the question: “Who created you?”

You should get a response similar to: I was created by OpenAI.

FAQ

How do you ensure data security and handle potentially harmful content?

All data is encrypted in transit between your Glean instance and the Azure OpenAI service.Please review the Data, privacy, and security for Azure OpenAI Service guide. We have highlighted some relevant excerpts (as of June 4, 2024) below:Your prompts (inputs) and completions (outputs), your embeddings, and your training data:

are NOT available to other customers.
are NOT available to OpenAI.
are NOT used to improve OpenAI models.
are NOT used to improve any Microsoft or 3rd party products or services.

To reduce the risk of harmful use of the Azure OpenAI Service, the Azure OpenAI Service includes both content filtering and abuse monitoring features. To learn more about content filtering, see Azure OpenAI Service content filtering. To learn more about abuse monitoring, see abuse monitoring.Content filtering occurs synchronously as the service processes prompts to generate content as described above and here. No prompts or generated results are stored in the content classifier models, and prompts and results are not used to train, retrain, or improve the classifier models.Azure OpenAI abuse monitoring detects and mitigates instances of recurring content and/or behaviors that suggest use of the service in a manner that may violate the code of conduct or other applicable product terms. To detect and mitigate abuse, Azure OpenAI stores all prompts and generated content securely for up to thirty (30) days.The data store where prompts and completions are stored is logically separated by customer resource (each request includes the resource ID of the customer’s Azure OpenAI resource). A separate data store is located in each region in which the Azure OpenAI Service is available, and a customer’s prompts and generated content are stored in the Azure region where the customer’s Azure OpenAI service resource is deployed, within the Azure OpenAI service boundary. Human reviewers assessing potential abuse can access prompts and completions data only when that data has been flagged by the abuse monitoring system. The human reviewers are authorized Microsoft employees who access the data via point wise queries using request IDs, Secure Access Workstations (SAWs), and Just-In-Time (JIT) request approval granted by team managers. For Azure OpenAI Service deployed in the European Economic Area, the authorized Microsoft employees are located in the European Economic Area.We strongly recommend that you request an exemption from Azure abuse monitoring so that your prompts and generated content are not stored on Azure servers or subject to human review by Microsoft employees.(Azure abuse monitoring is disabled for all customers on the Glean key.)

How can we estimate LLM costs?

The number of tokens we use will vary depending on the type of request (e.g., summarizing a long document will use many tokens). For requests that are retrieving an answer from the Glean search engine, the current token usage is:

Large model: 19,000 input tokens + 450 output tokens
Small model: 5,300 input tokens + 150 output tokens

General

Identity

Search

Assistant

Actions

Embedded Integrations

Glean MCP Servers

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

Setting up Glean to use GPT models on Azure OpenAI

Enable access to models

Request additional capacity from Azure

Capacity Requirements for the latest assistant architecture on Agentic Engine 2 using GPT-5

Select the model in Glean Workspace

Verify the model used by Glean Chat

FAQ

Architecture Diagram

General

Identity

Search

Assistant

Actions

Embedded Integrations

Glean MCP Servers

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

​Enable access to models

​Request additional capacity from Azure

​Capacity Requirements for the latest assistant architecture on Agentic Engine 2 using GPT-5

​Select the model in Glean Workspace

​Verify the model used by Glean Chat

​FAQ

​Architecture Diagram

Enable access to models

Request additional capacity from Azure

Capacity Requirements for the latest assistant architecture on Agentic Engine 2 using GPT-5

Select the model in Glean Workspace

Verify the model used by Glean Chat

FAQ

Architecture Diagram