Enable access to models
Fill out the Azure OpenAI Service form and request access to the following models:| Model name | How Glean uses the model |
|---|---|
| GPT-5 | Agentic Reasoning model used in fast and thinking modes in chat. This is the primary model for Glean chat. |
| GPT-4.1 (legacy) GPT-4o (gpt-4o-2024-05-13) (legacy) | Large model used for other, more complex tasks in Glean Assistant |
| GPT-4.1-mini (recommended) or GPT-4o-mini | Small model used for simpler tasks such as follow-up question generation |
Request additional capacity from Azure
Please see Azure OpenAI Service quotas and limits for the default quotas and instructions for requesting additional quota.Capacity Requirements for the latest assistant architecture on Agentic Engine 2 using GPT-5
| Users | High capacity model | Low capacity model | ||
|---|---|---|---|---|
| TPM | RPM | TPM | RPM | |
| 500 | 125000 | 10 | 5000 | 5 |
| 1000 | 250000 | 15 | 5000 | 5 |
| 2500 | 625000 | 35 | 10000 | 10 |
| 5000 | 1245000 | 65 | 15000 | 15 |
| 10000 | 2490000 | 130 | 30000 | 30 |
Select the model in Glean Workspace
- Go to Admin Console > Platform > LLM.
- Click Add LLM.
- Select Azure OpenAI.
- Select:
- GPT-5 for the agentic engine model
- GPT-4.1 (recommended) or GPT-4o for the large model
- GPT-4.1-mini (recommended) or GPT-4o-mini for the small model
- Click Validate to ensure Glean can use the model
- Once validated, click Save.
Verify the model used by Glean Chat
- Go to Glean Chat and select the public knowledge assistant.
- Ask the question: “Who created you?”
FAQ
How do you ensure data security and handle potentially harmful content?
How do you ensure data security and handle potentially harmful content?
All data is encrypted in transit between your Glean instance and the Azure OpenAI service.Please review the Data, privacy, and security for Azure OpenAI Service guide. We have highlighted some relevant excerpts (as of June 4, 2024) below:Your prompts (inputs) and completions (outputs), your embeddings, and your training data:
- are NOT available to other customers.
- are NOT available to OpenAI.
- are NOT used to improve OpenAI models.
- are NOT used to improve any Microsoft or 3rd party products or services.
How can we estimate LLM costs?
How can we estimate LLM costs?
The number of tokens we use will vary depending on the type of request (e.g., summarizing a long document will use many tokens). For requests that are retrieving an answer from the Glean search engine, the current token usage is:
- Large model: 19,000 input tokens + 450 output tokens
- Small model: 5,300 input tokens + 150 output tokens