Setting up Glean to use Anthropic Claude models on Amazon Bedrock
This article provides instructions for configuring Glean to use Anthropic Claude models on Amazon Bedrock, allowing direct billing of LLM usage through your Amazon Bedrock account.
This document should not be used if you are utilizing the Glean Key option. With the Glean Key option, Glean transparently manages the configuration and provisioning of LLM resources.
Enable access to foundation models in Bedrock
-
Log into the AWS Console with a user account that has permissions to subscribe to Bedrock models.
-
Navigate to Amazon Bedrock → Model access.
-
Choose the same region as your Glean AWS instance (or the nearest supported one).
-
Request access to the following models:
Model name How Glean uses the model Claude Sonnet 4.5 (preferred model) claude-sonnet-4-5-20250929Agentic reasoning model used for assistant and autonomous agents. This is the primary model for Glean Chat. Claude 3.7 Sonnet Large model used for other, more complex tasks in Glean Assistant. Claude 3.5 Haiku Small model used for simpler tasks such as follow-up question generation.
If prompted for a use case for the models, you can state: "Generate answers to questions about internal company documents."
Ensure you have enough quota from Bedrock
For default quotas on these models for pay-as-you-go, please refer to the Amazon Bedrock quotas. If you need more quota, you must contact your AWS account manager, as Bedrock does not currently offer a self-service method for increasing quota.
Capacity requirements
On average, Glean Assistant consumes the following per query with Claude Sonnet 4.5:
- Full input: 64.4k tokens
- Cached input: 10.3k tokens
- Output: 1.2k tokens
This usage is equivalent to about $0.08 per query based on current Claude API pricing. You can estimate your weekly Glean Assistant LLM costs by multiplying your weekly query volume by this amount. Actual token usage can vary based on query complexity and document size.
To estimate throughput requirements in tokens per minute (TPM), you should identify your deployment's query per minute (QPM) rate at a desired percentile (e.g., p90) and multiply it by the average tokens per query. The table below shows example TPM conversions assuming a rate of 0.004 QPM per Daily Active User (DAU).
TPM per Glean DAU
| Users | TPM |
|---|---|
| 500 | 125,000 |
| 1000 | 245,000 |
| 2500 | 615,000 |
| 5000 | 1,225,000 |
| 10000 | 2,450,000 |
| 20000 | 4,895,000 |
It is highly recommended to use your deployment's actual QPM for estimating capacity, as QPM per DAU can vary significantly across customers.
Select the models in Glean
- Navigate to Admin Console → Platform → LLM.
- Click on Add LLM.
- Choose Bedrock.
- Select the models:
- Claude Sonnet 4.5 for the agentic reasoning model.
- Claude 3.7 Sonnet for the large model.
- Claude 3.5 Haiku for the small model.
- Click Validate to confirm that Glean can use the models.
- After validation, click Save.
- To use Claude Sonnet 4.5 with Glean Assistant, the agentic engine features must be enabled. Until then, the assistant will use the large and small models you have configured.
- Glean will automatically apply an IAM policy to grant its servers access to Bedrock, so no extra authentication is needed.
Verify model used in Glean
- Go to Glean Chat and choose the Public Knowledge Assistant.
- Ask the question: "Who created you?"
- You should receive a response like: "I was created by the artificial intelligence company Anthropic."
FAQ
Architecture diagram
The diagram below illustrates how a user query is processed. It flows through modules for tool selection, query planning, and execution, interacts with the Glean Index and Knowledge Graph, and uses Amazon Bedrock for Claude model inference to generate an answer.
