Setting up Glean to use Anthropic Claude models on Amazon Bedrock

This article provides instructions for configuring Glean to use Anthropic Claude models on Amazon Bedrock, allowing direct billing of LLM usage through your Amazon Bedrock account.

This document should not be used if you are utilizing the Glean Key option. With the Glean Key option, Glean transparently manages the configuration and provisioning of LLM resources.

Enable access to foundation models in Bedrock

Log into the AWS Console with a user account that has permissions to subscribe to Bedrock models.
Navigate to Amazon Bedrock → Model access.
Choose the same region as your Glean AWS instance (or the nearest supported one).

Request access to the following models:

Model name	How Glean uses the model
Claude Sonnet 4.5 (preferred model) `claude-sonnet-4-5-20250929`	Agentic reasoning model used for assistant and autonomous agents. This is the primary model for Glean Chat.
Claude 3.7 Sonnet	Large model used for other, more complex tasks in Glean Assistant.
Claude 3.5 Haiku	Small model used for simpler tasks such as follow-up question generation.

If prompted for a use case for the models, you can state: “Generate answers to questions about internal company documents.”

Ensure you have enough quota from Bedrock

For default quotas on these models for pay-as-you-go, please refer to the Amazon Bedrock quotas. If you need more quota, you must contact your AWS account manager, as Bedrock does not currently offer a self-service method for increasing quota.

Capacity requirements

On average, Glean Assistant consumes the following per query with Claude Sonnet 4.5:

Full input: 64.4k tokens
Cached input: 10.3k tokens
Output: 1.2k tokens

This usage is equivalent to about $0.08 per query based on current Claude API pricing. You can estimate your weekly Glean Assistant LLM costs by multiplying your weekly query volume by this amount. Actual token usage can vary based on query complexity and document size. To estimate throughput requirements in tokens per minute (TPM), you should identify your deployment’s query per minute (QPM) rate at a desired percentile (e.g., p90) and multiply it by the average tokens per query. The table below shows example TPM conversions assuming a rate of 0.004 QPM per Daily Active User (DAU). TPM per Glean DAU

Users	TPM
500	125,000
1000	245,000
2500	615,000
5000	1,225,000
10000	2,450,000
20000	4,895,000

It is highly recommended to use your deployment’s actual QPM for estimating capacity, as QPM per DAU can vary significantly across customers.

Select the models in Glean

Navigate to Admin Console → Platform → LLM.
Click on Add LLM.
Choose Bedrock.
Select the models:
- Claude Sonnet 4.5 for the agentic reasoning model.
- Claude 3.7 Sonnet for the large model.
- Claude 3.5 Haiku for the small model.
Click Validate to confirm that Glean can use the models.
After validation, click Save.

To use Claude Sonnet 4.5 with Glean Assistant, the agentic engine features must be enabled. Until then, the assistant will use the large and small models you have configured.
Glean will automatically apply an IAM policy to grant its servers access to Bedrock, so no extra authentication is needed.

Verify model used in Glean

Go to Glean Chat and choose the Public Knowledge Assistant.
Ask the question: “Who created you?”
You should receive a response like: “I was created by the artificial intelligence company Anthropic.”

FAQ

How do you ensure data security?

All data is encrypted in transit between your Glean instance and the Amazon Bedrock service, which operate in the same AWS region. Amazon Bedrock does not use customer prompts and completions to train AWS models or share them with third parties. Model providers do not have access to Amazon Bedrock logs or customer data.

How do you handle potentially harmful content?

Please refer to the Amazon Bedrock abuse detection guide.

How can we estimate LLM costs?

Token usage varies by request type. For answers retrieved from the Glean search engine, the current token usage is:

Claude 3.5 Sonnet v2 or Claude 3.7 Sonnet: 19,000 input tokens + 450 output tokens.
Claude 3.5 Haiku: 5,300 input tokens + 150 output tokens.

If you opt for Provisioned Throughput, you will pay a fixed monthly cost instead of a per-query cost.

Architecture diagram

The diagram below illustrates how a user query is processed. It flows through modules for tool selection, query planning, and execution, interacts with the Glean Index and Knowledge Graph, and uses Amazon Bedrock for Claude model inference to generate an answer.

General

Identity

Search

Assistant

Actions

Embedded Integrations

Glean MCP Servers

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

Setting up Glean to use Anthropic Claude models on Amazon Bedrock

Enable access to foundation models in Bedrock

Ensure you have enough quota from Bedrock

Capacity requirements

Select the models in Glean

Verify model used in Glean

FAQ

Architecture diagram

General

Identity

Search

Assistant

Actions

Embedded Integrations

Glean MCP Servers

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

​Enable access to foundation models in Bedrock

​Ensure you have enough quota from Bedrock

​Capacity requirements

​Select the models in Glean

​Verify model used in Glean

​FAQ

​Architecture diagram

Enable access to foundation models in Bedrock

Ensure you have enough quota from Bedrock

Capacity requirements

Select the models in Glean

Verify model used in Glean

FAQ

Architecture diagram