Skip to main content
This article provides instructions for configuring Glean to use Anthropic Claude models on Amazon Bedrock, allowing direct billing of LLM usage through your Amazon Bedrock account.
This document should not be used if you are utilizing the Glean Key option. With the Glean Key option, Glean transparently manages the configuration and provisioning of LLM resources.

Enable access to foundation models in Bedrock

  1. Log into the AWS Console with a user account that has permissions to subscribe to Bedrock models.
  2. Navigate to Amazon Bedrock → Model access.
  3. Choose the same region as your Glean AWS instance (or the nearest supported one).
  4. Request access to the following models:
    Model nameHow Glean uses the model
    Claude Sonnet 4.5 (preferred model)
    claude-sonnet-4-5-20250929
    Agentic reasoning model used for assistant and autonomous agents. This is the primary model for Glean Chat.
    Claude 3.7 SonnetLarge model used for other, more complex tasks in Glean Assistant.
    Claude 3.5 HaikuSmall model used for simpler tasks such as follow-up question generation.
If prompted for a use case for the models, you can state: “Generate answers to questions about internal company documents.”

Ensure you have enough quota from Bedrock

For default quotas on these models for pay-as-you-go, please refer to the Amazon Bedrock quotas. If you need more quota, you must contact your AWS account manager, as Bedrock does not currently offer a self-service method for increasing quota.

Capacity requirements

On average, Glean Assistant consumes the following per query with Claude Sonnet 4.5:
  • Full input: 64.4k tokens
  • Cached input: 10.3k tokens
  • Output: 1.2k tokens
This usage is equivalent to about $0.08 per query based on current Claude API pricing. You can estimate your weekly Glean Assistant LLM costs by multiplying your weekly query volume by this amount. Actual token usage can vary based on query complexity and document size. To estimate throughput requirements in tokens per minute (TPM), you should identify your deployment’s query per minute (QPM) rate at a desired percentile (e.g., p90) and multiply it by the average tokens per query. The table below shows example TPM conversions assuming a rate of 0.004 QPM per Daily Active User (DAU). TPM per Glean DAU
UsersTPM
500125,000
1000245,000
2500615,000
50001,225,000
100002,450,000
200004,895,000
It is highly recommended to use your deployment’s actual QPM for estimating capacity, as QPM per DAU can vary significantly across customers.

Select the models in Glean

  1. Navigate to Admin Console → Platform → LLM.
  2. Click on Add LLM.
  3. Choose Bedrock.
  4. Select the models:
    • Claude Sonnet 4.5 for the agentic reasoning model.
    • Claude 3.7 Sonnet for the large model.
    • Claude 3.5 Haiku for the small model.
  5. Click Validate to confirm that Glean can use the models.
  6. After validation, click Save.
  • To use Claude Sonnet 4.5 with Glean Assistant, the agentic engine features must be enabled. Until then, the assistant will use the large and small models you have configured.
  • Glean will automatically apply an IAM policy to grant its servers access to Bedrock, so no extra authentication is needed.

Verify model used in Glean

  1. Go to Glean Chat and choose the Public Knowledge Assistant.
  2. Ask the question: “Who created you?”
  3. You should receive a response like: “I was created by the artificial intelligence company Anthropic.”

FAQ

All data is encrypted in transit between your Glean instance and the Amazon Bedrock service, which operate in the same AWS region. Amazon Bedrock does not use customer prompts and completions to train AWS models or share them with third parties. Model providers do not have access to Amazon Bedrock logs or customer data.
Please refer to the Amazon Bedrock abuse detection guide.
Token usage varies by request type. For answers retrieved from the Glean search engine, the current token usage is:
  • Claude 3.5 Sonnet v2 or Claude 3.7 Sonnet: 19,000 input tokens + 450 output tokens.
  • Claude 3.5 Haiku: 5,300 input tokens + 150 output tokens.
If you opt for Provisioned Throughput, you will pay a fixed monthly cost instead of a per-query cost.

Architecture diagram

The diagram below illustrates how a user query is processed. It flows through modules for tool selection, query planning, and execution, interacts with the Glean Index and Knowledge Graph, and uses Amazon Bedrock for Claude model inference to generate an answer. A system architecture diagram illustrating a user query being processed through a tool selection and query planning module, Glean Planner, Glean Index & Knowledge Graph, query execution, governance engine, intelligent data selector, and answer generation module, utilizing Amazon Bedrock for Claude model inference.