Setting up Glean to use Anthropic Claude models on Google Vertex AI

This article provides instructions for configuring Glean to use Anthropic Claude models on Google Vertex AI, allowing direct billing of LLM usage through your Google Vertex AI account using the customer key option.

Do not use this document if you are leveraging the Glean Key option. For the Glean Key option, Glean manages the configuration and provisioning of LLM resources transparently.

Enable access to models in Vertex AI

Go to the Vertex AI Model Garden and make sure you have enabled access to the following foundation models from the GCP project that Glean is running in:

Model name	How Glean uses the model
Claude Sonnet 4.5 (Preferred model) `claude-sonnet-4-5-20250929`	Agentic reasoning model used for assistant and autonomous agents
Claude 3.7 Sonnet	Large model used for other, more complex tasks in Glean Assistant
Claude 3.5 Haiku	Small Model used for simpler tasks such as follow-up question generation

Request additional quota from Vertex AI

You will need to file a standard GCP quota request, which is expressed in Requests Per Minute (RPM) and Tokens Per Minute (TPM). Filter for base_model: on the model names in the table below and region: for the region that your GCP project is running in. Please note that the quota is not a guarantee of capacity, but is intended by Google to ensure fair use of the shared capacity, and your requests may not be served during peak periods. To obtain guaranteed capacity, please speak with your Google account team about purchasing Provisioned Throughput.

A screenshot of the Google Cloud console showing Vertex AI quota settings. It shows filters for service, base_model (anthropic-claude-sonnet-4-5), and region (us-east5). It details the quota for online prediction input tokens, output tokens, and requests per minute.

Capacity Requirements

On Claude 4.5 Sonnet, Glean Assistant consumes on average 64.4k full input, 10.3k cached input, and 1.2k output tokens per query equivalent to approximately

0.08 per query based on current [Claude API pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing#claude-models). These averages were derived from running a large representative sample of real customer queries through Claude 4.5 Sonnet. To estimate weekly Glean Assistant LLM costs, multiply your weekly query volume by

0.08 per query. Actual token usage will vary by customer depending on query complexity and document size. To estimate throughput requirements (TPM), identify your deployment’s query-per-minute (QPM) rate at the desired percentile (typically p90), then multiply by the average tokens per query. The table below illustrates example TPM conversions assuming 0.004 QPM per DAU, based on historical customer data.

Users	TPM
500	125,000
1000	245,000
2500	615,000
5000	1,225,000
10000	2,450,000
20000	4,895,000

Glean highly recommends estimating capacity using your deployment’s actual QPM to produce capacity estimates as QPM per DAU varies widely across customers.

Select the model in Glean Workspace

Go to Admin Console > Platform > LLM.
Click on Add LLM.
Select Vertex AI.
Select Claude 4.5 for the agentic model.
Click Validate to ensure Glean can leverage the model.
Once validated, click Save.

A screenshot of the Glean admin interface for selecting LLM models. Vertex AI is chosen as the hosting provider. Claude Sonnet 3.7 is selected for the large model, Claude Haiku 3.5 for the small model, and Claude Sonnet 4.5 for the agentic reasoning model.

In order to use Claude Sonnet 4.5 with Glean Assistant, agentic engine features should be turned on. See details here. Until these features are turned on, Glean Assistant will continue to use large and small models you previously configured. You do not need to change your large and small model at this time.
We will use Application Default Credentials to call the models, so no additional authentication is required.

FAQ

How do you ensure data security?

All data is encrypted in transit between your Glean instance and the Vertex AI service which runs in the same GCP region as your Glean instance.Please review the Vertex AI Generative AI and Data Governance guide. We have highlighted some relevant excerpts below (as of June 4, 2024):

Foundation Model Training: By default, Google Cloud doesn’t use Customer Data to train its Foundation Models. Customers can use Google Cloud’s Foundation Models knowing that their prompts, responses, and any Adapter Model training data aren’t used for the training of Foundation Models.
Prediction: Inputs and outputs processed by Foundation Models, Adapter Models, and Safety Classifiers during Prediction are Customer Data. Customer Data is never logged by Google, without explicit permission from the customer by opting in to allow it to cache inputs and outputs.

General

Identity

Search

Assistant

Actions

Embedded Integrations

Glean MCP Servers

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

Setting up Glean to use Anthropic Claude models on Google Vertex AI

Enable access to models in Vertex AI

Request additional quota from Vertex AI

Capacity Requirements

Select the model in Glean Workspace

FAQ

Architecture Diagram

General

Identity

Search

Assistant

Actions

Embedded Integrations

Glean MCP Servers

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

​Enable access to models in Vertex AI

​Request additional quota from Vertex AI

​Capacity Requirements

​Select the model in Glean Workspace

​FAQ

​Architecture Diagram

Enable access to models in Vertex AI

Request additional quota from Vertex AI

Capacity Requirements

Select the model in Glean Workspace

FAQ

Architecture Diagram