Skip to main content

Agentic Engine V2 Token Consumption

Upgrading to Agentic Engine V2 changes both the underlying model (from GPT-4.1 to GPT-5) and token consumption patterns. This document presents testing data that estimates the pricing implications of this upgrade.

Among our beta customers, we saw a reduction in token usage equivalent to a $0.05 reduction (-39%) in cost per query after upgrading to Agentic Engine V2. This was driven by reduced token costs for GPT-5 as well as the use of cached input tokens.

Methodology and Considerations

We estimate Agentic Engine V2's impact on token consumption by comparing the per query token usage for our beta customers before and after upgrading from Agentic Engine v1.

Comparisons on token impact are done at the median and average. When considering these estimates, note token consumption is dependent on the complexity of user queries as well as the company corpus of documents as well as the split between fast and thinking mode. In testing we saw usage of GleanChat heavily skewed toward Thinking Mode.

Assumptions:

  • Cost is determined strictly by token count.
  • Cached prompts are cheaper than full prompts.
  • Baseline estimates given by Agentic Engine v1 using GPT 4.1

Input and Cached Token Usage

The tables below show per query median and average input, cached input, and net input token usage on Agentic Engine V2's fast and thinking modes, compared to Agentic Engine v1. Net input token counts are calculated by subtracting $0.9*cached$ input tokens from the full input + cached tokens to reflect the effective cost basis, assuming a 90% discount.

MetricFastThinkingAgentic Engine v1
Median10.3k34.9k16.7k
Average18.9k70.7k58.6k

Output token use

The tables below show per query median and average output token usage on Agentic Engine V2's fast and thinking modes, compared to Agentic Engine v1.

MetricFastThinkingAgentic Engine v1
Median0.5k3.6k1.1k
Average0.7k2.9k0.9k

Per query cost implications

This table shows per query deltas between Agentic Engine V2's fast and thinking modes and Agentic Engine v1. To compute a one number query cost estimate between fast and thinking mode we take a weighted average, assuming 95% of query traffic go to thinking mode. On net we see a $0.05 per query cost reduction after upgrading to Agentic Engine V2, benefitting from cached input tokens and overall cheaper per token costs for GPT-5 v. GPT 4.1.

MetricInput token deltaOutput token deltaTotal cost delta
Median7.1k1.9k+$0.01
Average-13.1k+1.8k-$0.05