Skip to main content

Adaptive Reasoning

We’re introducing a new default mode for Glean: Adaptive Reasoning. Adaptive mode automatically adjusts how much reasoning Glean applies to each question, so users get the right balance of speed and intelligence. We recommend adaptive mode for all users. You can toggle reasoning modes at anytime in Glean.

Adaptive Reasoning

Waldo, Glean's agentic search model

Adaptive reasoning is driven by Waldo, Glean's agentic search model, a retrieval-optimized model that runs automatically before the frontier model on eligible queries. Waldo quickly gathers the most relevant information from your organization, passes that evidence to the frontier model, and helps the final answer be faster, better grounded, and more efficient.

In practice, Glean finds the right context first, then answers with that context second.

How adaptive reasoning works

  1. A user asks a question in Glean.
  2. Waldo determines whether the question would benefit from a retrieval plan. If so, it issues targeted searches in parallel across your organization's content using a controlled set of tools.
  3. The pre-collected evidence is passed to the frontier model (GPT, Claude, or Gemini), which reasons over the initial searches, does its own reasoning, and produces a grounded, cited answer.

Model usage in adaptive and auto modes

When Waldo has sufficient information to answer the question, it plans retrieval, breaks down the question, and figures out the right search tools to call. Waldo never generates user-visible text. The frontier model always runs afterward and is responsible for the final response.

Glean's universal model key currently uses GPT-5.4, a premium model, in Auto mode, but we are not currently charging for its use. Under our FlexCredits pricing, FlexCredits are consumed when a premium model is automatically selected. We may begin charging for GPT-5.4 usage in Auto in the future, and if we do, we will notify you 14 days in advance. Because we are not currently tracking this usage for billing, you may see lower usage reflected in your billing dashboard today. If a user explicitly opts to use a premium model, self-selecting GPT-5.4, it will consume FlexCredits and show up on your billing dashboard.

Benefits

  • Faster answers: By front-loading retrieval, Glean spends less time searching and more time reasoning, delivering noticeably faster responses. We observed a 50% reduction in latency in our testing.
  • Lower LLM costs at scale: The agentic search model reduces frontier model token consumption by handling retrieval planning more efficiently.

Performance

Waldo delivers measurable latency improvements with no regression in answer quality:

MetricImprovement
P25 Time to First Token-51.0%
P50 Time to First Token-51.9%
P75 Time to First Token-45.8%
Answer quality and satisfactionNo change (neutral)

Who is affected by adaptive reasoning

Waldo runs automatically for organizations on the Glean Universal Model Key.

Organizations on AWS or Azure, or organizations using their own LLM keys, are not affected by this change. Customers from the EU are not affected by this change.

Configuration

No configuration is required from admins or end users. Waldo runs automatically on eligible queries behind the scenes.

If a query falls outside the model's scope, it immediately hands off to the frontier model with no degradation in quality.

Data flow

The following describes the query processing flow for Waldo:

  1. User query is sent to your Glean deployment.
  2. Your Glean deployment calls Waldo (hosted on Glean-managed infrastructure on Google Vertex AI), executes retrieval tool calls, and collects relevant context.
  3. Waldo's response goes back to your Glean deployment with the retrieved tool calls and context.
  4. Your Glean deployment calls the frontier model. The frontier model does the rest of the reasoning for the query using the pre-collected context and generates the final response.

Data flow diagram

Model information

Waldo is built on NVIDIA's Nemotron-3 Nano (30B-A3B) model and fine-tuned by Glean using reinforcement learning. It's hosted entirely on Glean-managed infrastructure on Vertex AI in the United States and isn't served through a third-party model provider endpoint.

Customer data isn't used to train this model. The model was trained on Glean's own internal dataset of enterprise information-seeking queries.

Security and data handling

PropertyDetail
No data persistenceQuery and response content is not logged in the Waldo serving path.
No model training on customer dataThe model was trained exclusively on Glean's internal data. Your organization's data is never used to train or fine-tune the model.
Stateless information flowsEach call to Waldo runs independently of any other calls, reducing complexity and increasing data isolation assurance.
Glean-hosted and controlledThe model runs on Glean's GCP infrastructure, not a third-party endpoint.
US-based infrastructureModel inference runs in Google Cloud (US region).
Permission enforcementThe model only retrieves content the requesting user is already authorized to access.
Admin controls unchangedAll existing Glean policies, including document restrictions, folder exclusions, and connector restrictions, continue to apply.

Updated terms

Glean's AI Terms Addendum has been updated to reflect that the Service supports Glean-hosted model deployments in addition to direct third-party LLM providers. You can review the updated terms at glean.com/legal.