LLM Insights

Admins can monitor Glean-attributed LLM usage and reliability for customer key (BYOK) deployments using the Insights → LLM dashboard. This dashboard displays token throughput, request rates, and error patterns to help you provision capacity and troubleshoot issues.

Use the LLM insights dashboard to provision and troubleshoot your LLM setup more confidently with Glean-specific traffic knowledge to avoid under or over-provisioning.

note

The LLM insights dashboard is available only for customer key (BYOK) deployments. If your deployment uses a Glean key, this dashboard is not available.

Data retention

The Insights dashboard guarantees data availability for the last 270 days. While historical data beyond this period may be available, Glean cannot guarantee its persistence. Future updates to the Insights dashboard may result in removing data from the dashboard that comes from before the 270-day retention period.

This retention policy applies to all metrics and data displayed in the Insights dashboard.

Filters and controls

The LLM dashboard provides the following filters and controls:

Time period filter to change the analysis window across all charts and metrics. You can select predefined ranges (Past week, Past month) or custom date ranges.
Model filter to focus on specific models or view the top 5 models by usage. You can select up to 5 models at once. Deselect one to choose another.

important

The dashboard updates once daily and reflects the data as of the previous day. Data might be delayed by up to 24 hours. Values reflect estimated Glean-attributed usage only.

Overview metrics

These cards provide a high-level snapshot of LLM usage and reliability in the selected time window.

note

The TPM and RPM values are estimates based on Glean's internal instrumentation. They should be used for trend analysis and capacity planning rather than precise billing calculations. For exact billing data, refer to your provider's invoice.

Peak TPM

Peak TPM (Tokens Per Minute) shows the highest token throughput observed in any single minute during the selected window. This metric helps you understand your maximum capacity requirements.

Use this to:

Identify peak load periods
Make sure your provider quota can handle traffic spikes
Plan capacity increases based on actual usage patterns

Avg TPM

Avg TPM (Average Tokens Per Minute) shows the average estimated Glean-attributed tokens per minute across the selected window.

Use this to:

Understand typical baseline token consumption
Compare against your provisioned capacity
Estimate ongoing costs based on average load

Peak RPM

Peak RPM (Requests Per Minute) shows the highest request rate observed in any single minute during the selected window.

Use this to:

Identify maximum request load
Make sure your provider quota can handle request spikes
Understand request patterns independent of token volume

429 rate

429 rate shows the percentage of requests that returned a 429 error (quota/capacity pressure) in the selected window.

A high 429 rate indicates:

Your provisioned TPM or RPM quota is being exceeded
Glean is experiencing throttling from your provider
Users may see degraded performance or errors

If your 429 rate is elevated, consider increasing your provider quota or reviewing which models are experiencing the most pressure.

5xx rate

5xx rate shows the percentage of requests that returned a 5xx error (provider-side error) in the selected window.

A high 5xx rate indicates:

Provider instability or outages
Infrastructure issues on the provider side
Errors not related to your quota limits

If your 5xx rate is elevated, check your provider status page or contact your provider support.

Usage over time

The Usage over time chart visualizes token and request throughput over the selected time period to help you identify trends and spikes.

Avg TPM toggle shows average estimated Glean-attributed tokens per minute over time, broken down by model
Avg RPM toggle shows average requests per minute over time, broken down by model
Each model displays as a separate colored line in the chart
Hover over the chart to see exact values for each model at any point in time

Use this chart to:

Spot usage trends and growth patterns
Identify specific dates or times with unusual activity
Understand which models drive the most traffic
Correlate usage spikes with business events or rollouts

Error rate monitoring

429 Total count

429 Total count shows the total number of requests that received a 429 error (quota pressure) in the selected window, and the percentage of all requests.

Use this to:

Quantify the scale of quota-related throttling
Understand how many requests were affected
Track improvements after increasing capacity

429 error rate chart

429 error rate chart visualizes the percentage of requests returning 429 errors over time, broken down by model.

Each model displays as a separate colored line
Hover to see exact 429 rate for each model at any point in time
Helps identify which models are experiencing the most quota pressure and when

5xx Total count

5xx Total count shows the total number of requests that received a 5xx error (provider-side error) in the selected window, and the percentage of all requests.

Use this to:

Quantify the scale of provider-related errors
Distinguish between quota issues (429s) and provider instability (5xxs)
Track provider reliability over time

5xx error rate chart

5xx error rate chart visualizes the percentage of requests returning 5xx errors over time, broken down by model.

Each model displays as a separate colored line
Hover to see exact 5xx rate for each model at any point in time
Helps identify provider instability patterns

If no 5xx errors occurred in the selected period, the chart shows "No insights for this time period."

Usage breakdown

The Usage breakdown table shows where traffic concentrates across your enabled models. Each row represents a model and includes the following information:

Model name: The LLM model identifier (for example, CLAUDE_4_6_OPUS_20260205, GPT5_4, GEMINI_3_1_PRO)
Peak TPM: The highest token throughput observed for this model in any single minute
Avg TPM: Average tokens per minute for this model across the selected window
Avg 429 Rate: Average percentage of requests for this model that received a 429 error
Avg 5xx Rate: Average percentage of requests for this model that received a 5xx error
Health status: Visual indicator showing whether the model is experiencing issues
- Healthy: Low error rates, performing normally
- High pressure: Elevated 429 rate, indicating quota pressure

Use this table to:

Identify which models are consuming the most capacity
Spot models experiencing quota pressure or reliability issues
Prioritize capacity increases for high-usage or high-error models
Compare performance across different model types or providers

Data retention​

Filters and controls​

Overview metrics​

Peak TPM​

Avg TPM​

Peak RPM​

429 rate​

5xx rate​

Usage over time​

Error rate monitoring​

429 Total count​

429 error rate chart​

5xx Total count​

5xx error rate chart​

Usage breakdown​

Data retention

Filters and controls

Overview metrics

Peak TPM

Avg TPM

Peak RPM

429 rate

5xx rate

Usage over time

Error rate monitoring

429 Total count

429 error rate chart

5xx Total count

5xx error rate chart

Usage breakdown