Skip to main content

Agent execution limits

Glean agents run within guardrails that protect cost, latency, and output quality. Understanding these limits helps you design agents that scale to large datasets without running into unexpected truncation or timeouts.

Tool-call limit

Each agent execution has a cap on the number of tool calls it can make in a single run. This cap exists to:

  • Control cost. Each tool call incurs model inference cost, so unbounded loops can become expensive.
  • Prevent timeouts. Agent runs have an overall time limit. Long chains of sequential calls risk exceeding it.
  • Preserve quality. Extended chains accumulate context drift and compounding errors, which reduces reliability.
  • Keep responses timely. Teammates are waiting on results.

When you build agents that iterate over many items — for example, one tool call per record — keep the total number of calls well within this cap. For large datasets, prefer strategies like time-based batching, download links, or querying a warehouse in a single step.

Response-size caps

Individual tool actions return results to the agent's context within a bounded size. When a single tool call produces more data than fits:

  • The in-context response is truncated to the portion that fits.
  • When available, a download link is included in the intermediate step output so teammates can retrieve the full raw result.
  • Some actions also return metadata (such as a total count) so the agent can tell the user how many records matched versus how many were returned in-context.

When you need to analyze very large result sets, combine the download link with a follow-up data-analysis step, or batch the source query into smaller windows. See Handling large result sets for a Jira-specific example of these patterns.