Investigating Violations

The ai_security.ai_security BigQuery table contains enriched records of AI security violations. These records include debug metadata, workflow context, raw content, and validation/model metadata. You can use this table for various purposes, including:

Root-cause analysis and triage
Dashboards and automated alerts
Offline machine learning or data aggregation work

Quick Investigation Checklist

Before you begin an investigation, ensure you have the following:

Required Data: The event_id, Run ID, or a specific timestamp range from the Findings dashboard.
Permissions: The necessary bigquery.dataViewer and bigquery.jobUser permissions to access and query the data.
Query Best Practices: Use saved views or pivoted queries to minimize manual errors.
Data Security: Mask or redact any sensitive fields before sharing query results outside of the security team.

How to access the data

Required Privileges

To access and query the data, a user typically needs the following Identity and Access Management (IAM) roles:

roles/bigquery.dataViewer: To read the table data.
roles/bigquery.jobUser: To run queries.

For broader access to logs and for troubleshooting ingestion issues, the roles/logging.viewer role may also be helpful.

GCP Console (web UI)

Sign in to Google Cloud Console and select the project that owns the dataset.
Open BigQuery → Explorer panel → find your_project → dataset ai_security → table ai_security.
Click the table to view schema, details and preview rows.
Use the Query editor to run SQL; review results in Table / JSON view; use the Visualization tab to build simple charts.
Save frequently used diagnostic queries with the Console’s Saved Queries feature for reuse and sharing.

CLI (Cloud Shell or local with gcloud + bq)

Authenticate and set the project:

gcloud auth login
gcloud config set project YOUR_PROJECT_ID

Run a simple query with bq:

bq query --use_legacy_sql=false \
"SELECT timestamp, jsonPayload.ai_security.event_id, jsonPayload.ai_security.event_type
 FROM \`YOUR_PROJECT.ai_security.ai_security\`
 WHERE jsonPayload.ai_security.event_type = 'VIOLATION'
 ORDER BY timestamp DESC
 LIMIT 50;"

Output JSON for programmatic parsing:

bq query --use_legacy_sql=false --format=prettyjson 'SELECT ...'

Save queries as stored procedures in BigQuery and call them from the CLI for automation.

Relevant columns (top-level BigQuery fields)

These fields are present on each exported log row and give resource / ingestion context:

logName — Log stream name.
resource.type — GCP resource type (e.g., k8s_container).
resource.labels.pod_name — Pod name (k8s).
resource.labels.location — Region / zone.
resource.labels.namespace_name — K8s namespace.
resource.labels.cluster_name — Cluster name.
resource.labels.project_id — GCP project id.
timestamp — Event timestamp (when it occurred).
receiveTimestamp — When the log was ingested.
insertId — Unique insert id (dedupe).
labels.commit_hash, labels.branch, labels.full_version — Build/version metadata.

Payload structure (jsonPayload.ai_security)

jsonPayload.ai_security contains a JSON representation of the AiSecurityLogEntry proto. Important fields and their meaning:

event_id (string) — globally unique event id.
event_type (enum) — e.g., VIOLATION.
event_description (string) — human readable description.
user_id (string) — user that triggered the event.
session_info — object with tab_id and session_tracking_token.
action (enum) — enforcement taken (BLOCK_REQUEST / ALLOW_REQUEST).
content_raw (string) — raw content that caused the event (user prompt or retrieved content).
content_metadata (repeated {key,value}) — context keys such as:
- RESOURCE_NAME, RESOURCE_ID, RESOURCE_URL,
- AGENT_NAME, RUN_ID, CHAT_SESSION_ID, AGENT_ID, SOURCE
validation_metadata (repeated {key,value}) — model prediction / validation debugging key-values.
(other context fields may exist — e.g., workflow entries, LLM call details, agent spans)

This JSON mirrors the following structure:

message Metadata {
   string key = 1 [(options.scrub) = {method: RAW}];
   string value = 2 [(options.scrub) = {method: RAW}];
}

message AiSecurityLogEntry {
  string event_id = 1;
  EventType event_type = 2;
  string event_description = 3;
  string user_id = 4;
  SessionInfo session_info = 5;
  Action action = 6;
  repeated Metadata validation_metadata = 7;
  string content_raw = 8;
  repeated Metadata content_metadata = 9;
}

Common SQL patterns & examples

Notes: UNNEST() is used to flatten repeated metadata arrays. Replace YOUR_PROJECT accordingly.

Get latest 100 violations with key content & resource id

SELECT
  timestamp,
  jsonPayload.ai_security.event_id AS event_id,
  jsonPayload.ai_security.event_type AS event_type,
  jsonPayload.ai_security.user_id AS user_id,
  jsonPayload.ai_security.action AS action,
  jsonPayload.ai_security.content_raw AS content_raw,
  cm.value AS resource_id
FROM
  `YOUR_PROJECT.ai_security.ai_security`,
  UNNEST(jsonPayload.ai_security.content_metadata) AS cm
WHERE
  jsonPayload.ai_security.event_type = 'VIOLATION'
  AND cm.key = 'RESOURCE_ID'
ORDER BY timestamp DESC
LIMIT 100;

Extract a single row’s full JSON for deep debugging

SELECT
  TO_JSON_STRING(t) AS full_row_json
FROM
  `YOUR_PROJECT.ai_security.ai_security` AS t
WHERE
  jsonPayload.ai_security.event_id = 'f525eda94eae4f8bb260b78cddb9b3bb';

Pivot content_metadata into columns (common keys)

SELECT
  timestamp,
  jsonPayload.ai_security.event_id AS event_id,
  MAX(IF(cm.key='RESOURCE_ID', cm.value, NULL)) AS resource_id,
  MAX(IF(cm.key='AGENT_NAME', cm.value, NULL)) AS agent_name,
  MAX(IF(cm.key='RUN_ID', cm.value, NULL)) AS run_id,
  jsonPayload.ai_security.content_raw AS content_raw
FROM
  `YOUR_PROJECT.ai_security.ai_security`,
  UNNEST(jsonPayload.ai_security.content_metadata) AS cm
WHERE
  jsonPayload.ai_security.event_type = 'VIOLATION'
GROUP BY timestamp, event_id, content_raw
ORDER BY timestamp DESC
LIMIT 200;

Count of violations by action type (daily)

SELECT
  DATE(timestamp) AS day,
  jsonPayload.ai_security.action AS action,
  COUNT(1) AS count
FROM `YOUR_PROJECT.ai_security.ai_security`
WHERE jsonPayload.ai_security.event_type = 'VIOLATION'
GROUP BY day, action
ORDER BY day DESC;

Top validation metadata keys/values (for model debugging)

SELECT
  vm.key, vm.value, COUNT(1) AS occurrences
FROM `YOUR_PROJECT.ai_security.ai_security`,
UNNEST(jsonPayload.ai_security.validation_metadata) AS vm
GROUP BY vm.key, vm.value
ORDER BY occurrences DESC
LIMIT 200;

Useful investigative workflows

Triage a single finding
- Start with the event_id from the Findings dashboard or the Run ID.
- Query event_id in BigQuery to fetch content_raw, content_metadata and validation_metadata.
- Inspect LLM call traces / llm_call and agent_span fields (if present) to see prompt / response context.
Find similar incidents
- Use content_metadata.RESOURCE_ID or normalized content_raw hashes to group similar violations.
- Search by validation_metadata keys (e.g., model label or confidence buckets) to identify common false positives.
Root cause of skipped users / digest generation issues
- Combine digest entries with workflow/compiler logs (workflow, workflow_compiler) in the exported fields to see enqueue vs execution differences.
Automated daily rollups
- Create scheduled queries that aggregate violations by agent, action, and resource; write results to ai_security_reporting dataset for dashboards.

Best practices & operational recommendations

Minimize scanned bytes: Project only needed fields (avoid SELECT *). Use UNNEST() carefully and filter early.
Protect PII: content_raw may contain sensitive user content. Limit access via IAM and consider creating sanitized views that mask or redact content_raw before sharing with wider teams.
Stored procedures & saved queries: Convert complex investigation SQL into stored procedures or saved queries for repeatable triage.
Alerting: For high-severity events (e.g., many BLOCK_REQUEST in short window), schedule queries to write a metric table and tie it to Cloud Monitoring or a Cloud Function that publishes alerts.
Retention & export: Define retention in BigQuery or set up periodic exports for offline ML if long-term analysis is required.

Example: create a reusable view for violation triage

CREATE OR REPLACE VIEW `YOUR_PROJECT.ai_security.violation_triage` AS
SELECT
  timestamp,
  jsonPayload.ai_security.event_id AS event_id,
  jsonPayload.ai_security.event_type AS event_type,
  jsonPayload.ai_security.user_id AS user_id,
  jsonPayload.ai_security.action AS action,
  jsonPayload.ai_security.event_description AS description,
  jsonPayload.ai_security.content_raw AS content_raw,
  ARRAY(
    SELECT AS STRUCT key, value FROM UNNEST(jsonPayload.ai_security.content_metadata)
  ) AS content_metadata,
  ARRAY(
    SELECT AS STRUCT key, value FROM UNNEST(jsonPayload.ai_security.validation_metadata)
  ) AS validation_metadata
FROM `YOUR_PROJECT.ai_security.ai_security`
WHERE jsonPayload.ai_security.event_type = 'VIOLATION';

Use this view for faster queries, RBAC (grant view access without giving raw table access), and reducing accidental exposure of raw logs.

Troubleshooting tips

If queries are slow/expensive:
- Add time filters, use partition pruning, cluster by high-cardinality fields, and avoid scanning content_raw unnecessarily.
For reproducible investigations:
- Capture the exact SQL and result snapshot (e.g., export query output to Cloud Storage) to preserve context for later audits.

General

Identity

Search

Assistant

Embedded Integrations

Glean MCP Servers

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

Managing Actions

Quick Investigation Checklist

How to access the data

Required Privileges

GCP Console (web UI)

CLI (Cloud Shell or local with gcloud + bq)

Relevant columns (top-level BigQuery fields)

Payload structure (jsonPayload.ai_security)

Common SQL patterns & examples

Get latest 100 violations with key content & resource id

Extract a single row’s full JSON for deep debugging

Pivot content_metadata into columns (common keys)

Count of violations by action type (daily)

Top validation metadata keys/values (for model debugging)

Useful investigative workflows

Best practices & operational recommendations

Example: create a reusable view for violation triage

Troubleshooting tips

General

Identity

Search

Assistant

Embedded Integrations

Glean MCP Servers

Protect

Knowledge

Management

Insights

Glean Customer Event Logs

Developer

Managing Agents

Managing Actions

​Quick Investigation Checklist

​How to access the data

​Required Privileges

​GCP Console (web UI)

​CLI (Cloud Shell or local with gcloud + bq)

​Relevant columns (top-level BigQuery fields)

​Payload structure (jsonPayload.ai_security)

​Common SQL patterns & examples

​Get latest 100 violations with key content & resource id

​Extract a single row’s full JSON for deep debugging

​Pivot content_metadata into columns (common keys)

​Count of violations by action type (daily)

​Top validation metadata keys/values (for model debugging)

​

​Useful investigative workflows

​Best practices & operational recommendations

​Example: create a reusable view for violation triage

​Troubleshooting tips

Quick Investigation Checklist

How to access the data

Required Privileges

GCP Console (web UI)

CLI (Cloud Shell or local with gcloud + bq)

Relevant columns (top-level BigQuery fields)

Payload structure (jsonPayload.ai_security)

Common SQL patterns & examples

Get latest 100 violations with key content & resource id

Extract a single row’s full JSON for deep debugging

Pivot content_metadata into columns (common keys)

Count of violations by action type (daily)

Top validation metadata keys/values (for model debugging)

Useful investigative workflows

Best practices & operational recommendations

Example: create a reusable view for violation triage

Troubleshooting tips