Investigating Violations
The ai_security.ai_security BigQuery table contains enriched records of AI security violations. These records include debug metadata, workflow context, raw content, and validation/model metadata.
You can use this table for various purposes, including:
- Root-cause analysis and triage
- Dashboards and automated alerts
- Offline machine learning or data aggregation work
Quick Investigation Checklist
Before you begin an investigation, ensure you have the following:
- Required Data: The event_id, Run ID, or a specific timestamp range from the Findings dashboard.
- Permissions: The necessary bigquery.dataViewer and bigquery.jobUser permissions to access and query the data.
- Query Best Practices: Use saved views or pivoted queries to minimize manual errors.
- Data Security: Mask or redact any sensitive fields before sharing query results outside of the security team.
How to access the data
Required Privileges
To access and query the data, a user typically needs the following Identity and Access Management (IAM) roles:
- roles/bigquery.dataViewer: To read the table data.
- roles/bigquery.jobUser: To run queries.
For broader access to logs and for troubleshooting ingestion issues, the roles/logging.viewer role may also be helpful.
GCP Console (web UI)
- Sign in to Google Cloud Console and select the project that owns the dataset.
- Open BigQuery → Explorer panel → find
your_project→ datasetai_security→ tableai_security. - Click the table to view schema, details and preview rows.
- Use the Query editor to run SQL; review results in Table / JSON view; use the Visualization tab to build simple charts.
- Save frequently used diagnostic queries with the Console's Saved Queries feature for reuse and sharing.
CLI (Cloud Shell or local with gcloud + bq)
- Authenticate and set the project:
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
- Run a simple query with
bq:
bq query --use_legacy_sql=false \
"SELECT timestamp, jsonPayload.ai_security.event_id, jsonPayload.ai_security.event_type
FROM \`YOUR_PROJECT.ai_security.ai_security\`
WHERE jsonPayload.ai_security.event_type = 'VIOLATION'
ORDER BY timestamp DESC
LIMIT 50;"
- Output JSON for programmatic parsing:
bq query --use_legacy_sql=false --format=prettyjson 'SELECT ...'
- Save queries as stored procedures in BigQuery and call them from the CLI for automation.
Relevant columns (top-level BigQuery fields)
These fields are present on each exported log row and give resource / ingestion context:
logName— Log stream name.resource.type— GCP resource type (e.g.,k8s_container).resource.labels.pod_name— Pod name (k8s).resource.labels.location— Region / zone.resource.labels.namespace_name— K8s namespace.resource.labels.cluster_name— Cluster name.resource.labels.project_id— GCP project id.timestamp— Event timestamp (when it occurred).receiveTimestamp— When the log was ingested.insertId— Unique insert id (dedupe).labels.commit_hash,labels.branch,labels.full_version— Build/version metadata.
Payload structure (jsonPayload.ai_security)
jsonPayload.ai_security contains a JSON representation of the AiSecurityLogEntry proto. Important fields and their meaning:
-
event_id(string) — globally unique event id. -
event_type(enum) — e.g.,VIOLATION. -
event_description(string) — human readable description. -
user_id(string) — user that triggered the event. -
session_info— object withtab_idandsession_tracking_token. -
action(enum) — enforcement taken (BLOCK_REQUEST/ALLOW_REQUEST). -
content_raw(string) — raw content that caused the event (user prompt or retrieved content). -
content_metadata(repeated{key,value}) — context keys such as:-
RESOURCE_NAME,RESOURCE_ID,RESOURCE_URL, -
AGENT_NAME,RUN_ID,CHAT_SESSION_ID,AGENT_ID,SOURCE
-
-
validation_metadata(repeated{key,value}) — model prediction / validation debugging key-values. -
(other context fields may exist — e.g., workflow entries, LLM call details, agent spans)
This JSON mirrors the following structure:
message Metadata {
string key = 1 [(options.scrub) = {method: RAW}];
string value = 2 [(options.scrub) = {method: RAW}];
}
message AiSecurityLogEntry {
string event_id = 1;
EventType event_type = 2;
string event_description = 3;
string user_id = 4;
SessionInfo session_info = 5;
Action action = 6;
repeated Metadata validation_metadata = 7;
string content_raw = 8;
repeated Metadata content_metadata = 9;
}
Common SQL patterns & examples
Notes: UNNEST() is used to flatten repeated metadata arrays. Replace YOUR_PROJECT accordingly.
Get latest 100 violations with key content & resource id
SELECT
timestamp,
jsonPayload.ai_security.event_id AS event_id,
jsonPayload.ai_security.event_type AS event_type,
jsonPayload.ai_security.user_id AS user_id,
jsonPayload.ai_security.action AS action,
jsonPayload.ai_security.content_raw AS content_raw,
cm.value AS resource_id
FROM
`YOUR_PROJECT.ai_security.ai_security`,
UNNEST(jsonPayload.ai_security.content_metadata) AS cm
WHERE
jsonPayload.ai_security.event_type = 'VIOLATION'
AND cm.key = 'RESOURCE_ID'
ORDER BY timestamp DESC
LIMIT 100;
Extract a single row’s full JSON for deep debugging
SELECT
TO_JSON_STRING(t) AS full_row_json
FROM
`YOUR_PROJECT.ai_security.ai_security` AS t
WHERE
jsonPayload.ai_security.event_id = 'f525eda94eae4f8bb260b78cddb9b3bb';
Pivot content_metadata into columns (common keys)
SELECT
timestamp,
jsonPayload.ai_security.event_id AS event_id,
MAX(IF(cm.key='RESOURCE_ID', cm.value, NULL)) AS resource_id,
MAX(IF(cm.key='AGENT_NAME', cm.value, NULL)) AS agent_name,
MAX(IF(cm.key='RUN_ID', cm.value, NULL)) AS run_id,
jsonPayload.ai_security.content_raw AS content_raw
FROM
`YOUR_PROJECT.ai_security.ai_security`,
UNNEST(jsonPayload.ai_security.content_metadata) AS cm
WHERE
jsonPayload.ai_security.event_type = 'VIOLATION'
GROUP BY timestamp, event_id, content_raw
ORDER BY timestamp DESC
LIMIT 200;
Count of violations by action type (daily)
SELECT
DATE(timestamp) AS day,
jsonPayload.ai_security.action AS action,
COUNT(1) AS count
FROM `YOUR_PROJECT.ai_security.ai_security`
WHERE jsonPayload.ai_security.event_type = 'VIOLATION'
GROUP BY day, action
ORDER BY day DESC;
Top validation metadata keys/values (for model debugging)
SELECT
vm.key, vm.value, COUNT(1) AS occurrences
FROM `YOUR_PROJECT.ai_security.ai_security`,
UNNEST(jsonPayload.ai_security.validation_metadata) AS vm
GROUP BY vm.key, vm.value
ORDER BY occurrences DESC
LIMIT 200;
Useful investigative workflows
- Triage a single finding
- Start with the
event_idfrom the Findings dashboard or theRun ID. - Query
event_idin BigQuery to fetchcontent_raw,content_metadataandvalidation_metadata. - Inspect LLM call traces /
llm_callandagent_spanfields (if present) to see prompt / response context.
- Start with the
- Find similar incidents
- Use
content_metadata.RESOURCE_IDor normalizedcontent_rawhashes to group similar violations. - Search by
validation_metadatakeys (e.g., model label or confidence buckets) to identify common false positives.
- Use
- Root cause of skipped users / digest generation issues
- Combine
digestentries with workflow/compiler logs (workflow,workflow_compiler) in the exported fields to see enqueue vs execution differences.
- Combine
- Automated daily rollups
- Create scheduled queries that aggregate violations by agent, action, and resource; write results to
ai_security_reportingdataset for dashboards.
- Create scheduled queries that aggregate violations by agent, action, and resource; write results to
Best practices & operational recommendations
- Minimize scanned bytes: Project only needed fields (avoid
SELECT *). UseUNNEST()carefully and filter early. - Protect PII:
content_rawmay contain sensitive user content. Limit access via IAM and consider creating sanitized views that mask or redactcontent_rawbefore sharing with wider teams. - Stored procedures & saved queries: Convert complex investigation SQL into stored procedures or saved queries for repeatable triage.
- Alerting: For high-severity events (e.g., many
BLOCK_REQUESTin short window), schedule queries to write a metric table and tie it to Cloud Monitoring or a Cloud Function that publishes alerts. - Retention & export: Define retention in BigQuery or set up periodic exports for offline ML if long-term analysis is required.
Example: create a reusable view for violation triage
CREATE OR REPLACE VIEW `YOUR_PROJECT.ai_security.violation_triage` AS
SELECT
timestamp,
jsonPayload.ai_security.event_id AS event_id,
jsonPayload.ai_security.event_type AS event_type,
jsonPayload.ai_security.user_id AS user_id,
jsonPayload.ai_security.action AS action,
jsonPayload.ai_security.event_description AS description,
jsonPayload.ai_security.content_raw AS content_raw,
ARRAY(
SELECT AS STRUCT key, value FROM UNNEST(jsonPayload.ai_security.content_metadata)
) AS content_metadata,
ARRAY(
SELECT AS STRUCT key, value FROM UNNEST(jsonPayload.ai_security.validation_metadata)
) AS validation_metadata
FROM `YOUR_PROJECT.ai_security.ai_security`
WHERE jsonPayload.ai_security.event_type = 'VIOLATION';
Use this view for faster queries, RBAC (grant view access without giving raw table access), and reducing accidental exposure of raw logs.
Troubleshooting tips
- If queries are slow/expensive:
- Add time filters, use partition pruning, cluster by high-cardinality fields, and avoid scanning
content_rawunnecessarily.
- Add time filters, use partition pruning, cluster by high-cardinality fields, and avoid scanning
- For reproducible investigations:
- Capture the exact SQL and result snapshot (e.g., export query output to Cloud Storage) to preserve context for later audits.