Skip to main content
Reports let you scan indexed content across connected data sources to find broadly shared or externally exposed sensitive content based on criteria you define. Use them for point‑in‑time audits, exports to CSV, and investigations where you don’t need ongoing dashboard workflows. Reports are on-demand CSV exports for ad‑hoc audits and offline analysis on content that is already searchable. Policies are saved, scheduled rules that scan continuously and surface findings in the Sensitive findings dashboard for in‑product triage and remediation.

Access and prerequisites

  • Access is restricted to admins with appropriate governance privileges (for example, Super Admin or Sensitive content moderator), as configured by your organization.
  • If you use Infotype scanning and your Glean deployment is on AWS, you must connect a Google Cloud DLP API key before running reports that include Infotypes.

Where to find Reports

  • In the Admin Console, go to Glean Protect → Reports.

Generate a report

PrerequisitesYou must be a super admin or have the sensitive content moderator role enabled to create and view reports. If you are running Glean on AWS and want to generate info type reports, your organization must have configured AWS for info type scanning with GCP.
  1. Go to Glean Protect → Sensitive findings → Reports and click Create report.
  2. Configure scope (data sources, time window, permission filters) and sensitive content criteria (see “Report configuration options” below).
  3. Define sensitive content. In this setp, describe the types of content that should be flagged as sensitive.
  4. Define the terms that should be excluded from the search.
  5. Click Generate report.
  6. Enter a name for the report and click Generate report. The report will appear in the Reports list with its status. When complete, download the CSV.

Report configuration options

You can tailor a report to reduce noise and focus on real risk:

Scope

  • Data sources: Specify whether the report needs to be created for all data sources in your organization, or only to selected data sources.
  • Time period: Select a time window for the documents to be searched (for example, created/updated/viewed in the past 30/90/365 days). You can specify any of the following options:
    • Past year (recommended): Optimize for faster report generation while covering the majority of documents.
    • All-time: Expect slower report generation times.
    • Custom time period: Specify the start and end date.
    • Last N day: Specify the number of days.
  • Permissions: Narrow down the scope of documents to scan based on how broadly shared it is. If any one of these conditions are met, we will include that document in our sensitive content search.
    • “Visible to anyone in your organization” refers to documents that can be viewed by anyone at your company. For example, a Slack thread posted in a public channel or a Google Doc that can be searched and accessed by anyone at your company.
    • “Visible to anyone on the internet” refers to documents that can be searched and accessed by individuals outside your organization (e.g. a Google Doc that can be viewed by “Anyone on the internet with the link”).
    • “Visible to [N] people or more, internal or external to your organization” refers to documents that have been made accessible to at least N or more people. We prevent you from choosing a number that is too low (i.e. lower than 5 people) because documents accessible to four or fewer people generally present a lower risk and it may significantly increase the processing time.
    • “Specific teammates” refers to documents that can be accessed by specific users mentioned in this field.

Sensitive content criteria

Define sensitive content:
  1. InfoType: Specify the infoType of the data that needs to be scanned. For more information, see Supported info types.
  2. Terms: Define a Primary term that needs to be flagged as sensitive findings. You can define 25 related terms per detector and 1000 related terms per policy.

    Primary term (Required)

    This is the main term the system scans for.
    FieldDetails
    Primary TermMust be alphanumeric. Terms are matched case-insensitive. All non-alphanumeric characters (like commas, hyphens, and parentheses) are treated as spaces during the scan.
    You can add additional sets of related terms to refine your matches and reduce false positives. The primary term is flagged as sensitive only after all the defined related term conditions are met.
    ComponentDescriptionEvaluation logic
    Term set (e.g., loan, savings)A list of related words separated by commas.OR relationship. Only the presence of one term from the set is required.
    Term condition (Two or more sets)Multiple term sets evaluated together.AND relationship. The presence of a match from each defined set is required for the primary term to be flagged.

    Example: Primary term

    If your policy defines the Primary term as Currency and adds the following two Related term sets:
    • Related term set 1: loan, savings
    • Related term set 2: USD, dollar
    A primary term is flagged as sensitive only if the primary term Currency is present AND the document text satisfies the following logic: ((loan OR savings) AND (USD OR dollar)).

    Limits

    • Maximum 25 related terms can be defined per primary term.
    • Maximum 1,000 related terms can be used per report.
  3. Define regular expressions for your policy.
    1. Specify a name or an ID that can identify your regex. You can filter your findings report using this name or ID.
    2. Specify a Regex that follow RE2 Syntax.
    3. You can also define Related terms to improve the precision.
    Related terms for the regex also uses the same logic and limits as terms. The system requires a regex and evaluates contextual rules using the same AND/OR relationships defined for related term sets. The regex is flagged as sensitive only after all the defined related term conditions are met.

    Example: Regex

    In this example, the system flags the content if the following criteria is met:
    • Name is Q3-Q4-Report
    • Only if the custom 8-digit pattern (\d{8}) for example, EmpID: 45678901 is found
    • AND the context satisfies the following Boolean logic: (budget OR forecasting OR finance) AND (Q3 OR Q4)
  • Specify terms that should not be flagged in this report.

Limits and behavior

  • Concurrency: You can generate up to 5 reports at a time (including queued weekly runs).
  • Manage runs: From the Reports list, you can view status, see the parameters used, cancel an in‑progress run, or cancel a weekly series.
  • Scheduling: Weekly reports run automatically once per week. For continuous or near‑real‑time monitoring and in‑product remediation, use policies instead.

Output and fields

  • Download format: CSV.
  • Typical fields include document identifiers and metadata, visibility signals, and matched sensitive content attributes to help triage and remediate. Exact columns may vary as capabilities evolve.

Troubleshooting

  • Report isn’t starting or appears blocked
    • Check if you already have multiple reports generating or queued (respect the concurrency limit).
    • Confirm your role/permissions grant access to Sensitive findings and Reports.
  • No results or unexpected volume
    • Narrow the time window, refine permission filters, or switch from broad terms to precise Info types / regex.
    • Run smaller, data‑source‑specific scans to validate criteria before expanding.
  • Info types not matching

Supported info types

See Supported info types.
Last updated: November 2025