Skip to main content
A policy is a configurable set of rules that defines how sensitive content is detected, monitored, and managed within your organization’s data ecosystem. Policies specify criteria such as:
  • The types of information to detect (e.g., predefined info types, custom terms, regular expressions)
  • The scope of data sources
  • Frequency
  • Exclusions
Each policy enables administrators to establish and automate protection measures tailored to organizational needs, ensuring compliance with data protection requirements and reducing the risk of unauthorized data exposure. You can manage policies through the sensitive findings, which provides tools for policy creation, enforcement, and continuous monitoring of violations. Note: Policies differ from Reports in that Policies are scheduled, ongoing scans with in‑product triage in the Sensitive findings dashboard. Reports are ad‑hoc, one‑time CSV exports for offline review.

Policy scope

Your policy’s scope determines what information Glean will review. The scope can be configured using the following criteria:
  • Data sources: Specify whether the policy applies to all data sources in your organization, or only to selected repositories or platforms.
  • Time period: Choose the range of document activity (such as when a document was viewed, created, or modified) that the policy will review.
  • Permissions: Set parameters for which documents are included based on user or group access levels (e.g., documents visible to all users, specific roles, or external collaborators).

Permissions

Narrow down the scope of documents to scan based on how broadly shared it is. If any one of these conditions are met, we will include that document in our sensitive content search.
  • “Visible to anyone in your organization” refers to documents that can be viewed by anyone at your company. For example, a Slack thread posted in a public channel or a Google Doc that can be searched and accessed by anyone at your company.
  • “Visible to anyone on the internet” refers to documents that can be searched and accessed by individuals outside your organization (e.g. a Google Doc that can be viewed by “Anyone on the internet with the link”).
  • “Visible to [N] people or more, internal or external to your organization” refers to documents that have been made accessible to at least N or more people. We prevent you from choosing a number that is too low (i.e. lower than 5 people) because documents accessible to four or fewer people generally present a lower risk and it may significantly increase the processing time.
  • “Specific teammates” refers to documents that can be accessed by specific users mentioned in this field.

Sensitive content

The sensitive content you define determines what types of information Glean’s sensitive insights feature will detect. You can specify sensitive data in multiple ways:
  • Defining specific info types, or by selecting them from a recommended list. Info types include things like credit card numbers, date of birth, SSN, and more. For more information, see Supported info types.
  • Entering custom terms, which Glean will attempt to find matches for. Terms are specific words or phrases that match important company information, like employee IDs or job titles.
  • Defining rules using regular expressions to match specific data formats or keywords. Regular expressions help you find custom types of sensitive information that follow a flexible format, like record numbers or user IDs. We use the re2 syntax for these expressions.
This configurable approach allows you to create policies that accurately identify a broad range of sensitive content, supporting any compliance and organizational needs you may have. You can also adjust your policy to exclude content that is not sensitive, but may otherwise turn up in your findings. For example, if you set a policy to report email addresses as sensitive, you may wish to exclude a sample user ([email protected]).

Frequency

All policies run on a recurring schedule. You can adjust how often policies run depending on your needs. For the highest priority sensitive content, you will likely want to set a continuous frequency to ensure that any findings are identified and addressed promptly. For lower priority sensitive content, you may wish to run on a weekly basis.
Since there are certain events/datasource that do not feed into continuous report scanning, there will be a periodic (weekly) scan that will run on the same continuous report to pick up all documents to be scanned that were not picked up by the continuous scan.

Create a policy

Create a policy to start generating findings.
PrerequisitesYou must be a super admin or have the sensitive content moderator role enabled to create and view policies. If you’re running Glean on AWS and want to generate infotype policies, your organization must have configured AWS for infotype scanning with GCP.
  1. Navigate to the Glean Admin console > Glean Protect > Sensitive findings page, then select the policies tab.
  2. Select the Create policy button to start creating your policy. You can create a policy either from scratch or from a template.
  3. Define your policy’s scope:
    1. Choose a data source or scope your policy to all data sources in Glean
    2. Define a time period your policy will apply to
    3. Select the permissions or the viewership of the documents (ie: anyone in the organization, anyone on the internet, specific teammates. etc.)
  4. Define sensitive content:
    1. InfoType: Specify the infoType of the data that needs to be scanned. For more information, see Supported info types.
    2. Terms: Define a Primary term that needs to be flagged as sensitive findings. You can also define Related terms to improve the precision.

      Primary term (Required)

      This is the main term the system scans for.
      FieldDetails
      Primary TermMust be alphanumeric. Terms are matched case-insensitive. All non-alphanumeric characters (like commas, hyphens, and parentheses) are treated as spaces during the scan.
      You can add additional sets of related terms to refine your matches and reduce false positives. The primary term is flagged as sensitive only after all the defined related term conditions are met.
      ComponentDescriptionEvaluation logic
      Term set (e.g., loan, savings)A list of related words separated by commas.OR relationship. Only the presence of one term from the set is required.
      Term condition (Two or more sets)Multiple term sets evaluated together.AND relationship. The presence of a match from each defined set is required for the primary term to be flagged.

      Example: Primary term

      If your policy defines the Primary term as Currency and adds the following two Related term sets:
      • Related term set 1: loan, savings
      • Related term set 2: USD, dollar
      A primary term is flagged as sensitive only if the primary term Currency is present AND the document text satisfies the following logic: ((loan OR savings) AND (USD OR dollar)).

      Limits

      • Maximum 25 related terms can be defined per primary term.
      • Maximum 1,000 related terms can be used per report.
    3. Define regular expressions for your policy.
      1. Specify a name or an ID that can identify your regex. You can filter your findings report using this name or ID.
      2. Specify a Regex that follow RE2 Syntax.
      3. You can also define Related terms to improve the precision.
      Related terms for the regex also uses the same logic and limits as terms. The system requires a regex and evaluates contextual rules using the same AND/OR relationships defined for related term sets. The regex is flagged as sensitive only after all the defined related term conditions are met.

      Example: Regex

      In this example, the system flags the content if the following criteria is met:
      • Name is Q3-Q4-Report
      • Only if the custom 8-digit pattern (\d{8}) for example, EmpID: 45678901 is found
      • AND the context satisfies the following Boolean logic: (budget OR forecasting OR finance) AND (Q3 OR Q4)
  5. Excluded from search: Specify terms that should not be flagged by this policy and be excluded from the sensitive findings search.
  6. Enforcement: Define the enforcement policy to determine the default action the system takes when a document is flagged by this policy.
    ActionDescriptionVisibility outcome
    Keep VisibleThe document remains fully visible on Glean.The document is visible to all users who have the original permissions to view it.
    Hide from AllThe document is hidden from all users across Glean.No user, regardless of original permissions, will see the document in Glean’s search results or Assistant responses.
  7. Frequency: Set frequency for this policy to scan documents for sensitive content.
  8. Click the Create policy button to save your policy.

Archive a policy

Active policies run on a recurring basis. When you no longer need a policy, you can archive it. Archived policies no longer scan and no longer generate findings. Any content that is hidden will be made available. To archive a policy:
  1. Navigate to the Glean Admin console > Glean Protect > Sensitive findings page, then select the policies tab.
  2. Select the menu icon on the right side of the policy you wish to archive.
  3. Select the Archive option.

Restore a policy

You can restore an archived policy at any time. Once archived policies are restored, they will begin generating findings in accordance with their frequency. To restore a policy:
  1. Navigate to the Glean Admin console > Glean Protect > Sensitive findings page, then select the policies tab.
  2. Select the Restore button for the policy you wish to restore.
Please note that only the following data sources are supported for continuous policies: Aha, Airtable, Asana, Bitbucket, Box, Confluence, Egnyte, Google Chat, Google Drive, Gitlab, Github, Google Groups, Google Sites, Greenhouse, Guru, Jira, Lessonly, Lever, Miro, Microsoft Teams, O365 Onedrive, O365 Sharepoint, Pagerduty, Quip, Slack, Seismic, Trello, Wordpress, and Zendesk

Supported info types

See Supported info types.
Last updated: November 2025