- The types of information to detect (e.g., predefined info types, custom terms, regular expressions)
- The scope of data sources
- Frequency
- Exclusions
Policy scope
Your policy’s scope determines what information Glean will review. The scope can be configured using the following criteria:- Data sources: Specify whether the policy applies to all data sources in your organization, or only to selected repositories or platforms.
- Time period: Choose the range of document activity (such as when a document was viewed, created, or modified) that the policy will review.
- Permissions: Set parameters for which documents are included based on user or group access levels (e.g., documents visible to all users, specific roles, or external collaborators).
Permissions
Narrow down the scope of documents to scan based on how broadly shared it is. If any one of these conditions are met, we will include that document in our sensitive content search.- “Visible to anyone in your organization” refers to documents that can be viewed by anyone at your company. For example, a Slack thread posted in a public channel or a Google Doc that can be searched and accessed by anyone at your company.
- “Visible to anyone on the internet” refers to documents that can be searched and accessed by individuals outside your organization (e.g. a Google Doc that can be viewed by “Anyone on the internet with the link”).
- “Visible to [N] people or more, internal or external to your organization” refers to documents that have been made accessible to at least N or more people. We prevent you from choosing a number that is too low (i.e. lower than 5 people) because documents accessible to four or fewer people generally present a lower risk and it may significantly increase the processing time.
- “Specific teammates” refers to documents that can be accessed by specific users mentioned in this field.
Sensitive content
The sensitive content you define determines what types of information Glean’s sensitive insights feature will detect. You can specify sensitive data in multiple ways:- Defining specific info types, or by selecting them from a recommended list. Info types include things like credit card numbers, date of birth, SSN, and more. For more information, see Supported info types.
- Entering custom terms, which Glean will attempt to find matches for. Terms are specific words or phrases that match important company information, like employee IDs or job titles.
- Defining rules using regular expressions to match specific data formats or keywords. Regular expressions help you find custom types of sensitive information that follow a flexible format, like record numbers or user IDs. We use the re2 syntax for these expressions.
Frequency
All policies run on a recurring schedule. You can adjust how often policies run depending on your needs. For the highest priority sensitive content, you will likely want to set a continuous frequency to ensure that any findings are identified and addressed promptly. For lower priority sensitive content, you may wish to run on a weekly basis.Since there are certain events/datasource that do not feed into continuous report scanning, there will be a periodic (weekly) scan that will run on the same continuous report to pick up all documents to be scanned that were not picked up by the continuous scan.
Create a policy
Create a policy to start generating findings.PrerequisitesYou must be a super admin or have the sensitive content moderator role enabled to create and view policies. If you’re running Glean on AWS and want to generate infotype policies, your organization must have configured AWS for infotype scanning with GCP.
- Navigate to the Glean Admin console > Glean Protect > Sensitive findings page, then select the policies tab.
- Select the Create policy button to start creating your policy. You can create a policy either from scratch or from a template.
-
Define your policy’s scope:
- Choose a data source or scope your policy to all data sources in Glean
- Define a time period your policy will apply to
- Select the permissions or the viewership of the documents (ie: anyone in the organization, anyone on the internet, specific teammates. etc.)
-
Define sensitive content:
- InfoType: Specify the infoType of the data that needs to be scanned. For more information, see Supported info types.
-
Terms: Define a Primary term that needs to be flagged as sensitive findings. You can also define Related terms to improve the precision.
Primary term (Required)
This is the main term the system scans for.Field Details Primary Term Must be alphanumeric. Terms are matched case-insensitive. All non-alphanumeric characters (like commas, hyphens, and parentheses) are treated as spaces during the scan. Related terms for precision
You can add additional sets of related terms to refine your matches and reduce false positives. The primary term is flagged as sensitive only after all the defined related term conditions are met.Component Description Evaluation logic Term set (e.g., loan, savings)A list of related words separated by commas. OR relationship. Only the presence of one term from the set is required. Term condition (Two or more sets) Multiple term sets evaluated together. AND relationship. The presence of a match from each defined set is required for the primary term to be flagged. Example: Primary term
If your policy defines the Primary term asCurrencyand adds the following two Related term sets:- Related term set 1:
loan, savings - Related term set 2:
USD, dollar
Currencyis present AND the document text satisfies the following logic: ((loan OR savings) AND (USD OR dollar)).
Limits
- Maximum 25 related terms can be defined per primary term.
- Maximum 1,000 related terms can be used per report.
- Related term set 1:
-
Define regular expressions for your policy.
- Specify a name or an ID that can identify your regex. You can filter your findings report using this name or ID.
- Specify a Regex that follow RE2 Syntax.
- You can also define Related terms to improve the precision.
Related terms for the regex also uses the same logic and limits as terms. The system requires a regex and evaluates contextual rules using the same AND/OR relationships defined for related term sets. The regex is flagged as sensitive only after all the defined related term conditions are met.
Example: Regex
In this example, the system flags the content if the following criteria is met:- Name is Q3-Q4-Report
- Only if the custom 8-digit pattern (
\d{8}) for example,EmpID: 45678901is found - AND the context satisfies the following Boolean logic:
(budget OR forecasting OR finance) AND (Q3 OR Q4)

- Excluded from search: Specify terms that should not be flagged by this policy and be excluded from the sensitive findings search.
-
Enforcement: Define the enforcement policy to determine the default action the system takes when a document is flagged by this policy.
Action Description Visibility outcome Keep Visible The document remains fully visible on Glean. The document is visible to all users who have the original permissions to view it. Hide from All The document is hidden from all users across Glean. No user, regardless of original permissions, will see the document in Glean’s search results or Assistant responses. - Frequency: Set frequency for this policy to scan documents for sensitive content.
- Click the Create policy button to save your policy.
Archive a policy
Active policies run on a recurring basis. When you no longer need a policy, you can archive it. Archived policies no longer scan and no longer generate findings. Any content that is hidden will be made available. To archive a policy:- Navigate to the Glean Admin console > Glean Protect > Sensitive findings page, then select the policies tab.
- Select the menu icon on the right side of the policy you wish to archive.
- Select the Archive option.
Restore a policy
You can restore an archived policy at any time. Once archived policies are restored, they will begin generating findings in accordance with their frequency. To restore a policy:- Navigate to the Glean Admin console > Glean Protect > Sensitive findings page, then select the policies tab.
- Select the Restore button for the policy you wish to restore.
Please note that only the following data sources are supported for continuous policies: Aha, Airtable, Asana, Bitbucket, Box, Confluence, Egnyte, Google Chat, Google Drive, Gitlab, Github, Google Groups, Google Sites, Greenhouse, Guru, Jira, Lessonly, Lever, Miro, Microsoft Teams, O365 Onedrive, O365 Sharepoint, Pagerduty, Quip, Slack, Seismic, Trello, Wordpress, and Zendesk
Supported info types
See Supported info types.Last updated: November 2025