A policy is a configurable set of rules that defines how sensitive content is detected, monitored, and managed within your organization’s data ecosystem. Policies specify criteria such as:
The types of information to detect (e.g., predefined info types, custom terms, regular expressions)
The scope of data sources
Frequency
Exclusions
Each policy enables administrators to establish and automate protection measures tailored to organizational needs, ensuring compliance with data protection requirements and reducing the risk of unauthorized data exposure. You can manage policies through the sensitive findings, which provides tools for policy creation, enforcement, and continuous monitoring of violations.Note: Policies differ from Reports in that Policies are scheduled, ongoing scans with in‑product triage in the Sensitive findings dashboard. Reports are ad‑hoc, one‑time CSV exports for offline review.
Your policy’s scope determines what information Glean will review. The scope can be configured using the following criteria:
Data sources: Specify whether the policy applies to all data sources in your organization, or only to selected repositories or platforms.
Time period: Choose the range of document activity (such as when a document was viewed, created, or modified) that the policy will review.
Permissions: Set parameters for which documents are included based on user or group access levels (e.g., documents visible to all users, specific roles, or external collaborators).
Narrow down the scope of documents to scan based on how broadly shared it is. If any one of these conditions are met, we will include that document in our sensitive content search.
“Visible to anyone in your organization” refers to documents that can be viewed by anyone at your company. For example, a Slack thread posted in a public channel or a Google Doc that can be searched and accessed by anyone at your company.
“Visible to anyone on the internet” refers to documents that can be searched and accessed by individuals outside your organization (e.g. a Google Doc that can be viewed by “Anyone on the internet with the link”).
“Visible to [N] people or more, internal or external to your organization” refers to documents that have been made accessible to at least N or more people. We prevent you from choosing a number that is too low (i.e. lower than 5 people) because documents accessible to four or fewer people generally present a lower risk and it may significantly increase the processing time.
“Specific teammates” refers to documents that can be accessed by specific users mentioned in this field.
The sensitive content you define determines what types of information Glean’s sensitive insights feature will detect. You can specify sensitive data in multiple ways:
Defining specific info types, or by selecting them from a recommended list. Info types include things like credit card numbers, date of birth, SSN, and more.
Entering custom terms, which Glean will attempt to find matches for. Terms are specific words or phrases that match important company information, like employee IDs or job titles.
Defining rules using regular expressions to match specific data formats or keywords. Regular expressions help you find custom types of sensitive information that follow a flexible format, like record numbers or user IDs. We use the re2 syntax for these expressions.
This configurable approach allows you to create policies that accurately identify a broad range of sensitive content, supporting any compliance and organizational needs you may have.You can also adjust your policy to exclude content that is not sensitive, but may otherwise turn up in your findings. For example, if you set a policy to report email addresses as sensitive, you may wish to exclude a sample user (sample-user@example.com).
All policies run on a recurring schedule. You can adjust how often policies run depending on your needs. For the highest priority sensitive content, you will likely want to set a continuous frequency to ensure that any findings are identified and addressed promptly. For lower priority sensitive content, you may wish to run on a weekly basis.
Since there are certain events/datasource that do not feed into continuous report scanning, there will be a periodic (weekly) scan that will run on the same continuous report to pick up all documents to be scanned that were not picked up by the continuous scan.
You must be a super admin or have the sensitive content moderator role enabled to create and view policies. If you’re running Glean on AWS and want to generate infotype policies, your organization must have configured AWS for infotype scanning with GCP.
Active policies run on a recurring basis. When you no longer need a policy, you can archive it. Archived policies no longer scan and no longer generate findings. Any content that is hidden will be made available.To archive a policy:
You can restore an archived policy at any time. Once archived policies are restored, they will begin generating findings in accordance with their frequency.To restore a policy:
Select the Restore button for the policy you wish to restore.
Please note that only the following data sources are supported for continuous policies: Aha, Airtable, Asana, Bitbucket, Box, Confluence, Egnyte, Google Chat, Google Drive, Gitlab, Github, Google Groups, Google Sites, Greenhouse, Guru, Jira, Lessonly, Lever, Miro, Microsoft Teams, O365 Onedrive, O365 Sharepoint, Pagerduty, Quip, Slack, Seismic, Trello, Wordpress, and Zendesk