Target the needle in the haystack
Glean’s sensitive policies support over 100 predefined Infotypes, along with custom term and regex-based matching. The goal is to define each policy with precision—targeting the exact identifiers or patterns that reflect truly sensitive data. These policies are most effective when you know the specific “needle in the haystack” you’re looking for. For example, using a broad term like “salary” may generate a high volume of false positives, making it difficult to isolate real risks. Precision helps ensure policies are actionable, accurate, and low-noise—a hallmark of effective DLP. Employee data: Use Infotypes to detect personally identifiable information (PII), such as national ID numbers, taxpayer IDs, or social security numbers across different countries. Passwords and authentication tokens: Leverage Infotypes that match common token patterns (e.g., OAuth, JWT, API keys) to reduce accidental exposure of credentials in places like docs, chat messages, or tickets. Customer data: Use custom regex patterns to match structured customer identifiers (e.g., customer IDs, account numbers). Pair with Infotypes to detect associated emails, phone numbers, or payment information. Financial data: Use Infotypes for financial terms, bank account numbers, and payment card details (e.g., PCI data). When applicable, supplement with regexes for internal codes or ledger IDs. Reduction in force (RIF) planning: Use a combination of project codenames (as terms), personal identifiers (Infotypes), and regexes for equity grant IDs or other sensitive HR references. Top secret projects: Use specific project names or codewords as custom terms. These are often referenced informally across less-controlled environments and can indicate early leakage points.Build policies using real examples
The best way to craft effective policies is to start with a known sensitive document:- What specific identifiers appear?
- Are there named projects, IDs, or structured patterns?
- Can you generalize those patterns into Infotypes, terms, or regexes?