Glean Language Support
Use Glean in your preferred language.
Capabilities
- Search – Results are a subset of all documents written in the same language as the query.
- Chat – Responses match the language of the query, and typically are based on documents written in the same language of the query.
- Note that this means Glean can only answer questions based off documents in the same language – for example, an English query that needs knowledge from a Spanish document is not GA.
- We do have initial support for the above (asking a question in language X that requires knowledge from language Y) for 2-lingual corpora as early access (🟦).
- Summarization – Summaries are provided in the user interface language, regardless of the source document’s language.
WARNING: Because our LLM engines are multilingual, it may appear upon casual testing that Assistant can understand languages not enumerate below – however, this is very different from our end-to-end, core technology actually functioning, so please do not use that to infer that Glean supports that language!
Support matrix
✅ Generally available
🟦 Early access and welcoming design partners to help battle-test it!
Keyword Search | Semantic Search | Assistant | UI | |
---|---|---|---|---|
English | ✅ | ✅ | ✅ | ✅ |
German | ✅ | ✅ | ✅ | ✅ |
Japanese | ✅ | ✅ | ✅ | ✅ |
French | ✅ | 🟦 | 🟦 | ✅ |
Spanish | ✅ | 🟦 | 🟦 | ✅ |
Dutch | ✅ | 🟦 | ✅ | |
Italian | ✅ | 🟦 | ✅ | |
Chinese (Simplified) | 🟦 | 🟦 | ✅ | |
Chinese (Traditional) | 🟦 | 🟦 | ✅ | |
Korean | 🟦 | 🟦 | ✅ | |
Portuguese | 🟦 | 🟦 | ✅ | |
Turkish | 🟦 | 🟦 | ||
Greek | ✅ | ✅ | ||
Hungarian | ✅ | ✅ | ||
Croatian | 🟦 | ✅ | ||
Czech | 🟦 | ✅ | ||
Slovak | 🟦 | ✅ | ||
Albanian | 🟦 | |||
Arabic | 🟦 | |||
Bengali | 🟦 | |||
Bulgarian | 🟦 | |||
Danish | 🟦 | |||
Finnish | 🟦 | |||
Hindi | 🟦 | |||
Indonesian | 🟦 | |||
Macedonian | 🟦 | |||
Norwegian | 🟦 | |||
Polish | 🟦 | |||
Romanian | 🟦 | |||
Russian | 🟦 | |||
Swedish | 🟦 | |||
Tamil | 🟦 | |||
Telugu | 🟦 | |||
Ukrainian | 🟦 |
Glossary
Keyword Search – The syntax/grammatical structure of the language is understood by the search stack. Search is functional.
Language detection – The language of the query is understood.
Segmentation – The boundary between words is understood.
Stemming – Concepts such as plurals and verb tenses are understood.
Stop words – Common words such as articles (e.g. a, the) and prepositions (e.g. of, from, in) are ignored.
Semantic Search – The semantics of the language as used in the particular enterprise context is understood. Search is stronger.
Frequency-based term weights – System understands the relative frequency of all terms (not just stop words) and weighs them appropriately when constructing a result set.
Domain-Adapted Vector Search – a fine-tuned embedding model is used within the larger hybrid search system
Acronyms – Corpus specific acronyms are automatically mined.
Synonyms – Corpus specific synonyms are automatically mined.
Assistant – Glean Chat has been optimized for the language and in-context learning examples have been provided in the language. Note that as Assistant is reliant on Search through RAG, quality is dependent on how much of the first 2 columns is complete for a given language: keyword Search is a strict requirement, and Semantic Search will improve upon quality.
User Interface – All end-user facing product surfaces are localized into the given language / region. Note that external help documentation and admin workspace setup are not yet localized.