Capabilities
- Search β Results are a subset of all documents written in the same language as the query.
- Chat β Responses match the language of the query, and typically are based on documents written in the same language of the query.
- Note that this means Glean can only answer questions based off documents in the same language β for example, an English query that needs knowledge from a Spanish document is not GA.
- We do have initial support for the above (asking a question in language X that requires knowledge from language Y) for 2-lingual corpora as early access (π¦).
- Summarization β Summaries are provided in the user interface language, regardless of the source documentβs language.
Support matrix
β Generally availableπ¦ Early access and welcoming design partners to help battle-test it!
| Keyword Search | Semantic Search | Assistant | UI | |
|---|---|---|---|---|
| English | β | β | β | β |
| German | β | β | β | β |
| Japanese | β | β | β | β |
| French | β | π¦ | π¦ | β |
| Spanish | β | π¦ | π¦ | β |
| Dutch | β | π¦ | β | |
| Italian | β | π¦ | β | |
| Chinese (Simplified) | π¦ | π¦ | β | |
| Chinese (Traditional) | π¦ | π¦ | β | |
| Korean | π¦ | π¦ | β | |
| Portuguese | π¦ | π¦ | β | |
| Turkish | π¦ | π¦ | ||
| Greek | β | β | ||
| Hungarian | β | β | ||
| Croatian | π¦ | β | ||
| Czech | π¦ | β | ||
| Slovak | π¦ | β | ||
| Albanian | π¦ | |||
| Arabic | π¦ | |||
| Bengali | π¦ | |||
| Bulgarian | π¦ | |||
| Danish | π¦ | |||
| Finnish | π¦ | |||
| Hindi | π¦ | |||
| Indonesian | π¦ | |||
| Macedonian | π¦ | |||
| Norwegian | π¦ | |||
| Polish | π¦ | |||
| Romanian | π¦ | |||
| Russian | π¦ | |||
| Swedish | π¦ | |||
| Tamil | π¦ | |||
| Telugu | π¦ | |||
| Ukrainian | π¦ |
Glossary
Keyword Search β The syntax/grammatical structure of the language is understood by the search stack. Search is functional.Language detection β The language of the query is understood.
Segmentation β The boundary between words is understood.
Stemming β Concepts such as plurals and verb tenses are understood.
Stop words β Common words such as articles (e.g. a, the) and prepositions (e.g. of, from, in) are ignored. Semantic Search β The semantics of the language as used in the particular enterprise context is understood. Search is stronger.
Frequency-based term weights β System understands the relative frequency of all terms (not just stop words) and weighs them appropriately when constructing a result set.
Domain-Adapted Vector Search β a fine-tuned embedding model is used within the larger hybrid search system
Acronyms β Corpus specific acronyms are automatically mined.
Synonyms β Corpus specific synonyms are automatically mined. Assistant β Glean Chat has been optimized for the language and in-context learning examples have been provided in the language. Note that as Assistant is reliant on Search through RAG, quality is dependent on how much of the first 2 columns is complete for a given language: keyword Search is a strict requirement, and Semantic Search will improve upon quality. User Interface β All end-user facing product surfaces are localized into the given language / region. Note that external help documentation and admin workspace setup are not yet localized.