Capabilities

  • Search – Results are a subset of all documents written in the same language as the query.
  • Chat – Responses match the language of the query, and typically are based on documents written in the same language of the query.
    • Note that this means Glean can only answer questions based off documents in the same language – for example, an English query that needs knowledge from a Spanish document is not GA.
    • We do have initial support for the above (asking a question in language X that requires knowledge from language Y) for 2-lingual corpora as early access (🟦).
  • Summarization – Summaries are provided in the user interface language, regardless of the source document’s language.

WARNING: Because our LLM engines are multilingual, it may appear upon casual testing that Assistant can understand languages not enumerate below – however, this is very different from our end-to-end, core technology actually functioning, so please do not use that to infer that Glean supports that language!

Support matrix

✅ Generally available
🟦 Early access and welcoming design partners to help battle-test it!

Keyword SearchSemantic SearchAssistantUI
English
German
Japanese
French🟦🟦
Spanish🟦🟦
Dutch🟦
Italian🟦
Chinese (Simplified)🟦🟦
Chinese (Traditional)🟦🟦
Korean🟦🟦
Portuguese🟦🟦
Turkish🟦🟦
Greek
Hungarian
Croatian🟦
Czech🟦
Slovak🟦
Albanian🟦
Arabic🟦
Bengali🟦
Bulgarian🟦
Danish🟦
Finnish🟦
Hindi🟦
Indonesian🟦
Macedonian🟦
Norwegian🟦
Polish🟦
Romanian🟦
Russian🟦
Swedish🟦
Tamil🟦
Telugu🟦
Ukrainian🟦

Glossary

Keyword Search – The syntax/grammatical structure of the language is understood by the search stack. Search is functional.
Language detection – The language of the query is understood.
Segmentation – The boundary between words is understood.
Stemming – Concepts such as plurals and verb tenses are understood.
Stop words – Common words such as articles (e.g. a, the) and prepositions (e.g. of, from, in) are ignored.

Semantic Search – The semantics of the language as used in the particular enterprise context is understood. Search is stronger.
Frequency-based term weights – System understands the relative frequency of all terms (not just stop words) and weighs them appropriately when constructing a result set.
Domain-Adapted Vector Search – a fine-tuned embedding model is used within the larger hybrid search system
Acronyms – Corpus specific acronyms are automatically mined.
Synonyms – Corpus specific synonyms are automatically mined.

Assistant – Glean Chat has been optimized for the language and in-context learning examples have been provided in the language. Note that as Assistant is reliant on Search through RAG, quality is dependent on how much of the first 2 columns is complete for a given language: keyword Search is a strict requirement, and Semantic Search will improve upon quality.

User Interface – All end-user facing product surfaces are localized into the given language / region. Note that external help documentation and admin workspace setup are not yet localized.