Overview
Glean supports crawling and indexing Salesforce Files, allowing users to find and question file content directly in Glean Search and Assistant. The connector respects all Salesforce permissions, ensuring users can only see the files they are authorized to access.Prerequisites
- Verify the Salesforce admin account has the Query All Files permission.
- Ensure your Salesforce connector is set up per the Salesforce Connector Overview.
Quick Start Checklist
- Authorize the connector with a Salesforce admin account.
- Add
ContentDocument
to the list of objects to crawl. - Enter the custom login domain during setup, if your Salesforce org uses a custom login domain (for example, a sandbox).
- Validate the setup by searching for a known file in Glean after the crawl is complete.
How it Works
The Glean connector crawls Salesforce Files, which are represented by the ContentDocument object. This process follows the connector’s standard crawl behavior and limits.What Gets Indexed
- Object:
ContentDocument
. Glean fetches file content through the related ContentVersion record. - File Types (default): PDF, DOCX, TXT, and HTML.
- File Size: Files up to 64 MB are fully indexed for search. Files larger than 64 MB are indexed by their metadata only.
- Unsupported Files: The connector does not support encrypted or compressed files or rich media such as images and videos.
How Permissions Work
A single file can be linked to multiple records (for example, an Opportunity, a Case, and a Contact) via the ContentDocumentLink object. Glean computes the union of permissions from all linked records for that file.- Permissions from indexed records: If a file is linked to multiple records that are indexed by Glean, a user can access the file if they have permission to view any one of those linked records.
- Permissions from non-indexed records: If a file is linked to a record that Glean does not index (e.g., a custom object not in the crawl list), the permissions from that specific link will not be applied.
Setup
To enable Salesforce document attachments, follow these steps in the Glean Admin Console.- In Glean Admin, navigate to your Salesforce data source and open the Setup tab.
- In the list of Salesforce objects to be crawled, add ContentDocument.
- Note: Do not add ContentDocumentLink to the list; it is used internally by Glean for permission resolution. The Attachment object is a legacy object and is not recommended for new setups.
- If your organization uses a sandbox or a non-standard login endpoint, set the optional custom login domain.
- Click Authorize and complete the OAuth flow using a Salesforce admin account that has the Query All Files permission.
- Click Save to save the changes. The connector will begin to crawl the ContentDocument objects.

FAQs
Why don’t I see ContentDocument in the object picker?
Why don’t I see ContentDocument in the object picker?
Ensure the service account used for authorization has access to the object in Salesforce.
I see files but not their text content. Why??
I see files but not their text content. Why??
Confirm that the file is 64 MB or smaller and is not an unsupported file type (e.g., encrypted or compressed).
Do files linked from custom objects work??
Do files linked from custom objects work??
Yes. Files are indexed via ContentDocument and permissioned via ContentDocumentLink, regardless of whether the custom object is indexed. Permissions are only applied from records that Glean is configured to crawl.
What if I have transcripts in JSON format?
What if I have transcripts in JSON format?
You can enable support for JSON files by adding json to the allowed file extensions in the Glean Admin settings. JSON content will be indexed as plain text without schema-aware parsing and are not enabled by default.