Key features
- Glean captures Confluence pages (including their hierarchical parent/child structure), blog posts, metadata attachments, comments, and more.
- Glean respects all user access permissions, ensuring users only see search results for documents they can access. When a user clicks on a search result, they are taken to the Confluence web application, which enforces the permission.
- The connector ensures comprehensive data coverage, including metadata, identity data, permissions data, and activity data. It provides real-time synchronization, reflecting updates and permission changes immediately in search results.
- All data is stored in the cloud project within the customer’s cloud account (Glean or customer hosted), ensuring no data leaves the customer’s environment.
- Glean uses Atlassian’s standard REST API for Confluence to ingest all data.
Versions supported
- The Confluence cloud connector has no specific version limitations, which is Atlassian’s SaaS offering of Confluence in the cloud.
- Glean also supports Confluence datacenter edition, a customer-managed deployment (not SaaS). For information on the Confluence Data Center (Confluence On-Prem) connector, see the Confluence Data Center Connector.
Indexed content and data
The Glean Confluence connector crawls three distinct types of data—Content, Identity, and Activity—to ensure a fast, comprehensive, and securely managed index.Content
Glean crawls the following core Confluence content entities and their associated metadata:- Pages: All Confluence pages, including their hierarchical structure and content.
- Blog Posts: All blog posts created within Confluence.
- Comments: Comments from both pages and blog posts (footer comments only, not inline).
- Spaces: The structural container for content.
- Folders: All folders created within Confluence including their hierarchical structure.
- Attachments: Metadata about files attached to pages and blog posts.
- Restricted pages: Supported through additional user configuration (see Setup section).
Activity data and webhooks
Glean crawls and indexes user interactions with content to keep the index current and provide personalized search results.| Activity Type | Description |
|---|---|
| Adds/Updates/Deletions | Tracks new content (Spaces, Pages, Blogs) and modifications or deletions of existing content. |
| Permissions Changes | Records all changes to content sharing permissions. |
| View Activity | Events indicating when any piece of content (Page, Blog, etc.) has been viewed by a user. |
Identity data
Glean crawls the following identity information to map permissions and group access correctly:- Users: Information about all users.
- Groups: Details about groups within the domain.
- Memberships: Information about group memberships (which users belong to which groups).
Limitations
The Confluence connector for Glean has the following known limitations in its crawling process:- The Glean app can read all unrestricted pages in the Confluence spaces. However, Glean can only read restricted pages if the admin grants access to the app for them.
- Glean only indexes page (footer) comments and not inline comments.
- In Confluence Cloud and Confluence Server, blogposts do not have a hierarchical structure and will perform a normal list-all-content-ids REST API call. Additionally, Glean does not support databases, whiteboards, smart links, and other custom content.
- Archived pages are not crawled.
Prerequisites
- The user setting up this data source must have administrator permissions.
- For Confluence Cloud, the Atlassian admin needs to install Glean’s Forge App on the instance. The Admin scope is required to fetch permissions associated with Confluence objects, which is necessary for correctly enforcing permissions in the search experience.
- Glean requires authentication to the Atlassian instance to fetch relevant information from Confluence.
-
The connector requires the
write:space.permission:confluencescope during the initial installation. This permission allows the Glean app to automatically be added to all Confluence spaces. If your organization cannot grant this write scope, an administrator must manually add the Glean crawler to every Confluence space. If you choose the manual route, contact Glean Support to enable the necessary configuration flag in your environment.
Setup instructions
Perform the following steps to connect your Confluence Cloud to Glean:- In Glean Admin console > Data source > select Confluence (Cloud).
- Sign in to Confluence as an administrator.
-
Copy your Atlassian domain from the browser URL bar (e.g.,
YourAtlassianDomain.atlassian.net) and paste it into the corresponding field in Glean. -
Go to
https://admin.atlassian.comand select your organization. -
In the row for Confluence, click the three dots (
...) and select Manage product access. -
Identify the default groups with access and enter them into Glean as a comma-separated list. (Only users in these groups will see results in Glean.)

- Click Create Forge Crawler App in Glean. This should create an installation link for the Glean crawler app.
- Click on Get app and install the app in the correct Confluence instance.
- After the app installation is successful, click Save in Glean.
(Optional) Configuring Glean Search for Confluence to crawl restricted pages
The Glean connector for Confluence by default, is configured to access all Confluence spaces and pages except Restricted Pages. Atlassian Admins cannot view restricted pages unless the admin user is given explicit access. Confluence Restricted Pages can be important for users who request to be included in search results. Glean has built the capability to crawl, and index restricted pages in a permissions-enforced way. It involves providing the “Glean Crawler” app view access to the pages to be indexed. The following is an overview of the procedure:-
Users must:
- Have edit access to a set of Restricted Pages in Confluence
- Have “Add/Delete Restrictions” permissions for the space
- Create an API token through the Atlassian settings workflow
- Upload that API token into their Glean application settings to store the token securely
- Glean will securely read the token and add the “Glean Search Crawler for Confluence” application to the restricted pages where the app has edit access as view-only
- Glean will be able to crawl and index the pages with the “Glean Search Crawler for Confluence”
- Users with edit or view access to the Restricted Pages can view those pages in Glean search results.
- Multiple users can upload their API tokens. For each such user, Glean will add the Glean Search Crawler For Confluence app to the view restrictions of restricted pages that the user can edit.
Create Confluence API token
For any of the users who have edit access to their Restricted Pages and need to have those pages crawled and indexed into Glean, they must do the following:- Login to your account in Atlassian.
- Go to https://id.atlassian.com/manage-profile/security/api-tokens.
- Select Create API Token.
- Enter a name for the token, for example “Glean Search Crawler”.
- Enter the token in the Glean UI by clicking your profile picture (bottom left corner) > Your settings > Data sources > Confluence Cloud.
Rate limits
Queries per Second (QPS): The default rate limit is set to 2 queries per second per user.Update Frequency
Content updates for the Confluence connector in Glean can happen quite rapidly, depending on the type of update and the configuration settings. Here are the key areas:- People / identity crawl: Changes to group memberships are picked up by the identity crawl, which runs every 3 hours. This ensures that updates to user groups and their permissions are reflected promptly.
- Incremental crawl: These occur every 8 hour to provide additional reliability beyond the minute-by-minute activity reports.
- Full crawl: The frequency of full crawls can be configured, but they are generally less frequent than incremental crawls at 30 days
How the crawl works
The crawler follows the traditional crawler strategy, including utilizing the API and the following ways to get and update data:- Identity Crawl: updating and adding of People data, including users, groups, and other information
- Webhooks: are messages sent by the application to notify Glean of changes in real-time, and then Glean either initiates a crawl or picks up the change on the next crawl.
- Content Crawls: Full crawls the entire defined scope of the application whereas incremental crawls only capture the changes from the previous full or incremental crawl.
API Endpoints
| Purpose | Cloud Endpoint | Cloud Method | Cloud Permission/Scope | DC Endpoint | DC Method | DC Permission | |
|---|---|---|---|---|---|---|---|
| List users | search/user | GET | READ | Same | GET | :llmCitationRef[0] | |
| List groups | group | GET | READ | Same | GET | :llmCitationRef[1] | |
| List group members | group/member | GET | READ | group/%s/member | GET | :llmCitationRef[2] | |
| List groups of user | user/memberof | GET | READ | Same | GET | :llmCitationRef[3] | |
| Get current user | user/current | GET | READ | N/A | :llmCitationRef[4] | ||
| Get email of users | user/email/bulk | POST | ADMIN | user/non-system | GET | :llmCitationRef[5] | |
| List spaces | space | GET | SPACE_ADMIN | Same | GET | :llmCitationRef[6] | |
| CQL based list spaces | search | GET | READ | Same | GET | :llmCitationRef[7] | |
| List pages in space | space/%s/content/page | GET | READ | Same | GET | :llmCitationRef[8] | |
| List blogposts in space | space/%s/content/blogpost | GET | READ | Same | GET | :llmCitationRef[9] | |
| Get space permissions | space/%s | GET | READ | spaces/spacepermissions.action | GET | Confluence Administrator | :llmCitationRef[10] |
| List content | content | GET | READ | Same | GET | :llmCitationRef[11] | |
| Get content | content/%s | GET | READ | Same | GET | :llmCitationRef[12] | |
| CQL based list content | content/search | GET | READ | Same | GET | :llmCitationRef[13] | |
| List children of page | pages/%s/children | GET | READ | Same | GET | :llmCitationRef[14] | |
| Fetch applinks | N/A | GET | rest/applinks/1.0/listApplicationlinks | GET | Confluence Administrator | :llmCitationRef[15] | |
| Create webhook | N/A | POST | rest/api/webhooks | POST | Confluence Administrator | :llmCitationRef[16] | |
| Get content restrictions | content/%s/restriction/byOperation/read | GET | READ | Same | GET | :llmCitationRef[17] | |
| Update content restriction | content/%s/restriction | PUT | Uses API tokens provided by users | N/A | :llmCitationRef[18] | ||
| Configure plugin | N/A | POST | scio_search/1.0/configure | POST | :llmCitationRef[19] | ||
| Get installed plugin version | N/A | GET | scio_search/1.0/version | GET | :llmCitationRef[20] | ||
| Get space permissions via plugin | N/A | GET | scio_search/1.0/space_permissions | GET | :llmCitationRef[21] |
Content configuration
Note: If Inclusion (Green-Listing) options are enabled, only content from the Inclusion category will be indexed. If Exclusion (Red-Listing) options are enabled, all content in the exclusion category will be removed. If both rules are applied to the same content, then the content will NOT be indexed, as exclusion rules take priority. The rules below should be used MINIMALLY to preserve the enterprise search experience, as most end-users expect to find all content. Most customers do not apply any rules or apply exclusion rules sparingly for sensitive folders. Exclusion rules are applied automatically after the next full crawl, which can vary by corpus size. If a recrawl is needed, please reach out to your Glean representative.Exclusion (Red-listing) options
Glean provides several options for excluding content from the data crawl, which excludes data from search and chat results.- Space: Exclude certain Confluence spaces from being crawled by Glean by specifying space keys
- Pages with specific labels: Exclude pages and blog posts with specific labels from being crawled by Glean
- Pages with content matching specific regex: Exclude pages and blog posts with content matching specific regex from being crawled by Glean
- Creators: Exclude content created by certain creators from being crawled by Glean.

Inclusion (Green-listing) options
Glean provides several options for including content from the data crawl, which includes data from search and chat results.- Spaces: Only allow Glean to crawl certain Confluence spaces. Glean will crawl all spaces except those in the Exclusion rules if no spaces are specified.

Looking for the original version of this page? You can find the archived version here.