Skip to main content

GCP deployment guide

banner

Overview

Glean provides our customers the ability to deploy Glean software inside their own Google Cloud Platform (GCP) project. This deployment requires your GCP admin to:

  1. Create a new GCP project.
  2. Associate a valid billing account.
  3. Enable applicable GCP APIs.
  4. Request the required quota increases from GCP.
  5. Create a Service Account with Project Owner role and associate a JSON account key.
  6. Notify Glean of the GCP zone selected, the Project Name, Project ID, Project Number, and the service account JSON key.

After completing the above, Glean's systems will automatically build and deploy the required compute, workflows, and software into your GCP project.

At this stage, Glean will advise you that your tenant is ready; allowing your admins to proceed with the setup process in our Getting Started guide.

This document will cover the steps required by your GCP admins to prepare a GCP project that is ready for your Glean build.


1. Select a GCP Region

You must first select a supported GCP region for Glean to build your environment in.

You must notify Glean of the GCP zone selected, e.g. asia-northeast1-a

warning

The region selected cannot be changed once your tenant has been built. Changing region will require a complete rebuild of your tenant.

2. Create the GCP Project

  1. Go to the Manage resources page in the GCP console and click Create Project.

  2. In the New Project window that appears, add a project name, organization, and location.

    • For the project name, the preferred format is glean-{customer name} or glean-{customer name}-{prod/sandbox}
    • E.g. glean-company or glean-company-prod
  3. Make sure that your project is created under the same organization as your Google Workplace account, and not "No Organization".

warning

Glean is not able to proceed with the build if the project is created under "No Organization". If you are unsure of how to resolve this, please contact your GCP account team or GCP support.

  1. Save the Project ID (which is directly below the Project name) and Project Number.

  2. Click Create.

  3. Notify Glean of the following information:

    a. Project name, eg glean-company → This was set in Step 2 above.

    b. Project ID, eg glean-company → This was saved in Step 4 above.

    c. Project number, eg 715000000000 → This was saved in Step 4 above.

    d. Region and Zone where you want to deploy Glean, e.g. us-central1-a

3. Configure Billing

  1. Go to Billing in the GCP console.

  2. Click Link a billing account to set up billing for this project.

warning

Ensure that the billing account has a corporate credit card attached to it. Using the "free trial billing tier" will not work.

4. Enable APIs

Glean requires that the following GCP APIs are enabled for the deployment to succeed. Substitute your Project ID to the end of the URL for each API below to enable the API on the project.

Required APIs

These APIs are required for every Glean deployment.

APIDescriptionURL
Artifact Registry API
(artifactregistry.googleapis.com)
Container image and artifact storage for deployment imageshttps://console.cloud.google.com/apis/api/artifactregistry.googleapis.com/overview?project=[PROJECT_ID]
BigQuery API
(bigquery.googleapis.com)
Log export datasets and analyticshttps://console.cloud.google.com/apis/api/bigquery.googleapis.com/overview?project=[PROJECT_ID]
Binary Authorization API
(binaryauthorization.googleapis.com)
Enforces container image signing policies on GKE and Cloud Runhttps://console.cloud.google.com/apis/api/binaryauthorization.googleapis.com/overview?project=[PROJECT_ID]
Cloud Billing API
(cloudbilling.googleapis.com)
Billing account management for the projecthttps://console.cloud.google.com/apis/api/cloudbilling.googleapis.com/overview?project=[PROJECT_ID]
Cloud Billing Budget API
(billingbudgets.googleapis.com)
Monthly cost alerts and budget notificationshttps://console.cloud.google.com/apis/api/billingbudgets.googleapis.com/overview?project=[PROJECT_ID]
Cloud Build API
(cloudbuild.googleapis.com)
Build worker pools for deployment pipelineshttps://console.cloud.google.com/apis/api/cloudbuild.googleapis.com/overview?project=[PROJECT_ID]
Cloud Data Loss Prevention (DLP) API
(dlp.googleapis.com)
Content inspection for sensitive data detectionhttps://console.cloud.google.com/apis/api/dlp.googleapis.com/overview?project=[PROJECT_ID]
Cloud Dataproc API
(dataproc.googleapis.com)
Spark jobs for data processing pipelineshttps://console.cloud.google.com/apis/api/dataproc.googleapis.com/overview?project=[PROJECT_ID]
Cloud Error Reporting API
(clouderrorreporting.googleapis.com)
Error aggregation and reportinghttps://console.cloud.google.com/apis/api/clouderrorreporting.googleapis.com/overview?project=[PROJECT_ID]
Cloud Functions API
(cloudfunctions.googleapis.com)
Serverless functions for deploy build triggershttps://console.cloud.google.com/apis/api/cloudfunctions.googleapis.com/overview?project=[PROJECT_ID]
Cloud Key Management Service (KMS) API
(cloudkms.googleapis.com)
Encryption key management and CMEKhttps://console.cloud.google.com/apis/api/cloudkms.googleapis.com/overview?project=[PROJECT_ID]
Cloud Logging API
(logging.googleapis.com)
Operational and audit logginghttps://console.cloud.google.com/apis/api/logging.googleapis.com/overview?project=[PROJECT_ID]
Cloud Memorystore for Memcached API
(memcache.googleapis.com)
Managed Memcached instances for cachinghttps://console.cloud.google.com/apis/api/memcache.googleapis.com/overview?project=[PROJECT_ID]
Cloud Monitoring API
(monitoring.googleapis.com)
Metrics, alerts, and notification channelshttps://console.cloud.google.com/apis/api/monitoring.googleapis.com/overview?project=[PROJECT_ID]
Cloud Profiler API
(cloudprofiler.googleapis.com)
Application performance profilinghttps://console.cloud.google.com/apis/api/cloudprofiler.googleapis.com/overview?project=[PROJECT_ID]
Cloud Pub/Sub API
(pubsub.googleapis.com)
Event messaging for alerts and ML inference pipelineshttps://console.cloud.google.com/apis/api/pubsub.googleapis.com/overview?project=[PROJECT_ID]
Cloud Quotas API
(cloudquotas.googleapis.com)
Quota management and increase requestshttps://console.cloud.google.com/apis/api/cloudquotas.googleapis.com/overview?project=[PROJECT_ID]
Cloud Resource Manager API
(cloudresourcemanager.googleapis.com)
Project metadata management; prerequisite for enabling all other APIshttps://console.cloud.google.com/apis/api/cloudresourcemanager.googleapis.com/overview?project=[PROJECT_ID]
Cloud Run Admin API
(run.googleapis.com)
Cloud Run services for the deploy pipelinehttps://console.cloud.google.com/apis/api/run.googleapis.com/overview?project=[PROJECT_ID]
Cloud SQL
(sql-component.googleapis.com)
Underlying Cloud SQL component APIhttps://console.cloud.google.com/apis/api/sql-component.googleapis.com/overview?project=[PROJECT_ID]
Cloud SQL Admin API
(sqladmin.googleapis.com)
Cloud SQL instance managementhttps://console.cloud.google.com/apis/api/sqladmin.googleapis.com/overview?project=[PROJECT_ID]
Cloud Scheduler API
(cloudscheduler.googleapis.com)
Managed cron job schedulinghttps://console.cloud.google.com/apis/api/cloudscheduler.googleapis.com/overview?project=[PROJECT_ID]
Cloud Storage
(storage-component.googleapis.com)
GCS bucket management for secrets, data, and artifactshttps://console.cloud.google.com/apis/api/storage-component.googleapis.com/overview?project=[PROJECT_ID]
Cloud Tasks API
(cloudtasks.googleapis.com)
Async task queue processinghttps://console.cloud.google.com/apis/api/cloudtasks.googleapis.com/overview?project=[PROJECT_ID]
Cloud Trace API
(cloudtrace.googleapis.com)
Distributed request tracinghttps://console.cloud.google.com/apis/api/cloudtrace.googleapis.com/overview?project=[PROJECT_ID]
Compute Engine API
(compute.googleapis.com)
VMs, networking, VPC, firewall rules, and load balancershttps://console.cloud.google.com/apis/api/compute.googleapis.com/overview?project=[PROJECT_ID]
Container Analysis API
(containeranalysis.googleapis.com)
Container vulnerability scanning; used by Binary Authorizationhttps://console.cloud.google.com/apis/api/containeranalysis.googleapis.com/overview?project=[PROJECT_ID]
Container Security API
(containersecurity.googleapis.com)
Security scanning for GKE containershttps://console.cloud.google.com/apis/api/containersecurity.googleapis.com/overview?project=[PROJECT_ID]
Dataflow API
(dataflow.googleapis.com)
Apache Beam data processing pipelineshttps://console.cloud.google.com/apis/api/dataflow.googleapis.com/overview?project=[PROJECT_ID]
Database Migration API
(datamigration.googleapis.com)
Cloud SQL instance version upgradeshttps://console.cloud.google.com/apis/api/datamigration.googleapis.com/overview?project=[PROJECT_ID]
GKE Hub API
(gkehub.googleapis.com)
GKE fleet managementhttps://console.cloud.google.com/apis/api/gkehub.googleapis.com/overview?project=[PROJECT_ID]
Generative Language API
(generativelanguage.googleapis.com)
Google Gemini API for LLM inferencehttps://console.cloud.google.com/apis/api/generativelanguage.googleapis.com/overview?project=[PROJECT_ID]
IAM Service Account Credentials API
(iamcredentials.googleapis.com)
Short-lived credential generation for service-to-service authhttps://console.cloud.google.com/apis/api/iamcredentials.googleapis.com/overview?project=[PROJECT_ID]
Identity and Access Management (IAM) API
(iam.googleapis.com)
Service account and role managementhttps://console.cloud.google.com/apis/api/iam.googleapis.com/overview?project=[PROJECT_ID]
Identity-Aware Proxy (IAP) API
(iap.googleapis.com)
Secure tunnel access for bastion VMshttps://console.cloud.google.com/apis/api/iap.googleapis.com/overview?project=[PROJECT_ID]
Kubernetes Engine API
(container.googleapis.com)
GKE cluster and node pool managementhttps://console.cloud.google.com/apis/api/container.googleapis.com/overview?project=[PROJECT_ID]
Organization Policy API
(orgpolicy.googleapis.com)
Organization policy compliance checkinghttps://console.cloud.google.com/apis/api/orgpolicy.googleapis.com/overview?project=[PROJECT_ID]
Secret Manager API
(secretmanager.googleapis.com)
Secure storage for VPN secrets and API keyshttps://console.cloud.google.com/apis/api/secretmanager.googleapis.com/overview?project=[PROJECT_ID]
Service Health API
(servicehealth.googleapis.com)
GCP service health monitoringhttps://console.cloud.google.com/apis/api/servicehealth.googleapis.com/overview?project=[PROJECT_ID]
Service Networking API
(servicenetworking.googleapis.com)
VPC peering for Cloud SQL private networkinghttps://console.cloud.google.com/apis/api/servicenetworking.googleapis.com/overview?project=[PROJECT_ID]
Service Usage API
(serviceusage.googleapis.com)
API enablement management; prerequisite for enabling all other APIshttps://console.cloud.google.com/apis/api/serviceusage.googleapis.com/overview?project=[PROJECT_ID]
Serverless VPC Access API
(vpcaccess.googleapis.com)
VPC connectors for Cloud Functions and Cloud Runhttps://console.cloud.google.com/apis/api/vpcaccess.googleapis.com/overview?project=[PROJECT_ID]
Stackdriver API
(stackdriver.googleapis.com)
Legacy monitoring and metricshttps://console.cloud.google.com/apis/api/stackdriver.googleapis.com/overview?project=[PROJECT_ID]
Storage Transfer API
(storagetransfer.googleapis.com)
Cross-bucket and cross-cloud data transfershttps://console.cloud.google.com/apis/api/storagetransfer.googleapis.com/overview?project=[PROJECT_ID]
Vertex AI API
(aiplatform.googleapis.com)
ML model hosting and LLM inferencehttps://console.cloud.google.com/apis/api/aiplatform.googleapis.com/overview?project=[PROJECT_ID]
Workflows API
(workflows.googleapis.com)
Pipeline orchestration and sequencinghttps://console.cloud.google.com/apis/api/workflows.googleapis.com/overview?project=[PROJECT_ID]

Optional APIs

These APIs are required only if you use specific Glean features or connectors.

APIDescriptionURL
Admin SDK API
(admin.googleapis.com)
Required for the Google Workspace connector (user and group directory sync)https://console.cloud.google.com/apis/api/admin.googleapis.com/overview?project=[PROJECT_ID]
Cloud Speech-to-Text API
(speech.googleapis.com)
Required for speech recognition featureshttps://console.cloud.google.com/apis/api/speech.googleapis.com/overview?project=[PROJECT_ID]
Cloud Vision API
(vision.googleapis.com)
Required for OCR (optical character recognition) of images in documentshttps://console.cloud.google.com/apis/api/vision.googleapis.com/overview?project=[PROJECT_ID]
Custom Search API
(customsearch.googleapis.com)
Required for web search featureshttps://console.cloud.google.com/apis/api/customsearch.googleapis.com/overview?project=[PROJECT_ID]
Drive Labels API
(drivelabels.googleapis.com)
Required for the Google Drive connector (Drive Labels support)https://console.cloud.google.com/apis/api/drivelabels.googleapis.com/overview?project=[PROJECT_ID]
Gmail API
(gmail.googleapis.com)
Required for the Gmail connectorhttps://console.cloud.google.com/apis/api/gmail.googleapis.com/overview?project=[PROJECT_ID]
Google Chat API
(chat.googleapis.com)
Required for the Google Chat connector. See GCHAT-3 if you encounter issues.https://console.cloud.google.com/apis/api/chat.googleapis.com/overview?project=[PROJECT_ID]
Google Docs API
(docs.googleapis.com)
Required for the Google Docs connectorhttps://console.cloud.google.com/apis/api/docs.googleapis.com/overview?project=[PROJECT_ID]
Google Drive API
(drive.googleapis.com)
Required for the Google Drive connectorhttps://console.cloud.google.com/apis/api/drive.googleapis.com/overview?project=[PROJECT_ID]
Google Sheets API
(sheets.googleapis.com)
Required for the Google Sheets connectorhttps://console.cloud.google.com/apis/api/sheets.googleapis.com/overview?project=[PROJECT_ID]
Web Risk API
(webrisk.googleapis.com)
Required for URL safety checking against Google's threat listshttps://console.cloud.google.com/apis/api/webrisk.googleapis.com/overview?project=[PROJECT_ID]

5. Request Quota Changes

Search for [Quotas] in the search box of the GCP Console and navigate to All Quotas, under IAM & Admin.

For each of the quotas in the table below, request a quota change by completing the following:

  1. Click on the required quota.
  2. Select Edit Quotas
  3. Enter the value specified by Glean for the quota.
  4. Click Submit Request.
note

Please note that some quota requests will require filing a ticket with GCP support. Response time is typically within 2 days.

warning

You must ensure that the region/location specified in your quota request(s) match the GCP Region and Zone that you wish to deploy in. For more information, see Supported GCP Regions.

Quota TypeServiceMetricLocationNew ValueJustification
All QuotasCompute Engine APICPUsus-central1 (or primary deployment region)110The Glean search system deploys 20+ instances of crawler services on nodes, as well as multiple nodes of the Elastic index service in Kubernetes cluster and so this quota is needed. Without this quota the system cannot be deployed to the project. We generally run on less than 50% of this quota, and go beyond 50% during Elastic index rolling deployments.
All QuotasCompute Engine APIN2 CPUsus-central1 (or primary deployment region)110The Glean search system deploys 20+ instances of crawler services on nodes, as well as multiple nodes of the Elastic index service in Kubernetes cluster and so this quota is needed. Without this quota the system cannot be deployed to the project. We generally run on less than 50% of this quota, and go beyond 50% during Elastic index rolling deployments.
All QuotasCompute Engine APIN2D CPUsus-central1 (or primary deployment region)110The Glean search system deploys 20+ instances of crawler services on nodes, as well as multiple nodes of the Elastic index service in Kubernetes cluster and so this quota is needed. Without this quota the system cannot be deployed to the project. We generally run on less than 50% of this quota, and go beyond 50% during Elastic index rolling deployments.
All QuotasCompute Engine APIT2D CPUsus-central1 (or primary deployment region)128The Glean search system runs batch Dataflow pipelines to generate training data, compute statistics, and perform model inference. Without this quota, these pipelines cannot efficiently run.
All QuotasCompute Engine APIVM Instancesus-central1 (or primary deployment region)240The Glean search system deploys Dataflow jobs and Kubernetes cluster, which create VM instances when jobs are launched.
All QuotasCompute Engine APINVIDIA T4 GPUsus-central1 (or primary deployment region)4The Glean search system runs batch Dataflow pipelines to generate training data, compute statistics, and perform model inference. Without this quota, these pipelines cannot efficiently run.
All QuotasVertex AI APICustom model training TPU V2 Coresus-central1 (or primary deployment region)8The Glean search system trains a custom AI language model on the corpus, enabling features such as semantic search, synonyms, and more. We use these TPU accelerators to power the training.
All QuotasVertex AI APICustom model training Nvidia V100 GPUs per regionus-central1 (or primary deployment region)8The Glean search system trains a custom AI language model on the corpus, enabling features such as semantic search, synonyms, and more. When there are no TPUs available, we use these GPU accelerators to power the training.
All QuotasVertex AI APICustom model training Nvidia T4 GPUs per regionus-central1 (or primary deployment region)4The Glean search system trains a custom AI language model on the corpus, enabling features such as semantic search, synonyms, and more. When there are no TPUs or V100's available, we use these GPU accelerators to power the training.
All QuotasCompute Engine APIPersistent Disk Standardus-central1 (or primary deployment region)10TBThe Glean search system stores millions of enterprise documents in Cloud SQL and in a search index with persistent storage. Due to the number and size of documents stored we need the quota to be increased.
All QuotasCompute Engine APIIn-use IP addressesus-central1 (or primary deployment region)20The Glean search system deploys 20-25 flex instances of crawler services on Kubernetes Engine, and each flex instance requires its own IP address.

6. Create a Service Account

The service account is used to allow Glean's systems to access the project and perform the build. You will create the service account and provide Glean with the private JSON key required to use it.

  1. Go to the Service Accounts page in the GCP console and click Select a Project.

  2. Click Create Service Account. Enter the service account name (glean-admin), ID, and description (optional), then click Create.

  3. Click the Select a role dropdown to make your service account an Owner of the project. Click Continue.

  4. Ignore the Grant users access to this service account option. It is not required.

  5. Click Create Key. In the panel that appears, select the key type JSON, then Create. This will save a private JSON key to your computer.

7. Upload the Service Account Key to the Glean Admin UI

  1. If you haven't already, follow the instructions from the Access the Admin UI section of the Getting Started guide.

  2. On the page titled Create a Google Cloud Platform project, click the box under Step 2 to upload the private JSON key to Glean.

  3. Click Save. Glean will now use the JSON key to validate that all the steps above have been performed correctly.

Success

If the save is successful, your Glean tenant is ready to be built. Contact your Glean account team to proceed.

warning

If the save fails, you will be presented with a red error message detailing the issues to correct. The key must be saved correctly before the build of your Glean tenant can proceed.

Troubleshooting

For Error Codes and troubleshooting steps, please see the Troubleshooting section.

FAQ