Skip to main content

Service restoration access

Glean provides a secure, audited, and time-bound service restoration process. This mechanism allows on-call engineers to restore service during rare, high-severity incidents when normal administrative paths are unavailable. The process is purpose-built to maximize safety and restore business continuity without granting persistent access or exposing customer data.

Infrastructure and security

To facilitate this, an isolated backend service automatically executes pre-approved infrastructure commands using least-privilege service account credentials.

  • Restricted access: Engineers submit commands through an internal portal accessible only to approved production personnel.
  • Mandatory approval: Every command requires peer review and approval before execution.
  • Zero direct access: Employees are never granted direct permission to execute commands via this service account.
  • Framework isolation: This process operates independently from Glean’s Debug Operation request handling framework.

Key advantages

Business continuity: Ensures incident mitigation during rare outages, even if standard paths—such as vault keys or admin controls—are impaired.
Security by design: Enforces security through zero persistent entitlements, a narrow scope of action, and strict traceability.
Operational accountability: Provides transparent auditing for every request, approval, and command.

Architectural and security specifications

  • Isolated execution environment: A lightweight, fully isolated service that runs within the same Kubernetes cluster but operates in a separate namespace from the main application. This separation reduces the attack surface and maintains operational independence.
  • Time-bound, least-privilege credentials: The system generates temporary credentials scoped exclusively to infrastructure commands (e.g., kubectl, gcloud, aws, az). These credentials explicitly deny access to customer data stores.
  • Zero standing privileges: No personnel at Glean maintain persistent service restoration access. The system creates access just-in-time, limits usage to a minimum duration, and ensures automatic expiration.
  • Auditable approval workflow: Requests require a clear justification and approval by an authorized second party (the "two-person rule"). The system prohibits self-approval.
  • Immutable auditing: The system logs every action—including requester, approver, execution details, timestamps, and justification—to a tamper-evident cloud-provider audit log. These logs are available to customers upon request.

Scope and restrictions

AreaNot included
Data accessProhibits direct access to customer data stores.
Persistent accessNo long‑lived credentials, backdoors, or standing entitlements.
Routine opsExcludes day-to-day maintenance and convenience tasks.
One‑person controlPrevents any single actor from both requesting and approving access.

How it works

  • Request: A production on‑call engineer submits a service restoration request via Glean’s internal admin UI, including a justification and an intended timeout.

  • Approval: A separate authorized production engineer or manager (Glean employee) reviews and approves or rejects the request. Approval is required before any command can be executed.

  • Execution in isolation: Upon receiving an approved command payload, the isolated service automatically executes the vetted infrastructure command least-privilege service account credentials. Access to customer data is explicitly denied by policy.

  • Audit everywhere: The system logs the unique request identifier, requester, approver, timestamps, commands, and outcomes in an immutable audit store.

Customer safeguards by design

  • Strong separation of duties with the two‑person rule.

  • Explicitly deny data store access for querying or viewing customer data.

  • Short‑lived execution credentials that automatically expire.

  • Full transparency via immutable, time‑sequenced audit logs.

  • Principle of least privilege applied to all service restoration access actions.

  • No shared infrastructure between the service restoration access service and the application runtime to minimize blast radius.

Permissions granted to the service

The purpose of the following roles are for troubleshooting and administration beyond what on-callers already have out of the box.

Cloud service account “service-restoration-gke” with roles:

  • Kubernetes Engine Developer
  • Dataflow Developer
  • Custom Cloud Build Developer
  • App Engine Service Admin
  • Cloud SQL Admin
  • Service Usage Admin
  • Cloud Run Admin
  • Cloud Scheduler Admin
  • Cloud Functions Admin
  • Cloud Tasks Admin
  • Redis Admin
  • IAM Role Viewer
  • Service Account Viewer
  • Compute Viewer
  • Compute Network User
  • Compute Load Balancer Viewer
  • DNS Reader

Kubernetes service account “service-restoration-gke” on just glean-cluster with role:

  • Kubernetes Cluster Admin

Blocked operations

The service restoration blocks all commands that could expose customer data:

Database access

  • Direct database connections (mysql, psql, redis-cli)
  • Database dumps and backups (mysqldump, pg_dump)
  • Data export/import operations

Object storage

  • Google Cloud Storage (gsutil, gcloud storage)
  • AWS S3 (aws s3, s3api, s3control)
  • Azure Blob Storage (az storage)
  • Third-party storage tools (rclone, MinIO)

File operations

  • Reading sensitive file types (.sql, .db, .json, .csv, .parquet, .log)
  • File transfers (ssh, scp, rsync)

Secrets & credentials

  • Viewing secrets (kubectl get secret, gcloud secrets, aws secretsmanager, az keyvault)

Command chaining

  • Shell operators (pipes, semicolons, backgrounding) blocked to prevent bypass attempts

This capability is part of Glean’s standard reliability and safety controls to ensure we can restore service under adverse conditions. You can learn more about Glean’s access to customer deployments in the Trust Center.

Frequently asked questions