Review the AWS Account Access and Deployment Model. This document describes the proposed access model between Glean and a customer’s account as well as how Glean will deploy software within an account.
glean-deployer: This role is used by Glean’s central infrastructure to perform deployment operations such as software setup and upgrades. It has permissions to invoke specific lambdas, manage load balancers and SSL certificates, and manipulate secrets in AWS Secrets Manager.
glean-viewer: This role is used by Glean engineers for viewing resources during debugging. It provides read access to a subset of account resources, including EC2 instances, load balancers, CloudWatch metrics, VPC components, Lambda functions, and EKS clusters.
cron-helper-invoker: This role is assumed by a Glean Central project service account to orchestrate workflows like machine learning jobs by invoking the cron_helper lambda function.
Glean does not support customer-managed keys (CMK). Please verify that CMK is not enabled. Note: CMK can block setup and deployment
Create a new & emptyAWS Account for Glean in the region of interest.
Important: Glean has provided a read-only script to determine & check if you have SCPs (Service Control Policies) that can impact the Glean deployment. Please run the Glean Service Control Policies (SCP) Checker script. Please refer to the documentation here for details.
Notify Glean if you have any issues or conflicts after the script has run.
Notify the Glean when the initial account setup is complete, execute the SCP Checker script, and are provide the following details:
AWS Account ID (e.g., #182306642168)
AWS Account Name (e.g., aws-glean-customer)
AWS Region (e.g., us-west-1)
Your organization’s email domain(s) (e.g. @acme.com)
Your SaaS Admin email(s) addresses (e.g. johnsmith@acme.com, Admin of Slack)
After setup, the admin(s) will receive a “magic link” to access the Glean platform/interface to aid in setup.
Note: After the following information has been provided to the Glean team, please stand-by for further instructions.
Once instructed by Glean, prepare to run the AWS CloudFormation Template (CFT) by following the steps outlined:
Login to the new AWS Account as an Administrator.
Navigate the console to your preferred (Glean-approved) region.
Navigate to the CloudFormation service > Create a new Stack > Upload the Template via the link below… > You can skip all other fields > When you are ready, click Submit to deploy the stack.
Note: After executing the AWS CloudFormation template in Step 5, the deployment process will take over an hour to complete. Once finished, an “External ID” will be generated. Glean will be notified of the External ID and will, in turn, inform your team, as this information is required for Step 6.
Please create the Glean Admin role via a different CloudFormation Template with your external ID. The Glean Team will provide an external ID to be used as a parameter when running the CloudFormation template:
STOP: Please wait for further instructions from the Glean team.Important Notes:
The Glean Admin role will only be used in urgent and severe situations where Glean’s “on-call support engineers” need short-lived admin-level access to the root cause and address any issues. Access will be short-lived in all events and requires direct approval from the Glean leadership team.
If you utilize AWS Config, please review the cost reduction documentation, as the default AWS Config settings can lead to unexpectedly high costs. Glean provides recommended settings for AWS Config that maintain its value while keeping costs low. If you’re not actively using AWS Config, you can disable it completely to reduce costs.
What are additional processes within the AWS environment that Glean should be aware of?
What is the preferred method of connecting on-prem data sources to the Glean VPC? At Glean we prefer using Site-to-Site VPN but support Transit Gateway Peering, Shared Transit Gateway, and PrivateLink.
Generally yes. We now support any region that offers all of our required services.Certain regions have limitations. For example, as of early April 2024, VPC endpoints for specific managed services may be available only in us-east-1 and us-west-2. You can look through https://www.aws-services.info/ to see which services are supported in your desired region.We currently do not support GovCloud regions yet. We have no immediate time frame on when we will support this.
Why does Glean request the customer to create an admin role?
There are situations where the Glean on-call engineer needs admin-level access to remediate or mitigate escalations. They must get approval from Glean leadership to access the Glean side internal admin service account, which can then be used for federated access to the AWS-side IAM admin role.
Do you apply data protection filters on CloudWatch logs?
Currently, we do not apply this masking to our logs. We’re discussing internally if it makes sense to apply, but we are wary of rendering logs unusable in important support and debugging situations.
Can we attach custom security groups to one of the managed services?
Please provide the details to our support team who can further discuss this.Does Glean provide any Intrusion Detection capabilities on AWS?
Glean recommends customers to leverage AWS GuardDuty for IDS capabilities on AWS. See this doc for more information.
Glean will set up and deploy all infrastructure, including VPC components, within an empty AWS account the customer owns, so there is nothing that the customer needs to do proactively with respect to networking.
What OS are the EC2 instances running on and where do the AMIs come from?
Generally Amazon Linux 2 on EKS nodes. We use the default AWS-provided AMIs here.For some standalone EC2 instances, we run a Glean AMI image built on top of Ubuntu 20.04 LTS (Focal).Use of custom AMIs is not currently supported.Will Glean patch the OS, or is that the customer’s responsibility?
Glean will handle the patching and maintenance of all compute instances.
Can you give me an estimate of the cost of the AWS resources? Can you give me an estimate of (1) how much data is transferred out of the AWS account per day (2) number of instances and their sizes across all services (e.g. EC2, RDS, EKS, S3, SageMaker)?
All of this can vary depending on the characteristics of your Glean deployment. To answer this question, please reach out to your Glean contact with the following information:
Number of employees in your organization
Number of documents in your corpus
The data sources to be connected, and ideally the number of docs per data source
While these are some high-level factors, many more nuances go into figuring out how much data needs to be stored and processed. We can provide some estimates based on comparable deployments.
Our SageMaker training jobs require ml.g4dn.* instance types (primarily ml.g4dn.xlarge). We run about 1-4 training jobs a day, with varying runtimes from 30 minutes to a few hours.However, none of the instances we explicitly create, e.g. on the EKS cluster, require GPUs.