AWS FAQ
Frequently asked questions about Glean’s AWS deployment options, infrastructure requirements, and feature support
If you have questions about Glean’s AWS support that aren’t covered below, please reach out to Glean Support or your designated Glean contact.
General Information
Generally yes. We now support any region that offers all of our required services.
However, certain regions have limitations. For example, as of early April 2024, VPC endpoints for specific managed services may be available only in us-east-1 and us-west-2.
We currently do not support GovCloud regions and have no immediate timeframe for supporting this.
LLM Support
Yes. As long as the LLM is supported by Glean, you can leverage it. Glean offers the following options:
- Anthropic Claude (via Bedrock - recommended)
- Anthropic Claude (BYOK)
- GPT - BYOK Azure OpenAI
- GPT - BYOK OpenAI
- GPT - Glean Account Key (additional charges apply)
- Gemini (BYOK)
Security
Glean requires access from:
- The central Glean project which orchestrates setup and release deployments
- The Glean AWS account which hosts the images
More information is available in the following documents (Trust Portal Access Required):
- Glean AWS Account Access and Deployment Model
- Glean Architecture on AWS
There are situations where the Glean on-call engineer needs admin-level access to remediate or mitigate escalations. They must get approval from Glean leadership to access a Glean-side internal admin GCP service account which can then be used for federated access to the AWS-side IAM admin role.
Yes.
We’re using AWS WAF natively: https://aws.amazon.com/waf/
Yes, this is enabled by default for all logs except for deny requests.
Currently, we do not apply AWS masking to our logs. This would render logs unusable in important support and debugging situations.
The path for incoming webhooks is:
- Webhooks first go through the WAF (you can add rules like IP restrictions)
- Then the application load balancer
- Then the k8s cluster
The authentication scheme depends on the specific vendor’s API.
Please provide the details to our support team who can further discuss this.
Glean recommends customers leverage AWS GuardDuty for IDS capabilities on AWS.
For more information, see: AWS GuardDuty and Glean
Networking
Glean will set up and deploy all infrastructure, including VPC components, within an empty AWS account the customer owns, so there is nothing that the customer needs to do proactively with respect to networking.
Compute
Generally Amazon Linux 2 on EKS nodes. We use the default AWS-provided AMIs here.
For some standalone EC2 instances, we run a Glean AMI image built on top of Ubuntu 20.04 LTS (Focal).
Glean will handle the patching and maintenance of all compute instances. This is automated by our internal systems.
Cost & Resourcing
Currently we estimate that the infrastructure costs for AWS are 1.5x those of GCP. Glean will continue to iterate on reducing the cost on both platforms.
Glean will handle dynamically sizing all of the infrastructure based on many different factors relevant to the customer-specific corpus.
These details can vary depending on the characteristics of your Glean deployment. To receive accurate estimates, please provide your Glean contact with:
- Number of employees in your organization
- Number of documents in your corpus
- The data sources to be connected, and ideally the number of docs per data source
While these are high-level factors, there are many more nuances that affect data storage and processing requirements. We can provide estimates based on comparable deployments, but any numbers provided should be used as estimates only.
Our SageMaker training jobs require ml.g4dn.* instance types (primarily ml.g4dn.xlarge). We run about 1-4 training jobs a day, with varying runtimes from 30 minutes to a few hours.
However, none of the instances we explicitly create, e.g. on the EKS cluster, require GPUs.
Most Glean-relevant traffic is ingress (incoming data). AWS generally does not charge for ingress.
Storage - RDS
We’re using AWS RDS for MySQL: https://aws.amazon.com/rds/mysql/
Once a day.
Storage - S3
No, we don’t enable Inventory.
No.
No, we don’t configure cross-region replication and in practice have not had a strong reason to.
Lambda
These lambdas are used for:
- Setup & deployment (Bootstrap configuration template)
- Maintenance operations and cron jobs, e.g. restarting or upgrading node pools
No. None of them are publicly accessible.
No, Glean doesn’t add layers to lambdas.
No, they are disabled.
Disaster Recovery
Please refer to the Glean Business Continuity & Disaster Recovery Policy (Trust Portal Access Required).
Feature Support
Yes. Vanity URLs, i.e. companyname.glean.com, are supported on AWS.
Glean on AWS has feature parity, with the exception of:
DLP / Sensitive Content Reporting: This feature is dependent on GCP’s DLP service. While AWS does have a similar service that Glean could leverage, it functions in a fundamentally different way (requiring a full export of all data to S3) which is incompatible with our platform. We are investigating alternative options to bring this capability to our AWS customers.
OCR: OCR is an optional feature and requires a subscription uplift for use. Glean leverages GCP Cloud Vision for OCR capabilities. We are currently investigating the use of a multi-modal local LLM to provide this capability instead.
Was this page helpful?