Best practices for production

Review the following recommended practices for preparing optional security measures and setting up Gloo in a production environment.

Deployment model

A production Gloo Gateway setup consists of one management cluster that the Gloo Gateway management components are installed in, and one or more workload clusters that run gateway proxies which are registered with and managed by Gloo Gateway. The management cluster serves as the management plane, and the workload clusters serve as the data plane, as depicted in the following diagram.

By default, the management server is deployed with one replica. To increase availability, you can increase the number of replicas that you deploy in the management cluster. Additionally, you can create multiple management clusters, and deploy one or more replicas of the managent server to each cluster. For more information, see High availability and disaster recovery.

In a production deployment, you typically want to avoid installing the management plane into a workload cluster that also runs a gateway proxy and other app workloads. Although Gloo Gateway remains fully functional when the management and agent components both run within the same cluster, you might have noisy neighbor concerns in which workload pods consume cluster resources and potentially constrain the management processes. This constraint on management processes can in turn affect other workload clusters that the management components oversee. However, you can prevent resource consumption issues by using Kubernetes best practices, such as node affinity, resource requests, and resource limits. Note that you must also ensure that you use the same name for the cluster during both the management plane installation and cluster registration.

Figure of a multicluster Gloo quick-start architecture, with a dedicated management cluster.

Control plane settings

Before you install the Gloo Gateway management plane into your management cluster, review the following options to help secure your installation. Each section details the benefits of the security option, and the necessary settings to specify in a Helm values file to use during your Helm installation.

You can see all possible fields for the Helm chart by running the following command:

helm show values gloo-platform/gloo-platform --version v2.5.3 > all-values.yaml

You can also review these fields in the Helm values documentation.

Licensing

During installation, you can provide your license key strings directly in license fields such as glooGatewayLicenseKey. For a more secure setup, you might want to provide those license keys in a secret instead.

  1. Before you install Gloo Gateway, create a secret with your license keys in the gloo-mesh namespace of your management cluster.
    cat << EOF | kubectl apply -n gloo-mesh -f -
    apiVersion: v1
    kind: Secret
    type: Opaque
    metadata:
      name: license-secret
      namespace: gloo-mesh
    data:
      gloo-mesh-license-key: ""
      gloo-network-license-key: ""
      gloo-gateway-license-key: ""
      gloo-trial-license-key: ""
    EOF
    
  2. When you install the Gloo Gateway management plane in your management cluster, specify the secret name as the value for the licensing.licenseSecretName field in your Helm values file.

FIPS-compliant image

If your environment runs workloads that require federal information processing compliance, you can use images of Gloo Gateway components that are specially built to comply with NIST FIPS. Open the values.yaml file, search for the image section, and append -fips to the tag, such as in the following example.
...
glooMgmtServer:
  image:
    pullPolicy: IfNotPresent
    registry: gcr.io/gloo-mesh
    repository: gloo-mesh-mgmt-server
    tag: 2.5.3-fips

Relay connection

Gloo offers the options to secure the relay connection between the Gloo management server and agent by using simple or mutual TLS. In POC or test environments, you can use the self-signed certificates that Gloo Gateway generates for you. However, in production environments it is recommended to use your own custom TLS certificates for the Gloo management server, and optionally, the Gloo agent and to derive these certificates from a root or intermediate CA that is stored with your preferred PKI provider.

For more information about available options to secure the relay connection, see Setup options

When bringing your own certificate for the Gloo management server, you must also bring your own certificate for the Gloo telemetry pipeline.

Certificate management

If you decide to secure the connection between the Gloo management server and agent by using mutual TLS, you can use the Gloo Gateway built-in capability to automatically rotate the client TLS certificate that the agents use to prove their identity before the certificate expires. Note that this option is available only if the intermediate CA credentials are stored in a Kubernetes secret on the management cluster. For more information, see Bring your own server TLS certificate.

Other certificates, such as the relay root CA and intermediate CA certificates as well as the server TLS certificate that the Gloo management server uses to prove its identity to the Gloo agent are not automatically rotated by Gloo Gateway. Instead, you must set up your own process to monitor the expiration of these certificates and to rotate them before they expire.

For more information about the certificate lifecycle, see Certificate rotation overview.

Deployment and service overrides

In some cases, you might need to modify the default deployment of the glooMgmtServer with your own Kubernetes resources. You can specify resources and annotations for the management server deployment in the glooMgmtServer.deploymentOverrides field, and resources and annotations for the service that exposes the deployment in the glooMgmtServer.serviceOverrides field.

Most commonly, the serviceOverrides section specifies cloud provider-specific annotations that might be required for your environment. For example, the following section applies the recommended Amazon Web Services (AWS) annotations for modifying the created load balancer service.

glooMgmtServer:
  serviceOverrides:
    metadata:
      annotations:
        # AWS-specific annotations
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "10"
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "9900"
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: "tcp"

        service.beta.kubernetes.io/aws-load-balancer-type: external
        service.beta.kubernetes.io/aws-load-balancer-scheme: internal
        service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
        service.beta.kubernetes.io/aws-load-balancer-backend-protocol: TCP
        service.beta.kubernetes.io/aws-load-balancer-private-ipv4-addresses: 10.0.50.50, 10.0.64.50
        service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-0478784f04c486de5, subnet-09d0cf74c0117fcf3
        service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: deregistration_delay.connection_termination.enabled=true,deregistration_delay.timeout_seconds=1
  # Kubernetes load balancer service type
  serviceType: LoadBalancer
  ...

In less common cases, you might want to provide other resources, like a config map or service account. This example shows how you might use the deploymentOverrides to specify a config map in a volume mount.

glooMgmtServer:
  deploymentOverrides:
    spec:
      template:
        spec:
          volumeMounts:
            - name: envoy-config
              configMap:
                name: my-custom-envoy-config
  ...

UI authentication

The Gloo UI supports OpenID Connect (OIDC) authentication from common providers such as Google, Okta, and Auth0. Users that access the UI will be required to authenticate with the OIDC provider, and all requests to retrieve data from the API will be authenticated.

You can configure OIDC authentication for the UI by providing your OIDC provider details in the glooUi section, such as the following.

...
glooUi:
  enabled: true
  auth:
    enabled: true
    backend: oidc
    oidc:
      appUrl: # The URL that the UI for the OIDC app is available at, from the DNS and other ingress settings that expose the OIDC app UI service.
      clientId: # From the OIDC provider
      clientSecret: # From the OIDC provider. Stored in a secret.
      clientSecretName: dashboard
      issuerUrl: # The issuer URL from the OIDC provider, usually something like 'https://<domain>.<provider_url>/'.

Redis instance

By default, a Redis instance is deployed for certain management plane components, such as the Gloo management server and Gloo UI. For a production deployment, you can disable the default Redis deployment and provide your own backing database instead.

For more information, see Backing databases.

Redis I/O threads

A new Helm value redis.deployment.ioThreads was introduced to specify the number of I/O threads to use for the built-in Redis instance. Redis is mostly single threaded, however some operations, such as UNLINK or slow I/O accesses can be performed on side threads. Increasing the number of side threads can help improve and maximize the performance of Redis as these operations can run in parallel.

The default and minimum valid value for this setting is 1. If you plan to increase the number of I/O side threads, make sure that you also change the CPU requests and CPU limits for the Redis pod. Set the CPU requests and limits to the same number that you use for the I/O side threads plus 1. That way, you can ensure that each side thread has an available CPU core, and that an additional CPU core is left for the main Redis thread. For example, if you want to set I/O threads to 2, make sure to add 3 CPU cores to the resource requests and limits for the Redis pod. You can find further recommendations regarding I/O threads in this Redis configuration example.

If you set I/O threads, the Redis pod must be restarted during the upgrade so that the changes can be applied. During the restart, the input snapshots from all connected Gloo agents are removed from the Redis cache. If you also update settings in the Gloo management server that require the management server pod to restart, the management server's local memory is cleared and all Gloo agents are disconnected. Although the Gloo agents attempt to reconnect to send their input snapshots and re-populate the Redis cache, some agents might take longer to connect or fail to connect at all. To ensure that the Gloo management server halts translation until the input snapshots of all workload cluster agents are present in Redis, it is recommended to enable safe mode on the management server alongside updating the I/O threads for the Redis pod. For more information, see Safe mode. Note that in version 2.6.0 and later, safe mode is enabled by default.

To update I/O side threads in Redis as part of your Gloo Mesh Gateway upgrade:

  1. Scale down the number of Gloo management server pods to 0.

    kubectl scale deployment gloo-mesh-mgmt-server --replicas=0 -n gloo-mesh
    
  2. Upgrade Gloo Mesh Gateway and use the following settings in your Helm values file for the management server. Make sure to also increase the number of CPU cores to one core per thread, and add an additional CPU core for the main Redis thread. The following example also enables safe mode on the Gloo management server to ensure translation is done with the complete context of all workload clusters.

    glooMgmtServer:
      safeMode: true
    redis: 
      deployment: 
        ioThreads: 2
        resources: 
          requests: 
            cpu: 3
          limits: 
            cpu: 3
    

Prometheus metrics

By default, a Prometheus instance is deployed with the management plane Helm chart to collect metrics for the Gloo management server. For a production deployment, you can either replace the built-in Prometheus server with your own instance, or locally federate metrics and provide them to your production monitoring system. For more information on each option, see Best practices for collecting metrics in production.

Redis safe mode

By default, safe mode is enabled on the Gloo management server to ensure that the Gloo management server translates custom Gloo resources only if the complete context of all workload clusters is populated in Redis or the Gloo management server's local memory.

If you disabled safe mode during your tests, such as to register a workload cluster with connectivity issues, it is recommended to enable safe mode before you move to production. To enable safe mode, add the following values to the Helm values file for the Gloo management plane:

glooMgmtServer:
  safeMode: true

To learn more about safe mode and how translation works in Gloo Mesh Gateway, see Safe mode.

Data plane settings

Before you register workload clusters with Gloo, review the following options to help secure your registration. Each section details the benefits of the security option, and the necessary settings to specify in a Helm values file to use during your Helm registration.

You can see all possible fields for the Helm chart by running the following command:

helm show values gloo-platform/gloo-platform --version v2.5.3 > all-values.yaml

You can also review these fields in the Helm values documentation.

FIPS-compliant image

If your environment runs workloads that require federal information processing compliance, you can use images of Gloo Gateway components that are specially built to comply with NIST FIPS. Open the values.yaml file, search for the image section, and append -fips to the tag, such as in the following example.
...
glooAgent:
  image:
    pullPolicy: IfNotPresent
    registry: gcr.io/gloo-mesh
    repository: gloo-mesh-agent
    tag: 2.5.3-fips

Kubernetes RBAC

For information about controlling access to your Gloo resources with Kubernetes role-based access control (RBAC), see User access.

To review the permissions of deployed Gloo components such as the management server and agent, see Gloo component permissions.