Best practices for Gloo Mesh in prod

Review the following recommended practices for preparing optional security measures and setting up Gloo Mesh Enterprise in a production environment.

Deployment model

A Gloo Mesh setup consists of one management cluster that the Gloo Mesh Enterprise management components are installed in, and one or more workload clusters that run services meshes which are registered with and managed by the management cluster. The management cluster serves as the management plane, and the workload clusters serve as the data plane.

In a production deployment, do not install the management components into a workload cluster that runs a service mesh. Instead, use a dedicated cluster for the management plane, as depicted in the following diagram.

Figure of a multicluster Gloo Mesh quick-start architecture, with a dedicated management plane cluster.

Management plane settings

Before you install the Gloo Mesh management components into your management cluster, review the following options to help secure your installation. Each section details the benefits of the security option, and the necessary settings to specify in a Helm values file to use during your Helm installation.

The settings in the following sections are provided in the sample values-mgmt-plane.yaml values file. Conversely, you can view the default values for the management plane Helm chart by saving them to a YAML file.

helm show values gloo-mesh-enterprise/gloo-mesh-enterprise --version 2.0.7 > values-mgmt-plane.yaml

FIPS-compliant image

If your environment runs workloads that require federal information processing compliance, you can use images of Gloo Mesh Enterprise components that are specially built to comply with NIST FIPS. Open the values.yaml file, search for the image section, and append -fips to the tag, such as in the following example.
...
glooMeshMgmtServer:
  image:
    pullPolicy: IfNotPresent
    registry: gcr.io/gloo-mesh
    repository: gloo-mesh-mgmt-server
    tag: 2.0.7-fips

Certificate management

Gloo Mesh Enterprise's default behavior is to create self-signed certificates at install time to handle bootstrapping mTLS connectivity between the management server and agent components of Gloo Mesh Enterprise. To use these default certificates, leave the disableCa and disableCaCertGeneration values set to false. If you prefer to set up Gloo Mesh without secure communication for quick demonstrations, include the --set insecure=true flag. Note that using the default self-signed certificate authorities (CAs) or using insecure mode are not suitable for production environments.

In production installations, do not use the default root CA certificate and intermediate signing CAs that are automatically generated and self-signed by Gloo Mesh. Instead, add automation so that the certificates can be easily rotated as described in the certificate management guide.

To supply your custom certificates during Gloo Mesh installation:

  1. Select the certificate management approach that you want to use, such as AWS Certificate Manager, HashiCorp Vault, or your own custom certs.
  2. As you follow those instructions, make sure that you create relay forwarding and identity secrets in the management and workload clusters.
  3. As you follow those instructions, modify your Helm values file to use the custom CAs, such as in the following glooMeshMgmtServer section. Note that you might need to update the relayTlsSecret name value, depending on your certificate setup.
insecure: false
glooMeshMgmtServer:
  relay:
    disableCa: true
    disableCaCertGeneration: true
    signingTlsSecret:
      name: relay-tls-signing-secret
    tlsSecret:
      name: relay-server-tls-secret

Deployment and service overrides

In some cases, you might need to modify the default deployment of the glooMeshMgmtServer with your own Kubernetes resources. You can specify resources and annotations for the management server deployment in the glooMeshMgmtServer.deploymentOverrides field, and resources and annotations for the service that exposes the deployment in the glooMeshMgmtServer.serviceOverrides field.

Most commonly, the serviceOverrides section specifies cloud provider-specific annotations that might be required for your environment. For example, the following section applies the recommended Amazon Web Services (AWS) annotations for modifying the created load balancer service.

glooMeshMgmtServer:
  ...
  serviceOverrides:
    metadata:
      annotations:
        # AWS-specific annotations
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "10"
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "9900"
        service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: "tcp"

        service.beta.kubernetes.io/aws-load-balancer-type: external
        service.beta.kubernetes.io/aws-load-balancer-scheme: internal
        service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
        service.beta.kubernetes.io/aws-load-balancer-backend-protocol: TCP
        service.beta.kubernetes.io/aws-load-balancer-private-ipv4-addresses: 10.0.50.50, 10.0.64.50
        service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-0478784f04c486de5, subnet-09d0cf74c0117fcf3
        service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: deregistration_delay.connection_termination.enabled=true,deregistration_delay.timeout_seconds=1
  # Kubernetes load balancer service type
  serviceType: LoadBalancer
  ...

In less common cases, you might want to provide other resources, like a config map or service account. This example shows how you might use the deploymentOverrides to specify a config map in a volume mount.

glooMeshMgmtServer:
  deploymentOverrides:
    spec:
      template:
        spec:
          volumeMounts:
            - name: envoy-config
              configMap:
                name: my-custom-envoy-config
  ...

UI authentication

The Gloo Mesh UI supports OpenID Connect (OIDC) authentication from common providers such as Google, Okta, and Auth0. Users that access the UI will be required to authenticate with the OIDC provider, and all requests to retrieve data from the API will be authenticated.

You can configure OIDC authentication for the UI by providing your OIDC provider details in the glooMeshUi section, such as the following.

...
glooMeshUi:
  enabled: true
  auth:
    enabled: true
    backend: oidc
    oidc:
      appUrl: # The URL that the UI for the OIDC app is available at, from the DNS and other ingress settings that expose the OIDC app UI service.
      clientId: # From the OIDC provider
      clientSecret: # From the OIDC provider. Stored in a secret.
      clientSecretName: dashboard
      issuerUrl: # The issuer URL from the OIDC provider, usually something like 'https://<domain>.<provider_url>/'.

Redis instance

By default, a Redis instance is deployed with the management plane Helm chart to store OIDC ID tokens. For a production deployment, you can disable the default Redis deployment and optionally provide your own Redis deployment instead.

  1. Add the following values into the glooMeshUi.auth section of the Helm values file with your OIDC details that you created in the previous section.

    ...
        oidc:
          session:
            backend: redis
            redis:
              host: # Point to your Redis instance. For example, the host for the default intaces is 'redis-dashboard.gloo-mesh.svc.cluster.local:6379'
    
  2. In a glooMeshRedis section of the values file, set the default instance to false.

    ...
    glooMeshRedis:
      enabled: false
    

Prometheus metrics

By default, a Prometheus instance is deployed with the management plane Helm chart to collect metrics for the Gloo Mesh app. For a production deployment, you can disable the default Prometheus deployment and optionally integrate your own Prometheus instance instead.

The minimum supported version of Prometheus is 2.16.0.

  1. In your Helm values file for the Gloo Mesh management plane chart, disable the default Prometheus instance and instead read from your custom Prometheus instance's full URL, including the port number.

    ...
    # Disable default Prometheus instance to provide your own instance
    prometheus:
      enabled: false
    # Provide the URL (with port) to your Prometheus instance
    prometheusUrl: 
    
  2. Configure your Prometheus server to scrape metrics from the gloo-mesh-mgmt-server-admin.gloo-mesh:9091 endpoint. For example, if you use the Prometheus community chart, update your values file for the Prometheus chart as follows to scrape metrics from Gloo Mesh.

    serverFiles:
      prometheus.yml:
        scrape_configs:
        - job_name: gloo-mesh
          scrape_interval: 15s
          scrape_timeout: 10s
          static_configs:
          - targets:
            - gloo-mesh-mgmt-server-admin.gloo-mesh:9091
    

    Note: Respect the following Prometheus scraping annotations in the Gloo Mesh deployment.

    annotations:
      app.kubernetes.io/name: gloo-mesh-mgmt-server
      prometheus.io/path: /metrics
      prometheus.io/port: "9091"
      prometheus.io/scrape: "true"
      sidecar.istio.io/inject: "false"
    

Data plane settings

Before you register workload clusters with Gloo Mesh, review the following options to help secure your registration. Each section details the benefits of the security option, and the necessary settings to specify in a Helm values file to use during your Helm registration.

The settings in the following sections are provided in the sample values-data-plane.yaml values file. Conversely, you can view the default values for the agent Helm chart by saving them to a YAML file.

helm show values gloo-mesh-agent/gloo-mesh-agent --version 2.0.7 > values-data-plane.yaml

FIPS-compliant image

If your environment runs workloads that require federal information processing compliance, you can use images of Gloo Mesh Enterprise components that are specially built to comply with NIST FIPS. Open the values.yaml file, search for the image section, and append -fips to the tag, such as in the following example.
...
glooMeshAgent:
  image:
    pullPolicy: IfNotPresent
    registry: gcr.io/gloo-mesh
    repository: gloo-mesh-agent
    tag: 2.0.7-fips

Certificate management

If you use the default self-signed certificates during Gloo Mesh installation, you can follow the steps in the cluster registration documentation to use these certificates during cluster registration. If you set up Gloo Mesh without secure communication for quick demonstrations, include the --set insecure=true flag during registration. Note that using the default self-signed certificate authorities (CAs) or using insecure mode are not suitable for production environments.

In production environments, you use the same custom certificates that you set up for Gloo Mesh installation during cluster registration:

  1. Ensure that when you installed Gloo Mesh, you set up the relay certificates, such as with AWS Certificate Manager, HashiCorp Vault, or your own custom certs, including the relay forwarding and identity secrets in the management and workload clusters.
  2. The relay certificate instructions include steps to modify your Helm values file to use the custom CAs, such as in the following relay section. Note that you might need to update the clientTlsSecret name and rootTlsSecret name values, depending on your certificate setup.
insecure: false
relay:
  authority: gloo-mesh-mgmt-server.gloo-mesh
  clientTlsSecret:
    name: gloo-mesh-agent-$REMOTE_CLUSTER-tls-cert
    namespace: gloo-mesh
  rootTlsSecret:
    name: relay-root-tls-secret
    namespace: gloo-mesh
  serverAddress: $MGMT_SERVER_NETWORKING_ADDRESS
...

Rate limiting and external authentication

To enable mTLS with rate limiting and external authentication, you must add an injection directive for those components. Although you can enable an injection directive on the gloo-mesh namespace, this directive makes the management plane components dependent on the functionality of Istio’s mutating webhook, which may be a fragile coupling and is not recommended as best practice. In production setups, install the Gloo Mesh Enterprise chart with just rate limiting and external authentication services enabled to the gloo-mesh-addons namespace, and label the gloo-mesh-addons namespace for Istio injection.

When you initially register a cluster, set the rate-limiter and ext-auth-service settings to false in the values file, such as the following.

rate-limiter: 
  enabled: false
ext-auth-service: 
  enabled: false
glooMeshAgent:
  enabled: true