Number of clusters

Single-cluster: Gloo Mesh Enterprise is fully functional when the management plane (management server) and data plane (agent and service mesh) both run within the same cluster. You can easily install both the control and data plane components by using one installation process. If you choose to install the components in separate processes, ensure that you use the same name for the cluster during both processes.

Multicluster: A multicluster Gloo Mesh Enterprise setup consists of one management cluster that the Gloo Mesh Enterprise management server is installed in, and one or more workload clusters that serves as the data plane (agent and service mesh). By running the management plane in a dedicated management cluster, you can ensure that no workload pods consume cluster resources that might impede management processes. Many guides throughout the documentation use one management cluster and two workload clusters as an example setup.

Cluster details

Review the following recommendations and considerations when creating clusters for your Gloo Mesh Enterprise environment.

Supported platforms

Gloo Mesh Enterprise is supported on the following platforms:

  • Kubernetes
  • OpenShift: Some changes are required to allow Istio to run on OpenShift clusters. To make these changes, use commands throughout the installation guides that are labeled for use with OpenShift. For more information, see Installation options.

Note that in multicluster setups, you can use both Kubernetes and OpenShift clusters.

Note: Be sure to verify that your cluster’s Kubernetes or Openshift version is supported for Gloo.

Name

The cluster name must be alphanumeric with no special characters except a hyphen (-), lowercase, and begin with a letter (not a number).

Cluster context names cannot include underscores. The generated certificate that connects workload clusters to the management cluster uses the context name as a SAN specification, and underscores in SAN are not FQDN compliant. You can rename a context by running kubectl config rename-context "<oldcontext>" <newcontext>.

Throughout the guides in this documentation, examples use a single-cluster setup and a three-cluster setup.

  • Single-cluster: When a guide requires an example name, the examples use mgmt. Otherwise, you can save the name of your cluster in the $CLUSTER_NAME environment variable, and the context of your cluster in the $MGMT_CONTEXT environment variable.
  • Multicluster: When a guide requires example names, the examples use mgmt, cluster1, and cluster2. Otherwise, you can save the names of your clusters in the $MGMT_CLUSTER, $REMOTE_CLUSTER1, and $REMOTE_CLUSTER2 environment variables, and the contexts of your clusters in the $MGMT_CONTEXT, $REMOTE_CONTEXT1, and $REMOTE_CONTEXT2 environment variables.

Size and memory

The following tables suggest minimum vCPU and memory sizes for the Gloo and Redis components of the management plane, depending on the size of your environment. If you find that the Gloo management server translation time is continuously above 60s in your environment, you can try to improve the performance by allocating more CPU and memory resources to the Gloo management server.

For more information, see the following resources:

  • Relay architecture, which explains how the management server, agent, and backing Redis instance work together to maintain the desired state of your environment.
    • Scalability reference, which explains other factors that impact performance at scale, such as the number of workspaces or virtual destinations.

    Demos, POCs, and non-production testing environments

    Review the following table of suggestions for sizing non-production environments, such as for development, demos, or proofs of concept (POCs).

    Your environmentGloo management clusterGloo workload clusterRedis for management plane
    Minimum setup for POCs, such as:
    • Single cluster or 1 management and 1 workload cluster
    • A couple small sample apps, such as httpbin
    4 vCPU and 8 GB memory2 vCPU and 8 GB memoryGeneral cache.m7g.large, 2 vCPU and 8 GiB memory
    Larger POCs, such as:
    • 1 management and 2 workload clusters
    • Many microservices-based apps, such as Bookinfo, a developer portal demo frontend, or a boutique store
    8 vCPU and 16 GB memory4 vCPU and 16 GB memoryGeneral cache.m7g.large, 2 vCPU and 8 GiB memory

    Production environments

    Review the following table of suggestions for sizing production environments. The table is organized into three general sizes based on the number of clusters and resources that you have.

    Your environment*Management cluster instancesGloo management serverGloo agentRedis for management plane
    Small
    • 1 management cluster
    • 2 workload clusters
    • < 1,000 services
    • Minimum: 8 vCPU and 32 GB memory
    • Resource requests: 4 vCPU and 16 GB memory
    • Resource limits: 8 vCPU and 32 GB memory
    • Resource requests: 1 vCPU and 2 GB memory
    • Resource limits: 2 vCPU and 4 GB memory
    • General cache.m7g.large
    • 2 vCPU and 8 GiB memory
    • Backing PV with at least 1Gi for persistence.
    Medium
    • 1 management cluster
    • 5-10 workload clusters
    • < 4,000 services
    • Minimum: 16 vCPU and 64 GB memory
    • Resource requests: 8 vCPU and 16 GB memory
    • Resource limits: 16 vCPU and 32 GB memory
    • Resource requests: 2 vCPU and 4 GB memory
    • Resource limits: 4 vCPU and 8 GB memory
    • General cache.m7g.2xlarge
    • 8 vCPU and 32 GiB memory
    • Backing PV with 2Gi - 5Gi for persistence.
    Large
    • 1 management cluster
    • > 10 workload clusters
    • > 5,000 services
    • Minimum: 16 vCPU and 64 GB memory
    • Suggested: 32 vCPU and 128 GB memory
    • Resource requests: 8 vCPU and 16 GB memory
    • Resource limits: 16 vCPU and 32 GB memory
    • Resource requests: 2 vCPU and 8 GB memory
    • Resource limits: 4 vCPU and 16 GB memory
    • General cache.m7g.4xlarge
    • 16 vCPU and 64 GiB memory
    • Backing PV with at least 10Gi for persistence.

    Notes on the table:

    * Your environment: A service is 1 Kubernetes service in a single cluster. For high availability and multicluster communication, services are translated across clusters. For example, if you have 100 Kubernetes services in a single cluster, and 10 workload clusters, then your total number of services is 1,000. To manage services more efficiently for translation, create workspaces. Then, export and import only the services that you need across workspaces.

    Gloo management server: To improve translation performance, you can run multiple replicas of the Gloo management server. This horizontal scaling shards and runs translation in parallel. However, horizontal scaling does not evenly distribute the compute load among replicas. Instead, horizontal scaling improves translation performance at a 1:1 ratio cost of resource utilization. At scale, such increased cost impacts mostly the size of the memory, because each management server replica maintains a global snapshot of the entire environment in-memory. As such, the suggested memory sizes increase at a higher ratio than the vCPUs as your environment scales.

    Redis for management plane: The sizing suggestions in the table are based off AWS ElastiCache node types and their corresponding AWS EC2 instance types. If you use a different Redis-compatible provider, try to use a comparable instance size. For more Redis deployment considerations, see the Backing Redis databases guide.

    Version

    The following versions of Gloo Mesh Enterprise are supported with the compatible open source project versions of Istio and Kubernetes. Later versions of the open source projects that are released after Gloo Mesh Enterprise might also work, but are not tested as part of the Gloo Mesh Enterprise release.

    Gloo Mesh EnterpriseRelease dateSupported Solo distributions of Istio and related Kubernetes versions tested by Solo
    2.615 Aug 2024
    • Istio 1.23 on Kubernetes 1.27 - 1.30
    • Istio 1.22 on Kubernetes 1.27 - 1.30
    • Istio 1.21 on Kubernetes 1.26 - 1.29 (see note below)
    • Istio 1.20 on Kubernetes 1.25 - 1.29
    • Istio 1.19 on Kubernetes 1.25 - 1.28
    Note: A bug was identified when upgrading from Istio version 1.20 or lower to Istio version 1.21 and later while being on Gloo Mesh Enterprise version 2.6. This bug can lead to disabled JWT authentication and authorization policies that fail close, which means that the gateway rejects requests as unauthenticated on any route that is protected by a JWT policy. Note that this bug is fixed in version 2.6.5 and later. Make sure to upgrade to Gloo Mesh Enterprise version 2.6.5 first before you upgrade to Istio version 1.21 and later.
    2.509 Jan 2024
    • Istio 1.20 on Kubernetes 1.25 - 1.29
    • Istio 1.19 on Kubernetes 1.25 - 1.28
    • Istio 1.18 on Kubernetes 1.24 - 1.27
    • Istio 1.17 on Kubernetes 1.23 - 1.26
    • Istio 1.16 on Kubernetes 1.22 - 1.25

    Note: Istio 1.21 is not supported in Gloo Mesh Enterprise version 2.5. You must upgrade the Gloo management server and agents to version 2.6 prior to upgrading to Istio 1.21.
    2.428 Aug 2023
    • Istio 1.18 on Kubernetes 1.24 - 1.27
    • Istio 1.17 on Kubernetes 1.23 - 1.26
    • Istio 1.16 on Kubernetes 1.22 - 1.25
    • Istio 1.15 on Kubernetes 1.22 - 1.25
    • Istio 1.14 on Kubernetes 1.21 - 1.24
    2.317 Apr 2023
    • Istio 1.18 on Kubernetes 1.24 - 1.27
    • Istio 1.17 on Kubernetes 1.23 - 1.26
    • Istio 1.16 on Kubernetes 1.22 - 1.25
    • Istio 1.15 on Kubernetes 1.22 - 1.25
    • Istio 1.14 on Kubernetes 1.21 - 1.24

    Feature gates

    To review the required Gloo Mesh Enterprise versions for specific features that you can optionally enable, see Feature gates.

    For more information, see Supported versions.

    Load balancer connectivity

    If you use an Istio ingress gateway and want to test connectivity through it in your Gloo environment, ensure that your cluster setup enables you to externally access LoadBalancer services on the workload clusters.

    Port and repo access from cluster networks

    If you have restrictions for your cluster networks in your cloud infrastructure provider, you must open ports, protocols, and image repositories to install Gloo Mesh Enterprise and to allow your Gloo installation to communicate with the Solo APIs. For example, you might have firewall rules set up on the public network of your clusters so that they do not have default access to all public endpoints. The following sections detail the required and optional ports and repositories that your management and workload clusters must access.

    Management cluster

    Required

    In your firewall or network rules for the management cluster, open the following required ports and repositories.

    NamePortProtocolSourceDestinationNetworkDescription
    Agent communication9900TCPClusterIPs of agents on workload clustersIP addresses of management cluster nodesCluster networkAllow the gloo-mesh-agent on each workload cluster to send data to the gloo-mesh-mgmt-server in the management cluster.
    Management server images--IP addresses of management cluster nodeshttps://gcr.io/gloo-meshPublicAllow installation and updates of the gloo-mesh image in the management cluster.
    Redis image--IP addresses of management cluster nodesdocker.io/redisPublicAllow installation of the Redis image in the management cluster to store OIDC ID tokens for the Gloo UI.

    Optional

    In your firewall or network rules for the management cluster, open the following optional ports as needed.

    NamePortProtocolSourceDestinationNetworkDescription
    Healthchecks8090TCPCheck initiatorIP addresses of management cluster nodesPublic or cluster network, depending on whether checks originate from outside or inside service meshAllow healthchecks to the management server.
    OpenTelemetry gateway4317TCPOpenTelemetry agentIP addresses of management cluster nodesPublicCollect telemetry data, such as metrics, logs, and traces to show in Gloo observability tools.
    Prometheus9091TCPScraperIP addresses of management cluster nodesPublicScrape your Prometheus metrics from a different server, or a similar metrics setup.
    Other tools----PublicFor any other tools that you use in your Gloo environment, consult the tool’s documentation to ensure that you allow the correct ports. For example, if you use tools such as cert-manager to generate and manage the Gloo certificates for your setup, consult the cert-manager platform reference.

    Workload clusters

    Required

    In your firewall or network rules for the workload clusters, open the following required ports and repositories.

    NamePortProtocolSourceDestinationNetworkDescription
    Agent image--IP addresses of workload cluster nodeshttps://gcr.io/gloo-meshPublicAllow installation and updates of the gloo-mesh image in workload clusters.
    East-west gateway15443TCPNode IP addresses of other workload clustersGateway load balancer IP address on one workload clusterCluster networkAllow services in one workload cluster to access the mesh’s east-west gateway for services in another cluster. Repeat this rule for the east-west gateway on each workload cluster. Note that you can customize this port in the spec.options.eastWestGatewaySelector.hostInfo.port setting of your workspace settings resource.
    Ingress gateway80 and/or 443HTTP, HTTPS-Gateway load balancer IP addressPublic or private networkAllow incoming traffic requests to the Istio ingress gateway.

    Optional

    In your firewall or network rules for the workload clusters, open the following optional ports as needed.

    NamePortProtocolSourceDestinationNetworkDescription
    Agent healthchecks8090TCPCheck initiatorIP addresses of workload cluster nodesPublic or cluster network, depending on whether checks originate from outside or inside service meshAllow healthchecks to the Gloo agent.
    Envoy telemetry15090HTTPScraperIP addresses of workload cluster nodesPublicScrape your Prometheus metrics from a different server, or a similar metrics setup.
    Istio Pilot15017HTTPSIP addresses of workload cluster nodes-PublicDepending on your cloud provider, you might need to open ports to install Istio. For example, in GKE clusters, you must open port 15017 for the Pilot discovery validation webhook. For more ports and requirements, see Ports used by Istio.
    Istio healthchecks15021HTTPCheck initiatorIP addresses of workload cluster nodesPublic or cluster network, depending on whether checks originate from outside or inside service meshAllow healthchecks on path /healthz/ready.
    Solo distributions of Istio--IP addresses of workload cluster nodesA repo key for a Solo distribution of Istio that you can get by logging in to the Support Center and reviewing the Istio images built by Solo.io support articlePublicAllow installation and updates of the Solo distribution of Istio in workload clusters.
    VM onboarding15012, 15443TCPGateway load balancer IP addresses on workload clustersVMsCluster networkTo add virtual machines to your Gloo Mesh Enterprise setup, allow traffic and updates through east-west routing from the workload clusters to the VMs.

    Port and repo access from local systems

    If corporate network policies prevent access from your local system to public endpoints via proxies or firewall rules:

    • Allow access to https://run.solo.io/meshctl/install to install the meshctl CLI tool.
    • Allow access to the Gloo Helm repository, https://storage.googleapis.com/gloo-platform/helm-charts, to install Gloo Mesh Enterprise via the helm CLI.

    Reserved ports and pod requirements

    Review the following service mesh and platform docs that outline what ports are reserved, so that you do not use these ports for other functions in your apps. You might use other services such as a database or application monitoring tool that reserve additional ports.

    Considerations for running Cilium and Istio on EKS

    If you plan to run Istio with sidecar injection and the Cilium CNI in tunneling mode (VXLAN or GENEVE) on an Amazon EKS cluster, the Istio control plane istiod is not reachable by the Kubernetes API server by default.

    Istio uses Kubernetes admission webhooks to inject sidecar proxies into pods. In EKS environments, the Cilium CNI cannot run on the same nodes where the Kubernetes API server is deployed to, which leads to communication issues when trying to inject Istio sidecars into pods.

    You can choose from the following options to allow istiod to communicate with the Kubernetes API server:

    • Configure istiod with direct access to the networking stack of the underlying host node by setting hostNetwork to true as shown in the following Istio Lifecycle Manager example:
        # Traffic management
          components:
            pilot:
              k8s:
                overlays:
                  - kind: Deployment
                    name: istiod-1-20
                    patches:
                      - path: spec.template.spec.hostNetwork
                        value: true 
        
    • Chain the Cilium CNI with the aws-vpc-cni. For more information, see the Cilium documentation.
    • Choose a different Cilium routing mode instead, such as eBPF-based routing. For more information about available modes, see the Cilium documentation.
    • Consider running Istio in ambient mode. For more information, see Beta: Ambient mesh.