Solo runs internal scalability tests for every release to confirm that the translation and user experience times remain within expected boundaries, and to measure scalability and performance improvements between releases.

This page summarizes factors that impact Gloo Mesh Enterprise scalability and recommendations for how to improve scalability and performance in large-scale environments. To learn about the scalability tests that Solo runs, see Internal scalability tests.

Scalability threshold definition

Gloo Mesh Enterprise is considered performant when Gloo resources that the user creates are translated into Envoy and Istio resources, and are applied in the user’s environment in a reasonable amount of time. The following images illustrate the components that are involved during the Gloo resource translation (A to B) and reconciliation (A to C) process when these resources are applied in the workload or management cluster.

A scalability threshold is reached when one of the following conditions are met:

  • Translation time too high: The time it takes from applying a Gloo resource in the cluster (A) to generating the output snapshot (B) is greater than 60 seconds.
  • User experience time too high: The time it takes from applying a Gloo resource in the cluster (A) to propagating the user’s config changes to the workload’s sidecar proxy or an Istio gateway (C) is greater than 120 seconds.
  • Gloo management server unavailable: The Gloo management server becomes unavailable or crashes, even though you provided enough compute resources.

Factors that impact the scalability threshold and recommendations

Review the factors that impact the scalability threashold in Gloo Mesh Enterprise. By accounting for these factors in your environment setup, you can optimize the Gloo Mesh Enterprise scalability and performance.

Workspace boundaries

Workspaces define the cluster resources that users have access to. These resources can be spread across multiple Kubernetes namespaces and clusters. To allow services within a workspace to communicate with each other across namespaces and clusters, Gloo Mesh Enterprise automatically translates the Gloo resources into Istio resources and applies them to the namespace and cluster where they are needed. This process is also referred to as federation.

The complexity of this setup increases even more when you choose to import and export Gloo resources in your workspace to allow services to communicate across workspaces. To learn more about how resources are federated in your workspace when you import from and export to other workspaces, see Import and export resources across workspaces.

The more services a workspace includes, or imports from and exports to, the more Istio resources must be added in each of your clusters. As resources scale up into the thousands, the Gloo Mesh Enterprise scalability threshold might be reached more quickly. To optimize Gloo Mesh Enterprise scalability, define proper workspace boundaries. This way, Gloo only federates the necessary resources across Kubernetes namespaces and clusters.

  • ✅ Use multiple, smaller workspaces that include only the services that a team has access to.
  • ✅ Export and import only the services that you need access to.
  • ❌ Do not create a global workspace that selects all Kubernetes clusters and namespaces.
  • ❌ Do not create one workspace per cluster, which makes multi-cluster communication more difficult.

Multicluster routing with virtual destinations and external services

To enable routing between services cluster boundaries, Istio resources must be federated across Kubernetes namespaces and clusters. You have the option to enable service federation in your workspace settings. With this setting, Gloo automatically creates Istio ServiceEntries in every namespace of each cluster. Depending on the number of services you have in your environment and the number of services that you export to and import from other workspaces, the number of ServiceEntries can quickly grow and impact the Gloo Mesh Enterprise scalability threshold in your environment.

Instead of enabling federation for your entire workspace, use Gloo virtual destinations and external services to enable intelligent, multicluster routing for the services that are selected by these resources. Gloo Mesh Enterprise retrieves the services that the virtual destination and external services select, and automatically federates them in each namespace within the workspace, across clusters, and even within other workspaces if you set up importing and exporting. With this setup, you federate only the services that must be federated across cluster. For more information about federation, see Federated services

  • ✅ Use virtual destinations when you want to access services that are located in other clusters.
  • ✅ Export and import only the services that you need access to in a particular workspace.
  • ❌ Do not use wildcards to import or export all services in a workspace.
  • ❌ Do not enable federation for all services at the workspace level.

Output snapshot size

When Gloo resources are created, the Gloo agent creates an input snapshot that includes the Gloo resources that must be translated. This input snapshot is sent to the Gloo management server for translation. After the Gloo management server translated the Gloo resources into the corresponding Istio resources, an output snapshot is sent back to the Gloo agents. The agents use this snapshot to apply the Istio resources in the workload clusters.

The number of services that belong to a workspace and the number of services that are imported and exported from other workspaces impact the size of the output snapshot that the Gloo management server creates.

Internal scalability tests show that the Gloo Mesh Enterprise scalability threshold is reached more quickly if the snapshot size is greater than 20 MB. As such, define proper workspace boundaries to reduce the number of services that you import from and export to other workspaces. Scoping a workspace to keep the number of services small usually results in smaller output snapshot sizes.

  • ✅ Use multiple, smaller workspaces that include only the services that a team has access to.
  • ✅ Export and import only the services that you need access to.
  • ❌ Do not create a global workspace that selects all Kubernetes clusters and namespaces.

Gloo management server compute resources

The Gloo management server’s compute resource consumption varies depending on the changes that must be applied in the Gloo Mesh Enterprise environment. For example, the resource consumption is high when a change occurs and the management server must translate the resource and propagate the change to the gateways and workload proxies. However, if no changes occur in the environment, the CPU and memory resources that are allocated to the Gloo management server are usually underutilized.

To find an overview of the minimum resource requirements, see System requirements for size and memory.

Number of clusters

The number of clusters that you add to your Gloo environment impacts the number of Gloo agents that need to be kept in sync and the time it takes to properly propagate changes in your environment. If you find that the reconciliation time (A to C) is continuously above 120s, you can try to scale the Gloo management server pod to multiple replicas.

Internal scalability tests

Solo runs internal scalability tests for every release to verify translation and user experience times, measure scalability and performance improvements between releases, and to confirm scalability in emulated customer environments.

To access the results of a sample scalability test that was run by Solo engineers, log in to Zendesk and review the Gloo Mesh Enterprise internal scalability tests article.