On this page

Gloo Mesh scalability

Learn about factors that impact the scalability of Gloo Mesh Enterprise.

Solo runs internal scalability tests for every release to confirm that the translation and user experience times remain within expected boundaries, and to measure scalability and performance improvements between releases.

This page summarizes factors that impact Gloo Mesh Enterprise scalability and recommendations for how to improve scalability and performance in large-scale environments. To learn about the scalability tests that Solo runs, see Internal scalability tests.

info

If you are interested in validating the scalability of Gloo Mesh Enterprise components in your environment or if you need help with optimizing the performance, contact your account representative. Solo engineers can help with replicating your production environment and running performance tests to verify the maximum number of workloads and resources that Gloo Mesh Enterprise supports for your use case.

Scalability threshold definition

Gloo Mesh Enterprise is considered performant when Gloo resources that the user creates are translated into Envoy and Istio resources, and are applied in the user’s environment in a reasonable amount of time. The following image illustrates the Gloo components that are included during the translation (A to B) and reconciliation process (A to C).

Figure: Gloo Mesh Enterprise resource translation and reconciliation for resources applied in workload clusters

A scalability threshold is reached when one of the following conditions are met:

Translation time too high: The time it takes from applying a Gloo resource in the cluster (A) to generating the output snapshot (B) is greater than 60 seconds.
User experience time too high: The time it takes from applying a Gloo resource in the cluster (A) to propagating the user’s config changes to the workload’s sidecar proxy or an Istio gateway (C) is greater than 120 seconds.
Gloo management server unavailable: The Gloo management server becomes unavailable or crashes, even though you provided enough compute resources.

Factors that impact the scalability threshold and recommendations

Review the factors that impact the scalability threashold in Gloo Mesh Enterprise. By accounting for these factors in your environment setup, you can optimize the Gloo Mesh Enterprise scalability and performance.

Workspace boundaries

Workspaces define the cluster resources that users have access to. These resources can be spread across multiple Kubernetes namespaces and clusters. To allow services within a workspace to communicate with each other across namespaces and clusters, Gloo Mesh Enterprise automatically translates the Gloo resources into Istio resources and applies them to the namespace and cluster where they are needed. This process is also referred to as federation.

The complexity of this setup increases even more when you choose to import and export Gloo resources in your workspace to allow services to communicate across workspaces. To learn more about how resources are federated in your workspace when you import from and export to other workspaces, see Import and export resources across workspaces.

The more services a workspace includes, or imports from and exports to, the more Istio resources must be added in each of your clusters. As resources scale up into the thousands, the Gloo Mesh Enterprise scalability threshold might be reached more quickly. To optimize Gloo Mesh Enterprise scalability, define proper workspace boundaries. This way, Gloo only federates the necessary resources across Kubernetes namespaces and clusters.

✅ Use multiple, smaller workspaces that include only the services that a team has access to.
✅ Export and import only the services that you need access to.
🟡 Avoid having a global workspace that selects all Kubernetes clusters and namespaces.
🟡 Reduce the amount of cross-workspace traffic required in multicluster environments.

Multicluster routing with virtual destinations and external services

To enable routing between services cluster boundaries, Istio resources must be federated across Kubernetes namespaces and clusters. You have the option to enable service federation in your workspace settings. With this setting, Gloo automatically creates Istio ServiceEntries in every namespace of each cluster. Depending on the number of services you have in your environment and the number of services that you export to and import from other workspaces, the number of ServiceEntries can quickly grow and impact the Gloo Mesh Enterprise scalability threshold in your environment.

Instead of enabling federation for your entire workspace, use Gloo virtual destinations and external services to enable intelligent, multicluster routing for the services that are selected by these resources. Gloo Mesh Enterprise retrieves the services that the virtual destination and external services select, and automatically federates them in each namespace within the workspace, across clusters, and even within other workspaces if you set up importing and exporting. With this setup, you federate only the services that must be federated across cluster. For more information about federation, see Federated services

✅ Use virtual destinations when you want to access services that are located in other clusters.
✅ Export and import only the services that you need access to in a particular workspace.
🟡 Avoid wildcards to import or export all services in a workspace.
🟡 Avoid using federation for all services at the workspace level.

Output snapshot size

When Gloo resources are created, the Gloo agent creates an input snapshot that includes the Gloo resources that must be translated. This input snapshot is sent to the Gloo management server for translation. After the Gloo management server translated the Gloo resources into the corresponding Istio resources, an output snapshot is sent back to the Gloo agents. The agents use this snapshot to apply the Istio resources in the workload clusters.

The number of services that belong to a workspace and the number of services that are imported and exported from other workspaces impact the size of the output snapshot that the Gloo management server creates.

Internal scalability tests show that the Gloo Mesh Enterprise scalability threshold is reached more quickly if the snapshot size is greater than 20 MB. As such, define proper workspace boundaries to reduce the number of services that you import from and export to other workspaces. Scoping a workspace to keep the number of services small usually results in smaller output snapshot sizes.

✅ Use multiple, smaller workspaces that include only the services that a team has access to.
✅ Export and import only the services that you need access to.
🟡 Avoid having a global workspace that selects all Kubernetes clusters and namespaces.

Gloo management server compute resources

The Gloo management server’s compute resource consumption varies depending on the changes that must be applied in the Gloo Mesh Enterprise environment. For example, the resource consumption is high when a change occurs and the management server must translate the resource and propagate the change to the gateways and workload proxies. However, if no changes occur in the environment, the CPU and memory resources that are allocated to the Gloo management server are usually underutilized.

To find an overview of the minimum resource requirements, see System requirements for size and memory.

Number of clusters

The number of clusters that you add to your Gloo environment impacts the number of Gloo agents that need to be kept in sync and the time it takes to properly propagate changes in your environment. If you find that the reconciliation time (A to C) is continuously above 120s, you can try to scale the Gloo management server pod to multiple replicas.

Internal scalability tests

Solo runs internal scalability tests for every release to verify translation and user experience times, measure scalability and performance improvements between releases, and to confirm scalability in emulated customer environments.

You can review the results of a sample scalability test that was run by Solo engineers. The test results are meant to show the scalability that was achieved and verified for Gloo Mesh in a given test setup with a predefined set of test data and workloads.

notifications

Scalability of Gloo Mesh can vary greatly and is highly dependend on various factors, such as the workspace boundary definitions, the number of workloads, the number of namespaces, available compute resources, and more. If you are interested in validating the performance and scalability of Gloo Mesh in your environment or if you need help with optimizing the performance, contact your account representative. Solo engineers can help with replicating your production environment and running performance tests to verify the maximum number of workloads and resources that Gloo Mesh supports for your environment.

Gloo Mesh Enterprise version

Tests were performed on Gloo Mesh version 2.2.5.

Test environment setup

The following compute resources were used to run the scalability test.

Compute resource	Unit
Management server pod CPU	4 vCPUs
Management server pod memory	8Gi
Management server pod replica count	1
Agent pod CPU	2 vCPUs
Agent pod memory	2Gi
Agent pod replica count	1
Istiod pod CPU	2 vCPUs
Istiod memory	2Gi
Number of management clusters	1
Number of workload clusters	3
Node config for management cluster	2 nodes with 8 vCPU and 32Gi memory
Node config for workload clusters	4 nodes with 4 vCPU and 16Gi memory

Load increments

During the scalability test, load was added to the test environment in increments as shown in the following table.

Resource	Amount
Namespaces	3
Workspaces	6
Workspace connections (number of workspaces to export to/ import from)	2
Workloads (Kubernetes Services) per namespace	4
Total number of workloads (multiply the namespaces, workspaces, and workloads)	72
Route tables	1 per total workload
Virtual destinations	1 per total workload
Header manipulation	1 per workspace
Transformation policy	1 per workspace

Test procedure

The scalability test comprised increasing the load and total number of Gloo resources as defined in the load increments. After each load increment, the performance of Gloo Mesh was measured. Tests were stopped when one of the scalability thresholds was reached as defined in the Gloo Mesh scalability threshold definition.

Test results

The following graphs illustrate the scalability that was achieved in Gloo Mesh during the scalability tests before the translation time threshold (B) was reached.

Figure: Scalability results

Refer to the following tables to find a detailed overview of the number of workloads, Gloo Mesh resources, translation and user experience times that were achieved during the scalability tests. These numbers are represented in the graphs.

Runs	Total number of workloads	Route tables	Virtual destinations	Header manipulation policy	Transformation policy	Total number of Gloo resources
1	72	72	72	9	9	162
2	144	144	144	15	15	318
3	216	216	216	21	21	474
4	288	288	288	27	27	630
5	360	360	360	33	33	786
6	432	432	432	39	39	942
7	504	504	504	45	45	1098
8	576	576	576	51	51	1254
9	648	648	648	57	57	1410
10	720	720	720	63	63	1566
11	792	792	792	69	69	1722
12	864	864	864	75	75	1878
13	936	936	936	81	81	2034
14	1008	1008	1008	87	87	2190
15	1080	1080	1080	93	93	2346
16	1152	1152	1152	99	99	2502
17	1224	1224	1224	105	105	2658
18	1296	1296	1296	111	111	2814
19	1368	1368	1368	117	117	2970
20	1440	1440	1440	123	123	3126
21	1512	1512	1512	129	129	3282
22	1584	1584	1584	135	135	3438
23	1656	1656	1656	141	141	3594
24	1728	1728	1728	147	147	3750
25	1800	1800	1800	153	153	3906
26	1872	1872	1872	159	159	4062
27	1944	1944	1944	165	165	4218
28	2016	2016	2016	171	171	4374
29	2088	2088	2088	177	177	4530
30	2160	2160	2160	183	183	4686
31	2232	2232	2232	189	189	4842
32	2304	2304	2304	195	195	4998

Run	User experience time (s)	Translation time (s)	Input snapshot size (MB)	Output snapshot size (MB)
1	3.630771399	0.6768629946	2.497	1.379
2	5.789293289	1.566540955	3.459	2.708
3	8.204482794	2.510457065	4.423	4.04
4	10.23336458	3.319648577	5.387	5.372
5	10.37551022	4.527299008	6.351	6.704
6	17.11259079	5.73963795	7.315	8.037
7	14.73311281	6.281648328	8.279	9.369
8	14.95479655	8.694260323	9.242	10.701
9	19.2776525	8.221313328	10.207	12.033
10	19.3824265	9.931342876	11.17	13.366
11	19.0717845	10.84953012	12.134	14.698
12	38.0724237	13.18595717	13.098	16.03
13	34.86034393	14.60871485	14.062	17.362
14	25.97134995	15.3668277	15.025	18.695
15	30.40479279	16.72147813	15.966	20.027
16	37.39088464	19.83334754	16.954	21.359
17	41.77419281	20.65257513	17.89	22.695
18	41.59029174	21.78748936	18.887	24.034
19	57.85606718	23.83600715	19.854	25.374
20	39.10190034	25.32169496	20.737	26.713
21	37.00853896	24.74112544	21.692	28.053
22	64.06909776	26.40160928	22.737	29.392
23	55.11466908	28.49822162	23.724	30.731
24	64.04015565	26.24104102	24.629	32.071
25	71.12533164	31.3953137	25.59	33.41
26	54.75058913	36.37591711	26.513	34.75
27	54.98820114	37.08584548	27.49	36.089
28	70.60482359	32.99328406	28.491	37.429
29	79.73433065	37.73269123	29.419	38.768
30	73.30540466	47.46800078	30.392	40.108
31	97.94193006	44.47086639	31.359	41.447
32	96.01194096	62.545793	32.327	42.786

check_circle

If you plan to create thousands of workloads in your Gloo Mesh Enterprise environment, contact your account representative so that Solo engineers can help with replicating your production environment and running performance tests to verify the maximum number of workloads and resources that Gloo Mesh supports for your environment.

Gloo Mesh scalability

Scalability threshold definition link

Factors that impact the scalability threshold and recommendations link

Workspace boundaries link

Multicluster routing with virtual destinations and external services link

Output snapshot size link

Gloo management server compute resources link

Number of clusters link

Internal scalability tests link

Gloo Mesh Enterprise version link

Test environment setup link

Load increments link

Test procedure link

Test results link