Architecture

A Gloo Mesh setup consists of one management cluster that the Gloo Mesh Enterprise management components are installed in, and one or more workload clusters that run services meshes which are registered with and managed by the management cluster. The management cluster serves as the management plane, and the workload clusters serve as the data plane.

As shown in the following figure, you can think of Gloo Mesh as a management plane for multiple service mesh control planes. When a workload cluster is registered with Gloo Mesh, the management plane discovers and creates configurations for mesh-enabled workloads in the cluster, unifies the trust model across clusters, scrapes metrics, and more.

Figure: Gloo Mesh Enterprise provides multicluster, multimesh management to any Kubernetes platform for simple Istio service discovery, traffic shaping, secured app traffic like external auth, failover, locality-aware load balancing, and more.

Components

Gloo Mesh Enterprise consists of a management server (sometimes called the “relay server”) for the management plane and relay agents for each workload cluster in the data plane.

Management server on the management cluster

After installation, a deployment named gloo-mesh-mgmt-server runs the management server. The management server is exposed by the gloo-mesh-mgmt-server service on a default port of 9900/TCP. You must configure an ingress point for workload clusters to communicate with the management server. To do so, use an ingress gateway such as Istio or Gloo Edge, or change the gloo-mesh-mgmt-server service type to LoadBalancer.

Relay agents on workload clusters

After registration, a deployment named gloo-mesh-agent runs the relay agent on each workload cluster. The relay agent is exposed by the gloo-mesh-agent service on the default port 9977. The Envoy proxy sidecars for cluster workloads use this port to send metrics to the agent. The relay agent does not receive inbound requests from outside the workload cluster. Unlike the management server, you do not need to configure ingress for the relay agent.

Agent-server communication

Communication between the management and data planes is initiated by relay agents, which run in the workload clusters, to the management server, which runs in the management cluster. The following figures outline the general flow of how the relay agents and server communicate to keep your multimesh ennvironment up to date.

Note that these steps outline the relay process in a multicluster setup.

Step 1: Management server and relay agent registration

A workload cluster is registered with the Gloo Mesh management plane. The relay agent in the workload cluster establishes two-way, mTLS-secured gRPC communication with the management server in the management cluster.

Note that for high availability, multiple instances of the gloo-mesh-mgmt-server pod run the management server. Each relay agent connects and sends its data to only one of the instances. To translate the resources that a server instance receives from the agents that are connected to it, the server instance receives:

Figure: Relay agents on workload clusters are registered with the Gloo Mesh management server on the management cluster.

Step 2: Istio mesh and resource discovery

The relay agent in the workload cluster sends a snapshot of its state to the management server. The following types of resources are included in the snapshot:

For more details on the types of resources that are included in each snapshot, see Mesh discovery by relay agents. Additionally, the agent reports the clusters that each resource is deployed to.

Figure: Relay agents discover resources in the workload cluster, and send a snapshot of resource and service mesh states to the management server.

Step 3: Gloo Mesh resources to manage Istio meshes across clusters

The translation components of the management server translate the snapshot into internal custom resources, which form a complete view of the meshes and mesh resources across all workload clusters. These discovered resources are represented in the management cluster for observability and configuration purposes.

Additionally, the management server translates the custom Gloo Mesh resources that you create in your workload clusters into Istio resources. For more information about how Gloo Mesh resources are translated to corresponding Istio resources, see Custom resource translation.

In a multicluster setup, the relay agents in workload clusters pull translated Istio resources as a snapshot, and apply the generated resources to the workload cluster. In a single cluster setup, those resources are written directly to the cluster without relay.

As you create and update more Gloo Mesh custom resources in workspaces in your workload clusters, the resource translation process occurs automatically to apply your changes. For example, when you update a Gloo Mesh custom resource, Gloo Mesh also automatically updates the generated Istio resources for you.

Figure: The management server translates the resource snapshots into Istio configurations, and relay agents pull the Istio configurations into each service mesh.

More details about agent-server communication

Now that you reviewed the overall flow of how Gloo Mesh's relay architecture works, learn more about how Gloo Mesh secures communication, translates mesh resources, and reconciles configuration updates.

Secure communication between relay agents and management server

Communication between the relay server and agent uses the gRPC protocol, secured by mTLS. When you install Gloo Mesh in the management server, a relay-identity-token-secret is created for you. You copy this secret as part of registering a workload cluster. The relay-identity-token-secret on each workload cluster must match the management cluster. To validate authenticity during registration, the agent uses simple TLS to send the token to the management server. After validating the token, the management server creates a TLS certificate for the relay agent. Then, all future communication from relay agents to the server uses this certificate for mTLS.

Note that you can use self-signed certificates or certificates from your PKI to secure server-agent communication. For more information, see the certificate management strategies for proof-of-concept or production.

Mesh discovery by relay agents

Each relay agent performs mesh discovery for the cluster that it is deployed to. The relay agent constructs a snapshot of the actual state of discovered entities in the workload cluster, including the following.

The resources in this snapshot provide the management server with the complete state of the service meshes, workloads, and destinations across the multicluster, multimesh environment. When you make cross-mesh configuration changes, the management server uses this state to create configuration updates for the workload clusters.

Configuration updates and state reconciliation

The relay agents watch for user-provided configuration updates in your workload clusters. For example, you might create a Gloo Mesh policy in one of your workload clusters. Relay agents send the snapshot of the actual state of resources in each workload cluster, and send the snapshot to a management server instance in the management cluster. Using the information that was gathered by mesh discovery from snapshots across workload clusters, the management server components automatically translate your provided Gloo Mesh resources into resources that are specific to each workload mesh. For example, if a relay agent reports that its cluster runs an Istio service, the management server translates your Gloo Mesh resources into VirtualService, DestinationRule, and AuthorizationPolicy Istio resources.

The management server then reconciles this declared state with the actual state of the workload clusters, and creates configuration updates. The relay agents pull these updates in real time from the management server. The agents apply the updated resources to the the Istio control planes in each cluster. Note that many service mesh proxies, like Envoy, rely on a polling mechanism between the control plane and the proxy instances. Therefore, any changes pulled by agents from Gloo Mesh are contingent on the polling cycle within a workload cluster's service mesh for the mesh proxy instances.