To support multicluster setups and provide exceptional multicluster routing, discovery, security, scalability, and observability capabilities across clusters, Gloo Mesh Gateway, Gloo Mesh Enterprise, Gloo Mesh Core, and Gloo Network for Cilium share the same relay architecture. The relay architecture consists of a Gloo management server and agents. The management server is commonly installed in a standalone management cluster. The agents are installed in each workload cluster. Workload clusters host your apps, and are registered with the management cluster.

Relay components

Gloo consists of a management server (sometimes called the “relay server”) for the management plane and relay agents for each workload cluster in the data plane.

Management server

After installation, a deployment named gloo-mesh-mgmt-server runs the management server. For your workload clusters to communicate with the management server, the gloo-mesh-mgmt-server service is automatically set up as a service of type LoadBalancer on a default port of 9900/TCP. The server is responsible for configuring the Gloo agents in your workload cluster and maintaining the desired state of your environment. When you create Gloo custom resources, the server translates these to the appropriate open source custom resources that your Gloo product is based on, such as Istio, Envoy, or Cilium. Then, the server pushes config changes to the agents to apply in the workload clusters.

Management server replicas and clusters

By default, the management server is deployed with one replica. To increase availability, you can increase the number of replicas that you deploy in the management cluster.

Relay agents

After registration, a deployment named gloo-mesh-agent runs the relay agent on each workload cluster. The relay agent is exposed by the gloo-mesh-agent service on the default port 9977. The agents send snapshots of the resources from each workload cluster to the management server.

Agent replicas

You can add replicas of the agent for higher availability. In such case, leader election affects which processes the agents handle. Logging and metric processes use the most resources, and scale as the number of services within the cluster grows.

The leader agent handles the following processes:

  • Relay
  • Discovery
  • Certificate management
  • Pod bouncing
  • API discovery for Gloo Portal
  • Schema reporter for Gloo Portal

All agent replicas, including the leader, handle the following processes:

  • Access log sink
  • xDS agent for WebAssembly (Wasm)
  • Verifier cache clearing for CRD watch management

Relay communication

Communication between the management server and agents is initiated by the Gloo relay agents, which run in the workload clusters. The following figures outline the general flow of how the relay agents and server communicate to keep your configurations and environment up to date.

Note that these steps outline the relay process in a multicluster setup.

Workload cluster registration

The following image shows the process for registering a workload cluster with the management cluster.

Figure: Registration of Gloo agents with the Gloo management server
  1. During the workload cluster registration process, the Gloo agent in the workload cluster establishes a simple TLS connection to one of the Gloo management server replicas and sends a relay identity token. The relay identity token is a string value that is either generated during the installation of the Gloo management server and copied to the workload cluster, or provided by you in the form of a Kubernetes secret (mTLS setup) or environment variable (TLS setup) to the Gloo management server and agent.
  2. The Gloo management server verifies the relay identity token value. If the agent’s relay identity token matches the management server’s token, the management server registers the agent. In mTLS setups, the management server also issues a client TLS certificate and sends it to the agent.
  3. The Gloo agent initiates a TLS handshake. In mTLS setups, the Gloo agent verifies the identity of the Gloo management server and presents the client TLS certificate to prove its own identity. Only if both parties can be verified successfully, the TLS handshake is complete and the mTLS connection is established.

After the connection is established, the Gloo agent gathers and sends its first discovery input snapshot. For more information about this process, see Custom resource translation.

Management server and agent communication

The relay architecture provides several options for how you can secure the connection between the Gloo management server and agents. By default, Gloo generates self-signed TLS certificates that can be used to establish a simple or mutual TLS connection. However, you can choose to provide your own TLS certificates.

For an overview of supported options, see the Setup options.

Gloo custom resource translation and reconciliation

Learn how Gloo custom resources are translated into Istio, Envoy, or Cilium resources and applied in each workload cluster.

Translation process

The Gloo management server and agents use relay input and output snapshots to exchange information and reconcile the resources that must be applied in each cluster. The input snapshot includes all Gloo resources that the Gloo agent discovered on its local cluster as illustrated in the Discovery input snapshot tab. The Gloo management server translates these resources into Istio, Envoy, or Cilium resources and sends them as an output snapshot to the Gloo agent. To learn more about this process, see the Translation and output snapshot tab.

Translation process with cluster context

Review the following flow diagrams to learn more about how Gloo custom resources are translated when you apply Gloo resources in the management versus the workload clusters.

For more information about how Gloo translates custom resources, see Custom resource translation.

Translation failure scenarios

Translation can fail when individual components of the Gloo relay architecture become unavailable, such as during a restart. Review the following scenarios to understand the impact.

Redis becomes unavailable

Redis is a key component of the Gloo management plane and serves as the single source of truth for translating Gloo custom resources into Istio, Envoy, or Cilium resources. In the event that Redis becomes unavailable, translation continues as the Gloo management server uses the input snapshots that are stored in its local memory.

Flip through the following cards to learn more about this failure scenario.

Gloo agent disconnects

To translate Gloo custom resources based on the complete context of all workload clusters, each connected Gloo agent must send a discovery input snapshot with all discovered entities to the Gloo management server.

Flip through the cards to learn how translation continues if one of the Gloo agents disconnects.

Gloo management server replica unavailable

When the Gloo agent connects to the Gloo management server, a management server replica is assigned and used to establish the simple TLS or mutual TLS connection with the Gloo agent. The Gloo agent then sends an input snapshot that contains all discovered entities from its local cluster.

Flip through the cards to learn how translation continues if the assigned Gloo management server replica becomes unavailable.

Redis and Gloo management server restart

In the following scenario, Redis and the Gloo management server become unavailable at the same time, which removes all input snapshots from the Redis cache and the management server’s local memory.

Flip through the cards to learn more about this failure scenario.