Relay architecture
The Gloo architecture consists of a Gloo management server that is commonly installed in a management cluster, and Gloo agents that are deployed to workload clusters. The workload cluster is registered with the management cluster and hosts your apps and services. All Gloo Platform products share this architecture to provide exceptional routing, discovery, security, scalability, and observability capabilities for your apps across clusters, clouds, and environments.
In testing or single-cluster environments, you can also set up the Gloo management components in a cluster and register the management cluster as a workload cluster at the same time.
Gloo components
Gloo consists of a management server (sometimes called the “relay server”) for the management plane and relay agents for each workload cluster in the data plane.
Gloo management server
After installation, a deployment named gloo-mesh-mgmt-server
runs the management server. For your workload clusters to communicate with the management server, the gloo-mesh-mgmt-server
service is automatically set up as a service of type LoadBalancer
on a default port of 9900/TCP
. The server is responsible for configuring the Gloo agents in your workload cluster and maintaining the desired state of your environment. When you create Gloo custom resources, the server translates these to the appropriate open source custom resources that your Gloo product is based on, such as Istio, Envoy, or Cilium. Then, the server pushes config changes to the agents to apply in the workload clusters.
Gloo relay agents
After registration, a deployment named gloo-mesh-agent
runs the relay agent on each workload cluster. The relay agent is exposed by the gloo-mesh-agent
service on the default port 9977
. The agents send snapshots of the Gloo resources from each workload cluster to the management server. Because the relay agent does not serve external requests directly, you do not need to configure an ingress gateway for the workload cluster.
Agent replicas
You can add replicas of the agent for higher availability. In such case, leader election affects which processes the agents handle. Logging and metric processes use the most resources, and scale as the number of services within the cluster grows.
The leader agent handles the following processes:
- Relay
- Discovery
- Certificate management
- Pod bouncing
- API discovery for Gloo Platform Portal
- Schema reporter for Gloo Platform Portal
All agent replicas, including the leader, handle the following processes:
- Access log sink
- xDS agent for WebAssembly (Wasm)
- Verifier cache clearing for CRD watch management
Agent-server communication
Communication between the management server and agents is initiated by the Gloo relay agents, which run in the workload clusters. The following figures outline the general flow of how the relay agents and server communicate to keep your configurations and ennvironment up to date.
Note that these steps outline the relay process in a multicluster setup.
Management server and relay agent registration
A workload cluster is registered with the Gloo management plane. The relay agent in the workload cluster establishes two-way, mTLS-secured gRPC communication with the management server in the management cluster.
Note that for high availability, you might configure multiple replicas of the gloo-mesh-mgmt-server
pod to run the management server. Each relay agent connects and sends its data to only one of the instances. To translate the resources that a server instance receives from the agents that are connected to it, the server instance receives:
- The resources that you deploy to each registered cluster. The agents send these resources to the management server instance. The server instance stores the resources in its local pod memory, which cannot be accessed by any other management server instance.
- The resources that other agents send to other management server instances. Each server instance writes certain resources, such as Gloo policies, services, and deployments, to one
gloo-mesh-redis
pod for persistent storage. Other management server instances can access these resources in persistent storage to create a global view of all workloads and destinations, and to apply the appropriate policies to these workloads and destinations.
Agent snapshots
The relay agent in the workload cluster sends a snapshot of its state to the management server. The following types of resources are included in the snapshot:
- Discovered Kubernetes resources, such as Kubernetes services and deployments for your apps.
- Gloo custom resources that you create.
- Cilium resources that you manually create or that are created for you.
Gloo custom resource translation
When you create a configuration, the management server translates the configuration into other Gloo custom resources or resources that are specific to the open source project that the Gloo product is based on, such as Istio, Envoy, or Cilium. These resources form a complete view of your Gloo environment and are stored in the built-in Redis database.
The Gloo agent pulls translated custom resources as a snapshot from the Gloo management server, and applies the resources in the workload cluster. In a single cluster setup, these resources are written directly to the cluster without relay.
More details about agent-server communication
Now that you reviewed the overall flow of how Gloo's relay architecture works, learn more about how Gloo secures communication, translates mesh resources, and reconciles configuration updates.
Secure communication between relay agents and management server
Communication between the relay server and agent uses the gRPC protocol, and is secured by mTLS. When you install Gloo in the management server, a relay-identity-token-secret
is created for you. You copy this secret as part of registering a workload cluster. The relay-identity-token-secret
on each workload cluster must match the management cluster. To validate authenticity during registration, the agent uses simple TLS to send the token to the management server. After validating the token, the management server creates a TLS certificate for the relay agent. Then, all future communication from relay agents to the server uses this certificate for mTLS.
Resource discovery by relay agents
Each relay agent performs resource discovery for the cluster that it is deployed to. The relay agent constructs a snapshot of the actual state of discovered entities in the workload cluster, including the following.
- Discovered Kubernetes resources, such as Kubernetes services, deployments, replicasets, daemonsets, and statefulsets.
- Gloo custom resources that you create. The management server translates Gloo resources into Cilium resources and displays them in the Gloo UI.
- Cilium resources, such as Cilium endpoints and network policies.
- Internal resources that are computed in memory by each agent and pushed to the management server without being persisted in the workload cluster, such as
IssuedCertificate
andCertificateRequest
resources.
The resources in this snapshot provide the management server with the complete state of the cluster, workloads, and destinations.
Configuration updates and state reconciliation
The relay agents watch for user-provided configuration updates in your workload clusters. For example, you might create a Gloo policy in one of your workload clusters. Relay agents then create a snapshot of all Gloo and open source custom resources in the workload cluster, and send the snapshot to the management server in the management cluster. The management server uses this information to decide what custom resources must be created. For example, if you create a Gloo access policy, the policy is translated into a Cilium network policy.
The management server then reconciles this declared state with the actual state of the workload clusters, and creates configuration updates. The relay agents pull these updates in real time from the management server. The agents apply the updated resources in each cluster.