The Gloo architecture consists of a Gloo management server that is commonly installed in a management cluster, and Gloo agents that are deployed to workload clusters. The workload cluster is registered with the management cluster and hosts your apps and services. All Gloo Platform products share this architecture to provide exceptional routing, discovery, security, scalability, and observability capabilities for your apps across clusters, clouds, and environments.
In testing or single-cluster environments, you can also run the Gloo control plane and data plane components in the same cluster.
Gloo consists of a management server (sometimes called the “relay server”) for the control plane and relay agents for each workload cluster in the data plane.
After installation, a deployment named
gloo-mesh-mgmt-server runs the management server. For your workload clusters to communicate with the management server, the
gloo-mesh-mgmt-server service is automatically set up as a service of type
LoadBalancer on a default port of
9900/TCP. The server is responsible for configuring the Gloo agents in your workload cluster and maintaining the desired state of your environment. When you create Gloo custom resources, the server translates these to the appropriate open source custom resources that your Gloo product is based on, such as Istio, Envoy, or Cilium. Then, the server pushes config changes to the agents to apply in the workload clusters.
Management server replicas and clustersBy default, the management server is deployed with one replica. To increase availability, you can increase the number of replicas that you deploy in the management cluster. Additionally, you can create multiple management clusters, and deploy one or more replicas of the managent server to each cluster. For more information, see High availability and disaster recovery.
After registration, a deployment named
gloo-mesh-agent runs the relay agent on each workload cluster. The relay agent is exposed by the
gloo-mesh-agent service on the default port
9977. The relay agent does not receive inbound requests from outside the workload cluster. Unlike the management server, you do not need to configure ingress for the relay agent.
You can add replicas of the agent for higher availability. In such case, leader election affects which processes the agents handle. Logging and metric processes use the most resources, and scale as the number of services within the cluster grows.
The leader agent handles the following processes:
- Certificate management
- Pod bouncing
- API discovery for Gloo Platform Portal
- Schema reporter for Gloo Platform Portal
All agent replicas, including the leader, handle the following processes:
- Access log sink
- xDS agent for WebAssembly (Wasm)
- Verifier cache clearing for CRD watch management
Communication between the control and data planes is initiated by relay agents, which run in the workload clusters, to the management server, which runs in the management cluster. The following figures outline the general flow of how the relay agents and server communicate to keep your multimesh ennvironment up to date.
Note that these steps outline the relay process in a multicluster setup.
Management server and relay agent registration
A workload cluster is registered with the Gloo management server in the control plane. The relay agent in the workload cluster establishes two-way, mTLS-secured gRPC communication with the management server in the management cluster.
Note that for high availability, you might configure multiple replicas of the
gloo-mesh-mgmt-server pod to run the management server. Each relay agent connects and sends its data to only one of the instances. To translate the resources that a server instance receives from the agents that are connected to it, the server instance receives:
- The resources that you deploy to each registered cluster. The agents send these resources to the management server instance. The server instance stores the resources in its local pod memory, which cannot be accessed by any other management server instance.
- The resources that other agents send to other management server instances. Each server instance writes certain resources, such as Gloo policies, services, and deployments, to one
gloo-mesh-redispod for persistent storage. Other management server instances can access these resources in persistent storage to create a global view of all workloads and destinations, and to apply the appropriate policies to these workloads and destinations.
Istio mesh and resource discovery
The relay agent in the workload cluster sends a snapshot of its state to the management server. The following types of resources are included in the snapshot:
- Discovery resources, such as Kubernetes services and deployments for your apps.
- Gloo custom resources that you create.
- Istio resources that you manually create or that are created for you.
- Internal resources that describe the Istio mesh, gateway endpoints, and more.
Gloo custom resource translation
The translation components of the management server translate the snapshot into internal custom resources, which form a complete view of the meshes and mesh resources across all workload clusters. These discovered resources are represented in the management cluster for observability and configuration purposes.
Additionally, the management server translates the custom Gloo resources that you create in your workload clusters into Istio resources. For more information about how Gloo resources are translated to corresponding Istio resources, see Custom resource translation.
In a multicluster setup, the relay agents in workload clusters pull translated Istio resources as a snapshot, and apply the generated resources to the workload cluster. In a single cluster setup, those resources are written directly to the cluster without relay.
As you create and update more Gloo custom resources in workspaces in your workload clusters, the resource translation process occurs automatically to apply your changes. For example, when you update a Gloo custom resource, Gloo also automatically updates the generated Istio resources for you.
More details about agent-server communication
Now that you reviewed the overall flow of how Gloo's relay architecture works, learn more about how Gloo secures communication, translates mesh resources, and reconciles configuration updates.
Secure communication between relay agents and management server
Communication between the relay server and agent uses the gRPC protocol, secured by mTLS. When you install Gloo in the management server, a
relay-identity-token-secret is created for you. You copy this secret as part of registering a workload cluster. The
relay-identity-token-secret on each workload cluster must match the management cluster. To validate authenticity during registration, the agent uses simple TLS to send the token to the management server. After validating the token, the management server creates a TLS certificate for the relay agent. Then, all future communication from relay agents to the server uses this certificate for mTLS.
Note that you can use self-signed certificates or certificates from your PKI to secure server-agent communication. For more information, see the certificate management strategies for proof-of-concept or production.
Mesh discovery by relay agents
Each relay agent performs mesh discovery for the cluster that it is deployed to. The relay agent constructs a snapshot of the actual state of discovered entities in the workload cluster, including the following.
- Discovered Kubernetes resources, such as Kubernetes services, deployments, replicasets, daemonsets, and statefulsets. The management server translates discovered resources into Istio resources and displays them in the Gloo UI. Note that you can use Istio discovery selectors to ignore certain Kubernetes resources. Ignored resources are not included in the snapshot that is sent from the agent to the management server.
- Gloo custom resources that you create. The management server translates Gloo resources into Istio resources and displays them in the Gloo UI.
- Istio resources, including:
- Istio resources that, after initial server-agent setup, the management server automatically translates from your Gloo resources and writes back to the workload cluster. These resources are included in the snapshot to avoid accidentally deleting them from the workload cluster if an agent disconnects and reconnects, and to display them in the Gloo UI.
- Any Istio resources that you manually created, so that they can be displayed in the Gloo UI.
- Internal resources that are computed in memory by each agent and pushed to the management server without being persisted in the workload cluster. Internal resources include:
Gatewayresources, which contain information about gateway endpoints in the cluster.
CertificateRequestresources, which are used in internal multi-step workflows that involve both the agent and the management server.
The resources in this snapshot provide the management server with the complete state of the service meshes, workloads, and destinations across the multicluster, multimesh environment. When you make cross-mesh configuration changes, the management server uses this state to create configuration updates for the workload clusters.
Configuration updates and state reconciliation
The relay agents watch for user-provided configuration updates in your workload clusters. For example, you might create a Gloo policy in one of your workload clusters. Relay agents send the snapshot of the actual state of resources in each workload cluster, and send the snapshot to a management server instance in the management cluster. Using the information that was gathered by mesh discovery from snapshots across workload clusters, the management server components automatically translate your provided Gloo resources into resources that are specific to each workload mesh. For example, if a relay agent reports that its cluster runs an Istio service, the management server translates your Gloo resources into
AuthorizationPolicy Istio resources.
The management server then reconciles this declared state with the actual state of the workload clusters, and creates configuration updates. The relay agents pull these updates in real time from the management server. The agents apply the updated resources to the Istio control planes in each cluster. Note that many service mesh proxies, like Envoy, rely on a polling mechanism between the control plane and the proxy instances. Therefore, any changes pulled by agents from Gloo are contingent on the polling cycle within a workload cluster's service mesh for the mesh proxy instances.
For more information and diagrams of the configuration update flow, see Gloo Mesh scalability threshold definition.