About ambient mesh
Explore an Istio service mesh without sidecars by running an ambient mesh with Gloo Mesh.
An ambient mesh removes the requirement of running a sidecar alongside each app in your mesh. Instead, you use node-level ztunnels to route Layer 4 traffic between apps, and waypoint proxies to enforce Layer 7 traffic policies whenever needed.
What is ambient mesh?
Solo collaborated with Google to develop ambient mesh, a new “sidecarless” architecture for the Istio service mesh. This architecture reduces the complexity of adopting a service mesh. You no longer have to inject a sidecar into your apps. But, you still get the benefits of Istio, such as secure mutual TLS (mTLS) for pod-to-pod communication.
Ambient mesh removes the sidecar from each pod for your apps. Instead, Istio uses node-level ztunnels to route Layer 4 traffic between apps. Waypoint proxies enforce Layer 7 traffic policies whenever needed. To onboard apps into the mesh, you simply label the namespace the app belongs to. Because no sidecar must be injected, you don’t need to worry about restarting or reconfiguring your apps. Your apps automatically become part of the ambient mesh.
What is the difference between a sidecarless and sidecar architecture?
You might wonder if you should use a sidecar or sidecarless architecture. Your approach depends on the requirements that your apps need to meet. These requirements include security, visibility, and lifecycle operations. You can even use both modes of Istio in the same service mesh. With this interoperable approach, you can use the mode that is best for your app. Let’s take a look at the benefits each architecture offers.
For detailed information about ambient mesh, see this Istio blog.
Ambient (sidecarless) architecture
The sidecarless architecture in an ambient mesh means that you do not need a sidecar in the same pod as your app. Communication between pods is still secured via mutual TLS (mTLS). This approach reduces your lifecycle operations in several ways, including:
- Fewer cluster resources to run your workloads
- Easier onboarding with no app configuration changes
- Simpler lifecycle operations by not having to restart the app
By default, ambient mesh routes traffic over Layer 4 of the OSI networking stack. Waypoint proxies are used for Layer 7 only when needed. This networking approach also reduces complexity, including:
- Simpler service mesh architecture
- Smaller risk of CVEs on Layer 4 vs. Layer 7
- Greater network performance in the service mesh
Sidecar architecture
A sidecar architecture, on the other hand, uses sidecars that run in each app’s pod. This approach gives several security benefits, such as:
- Stronger pod identity that enforces mTLS encryption per pod
- Reduced vulnerability surface, because any compromise impacts only that app
Network traffic is always on Layer 7 of the OSI networking stack. This way, you get greater visibility into the service mesh to help with things such as:
- Detecting bottlenecks
- Troubleshooting issues
- Improving workload parameters, such as resource requests and limits
Why ambient and Gloo Mesh?
Running ambient workloads in Gloo Mesh provides the following benefits:
- Observability with the Gloo UI and built-in Prometheus: Get instant access to L4 and L7 metrics for ambient workloads and visualize them with the Gloo UI. Metrics are automatically collected by the ztunnels and waypoint proxies, and are scraped by the built-in Prometheus server. You can run PromQL queries in Prometheus to analyze the metrics and monitor the traffic in your ambient mesh.
- Support for any CNI: You can run Gloo Mesh in ambient mode on any CNI, such as Cilium, Calico, or cloud provider-specific CNIs.
- Ambient insights: Gloo Mesh comes with an insights engine that automatically analyzes your Istio setups for health issues. These issues are displayed in the UI along with recommendations to harden your Istio setups. The insights give you a checklist to address issues that might otherwise be hard to detect across your environment. For more information, see [Insights](
/gloo-mesh/2.7.x//setup/insights/
).
Ready to move to an ambient mesh? Check out the following guides and resources to get started.
- Quickly install a demo deployment of an ambient mesh with the Gloo Operator.
- If you want to plan a migration from your existing sidecar mesh to an ambient mesh, review the ambient migration guide. This guide uses the Solo.io ambient migration tool to provide a prescriptive migration path based on your existing environment.
- Check out the free Ambient Estimator Tool, which assesses your Istio environment to estimate potential cost savings from migrating from sidecars to a sidecarless mesh architecture.
Components
Review the key components that make up the ambient mesh architecture. You can also watch Solo’s Christian Posta explain key ambient mesh features in a demo video.
ztunnel
The ztunnel is a zero trust, lightweight proxy that handles only Layer 4 traffic in the ambient mesh. It is deployed as a daemon set on every node of the cluster. All ingoing and outgoing traffic to the pods is automatically intercepted and secured by the ztunnel socket that is exposed on the pod. The socket is configured by the ztunnel that is co-located on the same node as the pod. The ztunnel socket forwards traffic to the ztunnel socket of the target pod. If the target pod is located on a different node, its ztunnel socket is configured by the ztunnel instance that is co-located on the same node as the pod. The communication between the ztunnel sockets is secured via mutual TLS.
Waypoint proxy
The waypoint proxy is a Layer 7 proxy that is shared between apps in the same service account. To enforce a Layer 7 policy for an app, you must manually deploy a waypoint proxy to the cluster. Then, when a request is sent to a target app that has L7 policies applied, the request is forwarded from the client pod’s ztunnel socket to the waypoint proxy. The waypoint proxy enforces the L7 policy and collects L7 metrics before the request is forwarded to the ztunnel socket of the target app. Traffic between the ztunnel socket and the waypoint proxy is secured via mTLS by default.
Istio CNI plug-in
The Istio CNI plug-in is deployed as a daemon set on every node of the cluster and monitors all pods that are created or removed from the ambient mesh. For all pods that participate in the ambient mesh, the CNI plug-in configures the redirect from the app to the ztunnel.
Istio control plane
The Istio control plane istiod
rolls out the ambient mesh configuration to the ztunnels and waypoint proxies in the cluster and keeps this configuration up-to-date. To enable mTLS connections between ztunnel sockets and waypoint proxies, the control plane generates the TLS certificates. At the same time, the control plane acts as a Certificate Authority (CA) to sign the certificates. The certificates are used by the ztunnel sockets and waypoint proxies to do mutual TLS authentication.
East-west gateway
This feature requires your mesh to be installed with the Solo distribution of Istio and an Enterprise-level license for Gloo Mesh. Contact your account representative to obtain a valid license.
In a multicluster ambient mesh, one east-west gateway exists on each cluster in the setup. When a request from an ingress gateway or in-mesh app in one cluster must be sent to an in-mesh app in another cluster, the request is first sent to the east-west gateway of the target app’s cluster. The east-west gateway is implemented as a ztunnel, and uses “double HBONE” to facilitate routing. In ambient mode, HBONE is used as the transport protocol. When a request is sent to the east-west gateway, double HBONE opens a tunnel that uses an outer mTLS layer and inner mTLS layer. The east-west gateway terminates the outer layer of the tunnel, along with the outer mTLS connection from the app or gateway that sent the request, and performs authentication. The east-west gateway then opens an inner mTLS connection to the ztunnel socket of the target app to securely forward the request.
Enterprise features
The Solo distribution of Istio includes basic built-in features by default, and numerous other features that you can unlock with an Enteprise-level license for Gloo Mesh. Review the following features that are supported when you install your ambient mesh with the Solo distribution of Istio and a Gloo Mesh Enterprise license.
Multicluster mesh
Create an ambient mesh setup across multiple clusters. In Gloo Mesh, you can deploy an ambient mesh to each workload cluster, create an east-west gateway in each cluster, and link the istiod control planes across cluster networks by using peering gateways. This feature allows you to easily route between ambient mesh services across clusters.
- To get started, you can install and link new ambient meshes. Alternatively, you can use automated cluster peering with the Gloo Mesh management plane to link multiple clusters.
- You can then make services available across clusters with simple global service naming and routing.
Node-based Layer 7 monitoring
Extract Layer 7 attributes from traffic requests that are routed through ztunnels. These L7 attributes can be used for:
- L7 metrics, such as
istio_requests_total
andistio_request_duration_milliseconds
, with labels that are based on the L7 attributes (such asResult Code
). - Access logs, based on L7 attributes in addition to the existing access log information.
- Distributed tracing, such as to manage information about spans and their tags.
For example, you can check out Gloo Mesh observability guides that include ztunnel steps, such as the Add Istio request traces guide.
SPIRE integration
SPIRE offers robust workload attestation capabilities that provide significantly more controls around how, when, and if identities are granted to workloads. The Solo distribution of Istio includes Enterprise support for using SPIRE node agents (over an Envoy SDS socket) to attest and grant identities to the ambient mesh workloads they proxy. This allows Istio to use these identities for mTLS connections between the ambient mesh workloads.
With the SPIRE integration, the ztunnel can act as a trusted spire-agent
delegate on the node by using the SPIRE DelegatedIdentity API. Ztunnel can integrate with SPIRE to leverage SPIRE’s existing node and workload attestation plugin framework directly, as well as request workload certificates that are issued by SPIRE on the basis of those attestations.
To get started, check out Secure workload identites with SPIRE.
Ambient-sidecar interoperability
In some cases, you might want to run a hybrid architecture in which you use ambient mesh components, such as waypoints and ztunnels, to provide mesh capabilities broadly across your environment while continuing to run sidecars for some of your pods, depending on your app’s requirements. This interoperable architecture is often implemented only temporarily or for specific use cases.
Waypoint interoperability for ingress and sidecars
In the community distribution of ambient mesh, Envoy-based proxies such as ingress gateways and sidecar-enabled workloads can communicate with ambient mesh components via the HBONE protocol by default. However, Envoy-based proxies cannot utilize waypoint proxies. This means that traffic between Envoy-based proxies and ambient mesh components that requires policy application either fail or do not perform correctly, because the expected policies enforced by the waypoint proxy do not apply to requests from the Envoy-based proxies.
In the Solo distribution of Istio, interoperability between waypoints and sidecars and between waypoints and ingress proxies is supported.
- Sidecars: Client policies, such as routing rules, are applied either at the client sidecar, or at one waypoint if a waypoint exists. When a sidecar detects that a request must be sent to a service that uses a waypoint, the sidecar disables any client-side policies (such as skipping virtual services), and instead sends the request directly to the waypoint. Note that policy enforcement for some sidecar-inbound policies that can also be applied at the service’s waypoint might behave differently.
- Ingress: By default, ingress proxies cannot determine the destination service for a request without applying the route. For this reason, the
istio.io/ingress-use-waypoint: true
label must be applied to any services that should respect waypoint proxies, so that the ingress proxy can send requests for the service’s route directly to the waypoint.
Interoperability scenarios
Because interoperability is supported, you can use the Solo distribution of Istio to run both sidecar and ambient components in your cluster. Typically, interoperability facilitates hybrid mesh setups for temporary reasons, such as during migration. However, some more permanent solutions might require a hybrid mesh as well. You might leverage interoperability in one of the following scenarios:
- Migration: You can peform a zero-downtime migration from a sidecar mesh to an ambient mesh, in which ambient waypoints can route to sidecars during the migration process.
- Multicluster mesh: In some cases, interoperability can be used intentionally, such as in multicluster sidecar mesh setups. In this setup, you install the Istio ambient components in each workload cluster to successfully create east-west gateways and establish multicluster peering, even if you plan to use a sidecar mesh. However, sidecar mesh setups continue to use sidecar injection for your workloads, and your workloads are not added to an ambient mesh.
Considerations for policy enforcement
Policy enforcement is an integral consideration when planning a sidecar-ambient architecture. Traffic policies that are applied at the sidecar-inbound level in a sidecar mesh are typically instead applied at the waypoint level in an ambient mesh. Because of this, in a setup in which your workload uses both a waypoint and a sidecar, the same policy might apply to traffic requests twice, such as in the following scenarios.
- When you use the migration tool to move workloads from sidecar to ambient, you migrate to an ambient setup in multiple phases. One phase includes activating waypoints while retaining sidecars in workloads, thereby starting a brief period of time in which traffic from a waypoint is sent to both sidecar- and ambient-enabled workloads. The tool then analyzes your sidecar-inbound policies, and makes suggestions for adjusting the policies to apply them at the waypoint. The period of overlap time ends when you remove the sidecars from workloads in the final phase, and remove the old sidecar-inbound policies that are now enforced only at the waypoint.
- You might have multiple clusters in a mesh setup, in which services in an ambient-mesh cluster make requests to services in a sidecar-mesh cluster. If both clusters define a global service, and a waypoint is defined in the ambient cluster, a request from the ambient cluster might pass through its local waypoint for policy enforcement, before being routed to the workload on the remote cluster where the policy is enforced again at the sidecar.
A policy that is applied at both the waypoint and sidecar-inbound level, thereby being applied to traffic requests twice, might have the following levels of issues:
- None: Policy is fully idempotent and causes no issues or changes to the traffic flow, such as matching a header for routing.
- Potential: Policy might cause issues, but does not necessarily break the traffic flow. For example, a rate limit policy might count requests twice and allow for half of the effective rate limit quota.
- Breaking: Policy completely breaks the traffic flow. For example, a PeerAuthentication policy that disables mTLS for a workload in a sidecar setup does not function in an ambient mesh, in which encryption with mTLS is always enabled through the HBONE communication protocol.
Review the following table to understand the effects of double policy enforcement for each resource type.
If you plan to use the migration tool to move to an ambient mesh, note that the tool automatically detects any problems that double policy enforcement might cause, and makes suggestions for how to adjust the resources accordingly. This table is provided for your general awareness of the effects of policy enforcement at both the waypoint and sidecar, or to help you plan for more permanent hybrid mesh setups.
Configuration | Issue level | Effects |
---|---|---|
AuthorizationPolicy | None | Generally, double authorization policy application causes no issues. For specific changes you might need to make during migration from sidecar to ambient, see authorization policy migration. |
AuthorizationPolicy with a custom ExtensionProvider | Breaking | Only one extension provider is permitted per workload. In a sidecar setup, you might have multiple policies with different providers, that are individually applied to different workloads. However, when you apply an AuthorizationPolicy with a custom ExtensionProvider to a waypoint, you can define only one provider for all workloads that the waypoint serves. This can cause an issue in which only one provider, which is potentially the incorrect provider for a workload, is applied to traffic that is routed through the waypoint for that workload. |
Global rate limit | Potential | Because both the waypoint and sidecar send rate limit requests to the global rate limiting server, both affect its state. This can result in counting requests twice and allowing for only half of the effective rate limit quota. Update the quotas to account for double the amount of requests. |
Local rate limit | None | Local rate limiting is evaluated independently by Envoy on both the waypoint and sidecar, so requests are not counted twice. |
PeerAuthentication with DISABLE | Breaking | In ambient mode, traffic between ztunnel node agents uses the HBONE protocol, which includes encryption with mTLS. Because of this, the DISABLE mTLS mode is not supported in ambient and cannot be applied in a PeerAuthentication policy on the waypoint. You can alternatively use STRICT mode to ensure that connections cannot bypass the mesh. |
RequestAuthentication with forwardOriginalToken | Breaking | If forwardOriginalToken is set to false (default setting), Envoy removes the token header after successful validation. This can cause requests that successfully authenticate at the waypoint to fail authentication at the sidecar, because the authentication token is removed before it is forwarded from the waypoint to the sidecar. Set forwardOriginalToken to true to ensure the token is forwarded. |
RequestAuthentication with outputClaimToHeaders | None | If outputClaimToHeaders is set, the header is overwritten if it already exists in the request. |
RequestAuthentication with outputPayloadToHeader | Potential | If outputPayloadToHeader is set, the payload is added twice, which results in a multi-value header. This can cause an issue if the upstream service that consumes the header cannot handle multi-value headers. |
Trace sampling percentage | None | The trace sampling rate set in the randomSamplingPercentage can increase because the sampling decision is made by both the waypoint and sidecar. If a request is sampled for tracing by the waypoint, the x-b3-sampled header is added, and the request is also sampled by the sidecar to emit a trace. If a request is not sampled by the waypoint, it might still be sampled at the sidecar according to the configured rate. |