OpenTelemetry tracing
Enable OpenTelemetry (OTel) tracing capabilities to obtain visibility and track requests as they pass through your API gateway to distributed backends.
OTel provides a standardized protocol for reporting traces, and a standardized collector through which to recieve metrics. Additionally, OTel supports exporting metrics to several types of distributed tracing platforms. For the full list of supported platforms, see the OTel GitHub respository.
Set up OpenTelemetry tracing
To get started, deploy an OTel collector and agents to your Gloo Gateway cluster to trace requests, and modify your gateway proxy with the OTel tracing configuration. Then, use a tracing provider to collect and visualize the sampled spans.
The OTel integration is supported as a beta feature in Gloo Gateway 1.13.0 and later. This guide uses the Zipkin tracing platform as an example to show how to set up tracing with OTel in Gloo Gateway. To set up other tracing platforms, refer to the platform-specific documentation.
Before you begin: Create or update your Gloo Gateway installation to version 1.13.0 or later.
-
Download the otel-config.yaml file, which contains the configmaps, daemonset, deployment, and service for the OTel collector and agents. You can optionally check out the contents to see the OTel collector configuration.
- For example, in the
otel-collector-conf
configmap that begins on line 92, thedata.otel-agent-config.receivers
section enables gRPC and HTTP protocols for data collection. Thedata.otel-agent-config.exporters
section enables logging data to Zipkin for tracing and to the Gloo Gateway console for debugging. - In the
otel-collector
deployment, you can comment out the ports that begin on line 194 so that only the tracing platform you want to use is enabled, such as Zipkin for this guide. - For more information about this configuration, see the OTel documentation. For more information and examples about the exporters you can configure, see the OTel GitHub repo.
cd ~/Downloads open otel-config.yaml
- For example, in the
-
Install the OTel collector and agents into your cluster.
kubectl apply -n gloo-system -f otel-config.yaml
-
Verify that the OTel collector and agents are deployed in your cluster. Because the agents are deployed as a daemonset, the number of agent pods equals the number of worker nodes in your cluster.
kubectl get pods -n gloo-system
Example output:
NAME READY STATUS RESTARTS AGE discovery-db9fbdd-4wsg8 1/1 Running 0 3h gateway-proxy-b5995db59-bvl9d 1/1 Running 0 3h gloo-56c78bb857-6v7vz 1/1 Running 0 3h otel-agent-dpdmd 3/3 Running 0 35s otel-collector-64d8c966c5-ptpfn 1/1 Running 0 35s
-
Install Zipkin, which receives tracing data from the Zipkin exporter in your OTel setup.
kubectl -n gloo-system create deployment --image openzipkin/zipkin zipkin kubectl -n gloo-system expose deployments/zipkin --port 9411 --target-port 9411
-
Create the following Gloo Gateway
Upstream
,Gateway
, andVirtualService
custom resources.- The
Upstream
defines the OTel network address and port that Envoy reports data to. - The
Gateway
resource modifies your default HTTP gateway proxy with the OTel tracing configuration, which references the OTel upstream. - The
VirtualService
defines a direct response action so that requests to the/
path respond withhello world
for testing purposes.
kubectl apply -f- <<EOF apiVersion: gloo.solo.io/v1 kind: Upstream metadata: name: "opentelemetry-collector" namespace: gloo-system spec: # REQUIRED FOR OPENTELEMETRY COLLECTION useHttp2: true static: hosts: - addr: "otel-collector" port: 4317 --- apiVersion: gateway.solo.io/v1 kind: Gateway metadata: labels: app: gloo name: gateway-proxy namespace: gloo-system spec: bindAddress: '::' bindPort: 8080 httpGateway: options: httpConnectionManagerSettings: tracing: openTelemetryConfig: collectorUpstreamRef: namespace: "gloo-system" name: "opentelemetry-collector" --- apiVersion: gateway.solo.io/v1 kind: VirtualService metadata: name: default namespace: gloo-system spec: virtualHost: domains: - '*' routes: - matchers: - prefix: / directResponseAction: status: 200 body: 'hello world' EOF
- The
-
In three separate terminals, port-forward and view logs for the deployed services.
- Port-forward the gateway proxy on port 8080.
kubectl -n gloo-system port-forward deployments/gateway-proxy 8080
- Port-forward the Zipkin service on port 9411.
kubectl -n gloo-system port-forward deployments/zipkin 9411
- Open the logs for the OTel collector.
kubectl -n gloo-system logs deployments/otel-collector -f
- Port-forward the gateway proxy on port 8080.
-
In your original terminal, send a request to
http://localhost:8080
.curl http://localhost:8080
-
In the OTel collector logs, notice the trace for your request that was printed to the log, such as the following example.
2023-01-13T17:20:20.907Z info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 1} 2023-01-13T17:20:20.907Z info ResourceSpans #0 Resource SchemaURL: ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope Span #0 Trace ID : 64dbb2328b98cc8d74dcc9be575ff8cb Parent ID : ID : 53e50df3bd752d67 Name : ingress Kind : Server Start time : 2023-01-13 17:20:20.479033 +0000 UTC End time : 2023-01-13 17:20:20.479216 +0000 UTC Status code : Unset Status message : Attributes: -> node_id: Str(gateway-proxy-b5995db59-bvl9d.gloo-system) -> zone: Str() -> guid:x-request-id: Str(b843b297-848a-95f0-b824-099a521f5c84) -> http.url: Str(http://localhost:8080/) -> http.method: Str(GET) -> downstream_cluster: Str(-) -> user_agent: Str(curl/7.79.1) -> http.protocol: Str(HTTP/1.1) -> peer.address: Str(127.0.0.1) -> request_size: Str(0) -> response_size: Str(11) -> component: Str(proxy) -> http.status_code: Str(200) -> response_flags: Str(-) {"kind": "exporter", "data_type": "traces", "name": "logging"}
Note that the
Status code
in theSpan
of the above trace isUnset
. This is the recommended value in accordance with the OpenTelemetry semantic conventions, which state:Span Status MUST be left unset if HTTP status code was in the 1xx, 2xx or 3xx ranges, unless there was another error (e.g., network error receiving the response body; or 3xx codes with max redirects exceeded), in which case status MUST be set to Error. … For HTTP status codes in the 5xx range, as well as any other code the client failed to interpret, span status MUST be set to Error.
To observe this behavior, update the
status
that is returned by thedirectResponseAction
to500
. Subsequent requests that are sent to this endpoint receive aStatus code: Error
instead ofStatus code: Unset
. -
In the Zipkin web interface, click Run query to view traces for your requests, and click Show to review the details of the trace.
Customize the span name
Gloo supports changing the default span name by using the transformation filter. The following steps show an example name change.
-
Change the span name by modifying your Virtual Service. The following sample configuration in a Virtual Service changes the name to the host header in the
text: '{{header("Host")}}'
field. Note because thespanTransformer.name
field is an Inja template, you can use header values and any other macro logic that is supported in transformations. For more information, see the transformation documentation.apiVersion: gateway.solo.io/v1 kind: VirtualService metadata: name: default namespace: gloo-system spec: virtualHost: domains: - '*' routes: - matchers: - prefix: / directResponseAction: status: 200 body: 'hello world' options: stagedTransformations: regular: requestTransforms: - requestTransformation: transformationTemplate: spanTransformer: name: text: '{{header("Host")}}'
-
After you apply the updated Virtual Service, you can verify that the span name in the OpenTelemetry collector now is set to the value of the
Host
header by sending a curl request.curl -H "Host: my-hostname" http://localhost:8080
In the output, verify that the following span appears:
2024-11-04T20:18:08.656Z info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 1} 2024-11-04T20:18:08.656Z info ResourceSpans #0 Resource SchemaURL: Resource attributes: -> service.name: Str(gateway-proxy) ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope Span #0 Trace ID : e42f4fa3ba873eefeb26b3b16112dc18 Parent ID : ID : 21547202617791ee Name : my-hostname Kind : Server Start time : 2024-11-04 20:18:06.697813 +0000 UTC End time : 2024-11-04 20:18:06.699024 +0000 UTC Status code : Unset Status message : Attributes: -> node_id: Str(gateway-proxy-577544cdcd-c6rvm.gloo-system) -> zone: Str() -> guid:x-request-id: Str(d1c5b217-e87b-95ae-b449-56c40eaa879c) -> http.url: Str(http://echo-server/) -> http.method: Str(GET) -> downstream_cluster: Str(-) -> user_agent: Str(curl/7.88.1) -> http.protocol: Str(HTTP/1.1) -> peer.address: Str(127.0.0.1) -> request_size: Str(0) -> response_size: Str(313) -> component: Str(proxy) -> upstream_cluster: Str(echo-server_gloo-system) -> upstream_cluster.name: Str(echo-server_gloo-system) -> http.status_code: Str(200) -> response_flags: Str(-) {"kind": "exporter", "data_type": "traces", "name": "logging"}
Note that in this example, the span name was modified at the level of the virtual host, so all routes under the virtual host will exhibit the same span naming pattern. However, it is also possible to override this logic on a per-route basis using the routeDescriptor
field:
routes:
- matchers:
- prefix: /route2
options:
autoHostRewrite: true
tracing:
routeDescriptor: CUSTOM_ROUTE_DESCRIPTOR
routeAction:
single:
upstream:
name: echo-server
namespace: gloo-system
When this configuration is applied, requests for the /route2
endpoint will result in spans being reported to the OpenTelemetry collector with the span name set to "CUSTOM_ROUTE_DESCRIPTOR"
. However, it is also important to note that routeDescriptor
can only set a static override value and does not support Inja transformation templates.