OpenTelemetry tracing
Enable OpenTelemetry (OTel) tracing capabilities to obtain visibility and track requests as they pass through your API gateway to distributed backends.
OTel provides a standardized protocol for reporting traces, and a standardized collector through which to recieve metrics. Additionally, OTel supports exporting metrics to several types of distributed tracing platforms. For the full list of supported platforms, see the OTel GitHub respository.
To get started, deploy an OTel collector and agents to your Gloo Gateway cluster to trace requests, and modify your gateway proxy with the OTel tracing configuration. Then, use a tracing provider to collect and visualize the sampled spans.
The OTel integration is supported as a beta feature in Gloo Gateway 1.13.0 and later. This guide uses the Zipkin tracing platform as an example to show how to set up tracing with OTel in Gloo Gateway. To set up other tracing platforms, refer to the platform-specific documentation.
Before you begin: Create or update your Gloo Gateway installation to version 1.13.0 or later.
-
Download the otel-config.yaml file, which contains the configmaps, daemonset, deployment, and service for the OTel collector and agents. You can optionally check out the contents to see the OTel collector configuration.
- For example, in the
otel-collector-conf
configmap that begins on line 92, thedata.otel-agent-config.receivers
section enables gRPC and HTTP protocols for data collection. Thedata.otel-agent-config.exporters
section enables logging data to Zipkin for tracing and to the Gloo Gateway console for debugging. - In the
otel-collector
deployment, you can comment out the ports that begin on line 194 so that only the tracing platform you want to use is enabled, such as Zipkin for this guide. - For more information about this configuration, see the OTel documentation. For more information and examples about the exporters you can configure, see the OTel GitHub repo.
cd ~/Downloads open otel-config.yaml
- For example, in the
-
Install the OTel collector and agents into your cluster.
kubectl apply -n gloo-system -f otel-config.yaml
-
Verify that the OTel collector and agents are deployed in your cluster. Because the agents are deployed as a daemonset, the number of agent pods equals the number of worker nodes in your cluster.
kubectl get pods -n gloo-system
Example output:
NAME READY STATUS RESTARTS AGE discovery-db9fbdd-4wsg8 1/1 Running 0 3h gateway-proxy-b5995db59-bvl9d 1/1 Running 0 3h gloo-56c78bb857-6v7vz 1/1 Running 0 3h otel-agent-dpdmd 3/3 Running 0 35s otel-collector-64d8c966c5-ptpfn 1/1 Running 0 35s
-
Install Zipkin, which receives tracing data from the Zipkin exporter in your OTel setup.
kubectl -n gloo-system create deployment --image openzipkin/zipkin zipkin kubectl -n gloo-system expose deployments/zipkin --port 9411 --target-port 9411
-
Create the following Gloo Gateway
Upstream
,Gateway
, andVirtualService
custom resources.- The
Upstream
defines the OTel network address and port that Envoy reports data to. - The
Gateway
resource modifies your default HTTP gateway proxy with the OTel tracing configuration, which references the OTel upstream. - The
VirtualService
defines a direct response action so that requests to the/
path respond withhello world
for testing purposes.
kubectl apply -f- <<EOF apiVersion: gloo.solo.io/v1 kind: Upstream metadata: name: "opentelemetry-collector" namespace: gloo-system spec: # REQUIRED FOR OPENTELEMETRY COLLECTION useHttp2: true static: hosts: - addr: "otel-collector" port: 4317 --- apiVersion: gateway.solo.io/v1 kind: Gateway metadata: labels: app: gloo name: gateway-proxy namespace: gloo-system spec: bindAddress: '::' bindPort: 8080 httpGateway: options: httpConnectionManagerSettings: tracing: openTelemetryConfig: collectorUpstreamRef: namespace: "gloo-system" name: "opentelemetry-collector" --- apiVersion: gateway.solo.io/v1 kind: VirtualService metadata: name: default namespace: gloo-system spec: virtualHost: domains: - '*' routes: - matchers: - prefix: / directResponseAction: status: 200 body: 'hello world' EOF
- The
-
In three separate terminals, port-forward and view logs for the deployed services.
- Port-forward the gateway proxy on port 8080.
kubectl -n gloo-system port-forward deployments/gateway-proxy 8080
- Port-forward the Zipkin service on port 9411.
kubectl -n gloo-system port-forward deployments/zipkin 9411
- Open the logs for the OTel collector.
kubectl -n gloo-system logs deployments/otel-collector -f
- Port-forward the gateway proxy on port 8080.
-
In your original terminal, send a request to
http://localhost:8080
.curl http://localhost:8080
-
In the OTel collector logs, notice the trace for your request that was printed to the log, such as the following example.
2023-01-13T17:20:20.907Z info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 1} 2023-01-13T17:20:20.907Z info ResourceSpans #0 Resource SchemaURL: ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope Span #0 Trace ID : 64dbb2328b98cc8d74dcc9be575ff8cb Parent ID : ID : 53e50df3bd752d67 Name : ingress Kind : Server Start time : 2023-01-13 17:20:20.479033 +0000 UTC End time : 2023-01-13 17:20:20.479216 +0000 UTC Status code : Unset Status message : Attributes: -> node_id: Str(gateway-proxy-b5995db59-bvl9d.gloo-system) -> zone: Str() -> guid:x-request-id: Str(b843b297-848a-95f0-b824-099a521f5c84) -> http.url: Str(http://localhost:8080/) -> http.method: Str(GET) -> downstream_cluster: Str(-) -> user_agent: Str(curl/7.79.1) -> http.protocol: Str(HTTP/1.1) -> peer.address: Str(127.0.0.1) -> request_size: Str(0) -> response_size: Str(11) -> component: Str(proxy) -> http.status_code: Str(200) -> response_flags: Str(-) {"kind": "exporter", "data_type": "traces", "name": "logging"}
Note that the
Status code
in theSpan
of the above trace isUnset
. This is the recommended value in accordance with the OpenTelemetry semantic conventions, which state:Span Status MUST be left unset if HTTP status code was in the 1xx, 2xx or 3xx ranges, unless there was another error (e.g., network error receiving the response body; or 3xx codes with max redirects exceeded), in which case status MUST be set to Error. … For HTTP status codes in the 5xx range, as well as any other code the client failed to interpret, span status MUST be set to Error.
To observe this behavior, update the
status
that is returned by thedirectResponseAction
to500
. Subsequent requests that are sent to this endpoint receive aStatus code: Error
instead ofStatus code: Unset
. -
In the Zipkin web interface, click Run query to view traces for your requests, and click Show to review the details of the trace.