Collect compute instance metadata

Allow OTel collector agents to gather metadata about the compute instances that the workload cluster is deployed to, and add the metadata as labels on the metrics it scrapes. This compute instance metadata helps you better visualize your Gloo Mesh setup across your cloud provider infrastructure network. For example, if you deploy workload clusters across multiple cloud providers, or add a virtual machine to your Gloo Mesh setup, you can more easily see how your Gloo resources are deployed across your compute instances in the Gloo UI.

Step 1: Enable infrastructure settings in your cloud provider

Ensure that the workload cluster is associated with an IAM OIDC provider. Otherwise, no other permissions are required.
  1. Enable Workload Identity for the workload cluster. Workload Identity allows the Kubernetes service account for the OTel collector to act as a GCP IAM service account, which you assign the necessary permissions to.

  2. Save your GCP project ID in an environment variable.

    export PROJECT=<gcp_project_id>
    
  3. Create an IAM service account in GCP for the OTel collector in the workload cluster, and grant IAM permissions so that the collector can access metadata about the compute instances that the workload cluster is deployed to.

    1. Create an IAM service account in GCP named OTelCollector.
      gcloud iam service-accounts create OTelCollector --project $PROJECT
      
    2. Create an IAM role that gives the permission to describe the VM instances in your project.
      gcloud iam roles create OTelComputeViewer \
        --project $PROJECT \
        --title "OTel compute viewer" \
        --permissions compute.instances.get,iam.serviceAccounts.getAccessToken
      
    3. Bind the role to the OTel GCP IAM service account.
      gcloud iam service-accounts add-iam-policy-binding OTelCollector@$PROJECT.iam.gserviceaccount.com \
        --project $PROJECT \
        --role "projects/$PROJECT/roles/OTelComputeViewer" \
        --member "serviceAccount:$PROJECT.svc.id.goog[gloo-mesh/gloo-telemetry-collector]"
      
  4. Annotate the Kubernetes service account for the OTel collector with its GCP IAM permissions.

    kubectl annotate serviceaccount gloo-telemetry-collector \
      --context $REMOTE_CONTEXT -n gloo-mesh \
      iam.gke.io/gcp-service-account=OTelCollector@$PROJECT.iam.gserviceaccount.com
    
  5. Restart the OTel collector daemonset to apply the change.

    kubectl --context $REMOTE_CONTEXT rollout restart daemonset/gloo-telemetry-collector-agent -n gloo-mesh
    

Step 2: Enable metadata collection in the Gloo telemetry pipeline

  1. Get your current installation Helm values, and save them in a file.

    helm get values gloo-platform -n gloo-mesh -o yaml > gloo-mesh-enterprise-single.yaml
    open gloo-mesh-enterprise-single.yaml
    
  2. Add the following Gloo telemetry collector agent settings.

    telemetryCollector:
      enabled: true
      resources:
        limits:
          cpu: 1000m
          memory: 2Gi
        requests:
          cpu: 500m
          memory: 1Gi
      config:
        exporters:
          otlp:
            endpoint: gloo-telemetry-gateway.gloo-mesh:4317
      telemetryCollectorCustomization:
        telemetry:
          logs:
            level: "debug"
        enableCloudMetadataProcessing: true
          pipelines:
            metrics/otlp_relay:
              enabled: true
    
  3. Upgrade your installation by using your updated values file.

    helm upgrade gloo-platform gloo-platform/gloo-platform \
     --namespace gloo-mesh \
     -f gloo-mesh-enterprise-single.yaml \
     --version $GLOO_VERSION
    
  4. Verify that your custom settings were added to the Gloo telemetry collector configmap.

    kubectl get configmap gloo-telemetry-collector-config -n gloo-mesh -o yaml
    
  5. Perform a rollout restart of the telemetry collector daemon set to force your configmap changes to be applied to the telemetry collector agent pod.

    kubectl rollout restart -n gloo-mesh daemonset/gloo-telemetry-collector-agent
    
  1. Get the Helm values files for your workload cluster.

    helm get values gloo-platform -n gloo-mesh -o yaml --kube-context $REMOTE_CONTEXT > agent.yaml
    open agent.yaml
    
  2. Add the following Gloo telemetry collector agent settings.

    telemetryCollector:
      enabled: true
      resources:
        limits:
          cpu: 1000m
          memory: 2Gi
        requests:
          cpu: 500m
          memory: 1Gi
      config:
        exporters:
          otlp:
            endpoint: ${TELEMETRY_GATEWAY_ADDRESS}
    telemetryCollectorCustomization:
      telemetry:
        logs:
          level: "debug"
      enableCloudMetadataProcessing: true
      pipelines:
        metrics/otlp_relay:
          enabled: true
    
  3. Upgrade the workload cluster.

    helm upgrade gloo-platform gloo-platform/gloo-platform \
      --kube-context $REMOTE_CONTEXT \
      --namespace gloo-mesh \
      -f agent.yaml \
      --version $GLOO_VERSION
    
  4. Verify that your settings are applied in the workload cluster.

    1. Verify that the tracing settings were added to the Gloo telemetry collector configmap.

      kubectl get configmap gloo-telemetry-collector-config -n gloo-mesh -o yaml --context $REMOTE_CONTEXT
      
    2. Perform a rollout restart of the telemetry collector daemon set to force your configmap changes to be applied to the telemetry collector agent pods.

      kubectl rollout restart -n gloo-mesh daemonset/gloo-telemetry-collector-agent --context $REMOTE_CONTEXT
      

Step 3: Visualize your setup

Launch the Gloo UI to view compute metadata.

  1. Access the Gloo UI.
    meshctl dashboard --kubecontext $MGMT_CONTEXT
    
  2. Click the Graph tab to open the network visualization graph for your Gloo Mesh setup.
  3. From the footer toolbar, click Layout Settings.
  4. Toggle Group By to INFRA to review the clusters, virtual machines, and Kubernetes namespaces that your app nodes are organized in. This view also shows details for the cloud provider infrastructure, such as the VPCs and subnets that your resources are deployed to. You can see more compute network details by clicking on resource icons, which opens the resource's details pane.