Manage Istio certificates with Vault

Vault is a popular open source secret management tool, including for private key infrastructure (PKI). With Vault, you can securely store your private keys, as well as create new intermediate, or leaf, certificates.

For multicluster traffic, you can establish a shared root of trust by using a single root CA and an intermediate CA that is signed by the same root CA for each cluster. This guide shows you how to configure Istio and Gloo Mesh to use Vault to store the root CA and generate intermediate CA to be used by Istio on each cluster to sign its workload certificates, as shown in the following figure.

Figure: Using Gloo Mesh to configure Istio to use Vault for the intermediate CA across clusters.

Figure: Using Gloo Mesh to configure Istio to use Vault for the intermediate CA across clusters.

In addition to using Vault for the intermediate CA, you can use Gloo Mesh Enterprise to get added security benefits. The Gloo Mesh Enterprise integration with Vault uses the istiod-agent, which runs as a sidecar to the istiod pod, and communicates with Vault to request private keys and to sign certificates. In this setup, Gloo Mesh loads the private key directly into the pod filesystem, thereby allowing for an added layer of security by not saving the key to etcd or any permanent storage. When the pod is deleted, the private key is also deleted.

Note that while this guide describes how to use Vault for the Istio intermediate CA, you can also use Vault to generate and manage the CAs for the Gloo Mesh Enterprise relay server and agent certificates. For example, you can follow the example in the Generating relay certificates guide to set up a relay intermediate CA.

Before you begin

To create a three-cluster setup, complete the Getting started tutorials.

Install Vault

If you already have Vault installed, you can use your existing deployment.
  1. If not added already, add the HashiCorp Helm repository to your management cluster.

    helm repo add hashicorp https://helm.releases.hashicorp.com
    
  2. Generate a root CA certificate and key for Vault.

    openssl req -new -newkey rsa:4096 -x509 -sha256 \
        -days 3650 -nodes -out root-cert.pem -keyout root-key.pem \
        -subj "/O=my-org"
    
  3. Install Vault and add the root CA to the Vault deployment in each cluster, such as with the following script.

    for cluster in ${REMOTE_CONTEXT1} ${REMOTE_CONTEXT2}; do
    
      # For more info about Vault in Kubernetes, see the Vault docs: https://learn.hashicorp.com/tutorials/vault/kubernetes-cert-manager
    
      # Install Vault in dev mode
      helm install -n vault  vault hashicorp/vault --set "injector.enabled=false" --set "server.dev.enabled=true" --kube-context="${cluster}" --create-namespace
    
      # Wait for Vault to come up.
      # Don't use 'kubectl rollout' because Vault is a statefulset without a rolling deployment.
      kubectl --context="${cluster}" wait --for=condition=Ready -n vault pod/vault-0
    
      # Enable Vault auth for Kubernetes.
      kubectl --context="${cluster}" exec -n vault vault-0 -- /bin/sh -c 'vault auth enable kubernetes'
    
      # Set the Kubernetes Auth config for Vault to the mounted token.
      kubectl --context="${cluster}" exec -n vault vault-0 -- /bin/sh -c 'vault write auth/kubernetes/config \
        token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
        kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
        kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
    
      # Bind the istiod service account to the PKI policy.
      kubectl --context="${cluster}" exec -n vault vault-0 -- /bin/sh -c 'vault write auth/kubernetes/role/gen-int-ca-istio \
        bound_service_account_names=istiod-service-account \
        bound_service_account_namespaces=istio-system \
        policies=gen-int-ca-istio \
        ttl=2400h'
    
      # Initialize the Vault PKI.
      kubectl --context="${cluster}" exec -n vault vault-0 -- /bin/sh -c 'vault secrets enable pki'
    
      # Set the Vault CA to the pem_bundle.
      kubectl --context="${cluster}" exec -n vault vault-0 -- /bin/sh -c "vault write -format=json pki/config/ca pem_bundle=\"$(cat root-key.pem root-cert.pem)\""
    
      # Initialize the Vault intermediate cert path.
      kubectl --context="${cluster}" exec -n vault vault-0 -- /bin/sh -c 'vault secrets enable -path pki_int pki'
    
      # Set the policy for the intermediate cert path.
      kubectl --context="${cluster}" exec -n vault vault-0 -- /bin/sh -c 'vault policy write gen-int-ca-istio - <<EOF
    path "pki_int/*" {
    capabilities = ["create", "read", "update", "delete", "list"]
    }
    path "pki/cert/ca" {
    capabilities = ["read"]
    }
    path "pki/root/sign-intermediate" {
    capabilities = ["create", "read", "update", "list"]
    }
    EOF'
    
    done
    

Now that Vault is set up in your clusters, you can use Vault as an intermediate CA provider.

Enable Vault as an intermediate CA provider

Now, federate the two meshes together by using Gloo Mesh Enterprise with Vault to federate identity. Create or edit a RootTrustPolicy with the new Vault shared mTLS config.

This example includes the autoRestartPods: true setting. Gloo Mesh will restart all of the Istio workloads in all of the cluster, to speed up the certificate rotation for the workloads. To avoid downtime, do NOT set this field to true in production environments.

The highlighted lines are the new config that the workload clusters use to authenticate and communicate with Vault. For more information, see the API docs.

cat << EOF | kubectl apply --context=${MGMT_CONTEXT1} -f -   
apiVersion: admin.gloo.solo.io/v2
kind: RootTrustPolicy
metadata:
  name: north-south-gw
  namespace: gloo-mesh
spec:
  config:
    agentCa:
      vault:
        caPath: pki/root/sign-intermediate
        csrPath: pki_int/intermediate/generate/exported
        kubernetesAuth:
          role: gen-int-ca-istio
        server: http://vault.vault:8200
    autoRestartPods: true
EOF

Update Gloo Mesh RBAC

The istio-agent sidecar that is installed in the next step needs to read and modify Gloo Mesh resources. To enable the necessary RBAC permissions, update the gloo-mesh-agent Helm release on both clusters. You can update the Helm release by adding to the YAML configuration file in your GitOps pipeline or directly with the helm upgrade command.

If your Enterprise agents were installed via Helm, and those manifests are applied using GitOps, you can add the following values to your values.yaml file.


istiodSidecar:
  createRoleBinding: true

for cluster in ${REMOTE_CONTEXT1} ${REMOTE_CONTEXT2}; do
  helm get values -n gloo-mesh gloo-mesh-agent --kube-context="${cluster}" > $cluster-values.yaml
  echo "istiodSidecar:" >> $cluster-values.yaml
  echo "  createRoleBinding: true" >> $cluster-values.yaml
  helm upgrade -n gloo-mesh gloo-mesh-agent gloo-mesh-agent/gloo-mesh-agent --kube-context="${cluster}" --version=$GLOO_MESH_VERSION -f $cluster-values.yaml
  rm $cluster-values.yaml
done

Modify istiod

Now that your virtual mesh is set up to use Vault for the intermediate CA, modify your Istio installation to support fetching and dynamically reloading the intermediate CA from Vault.

  1. Get the version that runs in your management cluster.

    export MGMT_PLANE_VERSION=$(meshctl version | jq '.server[].components[] | select(.componentName == "gloo-mesh-mgmt-server") | .images[] | select(.name == "gloo-mesh-mgmt-server") | .version')
    
  2. Update the istiod deployment with the istiod-agent sidecar to load and store the Vault certificates. For most installations, you can use the IstioOperator API with istioctl or the operator configuration. However, if you use local Kind clusters for quick testing, a manual JSON patch is necessary.

    
    CLUSTER_NAME=cluster-1
    ISTIO_VERSION=1.13.4
    cat << EOF | istioctl manifest install -y -f -
    apiVersion: install.istio.io/v1alpha1
    kind: IstioOperator
    metadata:
      name: gloo-mesh-demo
      namespace: istio-system
    spec:
      # only the control plane components are installed (https://istio.io/latest/docs/setup/additional-setup/config-profiles/)
      profile: minimal
      # Solo.io Istio distribution repository
      hub: gcr.io/istio-enterprise
      # Solo.io Gloo Mesh Istio tag
      tag: ${ISTIO_VERSION}
    
      meshConfig:
        # enable access logging to standard output
        accessLogFile: /dev/stdout
    
        defaultConfig:
          # wait for the istio-proxy to start before application pods
          holdApplicationUntilProxyStarts: true
          # enable Gloo Mesh metrics service (required for Gloo Mesh UI)
          envoyMetricsService:
            address: gloo-mesh-agent.gloo-mesh:9977
           # enable GlooMesh accesslog service (required for Gloo Mesh Access Logging)
          envoyAccessLogService:
            address: gloo-mesh-agent.gloo-mesh:9977
          proxyMetadata:
            # Enable Istio agent to handle DNS requests for known hosts
            # Unknown hosts will automatically be resolved using upstream dns servers in resolv.conf
            # (for proxy-dns)
            ISTIO_META_DNS_CAPTURE: "true"
            # Enable automatic address allocation (for proxy-dns)
            ISTIO_META_DNS_AUTO_ALLOCATE: "true"
            # Used for gloo mesh metrics aggregation
            # should match trustDomain (required for Gloo Mesh UI)
            GLOO_MESH_CLUSTER_NAME: ${CLUSTER_NAME}
    
        # Set the default behavior of the sidecar for handling outbound traffic from the application.
        outboundTrafficPolicy:
          mode: ALLOW_ANY
        # The trust domain corresponds to the trust root of a system. 
        # For Gloo Mesh this should be the name of the cluster that cooresponds with the CA certificate CommonName identity
        trustDomain: ${CLUSTER_NAME}
    
      values:
        # https://istio.io/v1.5/docs/reference/config/installation-options/#global-options
        global:
          # needed for connecting VirtualMachines to the mesh
          network: ${CLUSTER_NAME}
          # needed for annotating istio metrics with cluster (should match trust domain and GLOO_MESH_CLUSTER_NAME)
          multiCluster:
            clusterName: ${CLUSTER_NAME}
    
      components:
        pilot:
          k8s:
            env:
             # Allow multiple trust domains (Required for Gloo Mesh east/west routing)
              - name: PILOT_SKIP_VALIDATE_TRUST_DOMAIN
                value: "true"
            overlays:
            - apiVersion: apps/v1
              kind: Deployment
              name: istiod
              patches:
              # override istiod cacerts volume
              - path: spec.template.spec.volumes[name:cacerts]
                value: 
                  name: cacerts
                  secret: null
                  emptyDir:
                    medium: Memory
              # override istiod istiod-agent container to use Solo.io istiod-agent build
              - path: spec.template.spec.containers[1]
                value: 
                  name: istiod-agent
                  image: gcr.io/gloo-mesh/istiod-agent:$MGMT_PLANE_VERSION
                  imagePullPolicy: IfNotPresent
                  volumeMounts:
                  - mountPath: /etc/cacerts
                    name: cacerts
                  args: 
                  - sidecar
                  env:
                  - name: PILOT_CERT_PROVIDER
                    value: istiod
                  - name: POD_NAME
                    valueFrom:
                      fieldRef:
                        apiVersion: v1
                        fieldPath: metadata.name
                  - name: POD_NAMESPACE
                    valueFrom:
                      fieldRef:
                        apiVersion: v1
                        fieldPath: metadata.namespace
                  - name: SERVICE_ACCOUNT
                    valueFrom:
                      fieldRef:
                        apiVersion: v1
                        fieldPath: spec.serviceAccountName
              # override istiod istiod-agent-init init-container to use Solo.io istiod-agent-init build
              - path: spec.template.spec.initContainers
                value: 
                - name: istiod-agent-init
                  image: gcr.io/gloo-mesh/istiod-agent:$MGMT_PLANE_VERSION
                  imagePullPolicy: IfNotPresent
                  volumeMounts:
                  - mountPath: /etc/cacerts
                    name: cacerts
                  args: 
                  - init-container
                  env:
                  - name: PILOT_CERT_PROVIDER
                    value: istiod
                  - name: POD_NAME
                    valueFrom:
                      fieldRef:
                        apiVersion: v1
                        fieldPath: metadata.name
                  - name: POD_NAMESPACE
                    valueFrom:
                      fieldRef:
                        apiVersion: v1
                        fieldPath: metadata.namespace
                  - name: SERVICE_ACCOUNT
                    valueFrom:
                      fieldRef:
                        apiVersion: v1
                        fieldPath: spec.serviceAccountName
        # Istio Gateway feature
        ingressGateways:
        # enable the default ingress gateway
        - name: istio-ingressgateway
          enabled: true
          k8s:
            env:
              # Required by Gloo Mesh for east/west routing
              - name: ISTIO_META_ROUTER_MODE
                value: "sni-dnat"
            service:
              type: LoadBalancer
              ports:
                # health check port (required to be first for aws elbs)
                - name: status-port
                  port: 15021
                  targetPort: 15021
                # main http ingress port
                - port: 80
                  targetPort: 8080
                  name: http2
                # main https ingress port
                - port: 443
                  targetPort: 8443
                  name: https
                # Port for gloo-mesh multi-cluster mTLS passthrough (Required for Gloo Mesh east/west routing)
                - port: 15443
                  targetPort: 15443
                  # Gloo Mesh looks for this default name 'tls' on an ingress gateway
                  name: tls
    EOF
    
    
    kubectl patch -n istio-system deploy/istiod --patch '{
    	"spec": {
    			"template": {
    				"spec": {
    						"initContainers": [
    							{
    									"args": [
    										"init-container"
    									],
    									"env": [
    										{
    												"name": "PILOT_CERT_PROVIDER",
    												"value": "istiod"
    										},
    										{
    												"name": "POD_NAME",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "metadata.name"
    													}
    												}
    										},
    										{
    												"name": "POD_NAMESPACE",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "metadata.namespace"
    													}
    												}
    										},
    										{
    												"name": "SERVICE_ACCOUNT",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "spec.serviceAccountName"
    													}
    												}
    										}
    									],
    									"volumeMounts": [
    										{
    												"mountPath": "/etc/cacerts",
    												"name": "cacerts"
    										}
    									],
    									"imagePullPolicy": "IfNotPresent",
    									"image": "gcr.io/gloo-mesh-istiod-agent:$MGMT_PLANE_VERSION",
    									"name": "istiod-agent-init"
    							}
    						],
    						"containers": [
    							{
    									"args": [
    										"sidecar"
    									],
    									"env": [
    										{
    												"name": "PILOT_CERT_PROVIDER",
    												"value": "istiod"
    										},
    										{
    												"name": "POD_NAME",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "metadata.name"
    													}
    												}
    										},
    										{
    												"name": "POD_NAMESPACE",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "metadata.namespace"
    													}
    												}
    										},
    										{
    												"name": "SERVICE_ACCOUNT",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "spec.serviceAccountName"
    													}
    												}
    										}
    									],
    									"volumeMounts": [
    										{
    												"mountPath": "/etc/cacerts",
    												"name": "cacerts"
    										}
    									],
    									"imagePullPolicy": "IfNotPresent",
    									"image": "gcr.io/gloo-mesh-istiod-agent:$MGMT_PLANE_VERSION",
    									"name": "istiod-agent"
    							}
    						],
    						"volumes": [
    							{
    									"name": "cacerts",
    									"secret": null,
    									"emptyDir": {
    										"medium": "Memory"
    									}
    							}
    						]
    				}
    			}
    	}
    }'
    
  3. Repeat the previous step for each workload cluster with Istio installed.

Verify traffic uses the root CA

Now that Istio is patched with the Gloo Mesh istiod-agent sidecar, you can verify that all of the service mesh traffic is secured by using the root CA that you generated for Vault in the previous steps.

One way is to check the root-cert.pem in the istio-ca-root-cert config map that Istio propagates for initial TLS connection. This following example checks the propagated root-cert.pem against the local certificate that you supplied to Vault in a previous step. Note: If you are using an existing Vault deployment, make sure to save the Vault root certificate as root-cert.pem file in your current directory before running the command.

If installed correctly, the output from the following command is empty.

for cluster in ${REMOTE_CONTEXT1} ${REMOTE_CONTEXT2}; do
  kubectl --context="${cluster}" get cm -n bookinfo istio-ca-root-cert -ojson | jq -r  '.data["root-cert.pem"]' | diff root-cert.pem -
done

Rotating certificates for Istio workloads

When certificates are issued, pods that are managed by Istio must be restarted to ensure they pick up the new certificates. The certificate issuer creates a PodBounceDirective, which contains the namespaces and labels of the pods that must be restarted. For more information about how certificate rotation works in Istio, review the video series in this blog post.

Note: To avoid potential downtime for your apps in production, disable the PodBounceDirective feature. Then, control pod restarts in another way, such as a rolling update.

  1. Get your root trust policies.

    kubectl get roottrustpolicy --context ${MGMT_CONTEXT} -A
    
  2. In the root trust policy, set the autoRestartPods field to false.

    kubectl edit roottrustpolicy --context ${MGMT_CONTEXT} -n NAMESPACE ROOT_TRUST_POLICY
    
    apiVersion: admin.gloo.solo.io/v2
    kind: RootTrustPolicy
    metadata:
      name: north-south-gw
      namespace: gloo-mesh
    spec:
      config:
        autoRestartPods: false
        ...
    
  3. To ensure pods pick up the new certificates, restart the istiod pod in each remote cluster.

    kubectl --context {$REMOTE_CONTEXT} -n istio-system patch deployment istiod \
        -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"
    
  4. Restart your app pods that are managed by Istio, such as by using a rolling update strategy.