Manage Istio certificates with Vault

Vault is a popular open source secret management tool, including for private key infrastructure (PKI). With Vault, you can securely store your private keys, as well as create new intermediate, or leaf, certificates.

For multicluster traffic, you can establish a shared root of trust by using a single root CA and an intermediate CA that is signed by the same root CA for each cluster. This guide shows you how to configure Istio and Gloo Mesh to use Vault to store the root CA and generate intermediate CA to be used by Istio on each cluster to sign its workload certificates, as shown in the following figure.

Figure: Using Gloo Mesh to configure Istio to use Vault for the intermediate CA across clusters.

Figure: Using Gloo Mesh to configure Istio to use Vault for the intermediate CA across clusters.

In addition to using Vault for the intermediate CA, you can use Gloo Mesh Enterprise to get added security benefits. The Gloo Mesh Enterprise integration with Vault uses the istiod-agent, which runs as a sidecar to the istiod pod, and communicates with Vault to request private keys and to sign certificates. In this setup, Gloo Mesh loads the private key directly into the pod filesystem, thereby allowing for an added layer of security by not saving the key to etcd or any permanent storage. When the pod is deleted, the private key is also deleted.

Note that while this guide describes how to use Vault for the Istio intermediate CA, you can also use Vault to generate and manage the CAs for the Gloo Mesh Enterprise relay server and agent certificates. For example, you can follow the example in the Generating relay certificates guide to set up a relay intermediate CA.

Before you begin

  1. To try out the steps in this guide, create a management and workload cluster setup with Istio and Bookinfo installed. For example steps, see the Getting Started or demo setup.
  2. The default openssl version that is included in macOS is LibreSSL, which does not work with these instructions.

    Make sure that you have the OpenSSL version of openssl, not LibreSSL. The openssl version must be at least 1.1.

    1. Check the openssl version that is installed. If you see LibreSSL in the output, continue to the next step.
      openssl version
      
    2. Install the OpenSSL version (not LibreSSL). For example, you might use Homebrew.
      brew install openssl
      
    3. Review the output of the OpenSSL installation for the path of the binary file. You can choose to export the binary to your path, or call the entire path whenever the following steps use an openssl command.
      • For example, openssl might be installed along the following path: /usr/local/opt/openssl@3/bin/
      • To run commands, you can append the path so that your terminal uses this installed version of OpenSSL, and not the default LibreSSL. /usr/local/opt/openssl@3/bin/openssl req -new -newkey rsa:4096 -x509 -sha256 -days 3650...
  3. Save the kubeconfig contexts for your clusters. Run kubectl config get-contexts, look for your cluster in the CLUSTER column, and get the context name in the NAME column.
    export MGMT_CLUSTER=<mgmt-cluster-name>
    export REMOTE_CLUSTER=<remote-cluster-name>
    export MGMT_CONTEXT=<management-cluster-context>
    export REMOTE_CONTEXT=<remote-cluster-context>
    

Install Vault

If you already have Vault installed, you can use your existing deployment.
  1. If not added already, add the HashiCorp Helm repository to your management cluster.

    helm repo add hashicorp https://helm.releases.hashicorp.com --kube-context ${MGMT_CONTEXT}
    
  2. Generate a root CA certificate and key for Vault.

    openssl req -new -newkey rsa:4096 -x509 -sha256 \
        -days 3650 -nodes -out root-cert.pem -keyout root-key.pem \
        -subj "/O=my-org"
    
  3. Install Vault and add the root CA to the Vault deployment, such as with the following script. For more information about setting up Vault in Kubernetes, see the Vault docs. Make sure to replace the following variables with the values that you previously retrieved.

    # Install Vault in dev mode
    helm install -n vault vault hashicorp/vault --version=0.20.1 --set "injector.enabled=false" --set "server.dev.enabled=true" --kube-context="${REMOTE_CONTEXT}" --create-namespace
       
    # Wait for Vault to come up.
    # Don't use 'kubectl rollout' because Vault is a statefulset without a rolling deployment.
    kubectl --context="${REMOTE_CONTEXT}" wait --for=condition=Ready -n vault pod/vault-0
       
    # Enable Vault auth for Kubernetes.
    kubectl --context="${REMOTE_CONTEXT}" exec -n vault vault-0 -- /bin/sh -c 'vault auth enable kubernetes'
       
    # Set the Kubernetes Auth config for Vault to the mounted token.
    kubectl --context="${REMOTE_CONTEXT}" exec -n vault vault-0 -- /bin/sh -c 'vault write auth/kubernetes/config \
     token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
     kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
     kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
       
    # Bind the istiod service account to the PKI policy.
    kubectl --context="${REMOTE_CONTEXT}" exec -n vault vault-0 -- /bin/sh -c 'vault write \
     auth/kubernetes/role/gen-int-ca-istio \
     bound_service_account_names=istiod-service-account \
     bound_service_account_namespaces=istio-system \
     policies=gen-int-ca-istio \
     ttl=2400h'
       
    # Initialize the Vault PKI.
    kubectl --context="${REMOTE_CONTEXT}" exec -n vault vault-0 -- /bin/sh -c 'vault secrets enable pki'
       
    # Set the Vault CA to the pem_bundle.
    kubectl --context="${REMOTE_CONTEXT}" exec -n vault vault-0 -- /bin/sh -c "vault write -format=json pki/config/ca pem_bundle=\"$(cat root-key.pem root-cert.pem)\""
       
    # Initialize the Vault intermediate cert path.
    kubectl --context="${REMOTE_CONTEXT}" exec -n vault vault-0 -- /bin/sh -c 'vault secrets enable -path pki_int pki'
       
    # Set the policy for the intermediate cert path.
    kubectl --context="${REMOTE_CONTEXT}" exec -n vault vault-0 -- /bin/sh -c 'vault policy write gen-int-ca-istio - <<EOF
    path "pki_int/*" {
    capabilities = ["create", "read", "update", "delete", "list"]
    }
    path "pki/cert/ca" {
    capabilities = ["read"]
    }
    path "pki/root/sign-intermediate" {
    capabilities = ["create", "read", "update", "list"]
    }
    EOF'
    

    Example output: If you see any errors, review the troubleshooting section.

    Your release is named vault. To learn more about the release, try:
     
    $ helm status vault
    $ helm get manifest vault
    pod/vault-0 condition met
    Success! Enabled kubernetes auth method at: kubernetes/
    Success! Data written to: auth/kubernetes/config
    Success! Data written to: auth/kubernetes/role/gen-int-ca-istio
    Success! Enabled the pki secrets engine at: pki/
    Success! Enabled the pki secrets engine at: pki_int/
    Success! Uploaded policy: gen-int-ca-istio
    
  4. Repeat the previous step in each workload cluster.

Now that Vault is set up in your clusters, you can use Vault as an intermediate CA provider.

Update Gloo Mesh RBAC

The istio-agent sidecar in each cluster needs to read and modify Gloo Mesh resources. To enable the necessary RBAC permissions, update the gloo-mesh-agent Helm release. You can update the Helm release by adding the following snippet to the YAML configuration file in your GitOps pipeline or directly with the helm upgrade command.


istiodSidecar:
createRoleBinding: true
  1. Set the Gloo Mesh Enterprise version as an environment variable.
    export GLOO_MESH_VERSION=2.1.0-beta8
    
  2. Make sure that you have the Helm repo for the Gloo Mesh agent.
    helm repo add gloo-mesh-agent https://storage.googleapis.com/gloo-mesh-enterprise/gloo-mesh-agent --kube-context ${REMOTE_CONTEXT}
    helm repo update --kube-context ${REMOTE_CONTEXT}
    
  3. Upgrade the Helm chart with the required RBAC permission.
    helm get values -n gloo-mesh gloo-mesh-agent --kube-context=${REMOTE_CONTEXT} > ${REMOTE_CLUSTER}-values.yaml
    echo "istiodSidecar:" >> ${REMOTE_CLUSTER}-values.yaml
    echo "  createRoleBinding: true" >> ${REMOTE_CLUSTER}-values.yaml
    helm upgrade -n gloo-mesh gloo-mesh-agent gloo-mesh-agent/gloo-mesh-agent --kube-context="${cluster}" --version=$GLOO_MESH_VERSION -f ${REMOTE_CLUSTER}-values.yaml
    rm ${REMOTE_CLUSTER}-values.yaml
    

Modify istiod

So far, you set up the Gloo Mesh agent on each cluster to use Vault for the intermediate CA. Now, you can modify your Istio installation to support fetching and dynamically reloading the intermediate CA from Vault.

  1. Get the version that runs in your management cluster.

    export MGMT_PLANE_VERSION=$(meshctl version --kubecontext ${MGMT_CONTEXT} | jq '.server[].components[] | select(.componentName == "gloo-mesh-mgmt-server") | .images[] | select(.name == "gloo-mesh-mgmt-server") | .version')
    echo $MGMT_PLANE_VERSION
    
  2. Get your istiod deployment. Choose from the following options.

    1. If you did not deploy Istio yet, you can use an example Istio operator deployment as described in Deploy Istio in production.
    2. If you already deployed Istio, get your current deployment configuration. For example, you might check your GitOps configuration or run the following command to review the Istio operator configuration.
      kubectl get istiooperator -n istio-system --context $REMOTE_CONTEXT -o yaml > istio-operator.yaml
      
  3. Update the istiod deployment with the gloo-mesh-istiod-agent sidecar to load and store the Vault certificates. For most installations, you use an Istio operator to manage the istiod deployment. You can add an overlay section to the Istio operator configuration. If you did not use an Istio operator to manage istiod, such as for quick testing in local Kind clusters, you can patch the deployment.

    1. In the spec.config.components.pilot.k8s section of your Istio operator configuration file, add the following overlay. Replace $MGMT_PLANE_VERSION with the version that you got in the previous step.
      apiVersion: install.istio.io/v1alpha1
      kind: IstioOperator
      metadata:
        name: production-istio
        namespace: istio-system
      spec:
        components:
          pilot:
            k8s:
              overlays:
              - apiVersion: apps/v1
                kind: Deployment
                name: istiod
                patches:
                # override istiod cacerts volume
                - path: spec.template.spec.volumes[name:cacerts]
                  value: 
                    name: cacerts
                    secret: null
                    emptyDir:
                      medium: Memory
                # override istiod istiod-agent container to use Solo.io istiod-agent build
                - path: spec.template.spec.containers[1]
                  value: 
                    name: istiod-agent
                    image: gcr.io/gloo-mesh/gloo-mesh-istiod-agent:$MGMT_PLANE_VERSION
                    imagePullPolicy: IfNotPresent
                    volumeMounts:
                    - mountPath: /etc/cacerts
                      name: cacerts
                    args: 
                    - sidecar
                    env:
                    - name: PILOT_CERT_PROVIDER
                      value: istiod
                    - name: POD_NAME
                      valueFrom:
                        fieldRef:
                          apiVersion: v1
                          fieldPath: metadata.name
                    - name: POD_NAMESPACE
                      valueFrom:
                        fieldRef:
                          apiVersion: v1
                          fieldPath: metadata.namespace
                    - name: SERVICE_ACCOUNT
                      valueFrom:
                        fieldRef:
                          apiVersion: v1
                          fieldPath: spec.serviceAccountName
                # override istiod istiod-agent-init init-container to use Solo.io istiod-agent-init build
                - path: spec.template.spec.initContainers
                  value: 
                  - name: istiod-agent-init
                    image: gcr.io/gloo-mesh/gloo-mesh-istiod-agent:$MGMT_PLANE_VERSION
                    imagePullPolicy: IfNotPresent
                    volumeMounts:
                    - mountPath: /etc/cacerts
                      name: cacerts
                    args: 
                    - init-container
                    env:
                    - name: PILOT_CERT_PROVIDER
                      value: istiod
                    - name: POD_NAME
                      valueFrom:
                        fieldRef:
                          apiVersion: v1
                          fieldPath: metadata.name
                    - name: POD_NAMESPACE
                      valueFrom:
                        fieldRef:
                          apiVersion: v1
                          fieldPath: metadata.namespace
                    - name: SERVICE_ACCOUNT
                      valueFrom:
                        fieldRef:
                          apiVersion: v1
                          fieldPath: spec.serviceAccountName 
      
    2. Apply the updated Istio operator configuration to your cluster. Your GitOps workflow might apply changes automatically.
      kubectl apply --context ${REMOTE_CONTEXT} -f istio-operator.yaml
      
    
    kubectl patch -n istio-system deploy/istiod --patch '{
    	"spec": {
    			"template": {
    				"spec": {
    						"initContainers": [
    							{
    									"args": [
    										"init-container"
    									],
    									"env": [
    										{
    												"name": "PILOT_CERT_PROVIDER",
    												"value": "istiod"
    										},
    										{
    												"name": "POD_NAME",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "metadata.name"
    													}
    												}
    										},
    										{
    												"name": "POD_NAMESPACE",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "metadata.namespace"
    													}
    												}
    										},
    										{
    												"name": "SERVICE_ACCOUNT",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "spec.serviceAccountName"
    													}
    												}
    										}
    									],
    									"volumeMounts": [
    										{
    												"mountPath": "/etc/cacerts",
    												"name": "cacerts"
    										}
    									],
    									"imagePullPolicy": "IfNotPresent",
    									"image": "gcr.io/gloo-mesh/gloo-mesh-istiod-agent:$MGMT_PLANE_VERSION",
    									"name": "istiod-agent-init"
    							}
    						],
    						"containers": [
    							{
    									"args": [
    										"sidecar"
    									],
    									"env": [
    										{
    												"name": "PILOT_CERT_PROVIDER",
    												"value": "istiod"
    										},
    										{
    												"name": "POD_NAME",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "metadata.name"
    													}
    												}
    										},
    										{
    												"name": "POD_NAMESPACE",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "metadata.namespace"
    													}
    												}
    										},
    										{
    												"name": "SERVICE_ACCOUNT",
    												"valueFrom": {
    													"fieldRef": {
    															"apiVersion": "v1",
    															"fieldPath": "spec.serviceAccountName"
    													}
    												}
    										}
    									],
    									"volumeMounts": [
    										{
    												"mountPath": "/etc/cacerts",
    												"name": "cacerts"
    										}
    									],
    									"imagePullPolicy": "IfNotPresent",
    									"image": "gcr.io/gloo-mesh/gloo-mesh-istiod-agent:$MGMT_PLANE_VERSION",
    									"name": "istiod-agent"
    							}
    						],
    						"volumes": [
    							{
    									"name": "cacerts",
    									"secret": null,
    									"emptyDir": {
    										"medium": "Memory"
    									}
    							}
    						]
    				}
    			}
    	}
    }'
    
  4. Repeat the previous step for each workload cluster with Istio installed.

Enable Vault as an intermediate CA provider

Now, federate the two meshes together by using Gloo Mesh Enterprise with Vault to establish trusted communication across the service meshes.

Create or edit a RootTrustPolicy and add the shared mTLS configuration for Vault. The following example highlights the fields that the workload clusters use to authenticate and communicate with Vault. For more information, see the API docs.

cat << EOF | kubectl apply --context=${MGMT_CONTEXT} -f -   
apiVersion: admin.gloo.solo.io/v2
kind: RootTrustPolicy
metadata:
  name: north-south-gw
  namespace: gloo-mesh
spec:
  config:
    agentCa:
      vault:
        caPath: pki/root/sign-intermediate
        csrPath: pki_int/intermediate/generate/exported
        kubernetesAuth:
          role: gen-int-ca-istio
        server: http://vault.vault:8200
EOF

Verify traffic uses the root CA

Now that Istio is patched with the gloo-mesh-istiod-agent sidecar, you can verify that all of the service mesh traffic is secured by using the root CA that you generated for Vault in the previous section.

To verify, you can check the root-cert.pem in the istio-ca-root-cert config map that Istio propagates for the initial TLS connection. The following example checks the propagated root-cert.pem against the local certificate that you supplied to Vault in the previous section.

  1. From your terminal, navigate to the same directory as the root-cert.pem file that you previously created. Or if you are using an existing Vault deployment, save the root certificate as root-cert.pem.
  2. Check the difference between the root certificate that istiod uses and the Vault root certificate. If installed correctly, the output from the following command is empty.
    kubectl --context=${REMOTE_CONTEXT} get cm -n bookinfo istio-ca-root-cert -ojson | jq -r  '.data["root-cert.pem"]' | diff root-cert.pem -
    

Rotating certificates for Istio workloads

When certificates are issued, pods that are managed by Istio must be restarted to ensure they pick up the new certificates. The certificate issuer creates a PodBounceDirective, which contains the namespaces and labels of the pods that must be restarted. For more information about how certificate rotation works in Istio, review the video series in this blog post.

Note: To avoid potential downtime for your apps in production, disable the PodBounceDirective feature. Then, control pod restarts in another way, such as a rolling update.

  1. Get your root trust policies.

    kubectl get roottrustpolicy --context ${MGMT_CONTEXT} -A
    
  2. In the root trust policy, remove or set the autoRestartPods field to false.

    kubectl edit roottrustpolicy --context ${MGMT_CONTEXT} -n <namespace> <root-trust-policy>
    
    apiVersion: admin.gloo.solo.io/v2
    kind: RootTrustPolicy
    metadata:
      name: north-south-gw
      namespace: gloo-mesh
    spec:
      config:
        autoRestartPods: false
        ...
    
  3. To ensure pods pick up the new certificates, restart the istiod pod in each remote cluster.

    kubectl --context {$REMOTE_CONTEXT} -n istio-system patch deployment istiod \
        -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"
    
  4. Restart your app pods that are managed by Istio, such as by using a rolling update strategy.

Troubleshoot errors with the Vault script

If you have errors with the script to install Vault, review the following table.

Error Description
Error from server (NotFound): pods “vault-0” not found. Error from server (BadRequest): pod vault-0 does not have a host assigned. The Vault pod might not be running. Check the pod status, troubleshoot any issues, wait for the pod to start, and try again.
* path is already in use You already have set up that path. If you already ran the script, you can ignore this message.
Error writing data to pki/config/ca: Error making API request. Code: 400. Errors: * the given certificate is not marked for CA use and cannot be used with this backend command terminated with exit code 2 If you are using macOS, you might have the default LibreSSL version. Set up OpenSSL instead. For more information see Before you begin.