AWS
Set up the relay root and intermediate certificate authorities (CAs) to generate the relay server certificate and relay agent client certificates. These certificates are required to secure communication between the Gloo Mesh management and data planes.
About this approach
AWS Private CA is a managed CA that you can use to your secure apps and devices with private TLS certificates. In this setup, you use the AWS Private CA to generate and store your root CA and subordinate CAs that you use to generate client and server TLS certificates for your relay architecture. To manage the lifecycle of the server and agent certificates, you also install cert-manager. Cert-manager is a Kubernetes controller that helps you automate the process of obtaining and renewing certificates from various PKI providers, such as AWS Private CA or Vault.
With this approach, you get the following benefits:
- Secure storage of root certificates and keys.
- AWS subordinate CA serves as an additional security layer for your root CA.
- Automatically obtain and renew server and client TLS certificates with cert-manager.
Before you begin
Save the kubeconfig contexts for your clusters. Runkubectl config get-contexts
, look for your cluster in the CLUSTER
column, and get the context name in the NAME
column. Note: Do not use context names with underscores. The generated certificate that connects workload clusters to the management cluster uses the context name as a SAN specification, and underscores in SAN are not FQDN compliant. You can rename a context by running kubectl config rename-context "<oldcontext>" <newcontext>
.
export MGMT_CLUSTER=<mgmt-cluster-name>
export REMOTE_CLUSTER=<remote-cluster-name>
export MGMT_CONTEXT=<management-cluster-context>
export REMOTE_CONTEXT=<remote-cluster-context>
Step 1: Install cert-manager
In your management cluster, install
cert-manager
. For more information about installation options and versions, see the cert-manager documentation.kubectl
installation:kubectl apply --context $MGMT_CONTEXT -f https://github.com/jetstack/cert-manager/releases/download/v1.5.4/cert-manager.yaml
- Helm installation:
helm repo add jetstack https://charts.jetstack.io helm repo update helm install \ cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --version v1.5.4 \ --set installCRDs=true
Verify that
cert-manager
was successfully installed.kubectl get pod -n cert-manager --context $MGMT_CONTEXT
Example output:
NAME READY STATUS RESTARTS AGE cert-manager-7c6f78c46d-247br 1/1 Running 0 17s cert-manager-cainjector-668d9c86df-7cqb8 1/1 Running 0 17s cert-manager-webhook-764b556954-2m4zf 1/1 Running 0 17s
Step 2: Install the AWS Private CA issuer plug-in
Install the AWS Private CA issuer plug-in for cert-manager. For more information, see the
aws-privateca-issuer
plug-in documentationhelm repo add awspca https://cert-manager.github.io/aws-privateca-issuer helm repo update helm upgrade pca-issuer \ --install awspca/aws-privateca-issuer \ --namespace cert-manager \ --set image.tag=v1.2.1
Verify that the plug-in was successfully installed.
kubectl get pod -n cert-manager --context $MGMT_CONTEXT
Example output:
NAME READY STATUS RESTARTS AGE ... pca-issuer-aws-privateca-issuer-6768d7454b-w8d4l 1/1 Running 0 15s
Step 3: Generate the relay root CA
Create and securely store the relay root CA in AWS Certificate Manager Private Certificate Authority (AWS ACM PCA).
If it doesn’t already exist, create the
gloo-mesh
namespace.kubectl create namespace gloo-mesh --context $MGMT_CONTEXT
Create the root CA in ACM PCA.
# Generate CA config file echo ''' { "KeyAlgorithm":"RSA_2048", "SigningAlgorithm":"SHA256WITHRSA", "Subject":{ "CommonName":"relay-root-ca" } } ''' > ca_config.json # Create CA in ACM PCA REGION=us-east-1 CA_ARN=$(aws acm-pca create-certificate-authority \ --certificate-authority-configuration file://ca_config.json \ --certificate-authority-type "ROOT" \ --idempotency-token 01234567 \ --region $REGION \ --tags Key=Name,Value=relay-root-ca| jq -r '.CertificateAuthorityArn') # Example response payload # { # "CertificateAuthorityArn": "arn:aws:acm-pca:us-east-1:123456789:certificate-authority/123456789-debf-4513-89f7-c1834d5ffbd5" # } # download Root CA CSR from AWS aws acm-pca get-certificate-authority-csr \ --region $REGION \ --certificate-authority-arn $CA_ARN \ --output text > relay-root-ca.csr # Issue Root Certificate ISSUE_CERTIFICATE_RESPONSE=$(aws acm-pca issue-certificate \ --certificate-authority-arn $CA_ARN \ --csr fileb://relay-root-ca.csr \ --region $REGION \ --signing-algorithm "SHA256WITHRSA" \ --template-arn arn:aws:acm-pca:::template/RootCACertificate/V1 \ --validity Value=3650,Type="DAYS" \ --idempotency-token 1234567 \ --output json | jq -r '.CertificateArn') CERTARN=$ISSUE_CERTIFICATE_RESPONSE # Download Certificate aws acm-pca get-certificate \ --certificate-authority-arn $CA_ARN \ --certificate-arn $CERTARN \ --region $REGION \ --output text > relay-root-ca.pem # Upload certificate to AWS aws acm-pca import-certificate-authority-certificate \ --certificate-authority-arn $CA_ARN \ --region $REGION \ --certificate fileb://relay-root-ca.pem
Create an AWS user for the relay root CA.
aws iam create-user gloo-mesh-acm --region $REGION
Create an IAM policy for the relay root CA.
$POLICY_ARN=$(aws iam create-policy \ --policy-name GlooMeshRelayCA \ --policy-document \ '{ "Version": "2012-10-17", "Statement": [ { "Sid": "awspcaissuer", "Action": [ "acm-pca:DescribeCertificateAuthority", "acm-pca:GetCertificate", "acm-pca:IssueCertificate" ], "Effect": "Allow", "Resource": "$CA_ARN" } ] }' | jq -r '.Policy.Arn')
- Note: For cross account ACM management, see Resource-based policies in the AWS docs.
- For an example template, download this
PCACrossAccountPolicy.json
file and apply it via the CLI.aws acm-pca put-policy --region ${region} --resource-arn arn:aws:acm-pca:us-east-1:${control_plane_account}:certificate-authority/${cert_authority_id} --policy file://PCACrossAccountPolicy.json
Add the policy to the
gloo-mesh-acm
user.aws iam attach-user-policy --policy-arn $POLICY_ARN --user-name gloo-mesh-acm
Generate an access key-pair for the user, and add the key-pair to your cluster. The following example uses a Kubernetes secrets for the AWS credentials. For security reasons, you might prefer to use a different way to authorize Kubernetes resources to AWS, such as IAM roles for service accounts (IRSA) or EKS pod identities.
aws iam create-access-key --user-name gloo-mesh-acm # Example response # { # "AccessKey": { # "UserName": "Bob", # "Status": "Active", # "CreateDate": "2015-03-09T18:39:23.411Z", # "SecretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY", # "AccessKeyId": "AKIAIOSFODNN7EXAMPLE" # } # } #upload secrets to kubernetes AWS_ACCESS_KEY_ID=<key_id> SECRET_ACCESS_KEY=<secret> kubectl create secret generic gloo-mesh-acm-credentials \ --namespace gloo-mesh \ --from-literal=AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \ --from-literal=AWS_SECRET_ACCESS_KEY=$SECRET_ACCESS_KEY \ --context $MGMT_CONTEXT
Create a
cert-manager
issuer for the CA.cat << EOF | kubectl apply --context $MGMT_CONTEXT -f - apiVersion: awspca.cert-manager.io/v1beta1 kind: AWSPCAIssuer metadata: name: relay-root-ca namespace: gloo-mesh spec: arn: $CA_ARN region: $REGION secretRef: namespace: gloo-mesh name: gloo-mesh-acm-credentials EOF
Step 4: Create the server TLS certificate
To generate the gloo-mesh-mgmt-server
server certificates, create a cert-manager
certificate and refer to the ACM PCA issuer.
cat << EOF | kubectl apply --context $MGMT_CONTEXT -f -
kind: Certificate
apiVersion: cert-manager.io/v1
metadata:
name: gloo-mesh-mgmt-server
namespace: gloo-mesh
spec:
commonName: gloo-mesh-mgmt-server
dnsNames:
- "*.gloo-mesh"
# 1 year life
duration: 8760h0m0s
issuerRef:
group: awspca.cert-manager.io
kind: AWSPCAIssuer
name: relay-root-ca
renewBefore: 8736h0m0s
secretName: relay-server-tls-secret
usages:
- server auth
- client auth
privateKey:
algorithm: "RSA"
size: 4096
EOF
Step 5: Create the client TLS certificate
Generate a gloo-mesh-agent
client certificate for each workload cluster. Be sure to repeat these steps for each workload cluster that you plan to register with Gloo Network.
Create a
cert-manager
certificate and refer to the ACM PCA issuer.CLUSTER_NAME=$REMOTE_CLUSTER CLUSTER_CONTEXT=$REMOTE_CONTEXT cat << EOF | kubectl apply --context $MGMT_CONTEXT -f - kind: Certificate apiVersion: cert-manager.io/v1 metadata: name: gloo-mesh-agent-$CLUSTER_NAME namespace: gloo-mesh spec: commonName: gloo-mesh-agent-$CLUSTER_NAME dnsNames: # Must match the cluster name used in the helm chart install - "$CLUSTER_NAME" # 1 year life duration: 8760h0m0s issuerRef: group: awspca.cert-manager.io kind: AWSPCAIssuer name: relay-root-ca renewBefore: 8736h0m0s secretName: gloo-mesh-agent-$CLUSTER_NAME-tls-cert usages: - server auth - client auth privateKey: algorithm: "RSA" size: 4096 EOF
Copy the TLS secret to the workload cluster.
kubectl get secret gloo-mesh-agent-$CLUSTER_NAME-tls-cert \ --namespace gloo-mesh \ --output json \ --context $MGMT_CONTEXT \ | jq 'del(.metadata.creationTimestamp,.metadata.resourceVersion,.metadata.uid)' \ | kubectl apply --context $CLUSTER_CONTEXT -f -
Verify the cert-manager resources
For clusters that have cert-manager
installed, verify that your cert-manager
issuer and certificate resources are ready. If the READY column says False for any of the following resources, describe the resource for more details and resolve the issue before continuing.
kubectl get issuer -n gloo-mesh --context $MGMT_CONTEXT
kubectl get certificates -n gloo-mesh --context $MGMT_CONTEXT
Now that your custom certificates are created, continue to the next section to modify your Gloo Mesh deployment to use these certificates.
Step 6: Install the Gloo management server and agents
Set up Gloo Network to use the client and server TLS certificates that you created earlier.
Prepare the Helm installation settings for the Gloo management server.
glooMgmtServer: relay: disableCa: true disableCaCertGeneration: true disableTokenGeneration: true # Secret containing server TLS certs used to secure the management server. tlsSecret: name: relay-server-tls-secret
Install a new or upgrade an existing Gloo management server with the Helm settings from the previous step.
Prepare the Helm installation settings for the Gloo agent.
glooAgent: relay: # gloo-mesh-mgmt-server IP address serverAddress: $MGMT_SERVER_NETWORKING_ADDRESS # Custom certs: Secret containing client TLS certs used to identify the Gloo agent to the management server. If you do not specify a clientTlssSecret, you must specify a tokenSecret and a rootTlsSecret. clientTlsSecret: name: gloo-mesh-agent-$CLUSTER_NAME-tls-cert
Register the workload cluster or upgrade an existing Gloo agent with the Helm settings from the previous step.
Verifying your relay certificate setup
- Check that the relay connection between the management server and workload agents is healthy.
- Forward port 9091 of the
gloo-mesh-mgmt-server
pod to your localhost.kubectl port-forward -n gloo-mesh --context $MGMT_CONTEXT deploy/gloo-mesh-mgmt-server 9091
- In your browser, connect to http://localhost:9091/metrics.
- In the metrics UI, look for the following lines. If the values are
1
, the agents in the workload clusters are successfully registered with the management server. If the values are0
, the agents are not successfully connected.relay_pull_clients_connected{cluster="cluster1"} 1 relay_pull_clients_connected{cluster="cluster2"} 1 # HELP relay_push_clients_connected Current number of connected Relay push clients (Relay Agents). # TYPE relay_push_clients_connected gauge relay_push_clients_connected{cluster="cluster1"} 1 relay_push_clients_connected{cluster="cluster2"} 1
- Forward port 9091 of the
- Review the Gloo UI. Check that the Overall Mesh Status is healthy and that your remote clusters are registered without any configuration issues.
meshctl dashboard --kubecontext $MGMT_CONTEXT
- If the setup is unsuccessful, continue to Troubleshooting.
Troubleshooting relay certificates
Review the health of your Gloo pods in the management and remote clusters.
Check that the
gloo-mesh-mgmt-server
andgloo-mesh-agent
pods are running.kubectl get pods -n gloo-mesh --context ${MGMT_CONTEXT} kubectl get pods -n gloo-mesh --context ${REMOTE_CONTEXT}
If the pods are not running, describe the pods and check the State and Last State sections for error messages and reasons why the pod might not be healthy. For example, the following error messages in the
gloo-mesh-mgmt-server
andgloo-mesh-agent
pods indicate that the secret is misnamed or missing. Check the secrets and names, upgrade your Helm installation, and try again.- Example error message for
gloo-mesh-mgmt-server
pod:
Message: 3 errors occurred: * no tls secret found for grpc server: Secret "relay-server-tls-secret" not found * could not find forwarding server token: no token secret found: Timeout: failed waiting for *v1.Secret Informer to sync * no tls secret found for grpc server: Secret "relay-server-tls-secret" not found
- Example error message for
gloo-mesh-agent
pod:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Created 84m (x25 over 3h28m) kubelet Created container gloo-mesh-agent Normal Pulled 84m (x24 over 3h28m) kubelet Container image "gcr.io/gloo-mesh/gloo-mesh-agent:1.2.3" already present on machine Normal Started 84m (x25 over 3h28m) kubelet Started container gloo-mesh-agent Warning FailedMount 84m kubelet MountVolume.SetUp failed for volume "kube-api-access-zlr9b" : [failed to fetch token: Post "https://kind2-control-plane:6443/api/v1/namespaces/gloo-mesh/serviceaccounts/gloo-mesh-agent/token": read tcp 172.18.0.5:47314->172.18.0.5:6443: use of closed network connection, failed to sync configmap cache: timed out waiting for the condition] Warning FailedMount 84m kubelet MountVolume.SetUp failed for volume "kube-api-access-zlr9b" : [failed to fetch token: Post "https://kind2-control-plane:6443/api/v1/namespaces/gloo-mesh/serviceaccounts/gloo-mesh-agent/token": read tcp 172.18.0.5:57262->172.18.0.5:6443: use of closed network connection, failed to sync configmap cache: timed out waiting for the condition] Warning FailedMount 83m kubelet MountVolume.SetUp failed for volume "kube-api-access-zlr9b" : [failed to fetch token: Post "https://kind2-control-plane:6443/api/v1/namespaces/gloo-mesh/serviceaccounts/gloo-mesh-agent/token": http2: client connection force closed via ClientConn.Close, failed to sync configmap cache: timed out waiting for the condition] Warning BackOff 72s (x522 over 3h28m) kubelet Back-off restarting failed container
- Example error message for
Check the Kubernetes logs for the
gloo-mesh-mgmt-server
andgloo-mesh-agent
pods in each cluster for errors. Look for errors during thegrpc
connection.- For example, the following error message indicates that the
gloo-mesh-mgmt-server
load balancer IP address was set incorrectly for the agent during the Helm installation.
{"level":"warn","ts":"2021-11-02T19:56:42.197Z"caller":"zap/ grpclogger.go:85","msg":"[core]grpcaddrConn.createTransport failed to connect to {34.145.18106:9900:9900 gloo-mesh-mgmt-server.gloo-mesh <nil> <nil>}. Err: connection error: desc = \"transport: Error while dialing dial tcp: address 34.145.184.106:9900:9900 too many colons in address\".
- The following
gloo-mesh-agent
pod error indicates that you need to follow the steps in ca.crt.
{"level":"fatal","ts":1640102555.6522746,"msg":"secrets \"relay-root-tls-secret\" not found","version":"1.3.0-beta6","stacktrace":"runtime.main\n\t/usr/local/go/src/runtime/proc.go:255"}
- The following errors indicate that the server or client TLS certificate is expired. Regenerate the certificate, restart the pods, and try again.
{"level":"error","ts":1650047047.6682806,"logger":"translator.reconcile-42","caller":"translator/reconciler.go:195","msg":"translation for parent object failed","parent":"istio-ingressgateway-istio-system-cluster1~gloo-mesh~cluster1~internal.gloo.solo.io/v2, Kind=DiscoveredGateway","err":"Gateway istio-ingressgateway.istio-system in cluster cluster1 not found in snapshot.","errVerbose":"Gateway istio-ingressgateway.istio-system in cluster cluster1 not found in snapshot.\n\ttranslator.(*translator).TranslateOutputs.func1:/src/pkg/translator/translator.go:163\n\ttranslator.(*translator).translateParallel:/src/pkg/translator/translator.go:189\n\tsets.(*discoveredGatewaySet).UnsortedList:/src/pkg/api/internal.gloo.solo.io/v2/sets/sets.go:999\n\tsets.(*resourceSet).UnsortedList:/go/pkg/mod/github.com/solo-io/skv2@v0.22.11/contrib/pkg/sets/sets.go:118\n\tsets.(*discoveredGatewaySet).UnsortedList.func1:/src/pkg/api/internal.gloo.solo.io/v2/sets/sets.go:994\n\ttranslator.(*translator).translateParallel.func1:/src/pkg/translator/translator.go:191\n\ttranslator.getValidEastWestIngressGateway:/src/pkg/translator/translator.go:426","stacktrace":"github.com/solo-io/gloo-mesh-enterprise/pkg/translator.(*reconciler).reconcilePrimary.func1\n\t/src/pkg/translator/reconciler.go:195\ngithub.com/solo-io/gloo-mesh-enterprise/pkg/utils/syncutils.(*workQueue).Execute.func1\n\t/src/pkg/utils/syncutils/parallel.go:52"}
{"level":"info","ts":1650046690.815508,"caller":"grpclog/grpclog.go:37","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc00111c9d0, {TRANSIENT_FAILURE connection error: desc = \"transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2022-04-15T18:18:10Z is after 2022-04-15T14:28:30Z\"}","system":"grpc","grpc_log":true}
- For example, the following error message indicates that the
For
gloo-mesh-agent
pods, make sure that the cluster name matches the registered cluster name.- Check the KubernetesCluster resources in the management cluster to get registered cluster names.
kubectl get kubernetesclusters --context $MGMT_CONTEXT
- Check that the registered cluster name matches the name in the client certificate that is issued by the root CA, specifically the DNS SAN extension.
- If the cluster names do not match, update the KubernetesCluster to have the same name, or re-issue the client certificate with the same name.
- Check the KubernetesCluster resources in the management cluster to get registered cluster names.
If you still have issues, review the Known issues.
Known issues
ca.crt
Although the ca.crt is included in the gloo-mesh-agent
certificate secret, the gloo-mesh-agent
still expects it to exist separately in the remote cluster. To copy it from the management cluster into the remote clusters, you can run the following command. Make sure to update $CLUSTER_NAME
with your remote cluster name.
CLUSTER_NAME=$REMOTE_CLUSTER
CLUSTER_CONTEXT=$REMOTE_CONTEXT
kubectl get secret gloo-mesh-agent-$CLUSTER_NAME-tls-cert \
--namespace gloo-mesh \
--output json \
--context $CLUSTER_CONTEXT \
| jq 'del(.metadata.creationTimestamp,.metadata.resourceVersion,.metadata.uid,.data."tls.key",.data."tls.crt",.metadata.annotations)' \
| sed 's/gloo-mesh-agent-$CLUSTER_NAME-tls-cert/relay-root-tls-secret/' \
| kubectl apply --context $CLUSTER_CONTEXT -f -