Creating the cluster mesh
In this page, we will create a Calico Enterprise cluster mesh by connecting clusters together. Once created, Calico Enterprise cluster mesh enables multi-cluster networking, network policy for cross-cluster connections, cross-cluster services, and encryption via WireGuard.
Requirements
Calico Enterprise multi-cluster networking provides routing between clusters that preserves pod IPs. This section outlines the requirements for this routing to be established. If your network already provides routing between clusters that preserves pod IPs, you can skip this section.
Prerequisites for Calico Enterprise multi-cluster networking:
- All nodes participating in the cluster mesh must be able to establish connections to each other via their private IP.
- All nodes participating in the cluster mesh must have unique node names.
- Pod CIDRs between clusters must not overlap.
- All clusters must have at least one overlay network in common (VXLAN and/or WireGuard).
- All clusters must have the same
routeSource
setting onFelixConfiguration
.
If using VXLAN:
- The
vxlan*
settings onFelixConfiguration
must be the same across clusters participating in the mesh. - The underlying network must allow traffic on
vxlanPort
between clusters participating in the mesh. - All clusters must use Calico CNI.
If using WireGuard:
- The
wireguard*
settings onFelixConfiguration
must be the same across clusters participating in the mesh. - The underlying network must allow traffic on
wireguardListeningPort
between clusters participating in the mesh. - All clusters must use Calico CNI OR All clusters must use non-Calico CNI (mixing non-Calico CNI types is supported).
Note: much like intra-cluster routing in Calico Enterprise, cross-cluster routing can utilize both VXLAN and WireGuard at the same time. If both are enabled and a WireGuard peer is not ready, communication with that peer will fall back to VXLAN.
Setup
Generate credentials for cross-cluster resource synchronization
The basis of cluster mesh is the ability for a cluster connect to a remote cluster and sync data from it. This enables each Calico Enterprise cluster to have a view into the datastore that includes both local and remote cluster pods.
In this section, we will create a kubeconfig
for each cluster. This kubeconfig
is what other clusters will use to connect to a given cluster and synchronize data from it.
For each cluster in the cluster mesh, utilizing an existing kubeconfig
with administrative privileges, follow these steps:
-
Create the ServiceAccount used by remote clusters for authentication:
kubectl apply -f https://downloads.tigera.io/ee/v3.21.0-2.0/manifests/federation-remote-sa.yaml
-
Create the ClusterRole and ClusterRoleBinding used by remote clusters for authorization:
kubectl apply -f https://downloads.tigera.io/ee/v3.21.0-2.0/manifests/federation-rem-rbac-kdd.yaml
-
Create the ServiceAccount token that will be used in the
kubeconfig
:kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: tigera-federation-remote-cluster
namespace: calico-system
annotations:
kubernetes.io/service-account.name: "tigera-federation-remote-cluster"
EOF -
Use the following command to create your
kubeconfig
at$KUBECONFIG_NAME
:echo "apiVersion: v1
kind: Config
users:
- name: tigera-federation-remote-cluster
user:
token: $(kubectl get secret tigera-federation-remote-cluster -n calico-system -o=jsonpath='{.data.token}' | base64 -d)
clusters:
- name: tigera-federation-remote-cluster
cluster:
certificate-authority-data: $(kubectl config view --raw -o=jsonpath='{.clusters[0].cluster.certificate-authority-data}')
server: $(kubectl config view --raw -o=jsonpath='{.clusters[0].cluster.server}')
contexts:
- name: tigera-federation-remote-cluster-ctx
context:
cluster: tigera-federation-remote-cluster
user: tigera-federation-remote-cluster
current-context: tigera-federation-remote-cluster-ctx" > $KUBECONFIG_NAME -
Verify that the
kubeconfig
file works:Issue the following command to validate the
kubeconfig
file can be used to connect to the current cluster and access resources:kubectl --kubeconfig=$KUBECONFIG_NAME get nodes
Once you've created a kubeconfig
for each cluster, you can proceed to the next section to establish the cluster connections that form the mesh.
Establish cross-cluster resource synchronization
The cluster mesh is formed when each cluster connects to every other cluster to synchronize data. A cluster connects to another cluster using a RemoteClusterConfiguration, which references a kubeconfig created for the remote cluster.
In this section, within each cluster, we will create a RemoteClusterConfiguration for each other cluster in the mesh. This RemoteClusterConfiguration instructs the cluster to connect to a cluster using a kubeconfig.
With each cluster being connected to each other cluster, a full cluster mesh will be formed.
- Calico Enterprise multi-cluster routing
- Network-provided routing
Calico Enterprise achieves cross-cluster routing by extending the overlay network of a cluster to include nodes from remote clusters. This is made possible by each cluster having a view into the datastore that now includes remote pods and nodes.
For each pair of clusters in the cluster mesh (e.g. {A,B}, {A,C}, {B,C} for clusters A,B,C):
-
In cluster 1, create a secret that contains the
kubeconfig
for cluster 2:Determine the namespace (
<secret-namespace>
) for the secret to replace in all steps. The simplest method to create a secret for a remote cluster is to use thekubectl
command because it correctly encodes the data and formats the file.kubectl create secret generic remote-cluster-secret-name -n <secret-namespace> \
--from-literal=datastoreType=kubernetes \
--from-file=kubeconfig=<kubeconfig file> -
If RBAC is enabled in cluster 1, create a Role and RoleBinding for Calico Enterprise to use to access the secret that contains the
kubeconfig
for cluster 2:kubectl create -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: remote-cluster-secret-access
namespace: <secret-namespace>
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["watch", "list", "get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: remote-cluster-secret-access
namespace: <secret-namespace>
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: remote-cluster-secret-access
subjects:
- kind: ServiceAccount
name: calico-typha
namespace: calico-system
EOF -
Create the RemoteClusterConfiguration in cluster 1:
Within the RemoteClusterConfiguration, we specify the secret used to access cluster 2, and the overlay routing mode which toggles the establishment of cross-cluster overlay routes.
kubectl create -f - <<EOF
apiVersion: projectcalico.org/v3
kind: RemoteClusterConfiguration
metadata:
name: cluster-2
spec:
clusterAccessSecret:
name: remote-cluster-secret-name
namespace: <secret-namespace>
kind: Secret
syncOptions:
overlayRoutingMode: Enabled
EOF -
Validate the that the remote cluster connection can be established.
-
Repeat the above steps, switching cluster 1 and cluster 2.
In this setup, the cluster mesh will rely on the underlying network to provides cross-cluster routing that preserves pod IPs.
For each pair of clusters in the cluster mesh (e.g. {A,B}, {A,C}, {B,C} for clusters A,B,C):
-
In cluster 1, create a secret that contains the
kubeconfig
for cluster 2:Determine the namespace (
<secret-namespace>
) for the secret to replace in all steps. The simplest method to create a secret for a remote cluster is to use thekubectl
command because it correctly encodes the data and formats the file.kubectl create secret generic remote-cluster-secret-name -n <secret-namespace> \
--from-literal=datastoreType=kubernetes \
--from-file=kubeconfig=<kubeconfig file> -
If RBAC is enabled in cluster 1, create a Role and RoleBinding for Calico Enterprise to use to access the secret that contains the
kubeconfig
for cluster 2:kubectl create -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: remote-cluster-secret-access
namespace: <secret-namespace>
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["watch", "list", "get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: remote-cluster-secret-access
namespace: <secret-namespace>
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: remote-cluster-secret-access
subjects:
- kind: ServiceAccount
name: calico-typha
namespace: calico-system
EOF -
Create the RemoteClusterConfiguration in cluster 1:
Within the RemoteClusterConfiguration, we specify the secret used to access cluster 2, and the overlay routing mode which toggles the establishment of cross-cluster overlay routes.
kubectl create -f - <<EOF
apiVersion: projectcalico.org/v3
kind: RemoteClusterConfiguration
metadata:
name: cluster-2
spec:
clusterAccessSecret:
name: remote-cluster-secret-name
namespace: <secret-namespace>
kind: Secret
syncOptions:
overlayRoutingMode: Disabled
EOF -
If you have no IP pools in cluster 1 with NAT-outgoing enabled, skip this step.
Otherwise, if you have IP pools in cluster 1 with NAT-outgoing enabled, and workloads in that pool will egress to workloads in cluster 2, you need to instruct Calico Enterprise to not perform NAT on traffic destined for IP pools in cluster 2.
You can achieve this by creating a disabled IP pool in cluster 1 for each CIDR in cluster 2. This IP pool should have NAT-outgoing disabled. For example:
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: cluster2-main-pool
spec:
cidr: <Cluster 2 CIDR>
disabled: true -
Validate the that the remote cluster connection can be established.
-
Repeat the above steps, switching cluster 1 and cluster 2.
🎉 Done!
After completing the above steps for all cluster pairs in the cluster mesh, your clusters should now be forming a cluster mesh! You should now be able to route traffic between clusters, and write policy that can select remote workloads.
A cluster in the mesh can write policy rules that select pods from other clusters in the mesh. This is because traffic from remote clusters has pod IPs preserved, and the local cluster can associate remote pod IPs with the pod specs it synchronized from remote clusters.
How to
Switch to multi-cluster networking
The steps above assume that you are configuring both federated endpoint identity and multi-cluster networking for the first time. If you already have federated endpoint identity, and want to use multi-cluster networking, follow these steps:
- Validate that all requirements for multi-cluster networking have been met.
- Update the ClusterRole in each cluster in the cluster mesh using the RBAC manifest found in Generate credentials for cross-cluster authentication
- In all RemoteClusterConfigurations, set
Spec.OverlayRoutingMode
toEnabled
. - Verify that all RemoteClusterConfigurations are bidirectional (in both directions for each cluster pair) using these instructions.
- If you had previously created disabled IP pools to prevent NAT outgoing from applying to remote cluster destinations, those disabled IP pools are no longer needed when using multi-cluster networking and must be deleted.
Validate federated endpoint identity & multi-cluster networking
Validate RemoteClusterConfiguration and federated endpoint identity
Check remote cluster connection
You can validate in a local cluster that Typha has synced to the remote cluster through the Prometheus metrics for Typha.
Alternatively, you can check the Typha logs for remote cluster connection status. Run the following command:
kubectl logs deployment/calico-typha -n calico-system | grep "Sending in-sync update"
You should see an entry for each RemoteClusterConfiguration in the local cluster.
If either output contains unexpected results, proceed to the troubleshooting section.
Validate multi-cluster networking
If all requirements were met for Calico Enterprise to establish multi-cluster networking, you can test the functionality by establishing a connection from a pod in a local cluster to the IP of a pod in a remote cluster. Ensure that there is no policy in either cluster that may block this connection.
If the connection fails, proceed to the troubleshooting section.
Create remote-identity-aware network policy
With federated endpoint identity and routing between clusters established, you can now use labels to reference endpoints on a remote cluster in local policy rules, rather than referencing them by IP address.
The main policy selector still refers only to local endpoints; and that selector chooses which local endpoints to apply the policy. However, rule selectors can now refer to both local and remote endpoints.
In the following example, cluster A (an application cluster) has a network policy that governs outbound connections to cluster B (a database cluster).
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: default.app-to-db
namespace: myapp
spec:
# The main policy selector selects endpoints from the local cluster only.
selector: app == 'backend-app'
tier: default
egress:
- destination:
# Rule selectors can select endpoints from local AND remote clusters.
selector: app == 'postgres'
protocol: TCP
ports: [5432]
action: Allow
Troubleshoot
Troubleshoot RemoteClusterConfiguration and federated endpoint identity
Verify configuration
For each impacted remote cluster pair (between cluster A and cluster B):
-
Retrieve the
kubeconfig
from the secret stored in cluster A. Manually verify that it can be used to connect to cluster B.kubectl get secret -n <secret-namespace> remote-cluster-secret-name -o=jsonpath="{.data.kubeconfig}" | base64 -d > verify_kubeconfig_b
kubectl --kubeconfig=verify_kubeconfig_b get nodesThis validates that the credentials used by Typha to connect to cluster B's API server are stored in the correct location and provide sufficient access.
The command above should yield a result like the following:
NAME STATUS ROLES AGE VERSION
clusterB-master Ready master 7d v1.27.0
clusterB-worker-1 Ready worker 7d v1.27.0
clusterB-worker-2 Ready worker 7d v1.27.0If you do not see the nodes of cluster B listed in response to the command above, verify that you created the
kubeconfig
for cluster B correctly, and that you stored it in cluster A correctly.If you do see the nodes of cluster B listed in response to the command above, you can run this test (or a similar test) on a node in cluster A to verify that cluster A nodes can connect to the API server of cluster B.
-
Validate that the Typha service account in Cluster A is authorized to retrieve the
kubeconfig
secret for cluster B.kubectl auth can-i list secrets --namespace <secret-namespace> --as=system:serviceaccount:calico-system:calico-typha
This command should yield the following output:
yes
If the command does not return this output, verify that you correctly configured RBAC in cluster A.
-
Repeat the above, switching cluster A to cluster B.
Check logs
Validate that querying Typha logs yield the expected result outlined in the validation section.
If the Typha logs do not yield the expected result, review the warning or error-related logs in typha
or calico-node
for insights.
calicoq
calicoq can be used to emulate the connection that Typha will make to remote clusters. Use the following command:
calicoq eval "all()"
If all remote clusters are accessible, calicoq returns something like the following. Note the remote cluster prefixes: there should be endpoints prefixed with the name of each RemoteClusterConfiguration in the local cluster.
Endpoints matching selector all():
Workload endpoint remote-cluster-1/host-1/k8s/kube-system.kube-dns-5fbcb4d67b-h6686/eth0
Workload endpoint remote-cluster-1/host-2/k8s/kube-system.cnx-manager-66c4dbc5b7-6d9xv/eth0
Workload endpoint host-a/k8s/kube-system.kube-dns-5fbcb4d67b-7wbhv/eth0
Workload endpoint host-b/k8s/kube-system.cnx-manager-66c4dbc5b7-6ghsm/eth0
If this command fails, the error messages returned by the command may provide insight into where issues are occurring.
Troubleshoot multi-cluster networking
Basic validation
- Ensure that RemoteClusterConfiguration and federated endpoint identity are functioning correctly
- Verify that you have met the prerequisites for multi-cluster networking
- If you had previously set up RemoteClusterConfigurations without multi-cluster networking, and are upgrading to use the feature, review the switching considerations
- Verify that traffic between clusters is not being denied by network policy
Check overlayRoutingMode
Ensure that overlayRoutingMode
is set to "Enabled"
on all RemoteClusterConfigurations.
If overlay routing is successfully enabled, you can view the logs of a Typha instance using:
kubectl logs deployment/calico-typha -n calico-system
You should see an output for each connected remote cluster that looks like this:
18:49:35.394 [INFO][14] wrappedcallbacks.go 443: Creating syncer for RemoteClusterConfiguration(my-cluster)
18:49:35.394 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/ipam/v2/assignment/"
18:49:35.395 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/workloadendpoints"
18:49:35.396 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints"
18:49:35.396 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/profiles"
18:49:35.396 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
18:49:35.397 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/ippools"
If you do not see the each of the resource types above, overlay routing was not successfully enabled in your cluster. Verify that you followed the setup correctly for overlay routing, and that the cluster is using a version of Calico Enterprise that supports multi-cluster networking.
Check logs
Warning or error logs in typha
or calico-node
may provide insight into where issues are occurring.