Skip to main content
Calico Enterprise 3.21 (early preview) documentation

Creating the cluster mesh

In this page, we will create a Calico Enterprise cluster mesh by connecting clusters together. Once created, Calico Enterprise cluster mesh enables multi-cluster networking, network policy for cross-cluster connections, cross-cluster services, and encryption via WireGuard.

Requirements

Calico Enterprise multi-cluster networking provides routing between clusters that preserves pod IPs. This section outlines the requirements for this routing to be established. If your network already provides routing between clusters that preserves pod IPs, you can skip this section.

Prerequisites for Calico Enterprise multi-cluster networking:

  • All nodes participating in the cluster mesh must be able to establish connections to each other via their private IP.
  • All nodes participating in the cluster mesh must have unique node names.
  • Pod CIDRs between clusters must not overlap.
  • All clusters must have at least one overlay network in common (VXLAN and/or WireGuard).
  • All clusters must have the same routeSource setting on FelixConfiguration.

If using VXLAN:

  • The vxlan* settings on FelixConfiguration must be the same across clusters participating in the mesh.
  • The underlying network must allow traffic on vxlanPort between clusters participating in the mesh.
  • All clusters must use Calico CNI.

If using WireGuard:

  • The wireguard* settings on FelixConfiguration must be the same across clusters participating in the mesh.
  • The underlying network must allow traffic on wireguardListeningPort between clusters participating in the mesh.
  • All clusters must use Calico CNI OR All clusters must use non-Calico CNI (mixing non-Calico CNI types is supported).

Note: much like intra-cluster routing in Calico Enterprise, cross-cluster routing can utilize both VXLAN and WireGuard at the same time. If both are enabled and a WireGuard peer is not ready, communication with that peer will fall back to VXLAN.

Setup

Generate credentials for cross-cluster resource synchronization

mental model

The basis of cluster mesh is the ability for a cluster connect to a remote cluster and sync data from it. This enables each Calico Enterprise cluster to have a view into the datastore that includes both local and remote cluster pods.

In this section, we will create a kubeconfig for each cluster. This kubeconfig is what other clusters will use to connect to a given cluster and synchronize data from it.

For each cluster in the cluster mesh, utilizing an existing kubeconfig with administrative privileges, follow these steps:

  1. Create the ServiceAccount used by remote clusters for authentication:

    kubectl apply -f https://downloads.tigera.io/ee/v3.21.0-2.0/manifests/federation-remote-sa.yaml
  2. Create the ClusterRole and ClusterRoleBinding used by remote clusters for authorization:

    kubectl apply -f https://downloads.tigera.io/ee/v3.21.0-2.0/manifests/federation-rem-rbac-kdd.yaml
  3. Create the ServiceAccount token that will be used in the kubeconfig:

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: Secret
    type: kubernetes.io/service-account-token
    metadata:
    name: tigera-federation-remote-cluster
    namespace: calico-system
    annotations:
    kubernetes.io/service-account.name: "tigera-federation-remote-cluster"
    EOF
  4. Use the following command to create your kubeconfig at $KUBECONFIG_NAME:

    echo "apiVersion: v1
    kind: Config
    users:
    - name: tigera-federation-remote-cluster
    user:
    token: $(kubectl get secret tigera-federation-remote-cluster -n calico-system -o=jsonpath='{.data.token}' | base64 -d)
    clusters:
    - name: tigera-federation-remote-cluster
    cluster:
    certificate-authority-data: $(kubectl config view --raw -o=jsonpath='{.clusters[0].cluster.certificate-authority-data}')
    server: $(kubectl config view --raw -o=jsonpath='{.clusters[0].cluster.server}')
    contexts:
    - name: tigera-federation-remote-cluster-ctx
    context:
    cluster: tigera-federation-remote-cluster
    user: tigera-federation-remote-cluster
    current-context: tigera-federation-remote-cluster-ctx" > $KUBECONFIG_NAME
  5. Verify that the kubeconfig file works:

    Issue the following command to validate the kubeconfig file can be used to connect to the current cluster and access resources:

    kubectl --kubeconfig=$KUBECONFIG_NAME get nodes

Once you've created a kubeconfig for each cluster, you can proceed to the next section to establish the cluster connections that form the mesh.

Establish cross-cluster resource synchronization

mental model

The cluster mesh is formed when each cluster connects to every other cluster to synchronize data. A cluster connects to another cluster using a RemoteClusterConfiguration, which references a kubeconfig created for the remote cluster.

In this section, within each cluster, we will create a RemoteClusterConfiguration for each other cluster in the mesh. This RemoteClusterConfiguration instructs the cluster to connect to a cluster using a kubeconfig.

With each cluster being connected to each other cluster, a full cluster mesh will be formed.

mental model

Calico Enterprise achieves cross-cluster routing by extending the overlay network of a cluster to include nodes from remote clusters. This is made possible by each cluster having a view into the datastore that now includes remote pods and nodes.

For each pair of clusters in the cluster mesh (e.g. {A,B}, {A,C}, {B,C} for clusters A,B,C):

  1. In cluster 1, create a secret that contains the kubeconfig for cluster 2:

    Determine the namespace (<secret-namespace>) for the secret to replace in all steps. The simplest method to create a secret for a remote cluster is to use the kubectl command because it correctly encodes the data and formats the file.

    kubectl create secret generic remote-cluster-secret-name -n <secret-namespace> \
    --from-literal=datastoreType=kubernetes \
    --from-file=kubeconfig=<kubeconfig file>
  2. If RBAC is enabled in cluster 1, create a Role and RoleBinding for Calico Enterprise to use to access the secret that contains the kubeconfig for cluster 2:

    kubectl create -f - <<EOF
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
    name: remote-cluster-secret-access
    namespace: <secret-namespace>
    rules:
    - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["watch", "list", "get"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
    name: remote-cluster-secret-access
    namespace: <secret-namespace>
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: Role
    name: remote-cluster-secret-access
    subjects:
    - kind: ServiceAccount
    name: calico-typha
    namespace: calico-system
    EOF
  3. Create the RemoteClusterConfiguration in cluster 1:

    Within the RemoteClusterConfiguration, we specify the secret used to access cluster 2, and the overlay routing mode which toggles the establishment of cross-cluster overlay routes.

    kubectl create -f - <<EOF
    apiVersion: projectcalico.org/v3
    kind: RemoteClusterConfiguration
    metadata:
    name: cluster-2
    spec:
    clusterAccessSecret:
    name: remote-cluster-secret-name
    namespace: <secret-namespace>
    kind: Secret
    syncOptions:
    overlayRoutingMode: Enabled
    EOF
  4. Validate the that the remote cluster connection can be established.

  5. Repeat the above steps, switching cluster 1 and cluster 2.

🎉 Done!

After completing the above steps for all cluster pairs in the cluster mesh, your clusters should now be forming a cluster mesh! You should now be able to route traffic between clusters, and write policy that can select remote workloads.

mental model

A cluster in the mesh can write policy rules that select pods from other clusters in the mesh. This is because traffic from remote clusters has pod IPs preserved, and the local cluster can associate remote pod IPs with the pod specs it synchronized from remote clusters.

How to

Switch to multi-cluster networking

The steps above assume that you are configuring both federated endpoint identity and multi-cluster networking for the first time. If you already have federated endpoint identity, and want to use multi-cluster networking, follow these steps:

  1. Validate that all requirements for multi-cluster networking have been met.
  2. Update the ClusterRole in each cluster in the cluster mesh using the RBAC manifest found in Generate credentials for cross-cluster authentication
  3. In all RemoteClusterConfigurations, set Spec.OverlayRoutingMode to Enabled.
  4. Verify that all RemoteClusterConfigurations are bidirectional (in both directions for each cluster pair) using these instructions.
  5. If you had previously created disabled IP pools to prevent NAT outgoing from applying to remote cluster destinations, those disabled IP pools are no longer needed when using multi-cluster networking and must be deleted.

Validate federated endpoint identity & multi-cluster networking

Validate RemoteClusterConfiguration and federated endpoint identity

Check remote cluster connection

You can validate in a local cluster that Typha has synced to the remote cluster through the Prometheus metrics for Typha.

Alternatively, you can check the Typha logs for remote cluster connection status. Run the following command:

kubectl logs deployment/calico-typha -n calico-system | grep "Sending in-sync update"

You should see an entry for each RemoteClusterConfiguration in the local cluster.

If either output contains unexpected results, proceed to the troubleshooting section.

Validate multi-cluster networking

If all requirements were met for Calico Enterprise to establish multi-cluster networking, you can test the functionality by establishing a connection from a pod in a local cluster to the IP of a pod in a remote cluster. Ensure that there is no policy in either cluster that may block this connection.

If the connection fails, proceed to the troubleshooting section.

Create remote-identity-aware network policy

With federated endpoint identity and routing between clusters established, you can now use labels to reference endpoints on a remote cluster in local policy rules, rather than referencing them by IP address.

The main policy selector still refers only to local endpoints; and that selector chooses which local endpoints to apply the policy. However, rule selectors can now refer to both local and remote endpoints.

In the following example, cluster A (an application cluster) has a network policy that governs outbound connections to cluster B (a database cluster).

apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: default.app-to-db
namespace: myapp
spec:
# The main policy selector selects endpoints from the local cluster only.
selector: app == 'backend-app'
tier: default
egress:
- destination:
# Rule selectors can select endpoints from local AND remote clusters.
selector: app == 'postgres'
protocol: TCP
ports: [5432]
action: Allow

Troubleshoot

Troubleshoot RemoteClusterConfiguration and federated endpoint identity

Verify configuration

For each impacted remote cluster pair (between cluster A and cluster B):

  1. Retrieve the kubeconfig from the secret stored in cluster A. Manually verify that it can be used to connect to cluster B.

    kubectl get secret -n <secret-namespace> remote-cluster-secret-name -o=jsonpath="{.data.kubeconfig}" | base64 -d > verify_kubeconfig_b
    kubectl --kubeconfig=verify_kubeconfig_b get nodes

    This validates that the credentials used by Typha to connect to cluster B's API server are stored in the correct location and provide sufficient access.

    The command above should yield a result like the following:

     NAME              STATUS   ROLES    AGE   VERSION
    clusterB-master Ready master 7d v1.27.0
    clusterB-worker-1 Ready worker 7d v1.27.0
    clusterB-worker-2 Ready worker 7d v1.27.0

    If you do not see the nodes of cluster B listed in response to the command above, verify that you created the kubeconfig for cluster B correctly, and that you stored it in cluster A correctly.

    If you do see the nodes of cluster B listed in response to the command above, you can run this test (or a similar test) on a node in cluster A to verify that cluster A nodes can connect to the API server of cluster B.

  2. Validate that the Typha service account in Cluster A is authorized to retrieve the kubeconfig secret for cluster B.

     kubectl auth can-i list secrets --namespace <secret-namespace> --as=system:serviceaccount:calico-system:calico-typha

    This command should yield the following output:

    yes

    If the command does not return this output, verify that you correctly configured RBAC in cluster A.

  3. Repeat the above, switching cluster A to cluster B.

Check logs

Validate that querying Typha logs yield the expected result outlined in the validation section.

If the Typha logs do not yield the expected result, review the warning or error-related logs in typha or calico-node for insights.

calicoq

calicoq can be used to emulate the connection that Typha will make to remote clusters. Use the following command:

calicoq eval "all()"

If all remote clusters are accessible, calicoq returns something like the following. Note the remote cluster prefixes: there should be endpoints prefixed with the name of each RemoteClusterConfiguration in the local cluster.

Endpoints matching selector all():
Workload endpoint remote-cluster-1/host-1/k8s/kube-system.kube-dns-5fbcb4d67b-h6686/eth0
Workload endpoint remote-cluster-1/host-2/k8s/kube-system.cnx-manager-66c4dbc5b7-6d9xv/eth0
Workload endpoint host-a/k8s/kube-system.kube-dns-5fbcb4d67b-7wbhv/eth0
Workload endpoint host-b/k8s/kube-system.cnx-manager-66c4dbc5b7-6ghsm/eth0

If this command fails, the error messages returned by the command may provide insight into where issues are occurring.

Troubleshoot multi-cluster networking

Basic validation
  • Ensure that RemoteClusterConfiguration and federated endpoint identity are functioning correctly
  • Verify that you have met the prerequisites for multi-cluster networking
  • If you had previously set up RemoteClusterConfigurations without multi-cluster networking, and are upgrading to use the feature, review the switching considerations
  • Verify that traffic between clusters is not being denied by network policy
Check overlayRoutingMode

Ensure that overlayRoutingMode is set to "Enabled" on all RemoteClusterConfigurations.

If overlay routing is successfully enabled, you can view the logs of a Typha instance using:

kubectl logs deployment/calico-typha -n calico-system

You should see an output for each connected remote cluster that looks like this:

18:49:35.394 [INFO][14] wrappedcallbacks.go 443: Creating syncer for RemoteClusterConfiguration(my-cluster)
18:49:35.394 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/ipam/v2/assignment/"
18:49:35.395 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/workloadendpoints"
18:49:35.396 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints"
18:49:35.396 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/profiles"
18:49:35.396 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
18:49:35.397 [INFO][14] watchercache.go 186: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/ippools"

If you do not see the each of the resource types above, overlay routing was not successfully enabled in your cluster. Verify that you followed the setup correctly for overlay routing, and that the cluster is using a version of Calico Enterprise that supports multi-cluster networking.

Check logs

Warning or error logs in typha or calico-node may provide insight into where issues are occurring.

Next steps

Configure federated services