Multus is the way to go for Rook Ceph networking

Have you ever wanted to add more than one network interface to your Pods? Need a safer way to connect a legacy application to multiple network VLANs in Kubernetes? Let’s see how we can achieve this for a Rook Ceph cluster.

Normally in Kubernetes a Pod only has a single interface to communicate on the cluster network. This is and should be enough for most applications, but for production Rook Ceph deployments it isn’t good enough. From Ceph’s recommendation, you should preferably use two different networks. A public network on which clients can and will talk to all the Ceph cluster components and the second cluster network. The cluster network, as the name might imply is used for certain Ceph cluster traffic, to be exact the Ceph OSD data replication traffic.

A long time ago, kube-proxy was still using iptables (by default at the time) to route traffic to service IPs, the hostNetwork option was a common way to increase network “throughput”. hostNetwork: true does that by exposing the node’s network stack to the Pod’s containers. At first glance, this might sound great, but it comes with some drawbacks in the security department. I can still remember jokingly running shutdown in a hostNetwork Pod and, let’s just say I was thankfully able to somehow power on the server again through good old remote management interface (IPMI). So take that hostNetwork mode, you can weaken the isolation of containers by a lot.

You will end up with more traffic on the cluster network than on the public network for a simple reason, a client needs to “write” data once to an OSD, but the OSD needs to talk with an X amount of other OSDs to fulfill the replication requirement of the storage pool.

A simplified diagram of this flow of data from a client:

--- title: Ceph Data Replication Flow --- sequenceDiagram Client->>OSD1: Write data to volume OSD1->>OSD2: Store one replica of this data OSD1->>OSD3: Store one replica of this data OSD1-->>OSD1: Write data to disk OSD2-->>OSD2: Write data to disk OSD2->>OSD1: Data has been saved OSD1-->>OSD1: Write data to disk OSD3->>OSD1: Data has been saved OSD1->>Client: Write Confirmation

Sidenote: That’s a reason why Ceph can appear slower in direct comparisons with other storage projects because an input/ “data write” operation is only confirmed after it has been fully replicated.


So where does Multus come into play here?

Multus allows you to attach one or more (specific) network interfaces to your Pods. Making the whole ordeal of setting a Pod’s network interfaces to a Pod streamlined. There are still going to be some security implications when you, e.g., attach a node’s network interface to a Pod, but at least it is made transparent through Multus’ Custom Resource Definitions.

A security team could simply restrict access to these “network definitions” using RBAC in Kubernetes. This in combination with a policy agent (e.g., Open Policy Agent (OPA)), can help enforce certain “network access policies”. For monitoring/ auditing as well, you can just keep an eye on which network definitions are used by which Pods.

Let’s assume we have a Kubernetes node with two physically connected network interfaces. Let’s stick to the “good old” interface naming schema to keep it simple: eth0 and eth1 interface. šŸ˜‰

eth0 is used as the “default” interface of the node, we will be using eth0 for the Ceph public network (client traffic) and eth1 for Ceph’s OSD replication. For simplicity, we’ll assume both networks have a DHCP server running.

To get started, we need to create two NetworkAttachDefinitions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ceph-public-net
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "host-device",
      "device": "eth0",
      "ipam": {
        "type": "dhcp"
      }
    }'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ceph-cluster-net
spec:
  config: '{
      "cniVersion": "0.3.1",
      "type": "host-device",
      "device": "eth1",
      "ipam": {
        "type": "dhcp"
      }
    }'

Short explanation on what the .spec.config here means:

  • "type": "host-device" uses the host-device CNI plugin to “Move an already-existing device into a container.”
  • ipam section, "type": "dhcp" dhcp CNI plugin tells the CNI to get an IP from a DHCP server.

You can run kubectl get network-attachment-definitions to confirm that both NetworkAttachmentDefinitions have been created.

Warning: For existing clusters, you currently can’t easily switch from, e.g., “container network” to hostNetwork mode/ Multus.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  [...]
  network:
    provider: multus
    selectors:
      public: ceph-public-net
      cluster: ceph-cluster-net
  [...]

(Documentation: Ceph Cluster CRD - Multus Configuration - Rook Ceph v1.11)

This will tell the Rook Ceph operator to “attach” the Multus network annotations to the Ceph components, no need to add anything else to the CephCluster object.

To summarize, we can use Multus to more specifically and easily have a Rook Ceph cluster use two different networks for performance reasons.


Looking back at the time I implemented hostNetwork mode in Rook, it is still the simplest way to “skip the container network” to gain more performance (depending on the CNI encapsulation, etc., used by your Kubernetes cluster network) or expose a service to other clusters/ servers which “can’t be just ’loadbalanced’/ proxied”.

We are looking into improving the existing documentation and examples, to make it easier for people to use Multus, instead of hostNetwork mode, with their Rook Ceph clusters.

If you want to get a more in-depth look at what Multus can do, be sure to check out this great post by devopstales here: Use Multus CNI in Kubernetes - devopstales. To look at what other plugins and config options the CNI project and plugins have, check out CNI Documentation - Plugins Overview.

Thanks for reading!

Alexander Trost May 10, 2023