Kubernetes deserves more than ephemeral data, persist it with Rook

Kubernetes deserves more than ephemeral data, persist it with Rook

Alexander Trost August 23, 2022
Kubernetes deserves more than ephemeral data, persist it with Rook

tl;dr - With Rook and Ceph, data can finally be safely migrated to Kubernetes. The combination allows for stateful workloads, dynamic storage provisioning, object storage, and much more.

Stateless workloads on Kubernetes are easy to run, but the most important applications in your stack are probably stateful. At the end of the day, persisting data is crucial to doing useful work.

Kubernetes lets us pool compute from a given set of machines, but pooling storage safely, durably, and with high availability is a bit more difficult.

In the beginning: Manually provisioning Persistent Volumes for Kubernetes

Early on, Kubernetes administrators were forced to statically allocate drives and Persistent Volumes they intended to use, which can be hard to administer at scale.

While cluster worker node configuration is often static, machines can and do leave clusters all the time – scaling, real outages, and chaos engineering can all lead to fluctuation in worker nodes.

Kubernetes needed a dynamic, robust, and highly available solution for dealing with storage attached to Deployments, StatefulSets, and other workloads.

Dynamic provisioning for Kubernetes

At first, Dynamic provisioning for Kubernetes was easiest to get via hyperscalers – methods of storage that were built into hyperscale platforms were the easiest to request and use on demand.

Along with the storage provided by hyperscalers, advanced proprietary solutions such as Portworx along with some F/OSS software like Cinder and GlusterFS were made available and were actually included “in-tree” (in the Kubernetes source code).

It was only a matter of time until more highly available and robust storage systems got ported to run on Kubernetes and dynamically shepherd bits and bytes:

Ceph is an industry-leading, F/OSS solution for storage – trusted by large organizations who produce outstanding data volumes like CERN.

But before we dive into why Ceph is probably the storage solution you should be using, how do workloads even speak to Kubernetes?

Ways workloads and storage communicate on Kubernetes

The Kubernetes ecosystem moves fast, and Kubernetes storage has been no exception over the last few years.

From the days of statically pre-allocating PersistentVolumes, storage in Kubernetes has seen roughly 3 paradigm shifts:

The Kubernetes ecosystem has sought to build pluggable storage for workloads that is:

  • Dynamic (can be created on-demand)
  • Easy to extend (such that new storage providers can be easily added)
  • Can interact with workloads in a standardized and performant way

As FlexVolume has been deprecated, the Container Storage Interface is what the ecosystem has evolved to.

CSI is a standard (the spec is available online) for exposing block and file storage systems to containerized workloads – powered by gRPC, controllers and reconciliation loops that have made Kubernetes a success, it’s a stable and trusted solution to exposing storage systems to Kubernetes.

CSI is the right solution for getting storage to talk to containers, but we still need something to manage the storage itself!

Why Ceph?

While other storage providers have many options and features, Ceph is the clear choice for distributed stateful workloads on Kubernetes.

Ceph is Featureful

Ceph is a distributed filesystem which provides:

Ceph Features

Ceph is Robust

Ceph is robust, used by high performing organizations like CERN to handle massive amounts of data.

Ceph at CERN: A Year in the Life of a Petabyte-Scale Block Storage Service

Ceph is Progressive

In addition to being very robust and capable of handling large data volumes, Ceph is progressive.

Just like compute, the storage landscape has seen massive change in recent decades:

  • Popularization of SATA over IDE (PATA)
  • Emergence of the Solid State Drive (SSD)
  • Invention and spread of NVMe drives
  • Widespread use of checksumming, often involving previously research-grade tools like merkle trees

Ceph has kept pace with the broader storage landscape, adapting where necessary to take advantage of new technology.

Ceph offers advanced filesystem features

Ceph originally started with a storage engine named FileStore which was highly compatible with existing storage systems. These days, Ceph uses a storage system called BlueStore.

Bluestore has a dizzying array of features, some that required using more advanced filesystems (like ZFS) in the past:

  • Direct management of storage devices. BlueStore consumes raw block devices or partitions. This avoids intervening layers of abstraction (such as local file systems like XFS) that can limit performance or add complexity.
  • Metadata management with RocksDB. RocksDB’s key/value database is embedded in order to manage internal metadata, including the mapping of object names to block locations on disk.
  • Full data and metadata checksumming. By default, all data and metadata written to BlueStore is protected by one or more checksums. No data or metadata is read from disk or returned to the user without being verified.
  • Inline compression. Data can be optionally compressed before being written to disk.
  • Multi-device metadata tiering. BlueStore allows its internal journal (write-ahead log) to be written to a separate, high-speed device (like an SSD, NVMe, or NVDIMM) for increased performance. If a significant amount of faster storage is available, internal metadata can be stored on the faster device.
  • Efficient copy-on-write. RBD and CephFS snapshots rely on a copy-on-write clone mechanism that is implemented efficiently in BlueStore. This results in efficient I/O both for regular snapshots and for erasure-coded pools (which rely on cloning to implement efficient two-phase commits).

Ceph is well documented

How Ceph works is beyond the scope of discussion here, but worth pointing out is that it does something quite interesting that you might stumble upon if considering storage systems from first principles:

Ceph architecture diagram - https://docs.ceph.com/en/latest/architecture

Ceph delivers object storage, distributed file system, and block storage by layering on top of a distributed object store.

Distributed object stores are often the easiest to scale and manage (given intelligent clients and metadata), so Ceph makes the right choice there as well.

Rook is well integrated with Kubernetes

Rook goes above and beyond in its integration with Kubernetes, providing a full suite of CustomResourceDefinitions that make it easy and consistent to use in a Kubernetes cluster or across clusters.

Want to provision an object storage bucket? The relevant CRD would look like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: ceph-bucket [1]
  namespace: rook-ceph [2]
spec:
  bucketName: [3]
  generateBucketName: photo-booth [4]
  storageClassName: rook-ceph-bucket [4]
  additionalConfig: [5]
    maxObjects: "1000"
    maxSize: "2G"

Ceph and Kubernetes, held together with Rook

With Ceph’s robust storage and Kubernetes to manage, distribute, and orchestrate workloads, we’ve got a winning combination.

By using Ceph and Kubernetes, we can achieve a few different goals:

  • Networked, distributed and fault tolerant storage for our workloads
  • Avoid vendor lock-in but take advantage of vendor-specific storage features when needed (storage methods can be freely mixed)
  • Using well-known, well-documented tooling with enterprise support available

Ceph is also flexible enough to be used even in hybrid cloud deployments.

How Koor can help

Until now, operators ran Ceph outside their Kubernetes clusters with separate control and data planes. Expensive support contracts for Ceph and the cutting edge nature of Kubernetes made this a thorny problem.

At Koor, we help you provide robust, stable, scalable self-service Kubernetes storage to your team, offering support and advanced features in form of the Koor Storage Distribution.

Koor is commercial support for the Koor Storage Distribution, based on Rook, and we’re excited to start on our mission to bring robust, scalable, and featureful storage toi the world.

Ensure your company’s adoption of dynamic, on-demand storage for Kubernetes is successful with Koor.