We put Ceph on Nitro. It’s faster, but how much faster?
We put Ceph on Nitro. It’s faster, but how much faster?

tl;dr: We put Ceph on ⚡AWS Nitro⚡ (on Kubernetes) to see how fast it would go. The untuned setup runs ~44% faster, in fact. Skip to the results to find out more.
Ceph is at the cutting edge of open source distributed storage, and AWS has been at the cutting edge of data center operations for a long time.
What’s Ceph?
Ceph is a fully F/OSS, robust, highly available, and highly durable distributed storage system.
If you’re running it on Kubernetes, you’re going to want to use Rook (and keep your operations load light with Koor).
What’s Nitro?
AWS Nitro is a “quiescent” solution for hypervisor-controlled multi-tenancy on AWS’s custom hardware.
Put simply, Nitro makes it possible for machines provided by AWS to deliver a nearer-to-bare-metal performance experience, and for bare metal machines Nitro gets out of the way as much as possible to enable increased bare metal performance.
AWS has great talks from 2017 (the system has been in development since at 2013) on Nitro:
AWS re:Invent 2017: C5 Instances and the Evolution of Amazon EC2 Virtualization (CMP332)
AWS re:Invent 2018: Powering Next-Gen EC2 Instances: Deep Dive into the Nitro System (CMP303-R1)
AWS re:Invent 2019: Powering next-gen Amazon EC2: Deep dive into the Nitro system (CMP303-R2)
The question we want to answer
Well, how do we know how much faster Nitro is? We’ll have to get a realistic storage setup running on AWS and test a Nitro setup versus one not running on AWS’s next-gen hypervisor architecture.
First, the results
We’ve posed the burning question, and we won’t keep you waiting for the answers.
Graph
Here’s the TPS we observed across the runs:
TPS results from runs
Tables
In tabular form:
Run # | Stock TPS | Nitro TPS |
---|---|---|
1 | 369.90 | 545.16 |
2 | 383.34 | 535.89 |
How much did Nitro add?
As this setup has not been tuned (we’re not using provisioned IOPS, Rook is not tuned and neither is Postgres), the base level of transactions per second is quite low.
That said, the important bit here is that “simply” switching the instance type (m4.xlarge
to nitro-enabled m5.xlarge
) has rendered a ~45% increase in performance (as measured by TPS)!
With Rook managing storage and Nitro enhancing performance, we’ve got an easy to use low-TPS but production-grade ready to use file system deployed, running a database workload.
The setup
Interested in digging into how we set up the clusters and got those numbers? Continue reading. Alternatively you can dive into and run the code yourself.
Get the code
We’ve built a fully infrastructure-as-code repository that makes it easy to replicate our results!
Try the experiment yourself at opencoreventures/experiments-ceph-on-nitro
Hardware provisioning with Pulumi
Pulumi is a powerful solution for provisioning cloud resources as code. With Pulumi, you can provision cloud resources as code – make use of their custom resources to create abstractions (ex. an ObjectStorage
resource which works across AWS, GCP, and Azure).
Here’s a snippet from our code to provision our SSH keys and an instance:
|
|
This code does a bunch of things that are hard to do with other solutions:
- Makes use of ENV variables seamlessly
- Exposes the full power of NodeJS ecosystem
- Reads in files from the local filesystem easily
An alternative to Pulumi is Crossplane which manages all your compute, storage and other resources even inside your existing Kubernetes cluster.
Kubernetes for workload orchestration, provided by k0s
Since we’re using Ceph we aim to build a cluster of storage, and managing machines in a cluster is pretty easy these days with Kubernetes.
Running Kubernetes is made even easier by the k0s project (created by the folks over at Mirantis), so we’ll be using that to build our cluster (over an alternative like kubeadm
).
Here’s how easy it is to start a Kubernetes cluster with k0s
:
|
|
Of course, this is a template, which hasn’t been interpolated yet, but it’s that easy!
To make this run of course, so the Makefile
target looks a little like this:
|
|
Rook for Ceph cluster management
Rook makes running Ceph clusters a breeze on Kubernetes.
Installing Rook on Kubernetes is as easy as pie:
Testing Methodology
The methodology is pretty simple – as stated in the README, we’re going to:
- Provision compute resources on AWS
- Set up a k8s cluster on those machines
- Install Rook
- Run some workload simulations (ex.
pgbench
)
Then, we’re going to do the same thing again, but the second time will be ⚡supercharged by AWS Nitro⚡.
Huge thanks to Alexander, Founding Engineer @ Koor for flexing his expertise and helping resolve Rook cluster setup issues!
Wrapup
Well clearly AWS’s Nitro system is very impressive at increasing the throughput of I/O bound workloads! It’s not as dramatic as a change from HDD to SSD or SSD to NVMe but it’s certainly a huge step up, with not much more than an instance type change (and a few code changes).