Rook

Data and Persistence

Aren’t we all loving the comfort of the cloud? Simple backup and also sharing of pictures as an example. Ignoring privacy concerns for now when using a company for that, instead of e.g., self hosting, which would be a whole other topic. I love being able to take pictures of my cats, the landscape and my food and sharing the pictures. Sharing a picture with the world or just your family is just a few clicks away. The best of that, even my mother can do it.

Imagine the following situation. Your phone has been stolen and all your pictures in the cloud have been deleted due to an software bug.

I, personally would probably get a heart attack just thinking about that as I am a person which likes to look at old pictures from time to time to remember happenings and friends made during the time.

You may ask yourself what does this have to do with “Data and Persistence”. There is a simple answer for that. Pictures are data and the persistence is, well in this case, gone because your data has been deleted.

Persistence of Data has a different level of importance to each one of us. A student in America may hope for the persistence to be lost on his student debts and the other may have a job agency which basically relies on keeping the data of their clients not only available and intact but also secure.

Data Principles

TODO Is this needed? GDPR principles or the BSI principles I had in my apprenticeship time.

Storage: What is the right one?

Block storage

Will give you block devices on which you can then format as you need, just like a “normal” disk attached to your system. Block storage is used for applications, such as MySQL, PostgreSQL and more, which need the “raw” performance of block devices and the caching coming with that.

Filesystem storage

Is basically a “normal” filesystem which can be consumed directly. This is a good way to share data between multiple applications in a read and write many manner. This is commonly used to share AI models or scientific data between multiple running jobs/applications.

Technical side note: if you have very very old/legacy applications which are not really 64bit compatible, you might run into (stat syscall used) problems when the filesystem is using 64bit inodes.

Object storage

Object storage is depending on how you see it a very cloud native approach to storing data. You don’t store data on a block device and/or filesystem, you use a HTTP API. Most commonly known in the object storage field is Amazon Web Services S3 storage. There are also open source projects implementing (parts) of the S3 API to act as a drop-in replacement for AWS S3. Next to S3, there are also other object store APIs/protocols, such as OpenStack Swift, Ceph Rados and more.

In the end it boils down to what are the needs of your applications, but I would definitely keep in mind what the different storage types can offer. If you narrowed down what storage type can be used, look into the storage software market to see which “additional” possibilities each software can give you for your storage needs.

Storage in a Cloud-Native world

In a Cloud-Native world where everything is dynamic, distributed and must be resilient, it is more important than ever to have those points for the storage under your customer data. It must be highly available (all the time), resilient to failure of a server and/or application, and scale to the needs of your application(s).

This is might seem like an easy task if your are in the cloud, but even cloud have limits at a certain point. Though if you have special needs for anything in the cloud you are using, talking to your cloud provider will definitely help resolve problems. The point of talking to your cloud provider(s) is important before and while you are using them. As a an example, if you should experience problems with the platform itself or scaling issues of, let’s say their block storage, you can directly give feedback to them about it and possibly work together with them to workout a fix for the issue or another product which will be able to scale to your current and future needs.

Storage is especially problematic when it comes to scaling depending on the solution you are running/using. Assuming your application in itself can scale without issues, but the storage runs into performance issues. In most cases you can’t just add ten more storage servers and the problem will go away. “Zooming out” to persistence as a topic, one must accept that there are always certain limits.

A good example for such scalability limits is Facebook. To keep it short, Facebook at one point just “admitted” that there will always be a delay during replication of data/info. They accept that when a user from Germany updates their profile that it can/will take up to 3-5 minutes before users from e.g., USA, Seattle, will be able to see those changes.

To summarize this section: Your storage should be as Cloud-Native as your application. Talk with your cloud provider during testing and usage, keep them in the loop when you run into issues. Also don’t try to push limits which can’t be pushed right now (current state of technology).

What can Rook offer for your Kubernetes cluster?

Using normally wasted storage from the nodes your Kubernetes cluster runs on, as storage for your applications in Kubernetes.

Rook’s native Kubernetes integration

Rook is currently using the so called flexvolume driver for mounting storage into your containers.

Rook Ceph Operator

Example Cluster object to show ease of deploying a Ceph cluster in Kubernetes on the nodes of the cluster using Rook.

 


Rook is more than just Ceph

Minio, CockroachDB and NFS. A few others storage backends are work in progress.aproach

Summary

Summarize and talk a bit about Rook’s roadmap.

  • Update project governance policies #1445
  • Add Core Infrastructure Initiative (CII) Best Practices #1440
  • Integrate a more robust controller framework (e.g., CoreOS Operator SDK or Kubebuilder) #1981
  • Build and integration testing improvements
    • Increase PR quality gates (e.g., vendoring verificationlicense scanning, etc.)
    • Update promotion and release channels to align with storage provider specific statuses #1885
    • Refactor test framework and helpers to support multiple storage providers #1788
    • Isolate and parallelize storage provider testing #1218
    • Longhaul testing pipeline #1847
  • Custom resource validation, progress, status #1539
  • Design for Volume Snapshotting and policies (consider aligning with SIG-storage) #1552
  • Support for dynamic provisioning of new storage types
    • Dynamic bucket provisioning #1705
    • Dynamic database provisioning 1704
  • New storage providers
    • NFS operator and CRDs (backed by arbitrary PVs) #1551
    • Cassandra design #1910
  • CockroachDB
    • Secure deployment using certificates #1809
    • Helm chart deployment #1810
    • Run on arbitrary PVs #919
  • Minio
    • Helm chart deployment #1814
    • Run on arbitrary PVs #919
  • Ceph
    • Added support for Mimic and later versions in addition to Luminous #2004
    • Automated upgrade support (initial) #997
    • Manage an existing Ceph cluster (basic) #1868
    • OSDs
      • Run on arbitrary PVs (local storage) as an alternative to host path #796 #919
      • Minimize or eliminate running with privileged containers (e.g., reduced access to /dev) #1944
      • Disk management (adding, removing, and replacing disks) #1435
      • ceph-volume is used for provisioning #1342
    • Mgr and plugins
      • Placement group balancer support (enable the mgr module)
    • File
      • NFS Ganesha CRD #1799
      • Dynamic Volume Provisioning for CephFS #1125
    • Object
      • Multi-site configuration #1584
      • CRD for object store users #1583

How to get involved?

If you are interested in Rook, don’t hesitate to connect with the Rook community and project using the below ways.

For questions just hop on the Rook.io Slack and ask in the #general channel.

written by Alexander Trost, DevOps Engineer at Cloudical Deutschland GmbH