The following article describes an example of how a real-world “As A Service”-infrastructure was designed and deployed from scratch. It covers different aspects of running multiple assets in form of service offerings, e.g. virtual machines, containers or microservices. Therefore, EWERK describes their journey from a blank page to production and how they solved challenges like scalability, reliability and performance. This is meant for everyone who wants to understand platform design principles and what to consider before getting started as well as after take-off.
Infrastructure as a Service
The world is changing at a pace that has never been seen before. To keep up, companies should always think of ways to adapt to those changes by optimizing their processes, products and services. This also includes the creative art of forging new services and products to help customers solving challenges in their daily business.
One way to do this is by providing ways to “consume” assets more efficiently “as a Service” – no matter if that’s a server, a service or software. The following article will cover “Infrastructure as a Service” (IaaS) and how the Leipzig based service provider “EWERK” leverages state-of-the-art technologies to thrive and disrupt.
IaaS is a concept of cloud computing that delivers compute-, storage-, and networking resources. It refers to a combination of hosting, hardware provisioning and the essential services needed to run a cloud-like platform at the premises of choice. For a customer the platform should “feel” like a cloud even though this does not necessarily mean to run services in a public cloud like AWS, Azure or Google.
By consuming infrastructure-services you can quickly scale resources depending on the constantly changing demands whereas you only get charged for the resources you actually used – this concept is called “pay as you go” and helps you keeping track of your costs while providing a highly customizable environment to work with.
While service providers like EWERK manage the underlying infrastructure, the user installs, configures, and manages software including applications, middleware, and operating systems.
Companies’ reasons for choosing an IaaS environment are different, depending on the size of the organization and the industry of the company. Cost reduction often is a key driver, but there is more.
Time to market is getting more important due to a growing number of competitors and possible disruption from players that have not been in the game before (and therefore are unknown and unpredictable). That makes flexibility and cost management one of the key differentiators between failure and success.
Another challenge is pooling of skill and resources – the number of highly educated people is finite while the number of potential employers is increasing. Demands are also changing which raises the demand for professionals even higher. We are all constraint by 24 hours a day – the logical consequence is to enable the people to make the most out of their working day. Using IaaS-offerings free up resources on the customer side because they don’t need to bother with infrastructure components such as racks, power, cables and so on. They consume a well-prepared environment for their very special purposes leaving the maintenance tasks to the IaaS-provider which helps them to focus on adding value to their customers rather than maintaining hardware.
EWERK is a leading IT Service Provider based in Germany with more than 25 years of experience and over 500 clients across different industries (energy, mobility, healthcare, education and public sector) trusting in their project management and consulting expertise. They support their clients in sustainable growth – e.g. through more efficient digitized processes, smart online portals or interactive brands.
The target group is highly specialized regarding requirements for security, compliance, performance as well as automation and usability – therefore EWERK needs to explore new and innovative paths in order to design, operate and provide new types of “cloud-like” service offerings.
On this path they created a solution stack to address the challenges that came up from the field and their engineers – this stack involves the infrastructure services mentioned earlier as well as platform services based on container applications (especially Kubernetes provided by RedHat OpenShift – https://www.openshift.com).
Running on top of commodity servers Apache CloudStack (http://cloudstack.apache.org) builds the foundation of the IaaS-platform. Everything that is related to data management and SLAs in the storage backend is covered by an HCI-solution from NetApp (https://www.netapp.com, fig. 1).
The solution provided by EWERK enables their customers to achieve their business goals while providing the expertise to support every single phase of a project. They focus on building strong partnerships within the region and are working on supporting KRITIS-agnostic workloads.
Figure 1: current solution
Apache CloudStack is an open source software designed to deploy and manage large amounts of virtual machines as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. CloudStack is used by a wide number of service providers to offer public cloud services and by many companies to provide an on-premises (private) cloud offering. It is also often used as part of a hybrid cloud solution).
Using CloudStack, EWERK is able to address (almost) all customer-specific workloads. Different Hypervisors can be used – e.g. KVM, Xen or VMware. You can then create different isolated tenants to whom resources (like CPUs, Memory, Storage) are assigned. Think of it as a pool out of which assets get provisioned by the tenant / customer.
EWERK supports and develops significantly within the CloudStack community. Open source software provides various development options to tailor the software to specific needs. Another important point is the flexibility in the storage backend and its scaling possibilities. CloudStack includes a worldwide active community that supports and controls the progress of the development. This also includes the integration of plug-ins to be used with third-party vendors like NetApp. Using the APIs provided by the different solutions accelerates processes within the whole ecosystem (e.g. providing storage without the need to manage the storage at all). Datastores that need to be provided can be provisioned by the IaaS-software itself – this again frees up resources and time to focus on more important tasks.
EWERK is constantly developing the CloudStack project and contributes to the open source community. The primary purpose is to further tailor the software to their specific needs while giving something back to the community. Due to these efforts the Lead Cloud Architect of EWERK – Sven Vogel – has recently become VP and Chairman of the Apache CloudStack project.
As mentioned before, CloudStack is great to build a foundation for virtual machines and almost every use-case you might think of. However, there are still some areas uncovered: what if one wants to simply provide applications rather than installing virtual machines on his or her own? In the past there were more and more container workloads emerging which somehow made VMs and servers obsolete (from an end-user perspective). Of course, this does not mean that there are no servers anymore behind the curtain. It simply means, that the way we interact with services has partially changed from bare metal to full VM to an API or a microservice.
OpenShift is a solution that provides a so called “Platform as a service” (PaaS) developed by RedHat. It adopts Kubernetes to offer a platform on which you can run container workloads and microservice deployments. The application (or service) itself gets packed into container images rather than deployed on full VMs. Doing so reduces the overall overhead and speeds up the process of deployments using microservices.
OpenShift itself uses master nodes and worker nodes to build the container environment – master nodes take care of metadata and worker nodes run the actual containers. This implies, that behind the scenes there are obviously still servers running – and that’s where CloudStack plays an essential role to provide those still needed virtual machines.
Since every customer gets its own isolated tenant, you also get a fully isolated container environment, where you can manage your own user accounts and grant access based on your architecture requirements. By providing a DevOps-like approach, the application development process is accelerated – for those with specific needs even with a dedicated and private registry to store modified and optimized container images. Also included in the overall experience is a monitoring solution, so all issue is reported immediately.
Beside the container deployment itself, storage is (as always) a topic to be mentioned – and that’s where Trident helps.
Usually data in containers is not meant to be persistent. Whenever a container is deleted or lost, all its data is gone. However, this is not useful in every use case – there are a bunch of applications that we would like to store information in a persistent state so that we can reuse the data in a later deployment.
“Trident” (https://github.com/NetApp/trident) is an open source storage orchestrator for Kubernetes (and OpenShift) provided by NetApp – it is using the container storage interface (CSI) API (https://github.com/container-storage-interface/spec/blob/master/spec.md) to provide persistent volumes within container environments.
Every OpenShift cluster gets its own trident instance – it comes with pods running as ReplicaSet on the master nodes (controller server) as well as a DaemonSet on the worker nodes (node server). Having trident running in the cluster the administrator can create and provide storage classes to build different service levels – e.g. class “gold” with a certain amount of IOPS or “silver” with a lower value for IOPs and so on. Those storage classes can then be used for deployments in which the “user” (software developer, application owner, …) claims a persistent volume for his or her application(s). The PVC (persistent volume claim) triggers trident to create a persistent volume (if not yet existing) and mounts it to the requesting pod.
However, simply providing storage is eventually not solving all challenges in a container-based environment. What about creating backups (snapshots) of your data yourself or cloning volumes to reuse the same data for parallel deployments? Well – guess what – trident can do this already! You don’t need to ask the administrator or someone else – just use the deployments you already are aware of. With the introduction of CSI, Trident started to bring its own CRDs (Custom Resource Definitions) to Kubernetes to add even more capabilities in form of self-service and reliability. Using those functionalities one can create on-demand snapshots of every trident-managed persistent volume (PV) using YAML-manifests and Kubernetes binaries (like kubectl). It is also possible to clone a volume – either from real-time data or a previously created snapshots – to get identical copies of the data at a given point in time with no additional storage space claimed! Imagine AI pipelines or databases where you need instant real-time copies of your data to parallelize deployments and trainings (the use cases for snapshots and cloning are sort of unlimited).
Figure 2: Trident Storage class example
Sidenote: Trident is going to be comprehensively described within the next cloud report (Q4 – 2020) elaborating on topics like the general architecture, potential use-cases, upcoming features and its distinction to similar applications.
HCI / Solidfire
One of the most important challenges to solve within “as a Service” environment is to meet SLAs and performance guarantees. When you “guarantee” – and therefore charge certain performance and availability – your customers might want you to proof your claims.
As a consequence of that you should serve what you provide – and this is where NetApp HCI comes into play. It is built to provide resources under any given circumstances – even if an SSD fails or a node gets lost. Performance (and therefore SLAs) is always guaranteed. The platform does this by running an operating system called “Element OS” which is able to pool resources like IOPS or capacity and divide those between the workloads on the system.
By adding linear scalability, it also provides a way to grow with the requirements on top of the provided platforms. Whenever you run low on either capacity or performance you can simply add nodes to grow the resource pools based on your actual demands.
CloudStack as the layer above integrated into the NetApp APIs which means that by installing a plugin you can create and manage persistent volumes from the CloudStack interface without touching the underlying environment (HCI) at all. This includes the step of adding quality of service to the volumes created (e.g. min. / max. IOPS).
Doing so reduces the time spent on maintaining less systems and being faster to production with automation and integration.
Since there are sometimes requirements to mirror data between datacenters, the solution also provides synchronous and/or asynchronous data replication. Platforms, that are meant to be always-on with the highest availability SLAs can be spread across three datacenters so that there is always one site that can fail at any time without disrupting the production.
By using different state-of-the-art technologies EWERK designed a solution to provide IaaS- and PaaS-offerings to their respective customers. They combine open-source software with enterprise-grade storage solutions to get the most out of every component necessary to run the platform.
As a consequence, they dramatically reduce the resources needed to maintain the whole stack and accelerated their time-to-market by also adopting paradigms like automation.
Every single challenge they had before building the platform is solved by the mentioned components with the potential to scale up to infinity – and beyond.
Sven Vogel is the Lead Cloud Solution Architect at EWERK and an Apache CloudStack committer. With 10+ years of experience in infrastructure projects he knows how to provide platforms to a variety of customers.
Contact: mail firstname.lastname@example.org or via phone 0341/4264999
Timo Oswald is a cloud-minded Enterprise Account Manager with the imagination and will to move mountains. As Enterprise Account Manager, he is proud and passionate about supporting customers to innovate their business and better serve their customers every single day. He believes in adopting new technologies like Kubernetes, Ansible, CloudStack and Cloud Native Designs to understand and solve actual challenges of today.
Contact: mail Timo.Oswald@netapp.com or phone +49 151 12055885
Steve Guhr is a Solutions Engineer from NetApp who was growing up building datacenters. Used to be an infrastructure guy for 10 years he started digging into DevOps and AI topics “by accident”. The daily business is about to change dramatically and since data is getting more and more important everybody needs to adopt new technologies. That’s why he decided to work with containers, automation and neural networks rather than building racks and virtual servers.
Contact: mail: email@example.com phone +49 151 12055679 LinkedIn: https://www.linkedin.com/in/steveguhr/