Building A Niche Cloud: A Pragmatic Approach
source: https://innolectinc.com/how-smart-is-your-team/teamwork-ants-building-a-house/
Overview
Before getting started you might want to read about the birth of the niche cloud in the first part of this two part series.
We have all heard the saying, you cannot teach an old dog new tricks. Yes that is true, but we aren't thankfully dogs. Learning to do something new of course, requires an open mind-set and a desire for change. Many organizations are getting left out of digital disruption these days because they keep falling back on old outdated ideas, behaviors and habits. Our minds are so full, so occupied and so tired we simply cannot grasp or don't have the energy for anything new. We spend our time applying what is new to what we know, which is old.
In this article lets reset our minds and look at an approach to build a niche cloud from the ground up. Instead of pealing back the onion we will apply layer after layer until we have the onion itself. Of course I realize there is a lot more and this article is just scratching at the surface, nevertheless it is an approach, a basic rule-set and guideline for getting started.
Define your Cloud Layers
Each layer must be independent of one another and all layers should be abstracted away behind our onion. When imaging a cloud, the first things is imagine the user and imagine him or her seeing the onion. Rule #1: The user doesn't see any layers, just the onion, but there are many, many layers.
Layers are necessary to abstract complexity. If you achieve a high enough abstraction level, you have essentially created a cloud, congratulations it really is that easy and yet also so complex thus I really think simplexity is a great way to understand the goal.
Every layer obviously requires automation. Ideally you should settle on an automation language and build a platform around your automation. Writing a script, playbook, recipe, package, whatever is not automation. It also isn't automation if individuals in different groups can perform their tasks automatically. Rule #2: Automation is about abstracting complexity, process, individual domain knowledge and making it standard or easily consumable across the cloud. Automation is important at every layer. It should be used to construct the layers, one by one but it also plays a key role at the business process layer and customer portal layer. This is the area where you will want to invest the most time and create the most value for your customers. As such all tasks or anything running on the cloud, must be implemented in the Automation layer.
Below is a diagram illustrating the key cloud layers. The entry point for customers is the portal. Customer are of course developers, users, anyone that will consume a service.
Automation Layer
Responsible for deployment of layers and tooling used within them. Each layer should be deployed individually with no coupling to other layers. Layers should be orchestrated into overall workflow, enabling end-to-end cloud deployment.
Infrastructure Layer
Provides an abstraction on baremetal. Abstracts physical compute, storage and network resources though it is possible to move those abstractions to another layer, for example the PaaS layer.
PaaS Layer
Provides abstraction for services, applications and their runtimes. Ensures applications are portable and decoupled from infrastructure. PaaS layer provides consistent application layer that is independent of individual clouds, can and may actually span clouds or cloud constructs.
Management Layer
Provides patching, security vulnerability detection, governance, chargeback, automation platform, analytics and more across cloud(s). Management tooling should be cloud agnostic and not coupled to underlying layers. Management layer should provide interfaces for business process layer.
Business Process Layer
Integration with backend systems, responsible for defining services, logic, billing and other such rule sets. Responsible for handling customer accounts and customer related activities.
Customer Portal Layer
Interface and API exposed to end customer. Abstraction of all complexity to provide a service which is simple to understand and consume. This is all about customer experience.
Products and Technologies
After a discussion around the various layer and their decoupling it is important to decide on technology and products. This is obviously where most people want to start but doing so without understanding the layers and having a strategy around automation, platform capabilities and a technology roadmap would be a major mistake. Rule #3: Don't lead with technology and don't just choose technology vendors solely because they exist or you have good experience with them. You should be building your Cloud from the ground-up. Choose your technology based on how it best compliments the required capabilities and of course the desired outcomes. Envision where you want to be several years from now.
I am going to provide examples using Red Hat products and technologies to build the layers and show a Red Hat approach. I think it provides a good example for how to create a technology map and map technologies to the various layers. Rule #4: Technology and products can and should be replaced down the road, don't get attached to them.
Automation Layer
This is the most important to get right. I think it is a no-brainer actually. Ansible provides a simple, easy language for automation. Ansible Tower provides a powerful platform around Ansible that among other things, gives Ansible an API. Using Ansible Tower would allow automation to be built, consumed and reused among various teams. Another thing I really like about Ansible is it is simple, everyone can take part, understand it and there are no heroics needed to operate. Rule #5: Anything that is dependent on individuals and heroes should be re-designed and replaced with something standard that is not.
Infrastructure Layer
This layer is responsible for the foundation of your Cloud. It runs on metal and needs to provide the same capabilities you would get from Amazon, Google or Microsoft. OpenStack is a no-brainer but I also think Red Hat OpenStack is a no-brainer. Red Hat OpenStack uses Director which builds an undercloud, manages metal and deploy, manages, upgrades the OpenStack overcloud. Director is built around Ironic and TripleO (OpenStack on OpenStack). Why require separate tools that need to be learned when you simply can use OpenStack to deploy OpenStack. Another critical piece is upgrades. Red Hat offers not only version to version but a fast-forward upgrade that lets you skip versions. Red Hat also provides support for 5 years on a single release. This layer needs to be rock solid and the Achilles heal of OpenStack is upgrade and lifecycle management. The reason you go with OpenStack is to abstract technology and allow you to change technologies without impacting the APIs. OpenStack is maybe the one thing you likely won't want to change, if done right and as such having a long support lifecycle is crucial.
PaaS Layer
While there are many, many options, I think the industry has settled on Kubernetes. Besides Google, who announced the project, Red Hat is the only company there since the beginning. Everyone is obviously jumping on the bandwagon and for good reason, every application should and will eventually run in containers. Portability of applications and a much, much faster, more streamlined release cycle alone is driving container adoption. OpenShift is by far the most advanced and enterprise ready Kubernetes. OpenShift is also a platform and this is why it makes a great choice. Kubernetes is just an orchestration layer, you need to build an ecosystem around it (container registry, sdn, security, governance, CI/CD, IaaS plugins and much more).
Management Layer
This layer is a little less straight forward and there is more room to come to other technology decision points. I think Ansible Tower is a given to provide Ansible-as-a-Platform. Other capabilities needed are patching, vulnerability detection, monitoring, governance and predictive systems management, going in the direction of AI. Clearly this is just a starting point and a lot more will fall into this layer.
Satellite would be used for patching of Red Hat Enterprise Linux and potentially other RPM based Linux distros. If you have windows you will likely need additional patching tooling there. I am not a fan of single tools that do everything in this area. I would go with the best tools for the Operating Systems you want to offer and abstract them away through Ansible primitives.
Monitoring and governance could be provided by CloudForms. It also supports other Clouds so if you are a multi-cloud organization, this can provide a layer to manage those various clouds. Governance is about policies. You need a rule-set and if those rules are violated then at a minimum people need to be notified. In addition taking action, shutting off non-compliant systems is also something to consider, depends on requirements really.
For monitoring you really need something predictive that is going in the direction of an AI. What you know won't kill you, it is what you don't. Insights looks at security, configuration, even at platform level against a rule-set that is generated using an AI approach. The rule-sets come from support cases, knowledge base articles and common best practice at Red Hat. The intelligence is these rules are constantly enhanced, created and updated. Therefore if another customer hits an issue, you wouldn't have to as a rule could be triggered that could warn ahead of time based on an issue someone else experienced. Pretty cool right?
How does this all come together at consumer level? Think about all the services and additional capabilities you could provide that are of value besides giving someone a dumb VM connected to the latest RPMs, like generic public cloud. You could provide lifecycle management, patch stages and so on. Governance could be used to enforce specific requirements in a standardized way. You could start to tailor your requirements to fit that of a niche cloud. Insights could be provided as a higher level service, maybe even a higher SLA level. There are many, many possibilities that go well beyond just being boring and doing what public cloud does.
Business Process Layer
This is your key value add. How you do billing, what processes you are able to expose, capabilities and features to enable. It is basically the logic for what gets exposed through the customer portal. Likely you will need something more sophisticated than Ansible. Business rules are also typically understood by business analysts not developers. Red Hat Decision Maker is based on drools and allows business analysts to implement rule-sets as example without needing to change code or be programmers. Think about how you offer services and what they cost? This is quite dynamic and you definitely need something that business people can change, tweak and understand.
Customer Portal Layer
Finally we are at the onion. This is what the customer sees. This layer should not have much logic, the intelligence should be in the business process layer. The portal is responsible for allowing customers to consume services that are offered. It is your store front and one that always has enough supply to satisfy every customer. It provides such a good abstraction that most think it is just magic and cannot begin to understand what they are seeing or experiencing. If only the knew what mess lies below? You may think these are lofty or impossible goals, but if you believe that, just don't start, it won't work is my advice. You can always be a consumer and have someone else do it for you, no shame in that.
Where to get started with customer portal? How about an innovation lab?
Building your customer portal is a great opportunity to work as well as learn from some of the best and brightest opensource developers in the world. The idea is similar to cooking. Sure you can get a recipe, even watch someone cook on YouTube. Following such recipes, probably will result in something decent or even good. But, what if you send yourself or team to a five star kitchen and cook with master chefs? What do you think would happen when your got back to your kitchen?
I will tell you. You will cook quite differently. You will change your tools, approach. You may even make changes to the kitchen itself. What would long term results produce? Which method would lead to them cooking better meals? I think it's obvious.
It is the same thing with software development and coding. Red Hat offers something unique here and I think it is perfect for building a cloud portal prototype while learning with Red Hat's top engineers and software developers at the core of opensource innovation. If this sounds interesting here is a lightning talk on the subject from Jeremy Brown (https://www.youtube.com/watch?v=9BVaTx_RJ9U).
The next diagram shows the key layers and technologies used in those layers.
The key to remember is each of these layers, just like the onion are independent and not coupled. Rule # 6: Building loosely coupled layers is key to future cloud longevity.
Building the Niche Cloud Layer by Layer
What I am going to provide is some basic guides to get going using Red Hat products and technologies. This should serve as a prototype or demo. By no means is this meant to be anything more than allowing you to get your feet wet and understand the concepts above.
Infrastructure Layer
For getting started purposes I recommend getting a Hetzner Root Server. These are physical servers in the cloud that are very inexpensive, Hetzner is also what I refer to as a niche cloud. They provide metal-as-a-service and a market place where you can auction or purchase used hardware on monthly basis, hence why it is so inexpensive. It is great for prototyping, demo or conceptual work.
To build OpenStack layer using RDO on RHEL or CentOS follow my blog here.
PaaS Layer
The PaaS can either be built using OpenShift Community (OKD) or OpenShift Enterprise. I have provided Ansible playbooks and documentation for deploying OpenShift on OpenStack in GitHub.
Management Layer
The management layer consists of several technologies and products. I have also provided playbooks and documentation to deploy Red Hat Satellite and Ansible Tower on OpenStack in GitHub. I have not documented Red Hat CloudForms because an image exists for OpenStack. This is pretty easy to get going using documentation provided by Red Hat.
Business Process Layer
The business process layer is of course highly specific. Nevertheless working with some of my colleagues, we have provided some use cases to help with business process layer. The use cases are implemented in Ansible. They can of course also be consumed via API from Ansible Tower. Below are some of the use cases.
Order Infrastructure Project with Quota
Order Instance
Order Application or Service
Order Database
Customer Portal Layer
Hopefully you like the idea of running an innovation lab to build prototype for your cloud portal. CloudForms could initially also be used to provided a generic portal. You definitely want to build your own but this could at least provide value in the beginning to get up and running fast while you take time to build a real portal. Below is example of the CloudForms customer portal.
Summary
In this article we discussed how to approach building your own niche cloud. The importance of decoupled layers. How the layers build upon one another. We discussed technology and products that could be used to build the layers from a Red Hat perspective. Finally guidelines and ideas were provided to help get you started in building your niche cloud. Below are the important rules worth repeating, that were mentioned throughout this article.
- Rule #1: The user doesn't see any layers, just the onion, but there are many, many layers.
- Rule #2: Automation is about abstracting complexity, process, individual domain knowledge and making it standard or easily consumable across the cloud.
- Rule #3: Don't lead with technology and don't just choose technology vendors solely because they exist or you have good experience with them.
- Rule #4: Technology and products can and should be replaced down the road, don't get attached to them.
- Rule #5: Anything that is dependent on individuals and heroes should be re-designed and replaced with something standard that is not.
- Rule # 6: Building loosely coupled layers is key to future cloud longevity.
Happy Niche Clouding!
(c) 2018 Keith Tenzer