Enterprise OpenStack: RHEL OSP
Overview
As of today there are over eleven OpenStack services and more are coming. Each service has complete isolation from other services and that allows OpenStack to scale far beyond the reach of current computing platforms. However due to all these independent services, OpenStack can be very complicated to operationalize in enterprise environments.
Each service has a database or means for storing stateful data, requires load balancing and can scale at a different pace from that of other services. In addition OpenStack has over 1000 different configuration parameters, making it impossible to deploy and manage without complete automation through configuration management.
Enterprise OpenStack is about high availability, scalability, automation, configuration management, life-cycles and of course support. Choosing the right distribution is the most important decision you will ever make in regards to OpenStack. Below are some points to consider:
- Support - How much experience does company have with Open Source, Linux, Ceph and KVM?
- Vision - How complete is the OpenStack story? Does it extend to applications?
- Enterprise Features - High availability, scalability and support for diversified hardware?
- Configuration Management - Can OpenStack be managed centrally and deployment be 100% automated?
- Enterprise Management - Charge-back, governance policies, single-pane-of-glass, hybrid cloud and support for traditional platforms (VMware, Microsoft Hyper-V, RHEV)?
- Linux Containers - How will OpenStack provide infrastructure for DevOps and containerized next-gen applications?
- Lock-in - Are you free to choose your underlying hardware vendor for compute, network and storage?
I know at least one company that can talk about all of these points and that is Red Hat. In this post we will focus on the Red Hat OpenStack Platform (RHEL OSP).
Node Types
RHEL OSP defines four types of nodes: admin, controller, compute and Ceph storage node.
Admin Node
The admin node is responsible for management of an OpenStack environment. It provides configuration management through puppet and automated installation through foreman. The admin node is constantly evolving and mixes best of breed open source technologies together. The admin node will allow an OpenStack administrator to deploy OpenStack with 100% automation on bare-metal or virtual infrastructure. The admin node maintains configuration for a deployment centrally therefore you can not only deploy initial environment but you can grow and scale-out the environment in an automated manner. The admin node provides the following additional services for an OpenStack deployment:
- DHCP
- DNS
- PXE
- TFTP
These services enable automated provisioning and configuration management. The admin node can provision two types of OpenStack nodes today: controller and compute.
Controller Node
The controller node runs all OpenStack services except for Nova (compute). The Red Hat best practice is to run three controller nodes. The admin node and the RHEL OSP installer will configure a pacemaker cluster that uses corosync for the cluster network as well as fencing nodes. All OpenStack services will be individually clustered with an HA proxy for load balancing. The main reason for running three controller nodes aside from scalability and availability is fencing. Having three nodes provides clear ownership in a pacemaker cluster and as such two nodes is not a valid configuration.
The controller node runs the following OpenStack services:
- Horizon (dashboard)
- Keystone (identity)
- Nova scheduler (compute)
- Neutron (network)
- Neutron metadata (metadata - cloud-init)
- Neutron L3 Agent (L3 networking)
- Cinder (block storage)
- Swift (object storage)
- Heat (orchestration)
- Ceilometer (telemetry)
Compute Node
The compute node runs Nova (compute) and the Neutron openvswitch agent. Compute nodes do not provide any high availability, if a compute node goes down then all instances hosted on that node are down. The point of OpenStack though is to provide horizontal scaling. If you need more compute resources you simply add more compute nodes. Applications must be resilient and able to handle instances going down, this is a requirement that is often misunderstood when discussing cloud computing.
Ceph Storage Node
The Ceph storage node runs Ceph services such as the Object Storage Daemon (OSD) that provides storage to a Ceph cluster. These nodes are not at this time provisioned by the admin node and must be configured separately. The admin node will let the administrator configure Ceph as a storage back-end for Glance or Cinder but the Ceph cluster must be available. Ceph requires metadata services however and it is ideal to run these on the controller nodes but other than the Ceph metadata services nothing else Ceph related should be running on OpenStack controller nodes.
Ceph is the defacto storage for OpenStack because it meets the requirements for OpenStack storage very well. Ceph can scale far beyond other storage systems and like OpenStack it abstracts hardware allowing you to be free in choice of your hardware vendors.
Installing Admin Node
The first step to deploying enterprise OpenStack is to install and configure the admin node. The admin node must at a minimum be connected to the provisioning network. If the admin node goes down you will be unable to provision. In addition if the admin node is providing DNS or DHCP to OpenStack environment those services will be offline. Infrastructure decisions regarding the admin node require thoughtful planning.
Prepare
Install RHEL 7.1
#subscription-manager register
#subscription-manager list --available
#subscription-manager attach --pool=<pool id>
#systemctl disable NetworkManager.service
#systemctl stop NetworkManager.service
#yum remove dnsmasq
#subscription-manager repos --disable=*
#subscription-manager repos --enable=rhel-7-server-rpms
#subscription-manager repos --enable=rhel-7-server-openstack-6.0-installer-rpms
#subscription-manager repos --enable=rhel-server-rhscl-7-rpms
#yum update -y
#yum install -y rhel-osp-installer
Setup Provisioning Network
Before starting the installer ensure your network interface is setup correctly. Below is an example.
DEVICE=eth0 BOOTPROTO=none HWADDR=52:54:00:a5:fe:26 ONBOOT=yes HOTPLUG=yes TYPE=Ethernet IPADDR=192.168.122.99 NETMASK=255.255.255.0 PEERDNS=yes DNS1=192.168.122.99 DNS2=192.168.122.1 NM_CONTROLLED=no
Install and Configure
#rhel-osp-installer
Please select NIC on which you want provisioning enabled: 1. eth0 2. eth1 ? 1 Networking setup: Network interface: 'eth0' IP address: '192.168.122.99' Network mask: '255.255.255.0' Network address: '192.168.122.0' Host Gateway: '192.168.122.1' DHCP range start: '192.168.122.100' DHCP range end: '192.168.122.254' DHCP Gateway: '192.168.122.99' DNS forwarder: '192.168.122.1' Domain: 'lab.local' NTP sync host: '0.rhel.pool.ntp.org' Timezone: 'Europe/Berlin' Configure networking on this machine: ✓ Configure firewall on this machine: ✓
Ensure your DNS forwarder is a system that can provide external DNS. The DHCP gateway should be admin node itself since it provides DHCP unless of course you want to handle this externally.
How would you like to proceed?: 1. Proceed with the above values 2. Change Network interface 3. Change IP address 4. Change Network mask 5. Change Network address 6. Change Host Gateway 7. Change DHCP range start 8. Change DHCP range end 9. Change DHCP Gateway 10. Change DNS forwarder 11. Change Domain 12. Change NTP sync host 13. Change Timezone 14. Do not configure networking 15. Do not configure firewall 16. Cancel Installation 9 new value for DHCP Gateway 192.168.122.1
Change the NTP server, it is critical you provide an internal NTP server or an external one that is available as OpenStack replies heavily on NTP.
How would you like to proceed?: 1. Proceed with the above values 2. Change Network interface 3. Change IP address 4. Change Network mask 5. Change Network address 6. Change Host Gateway 7. Change DHCP range start 8. Change DHCP range end 9. Change DHCP Gateway 10. Change DNS forwarder 11. Change Domain 12. Change NTP sync host 13. Change Timezone 14. Do not configure networking 15. Do not configure firewall 16. Cancel Installation 12 Enter a list of NTP hosts, separated by commas. First in the list will be the default. clock.redhat.com
Configure client authentication SSH public key: '' Root password: '*******************************************' Please set a default root password for newly provisioned machines. If you choose not to set a password, it will be generated randomly. The password must be a minimum of 8 characters. You can also set a public ssh key which will be deployed to newly provisioned machines. How would you like to proceed?: 1. Proceed with the above values 2. Change SSH public key 3. Change Root password 4. Toggle Root password visibility 3 new value for root password ******** enter new root password again to confirm ********
Now you should configure installation media which will be used for provisioning. Note that if you don't configure it properly, host provisioning won't work until you configure installation media manually. Enter RHEL repo path: 1. Set RHEL repo path (http or https URL): http:// 2. Proceed with configuration 3. Skip this step (provisioning won't work) 1 Path: http://192.168.122.99:8120/RHEL7 Enter RHEL repo path: 1. Set RHEL repo path (http or https URL): http://192.168.122.99:8120/RHEL7 2. Proceed with configuration 3. Skip this step (provisioning won't work) 2
Enter your subscription manager credentials: 1. Subscription manager username: myuser 2. Subscription manager password: ******** 3. Comma or Space separated repositories: rhel-7-server-openstack-6.0-rpms rhel-7-server-openstack-6.0-installer-rpms rhel-7-server-rh-common-rpms 4. Subscription manager pool (recommended): mypool 5. Subscription manager proxy hostname: 6. Subscription manager proxy port: 7. Subscription manager proxy username: 8. Subscription manager proxy password: 9. Proceed with configuration 10. Skip this step (provisioning won't subscribe your machines) 9
Starting to seed provisioning data Use 'base_RedHat_7' hostgroup for provisioning Success! * Foreman is running at https://admin.lab.local Initial credentials are admin / 7wHcE3YZYHSRffmh * Foreman Proxy is running at https://admin.lab.local:8443 * Puppetmaster is running at port 8140 The full log is at /var/log/rhel-osp-installer/rhel-osp-installer.log
Configure Provisioning Media
The admin node needs to install RHEL on nodes it provisions. To do this we need to expose the RHEL install media through HTTP. Below is the process:
mount RHEL 7.1 install media in cdrom
#mkdir /RHEL7
#mount -o ro /dev/cdrom /RHEL7
#cp -dpR /RHEL7 /var/www/html/.
#chmod -R 755 /var/www/html/RHEL7
#semanage port -a -t http_port_t -p tcp 8120
vi /etc/httpd/conf.d/medium.conf
Listen 8120 NameVirtualHost *:8120 <VirtualHost *:8120> DocumentRoot /var/www/html/ ServerName 192.168.122.99 <Directory "/var/www/html/"> Options All Indexes FollowSymLinks Order allow,deny Allow from all </Directory> </VirtualHost>
#iptables -I INPUT 1 -p tcp -m multiport --ports 8120 -m comment --comment "8120 accept - medium" -j ACCEPT
#iptables-save > /etc/sysconfig/iptables
#systemctl restart network.service
#systemctl restart httpd
Host Discovery
Now that the admin node has been installed and configured we can start our OpenStack deployment. The first step is host discovery. As mentioned OpenStack nodes can run on bare-metal or Virtual Machines. In either case you must configure host for boot via PXE. At minimum the controller node should have three NICs connecting to networks for provisioning / management, external and tenant. The compute node should have minimum two NICs for provisioning / management and tenant. You also generally want an additional network for public API and of course storage. Configure host networking properly before booting.
Once hosts are booted foreman will discover them and they will show up in the RHEL OSP installer under Hosts->Discovered Hosts.
Create Networks in RHEL OSP installer under Infrastructure->Subnet. You should have at minimum three subnets (provisioning / management, external and tenant). It is also recommend to separate public and storage.
Below both hosts have been discovered based on MAC address (names can be changed later).
Create an external subnet (you need to do this for every network).
Deploying OpenStack
The first step to deploying OpenStack is to create a new deployment by going to OpenStack Installer->Deployments. Creating a new deployment is a four step process that can be observed below.
Glance requires a storage back-end, in this case NFS was chosen.
Cinder also requires a storage back-end, in this case NFS was chosen. Note NetApp as being an option. Red Hat partners such as NetApp have started integrating into the RHEL OSP installer.
Once the deployment has been created we can assign hosts. In this configuration we have one controller and compute node (the minimum setup). Below we will assign controller and compute node to deployment. Keep in mind the beauty of the installer allows you to grow the environment and scale-out more controllers or compute nodes as required. I would recommend starting small until at least the kinks are worked out (see troubleshooting section).
Once our hosts have been assigned to a deployment we can update the hosts. In this case I changed the hostname but you can also configure network, domain and realm information for every host.
Finally we are ready for deployment. Under Infrastructure->Deployments select the deployment and deploy. The progress can be followed from the UI. It typically takes around 2 hours to deploy an OpenStack environment and periodically you will want to check on the progress. Once RHEL is installed you can log into controller and compute hosts to follow progress at more granular level. The install log is located under /var/log/foreman-installer/foreman-installer.log.
Congrats you have deployed an Enterprise OpenStack environment!
Monitoring
One of the great features with the RHEL OSP installer is monitoring. You get dashboards, reports and detailed host information that allow administrators to proactively monitor their deployments. Below are a few dashboards to give you an idea of the capability.
Neutron Networking
At this point we have a running OpenStack environment. The last step is to setup OpenStack networking. In this example we will use vxlan for tenant tunneling traffic and a flat network for external access via floating IPs.
Configure internal tenant network using vxlan.
neutron net-create internal --provider:network_type vxlan
neutron subnet-create internal --name internal_subnet --allocation-pool start=10.10.1.100,end=10.10.1.200 10.10.1.0/24
Configure external flat provider network
neutron net-create external --provider:network_type flat --provider:physical_network physnet-external --router:external=True
neutron subnet-create external --name external_subnet --allocation-pool start=192.168.123.100,end=192.168.123.200 --disable-dhcp --gateway 192.168.123.1 192.168.123.0/24
Configure a OpenStack router without HA
neutron router-create prod-router --ha False
Set router gateway to our external provider network and add interface to tenant network
neutron router-gateway-set prod-router external
neutron router-interface-add prod-router internal_subnet
When complete the network topology should look something like diagram below.
Troubleshooting
OpenStack is not for the faint of heart. If you are expecting click-next and grab a coffee you are in for a rude awakening. OpenStack requires a certain level of Open Source and Linux knowledge. You should understand how puppet and foreman work, these skills are recommended for troubleshooting (especially puppet). You also need decent skills in OpenStack networking (openvswitch) in general. Just because the RHEL OSP installer automates everything doesn't mean it is autopilot.
Before we get into troubleshooting lets understand the basic workflow of a RHEL OSP deployment.
- provision nodes and install base RHEL.
- register nodes with subscription manager
- download packages from appropriate channels
- configure base networking
- puppet run on all controller nodes (openstack, openvswitch and pacemaker cluster configured)
- puppet run on all compute nodes (openstack nova and openvswitch configured)
Here are some common problems that I have run into, hopefully they are helpful.
- Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Local ip for ovs agent must be set when tunneling is enabled at
/etc/puppet/environments/production/modules/neutron/manifests/agents/ovs.pp:32 on node controller1.lab.local
Solution: this is a configuration problem. It will occur on the controller node and means that the controller does not have access to a particular network. In this case it was due to only giving the compute node access to the tenant network. This error requires re-configuring network and deployment in the admin node and starting over.
- Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node ostack-ctr1.lab.local: Failed to find ostack-ctr1.lab.local via exec: Execution of '/etc/puppet/node.rb ostack-ctr1.lab.local' returned 1:
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Solution: This error was caused mainly by a bug fixed in RHEL OSP 6.0.1. Ensure you are running RHEL 6.0.1, RHEL 7.1 and have performed a yum update. This error requires re-install of the the admin node itself to update to latest version.
- Error: Deployment completes but br-ex is missing.
Solution: the br-ex is an openvswitch bridge that handles external access for instances via floating IPs. If br-ex is missing you wont have ability to assign floating IPs to instances for external access. This could be a configuration problem in deployment. Edit the controller host and ensure the external network has the correct subnet.
In order to create br-ex openvswitch bridge manually follow these steps
- Ensure physical NIC in this case eth1 is configured correctly
#cat /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 DEVICETYPE=ovs TYPE=OVSPort OVS_BRIDGE=br-ex ONBOOT=yes BOOTPROTO=none
- Ensure br-ex interface for openvswitch is configured correctly
#cat /etc/sysconfig/network-scripts/ifcfg-br-ex IPADDR="192.168.123.43" NETMASK="255.255.255.0" GATEWAY="192.168.123.1" ONBOOT=yes PEERROUTES=no NM_CONTROLLED=no DEFROUTE=no PEERDNS=no DEVICE=br-ex DEVICETYPE=ovs OVSBOOTPROTO="none" TYPE=OVSBridge
- Create br-ex bridge
#ovs-vsctl add-br br-ex
- Add physical NIC to bridge as internal openvswitch port
#ovs-vsctl add-port br-ex eth1
- Create a patch port on br-int that patches over to br-ex
#ovs-vsctl add-port br-int br-int-ex -- set Interface br-int-ex type=patch options:peer=phys-br-ex
- Create a openvswitch patch port on br-ex that patches over to br-int
#ovs-vsctl add-port br-ex phy-br-ex -- set Interface phy-br-ex type=patch options:peer=br-int-ex
Bridge br-ex Port "eth1" Interface "eth1" Port phy-br-ex Interface phy-br-ex type: patch options: {peer=br-int-ex} Port br-ex Interface br-ex type: internal
- Error: registering host with foreman (https://admin.osp.lab.com) could not send facts to foreman: connection refused - connect(2)
Solution: this error occurs during discovery of hosts by foreman. It indicates a DNS or firewall problem. RHEL OSP 6.0.1 fixed a lot of problems with discovery but it is important to ensure DNS and DHCP are working correctly from admin node.
- Error: Openvswitch interfaces show down when issuing "ip a" command
Solution: this is not generally a problem. Openvswitch creates interfaces for every bridge for Linux compatibility reasons. They are not otherwise used. The kernel does not recognize these devices correctly and hence sees them as down interfaces.
- Error: Puppet throws error no certificate found and waitforcert is disabled
Solution: this problem is fairly uncommon however if it happens you need to regenerate certificates. Below is the process.
On puppet master (admin node)
#puppet cert sign --all
#puppet cert clean --all
On puppet agent (controller or compute)
#rm -rf /var/lib/puppet/ssl/*
vi /etc/puppet/puppet.conf certificate_revocation = false
#puppet agent --no-daemonize --server admin.local.lab --onetime --verbose
On puppet master (admin node)
#puppet cert --list
#puppet cert sign "controller or compute hostname"
Restarting Puppet
In case puppet fails for any reason you need to run the following command on the controller or compute node where the error occurred in order to restart puppet
#puppet agent -td
Summary
Enterprise OpenStack is not just about support it is so much more. As we have seen to operationalize OpenStack in enterprise environments we require automation, provisioning, central management / monitoring, declarative configuration management, Linux experience / expertise and of course world class support from a premier Open Source company like Red Hat. The biggest difference between OpenStack distributions and if you will feature disparity lies within the admin node. OpenStack is OpenStack but how it is deployed and maintained will determine your success. This is just the beginning, if you like the current capabilities you are going to love what is coming down the pipe with Triple-O (OpenStack on OpenStack) and over / under clouds. Things are just getting started!
As always if you have feedback, ideas or suggestions I would love to hear about them.
Happy Stacking!
(c) 2015 Keith Tenzer