Overview
Intelligently and automatically scaling applications based on resource requirements is at the heart of the OpenStack value proposition. It is one of the key capabilities that differentiate cloud vs traditional infrastructure such as VMware. For those of us who have been in IT a while auto scaling was always a white unicorn, often discussed but never actually seen. In this article we will talk about how to do this in OpenStack using Heat and Ceilometer.
Heat
Orchestration and automation within OpenStack is handled by Heat. It is the brains of your cloud. Heat provides a declarative structure for defining IT processes using YAML. You could say the value of OpenStack is implemented by Heat as this is where your business processes exist. Heat will let you automatically provision infrastructure (compute, network and storage) based on YAML templates. In addition Heat also lets you create policies around running infrastructure. One such policy is around auto scaling.
Ceilometer
Collecting and persisting utilization measurements within OpenStack is handled by Ceilometer. OpenStack attempts to handle IT infrastructure as a utility and as such metering is a critical aspect. Furthermore, enabling billing systems to provide pay-as-you-go consumption models, requires exact metering. Beyond billing, such metering is also key for auto scaling. Decisions made by Heat on when to scale applications, are based on data collected by Ceilometer.
Auto Scaling Heat Templates
A lot of the configuration information in this article comes from article written by Christian Berendt. In this auto scaling example, Heat and Ceilometer will be used to scale CPU bound virtual machines. Heat has the concept of a stack which is simply the environment itself. The Heat stack template describes the process or logic around how a Heat stack will be built and managed. This is where you can create an auto scaling group and configure Ceilometer thresholds. The environment template explains how to create the stack itself, what image or volume to use, network configuration, software to install and everything an instance or instances need to properly function. You can put everything into the Heat stack template, but separating the Heat stack template from the environment is much cleaner, at least in more complex configurations such as auto scaling.
Environment Template
Below we will create an environment template for a cirros image. The instance template will create an instance based on Cirros image, configure a cinder volume, add IP from private network, add floating IP from public network, add security group, private ssh-key and generate 100% CPU load through user-data. Note: you will need to make changes below depending on your OpenStack configuration. The OpenStack installation and configuration used for these examples can be found in this article.
#vi /etc/heat/templates/cirros.yaml
heat_template_version: 2014-10-16 description: A base Cirros 0.3.4 server resources: server: type: OS::Nova::Server properties: block_device_mapping: - device_name: vda delete_on_termination: true volume_id: { get_resource: volume } flavor: m1.nano key_name: admin networks: - port: { get_resource: port } port: type: OS::Neutron::Port properties: network: private security_groups: - all floating_ip: type: OS::Neutron::FloatingIP properties: floating_network: public floating_ip_assoc: type: OS::Neutron::FloatingIPAssociation properties: floatingip_id: { get_resource: floating_ip } port_id: { get_resource: port } volume: type: OS::Cinder::Volume properties: image: 'Cirros 0.3.4' size: 1
Now that we have an environment template, we need to create a Heat resource type and link it to above file /etc/heat/templates/cirros.yaml.
#vi /root/environment.yaml
resource_registry: "OS::Nova::Server::Cirros": "file:///etc/heat/templates/cirros.yaml"
Stack Template
Below we will define our Heat stack template. We will create the following resources: scaleup_group, scaleup_policy and cpu_alarm_high. The scaleup_group explains how the instance should be scaled and also defines the environment (OS::Nova::Server::Cirros) that points to the environment yaml file “/etc/heat/templates/cirros.yaml”. The scaleup_policy defines how to handle a scale-up event. Finally we have a threshold, the cpu_alarm_high resource is used to trigger a scale-up event. Here we define the threshold and actions provided by Ceilometer.
#vi /root/example.yaml
heat_template_version: 2014-10-16 description: Example auto scale group, policy and alarm resources: scaleup_group: type: OS::Heat::AutoScalingGroup properties: cooldown: 60 desired_capacity: 1 max_size: 3 min_size: 1 resource: type: OS::Nova::Server::Cirros scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: { get_resource: scaleup_group } cooldown: 60 scaling_adjustment: 1 scaledown_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: { get_resource: scaleup_group } cooldown: 60 scaling_adjustment: -1 cpu_alarm_high: type: OS::Ceilometer::Alarm properties: meter_name: cpu_util statistic: avg period: 60 evaluation_periods: 1 threshold: 50 alarm_actions: - {get_attr: [scaleup_policy, alarm_url]} comparison_operator: gt cpu_alarm_low: type: OS::Ceilometer::Alarm properties: meter_name: cpu_util statistic: avg period: 60 evaluation_periods: 1 threshold: 10 alarm_actions: - {get_attr: [scaledown_policy, alarm_url]} comparison_operator: lt
Update Ceilometer Collection Interval
By default Ceilometer will collect CPU data from instances every 10 minutes. For this example we want to change that to 60 seconds. Change the interval to 60 in the pipeline.yaml file and restart OpenStack services.
#vi /etc/ceilometer/pipeline.yaml
- name: cpu_source interval: 60 meters: - "cpu" sinks: - cpu_sink
#openstack-service restart
Running Heat Stack
At this point we are ready to run our auto scaling Heat stack. The expected results should be that a single Cirros instance is launched. It should have private and floating IPs.
#heat stack-create example -f /root/example.yaml -e /root/environment.yaml +--------------------------------------+------------+--------------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+--------------------+----------------------+ | 6fca513c-25a1-4849-b7ab-909e37f52eca | example | CREATE_IN_PROGRESS | 2015-08-31T16:18:02Z | +--------------------------------------+------------+--------------------+----------------------+
Heat will create the stack and launch the one cirros instance.
#nova list +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+--------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+--------------------------------------+ | 3f627c84-06aa-4782-8c12-29409964cc73 | ex-qeki-3azno6me5gvm-pqmr5zd6kuhm-server-gieck7uoyrwc | ACTIVE | - | Running | private=10.10.1.156, 192.168.122.234 | +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+--------------------------------------+
Heat will also create two cpu alarms which are used to trigger scale-up or scale-down events.
ceilometer alarm-list +--------------------------------------+-------------------------------------+-------------------+----------+---------+------------+--------------------------------+------------------+ | Alarm ID | Name | State | Severity | Enabled | Continuous | Alarm condition | Time constraints | +--------------------------------------+-------------------------------------+-------------------+----------+---------+------------+--------------------------------+------------------+ | 04b4f845-f5b6-4c5a-8af0-59e03c22e6fa | example-cpu_alarm_high-rd5kysmlahvx | ok | low | True | True | cpu_util > 50.0 during 1 x 60s | None | | ac81cd81-20b3-45f9-bea4-e51f00499602 | example-cpu_alarm_low-6t65kswutupz | ok | low | True | True | cpu_util < 10.0 during 1 x 60s | None | +--------------------------------------+-------------------------------------+-------------------+----------+---------+------------+--------------------------------+------------------+
Automatically Scaling Up
Heat will scale instances up based on the cpu_alarm_high threshold. Once CPU utilization is above 50% instances will be scaled up. In order to generate CPU load, log into the instance and run the “dd” command.
$ssh -i admin.pem cirros@192.168.122.232 $sudo -i #dd if=/dev/zero of=/dev/null & #dd if=/dev/zero of=/dev/null & #dd if=/dev/zero of=/dev/null &
Upon running “dd” commands we should have close to 100% CPU utilization in our cirros instance. After 60 seconds we should see that Heat has scaled and we have two instances.
#ceilometer alarm-list +--------------------------------------+-------------------------------------+-------+----------+---------+------------+--------------------------------+------------------+ | Alarm ID | Name | State | Severity | Enabled | Continuous | Alarm condition | Time constraints | +--------------------------------------+-------------------------------------+-------+----------+---------+------------+--------------------------------+------------------+ | 04b4f845-f5b6-4c5a-8af0-59e03c22e6fa | example-cpu_alarm_high-rd5kysmlahvx | ok | low | True | True | cpu_util > 50.0 during 1 x 60s | None | | ac81cd81-20b3-45f9-bea4-e51f00499602 | example-cpu_alarm_low-6t65kswutupz | alarm | low | True | True | cpu_util < 10.0 during 1 x 60s | None | +--------------------------------------+-------------------------------------+-------+----------+---------+------------+--------------------------------+------------------+
#nova list +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+--------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+--------------------------------------+ | 3f627c84-06aa-4782-8c12-29409964cc73 | ex-qeki-3azno6me5gvm-pqmr5zd6kuhm-server-gieck7uoyrwc | ACTIVE | - | Running | private=10.10.1.156, 192.168.122.234 | | 0f69dfbe-4654-474f-9308-1b64de3f5c18 | ex-qeki-qmvor5rkptj7-krq7i66h6n7b-server-b4pk3dzjvbpi | ACTIVE | - | Running | private=10.10.1.157, 192.168.122.235 | +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+--------------------------------------+
After additional 60 seconds we should see that Heat has scaled again and we have three instances. Since three is the maximum for this configuration, we will not scale any higher.
#nova list +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+--------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+--------------------------------------+ | 3f627c84-06aa-4782-8c12-29409964cc73 | ex-qeki-3azno6me5gvm-pqmr5zd6kuhm-server-gieck7uoyrwc | ACTIVE | - | Running | private=10.10.1.156, 192.168.122.234 | | 0e805e75-aa6f-4375-b057-2c173b68f172 | ex-qeki-gajdwmu2cgm2-vckf4g2gpwis-server-r3smbhtqij76 | ACTIVE | - | Running | private=10.10.1.158, 192.168.122.236 | | 0f69dfbe-4654-474f-9308-1b64de3f5c18 | ex-qeki-qmvor5rkptj7-krq7i66h6n7b-server-b4pk3dzjvbpi | ACTIVE | - | Running | private=10.10.1.157, 192.168.122.235 | +--------------------------------------+-------------------------------------------------------+--------+------------+-------------+--------------------------------------+
Automatically Scaling Down
Heat will scale instances down based on the cpu_alarm_low threshold. Once CPU utilization is below 10% instances will be scaled down. We can simply kill the “dd” processes and watch Heat scale instances back down.
After stopping “”dd” processes we should see that the cpu_alarm_low event is triggered. This will cause Heat to scale down and remove instance.
#ceilometer alarm-list +--------------------------------------+-------------------------------------+-------+----------+---------+------------+--------------------------------+------------------+ | Alarm ID | Name | State | Severity | Enabled | Continuous | Alarm condition | Time constraints | +--------------------------------------+-------------------------------------+-------+----------+---------+------------+--------------------------------+------------------+ | 04b4f845-f5b6-4c5a-8af0-59e03c22e6fa | example-cpu_alarm_high-rd5kysmlahvx | ok | low | True | True | cpu_util > 50.0 during 1 x 60s | None | | ac81cd81-20b3-45f9-bea4-e51f00499602 | example-cpu_alarm_low-6t65kswutupz | alarm | low | True | True | cpu_util < 10.0 during 1 x 60s | None | +--------------------------------------+-------------------------------------+-------+----------+---------+------------+--------------------------------+------------------+
After a few minutes we should be back to a single instance.
Summary
In this article we talked about how Heat and Ceilometer work together within OpenStack, providing the brains behind your cloud. We looked at a typical cloud use case around auto scaling and how to configure auto scaling through Heat. Hopefully you found this article informative. As always any feedback is greatly appreciated and as we say at Red Hat, sharing is caring.
Happy Auto Scaling!
(c) 2015 Keith Tenzer
Thank you for the article. I am currently dealing with using Autoscaling in Openstack, but I have issue here. I can see in heat-engine.log that alarm is in new state, and that the autoscaling group is being adjusted by 1, but after this there is an authentication error and as a result no new virtual machine is spawned. Do you have an idea what could be wrong?
LikeLike
Did you make sure all the correct components are installed? You need Heat_CloudWatch and Heat_CFN. If you are using RDO you can follow this documentation to properly prepare OpenStack environment: https://keithtenzer.com/2016/01/04/openstack-liberty-lab-installation-and-configuration-guide/
LikeLike
Actually I was not owner of the Open stack cloud, but they moved to another Red Hat Open Stack, where the autoscaling is working using webhook. However, cpu util alarm is not working well, the alarm is still in “insufficient data” state. I configured the “cpu_alarm_low” and “cpu_alarm_high” configuration period to 60, and ceilometer interval in pipeline.yaml to 60, as you say in the article. Do you know what could be causing the problem?
LikeLike
Hi Michal,
The issue you refer to I have also experienced. It was caused by Heat CFN missing. You Need Heat, Ceilometer, Heat CloudWatch and Heat CFN in order for scaling policies to work. Let me know if this helps?
Regards,
Keith
LikeLike
Could you tell me how can I install/enable the Heat CFN, and what it actually does?
Regards,
Michal
LikeLike
Hi
I try to implement the Autoscaling using with Ceilometer. I am able to create stack with heat template and I stree the CPU untilization morethan 90% . but still scaling is not happening.
Also I see one issue Ceilometer alarm state is “insufficient data”
[root@icmcs ceilometer]# ceilometer alarm-list
+————————————–+————————————————-+——————-+———-+———+————+—————————————————————–+——————+
| Alarm ID | Name | State | Severity | Enabled | Continuous | Alarm condition | Time constraints |
+————————————–+————————————————-+——————-+———-+———+————+————————————————–
| cacbac69-14eb-49c1-b1b1-042f8e0d3e52 | demo-autoscaling1-cpu_alarm_low-imzkpfqywqvn | insufficient data | low | True | True | cpu-util 50.0 during 1 x 50s | None |
+————————————–+————————————————-+——————-+———-+———+————+————————————————–
Ceilometer cpu_util alos not capturing.
[root@icmcs ceilometer]# ceilometer statistics -m cpu_util
+——–+————–+————+—–+—–+—–+—–+——-+———-+—————-+————–+
| Period | Period Start | Period End | Max | Min | Avg | Sum | Count | Duration | Duration Start | Duration End |
+——–+————–+————+—–+—–+—–+—–+——-+———-+—————-+————–+
I appreciate help on this. thanks.
LikeLike
autoscaling Expecting to find username or userId in passwordCredentials – the server could not comply with the request since it is either malformed or otherwise incorrect
I faced above nova error..
what’s the matter?..
Help me
LikeLike
DOn’t quite have enough information to help, is the instance not starting or is this an error with cloudinit? What image are you using?
Keith
LikeLike
Is this based on liberty RDO ?
LikeLike
I did this originally on kilo but it should work in liberty and even newer releases. The only thing that might change is heat template syntax but you newer osp versions support older heat templates which is why you have template version.
LikeLike
It can take a while for ceilometer to gather data so I wonder if that is issue? You can change default polling to something more aggressive. If you look at my blog on autoscaling this is documented. Let me know how it goes?
LikeLike
Hi Keith,
I was trying to set autoscaling in RHOSP 8 (packstack based setup) i was facing issue where instance is not getting created even if the CPU load is high.
——+——————————–+——————+
| Alarm ID | Name | State | Enabled | Continuous | Alarm condition | Time constraints |
+————————————–+————————————–+——-+———+————+——————————–+——————+
| 559ab131-ad12-4d40-b68d-bf8169011eec | etisalat-cpu_alarm_high-xr4b22tok2f3 | ok | True | True | cpu_util > 60.0 during 1 x 30s | None |
| f77648f3-12f4-4e4d-b0e1-86e77872d478 | etisalat-cpu_alarm_low-uy3fjjc6mcyq | alarm | True | True | cpu_util < 30.0 during 1 x 30s | None |
+————————————–+————————————–+——-+———+————+——————————–+——————+
== Ceilometer services ==
openstack-ceilometer-api: active
openstack-ceilometer-central: active
openstack-ceilometer-compute: active
openstack-ceilometer-collector: active
openstack-ceilometer-alarm-notifier: active
openstack-ceilometer-alarm-evaluator: active
openstack-ceilometer-notification: active
== Heat services ==
openstack-heat-api: active
openstack-heat-api-cfn: active
openstack-heat-api-cloudwatch: active
openstack-heat-engine: active
Please suggest.
Thanks in advance 🙂
LikeLike