Pacemaker – The Open Source, High Availability Cluster

Overview

Pacemaker is a Open Source, High Availability cluster. Pacemaker and in general Linux clustering have been around for a very long time. Both matured greatly over the past 10 to 15 years.Today Pacemaker is a very simple streamlined bundle that includes the Pacemker clustering, fencing agents, resource agents and the heartbeat (corrosync). Pacemaker is available in Red Hat Enterprise Linux 7 as the High Availability option. In this article I will provide an overview of Pacemaker and a tutorial on how to setup a two-node pacemaker cluster for Apache using shared storage.

fence-example-fc

Pacemaker Basics

As mentioned Pacemaker has a few components: clustering, fence agents, resource agents and corrosync.

Clustering

Pacemaker provides all the packages to configure and manage a high availability cluster using the CLI or GUI.

Fence Agents

Fencing is about shutting off a node that has become unstable or unresponsive so that it cannot damage the cluster or any cluster resources. The main reason we have fencing is to illiminate the possibility of a split-brain where multiple nodes access resources at same time. A split-brain can lead to data corruption and general cluster malfunction. There are two types of fencing agents in Pacemaker: power and storage. The most commonly used fencing agent is power. These agents connect to hardware such as UPS, blade chasis, iLO cards, etc and are responsible for fencing a node in the event that it becomes unresponsive. The other type of fencing is storage based. Typically storage-based fencing uses SCSI-3 PR (Persistent Reservation) that ensures only one node can ever write or access storage at time. This requires of course that the shared storage is used and that it supports SCSI-3 PR. The daemon or service responsible for fencing in Pacemaker is stonith (shoot the other node in the head).

Open source clustering in regards to fencing design differs slightly from commercial clustering. Open source has always taken a very conservative approach and IMHO that is a good thing. The last thing you want it data corruption. If fencing does not work Pacemaker will make the entire cluster unavailable. This means manual intervention will be required to bring resources online but your data is safe. Commercial solutions have very elaborate and complex fencing proceadures and try to always ensure failover is automated even if a problem occurs. I am not saying commercial software isn’t bullet-proof, just that there is a design difference in this regard.

Resource Agents

Pacemaker resource agents are packages that integrate applications. Resource agents understand a specific application and it’s dependencies. As such using resource agent (if one exists) makes configuration of applications much simpler. It also ensures that best practice around clustering for given application are enforced.

Corrosync

Pacemaker requires a heartbeat for internal communications . The corrosync daemon provides inter-cluster communications between cluster nodes. It is also responsible for quorum if a quorum is used.

Pacemaker Tutortial

Now that we have a high-level understanding of Pacemaker it is time to get our hands a bit dirty and configure a cluster. For this tutorial we will use a simple example of a two-node cluster for Apache. We will configure the cluster, setup storage-based fencing and configure a resource group for Apache.

Install Pacemaker

Perform following steps on both cluster nodes

  • Install RHEL / CentOS 7.1 (minimal)
  • Configure subscription and repos (RHEL 7)
#subscription-manager register
#subscription-manager list --available
#subscription-manager attach --pool=<pool id>
#subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms
  • Install Pacemaker packages
#yum update -y
#yum install -y pcs fence-agents-all
  • Open firewall ports
#firewall-cmd --permanent --add-service=high-availability
#firewall-cmd --reload
  • Set hacluster password
#echo CHANGEME | passwd --stdin hacluster
  • Enable services
#systemctl start pcsd.service
#systemctl enable pcsd.service
  • Configure ISCSI client
#yum install -y iscsi-initiator-utils
#vi /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2015-06.com.lab:<hostname>

Setup Shared ISCSI Storage

These steps are optional if you have ISCSI storage already configured. In these steps we will configure a third RHEL / CentOS system to provide shared storage using ISCSI.

  • Install RHEL / CentOS 7.1 (minimal)
  • Install ISCSI packages
#yum install -y targetcli
  • Enable ISCSI service
#systemctl enable target
  • Create LVM disk (this will be the shared storage device)
#fdisk /dev/vdb (create new partition of type LVM)
#pvcreate /dev/vdb1
#vgcreate cluster_vg /dev/vdb1
#lvcreate -L 1G cluster_vg -n cluster_disk1
#lvcreate -L 990M cluster_vg -n cluster_disk1
#mkfs -t ext4 /dev/cluster_vg/cluster_disk1
  • Configure ISCSI target
# targetcli
/> backstores/block create disk1 /dev/cluster_vg/cluster_disk1
/> iscsi/ create iqn.2015-06.com.lab:rhel7
/> /iscsi/iqn.2015-06.com.lab:rhel7/tpg1/portals/ create
/> iscsi/iqn.2015-06.com.lab:rhel7/tpg1/luns create /backstores/block/disk1
/> iscsi/iqn.2015-06.com.lab:rhel7/tpg1/acls create iqn.2015-06.com.lab:pm-node1
/> iscsi/iqn.2015-06.com.lab:rhel7/tpg1/acls create iqn.2015-06.com.lab:pm-node2
/> exit
  • Open firewall ports
#firewall-cmd --permanent --add-port=3260/tcp
#firewall-cmd --reload

Create Cluster

At this point both cluster nodes have all the cluster packages and have access to shared ISCSI storage. In this section we will configure the cluster on one of the nodes.

  • Authorize cluster nodes
#pcs cluster auth pm-node1.lab.com pm-node2.lab.com
Username: hacluster
Password:
pm-node1.lab.com: Authorized
pm-node2.lab.com: Authorized
  • Setup the cluster
#pcs cluster setup --start --name mycluster pm-node1.lab.com pm-node2.lab.com
  • Enable services
#pcs cluster enable --all
  • Check cluster status
# pcs cluster status
Cluster Status:
Last updated: Fri Jun 19 14:10:24 2015
Last change: Fri Jun 19 14:09:15 2015
Stack: corosync
Current DC: pm-node1.lab.com (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
0 Resources configured
PCSD Status:
pm-node1.lab.com: Online
pm-node2.lab.com: Online

Storage Fencing

Before creating resource groups fencing needs to be configured. In this example we will use storage fencing and fence_scsi. Fencing is to be configured on one of the nodes.

  • Configure stonith
#pcs stonith create scsi fence_scsi pcmk_host_list="pm-node1.lab.com pm-node2.lab.com" pcmk_monitor_action="metadata" pcmk_reboot_action="off"devices="/dev/mapper/cluster_vg-disk1" meta provides="unfencing"
  • Check status of fencing
#pcs stonith show
 scsi (stonith:fence_scsi): Started

Resource Group

Now that the cluster is running and fencing has been configured we can setup the resource group. A resource group defines application dependencies and ensures application is started correctly in the event of a failover. A resource group is to be configured on one of the nodes.

  • Install application packages (Apache)
#yum install -y httpd wget
  • Open firewall ports
#firewall-cmd --permanent --add-service=http
#fireall-cmd --reload
  • Configure Apache
#vi /etc/httpd/conf/httpd.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
  • Mount shared storage
# mount /dev/cluster_vg /disk1 /var/www/
# mkdir /var/www/html
# mkdir /var/www/cgi-bin
# mkdir /var/www/error
# restorecon -R /var/www
# cat <<-END>/var/www/html /index.html
<html>
<body>Hello</body>
</html>
END
#umount /var/www
  • Configure LVM so it only starts volumes not owned by the cluster

Note: it is important to not allow LVM to start the cluster owned volume group, in this case cluster_vg.

#vi /etc/lvm/lvm.conf
volume_list = [ "rhel" ]
use_lvmetad=0
  • Ensure boot image does not try and control cluster volume
#dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
  • Create resource for LVM disk
#pcs resource create disk1 LVM volgrpname= cluster_vg exclusive= true --group apachegroup
  • Create resource for filesystem
#pcs resource create apache_fs Filesystem device="/dev/cluster_vg/disk1" directory="/var/www" fstype="ext4 " --group apachegroup
  • Create resource for virtual IP address
#pcs resource create VirtualIP IPaddr2 ip=192.168.122.52 cidr_netmask=24 --group apachegroup
  • Create resource for website
#pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status" --group apachegroup

At this point the cluster should be configured and look somthing similar to our example.

#pcs status
Cluster name: mycluster
Last updated: Mon Jun 22 14:47:49 2015
Last change: Mon Jun 22 12:25:14 2015
Stack: corosync
Current DC: pm-node2.lab.com (2) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
5 Resources configured
Online: [ pm-node1.lab.com pm-node2.lab.com ]
Full list of resources:
Resource Group: apachegroup
 disk1 (ocf::heartbeat:LVM): Started pm-node1.lab.com 
 VirtualIP (ocf::heartbeat:IPaddr2): Started pm-node1.lab.com 
 apache_fs (ocf::heartbeat:Filesystem): Started pm-node1.lab.com 
 Website (ocf::heartbeat:apache): Started pm-node1.lab.com 
 scsi (stonith:fence_scsi): Started pm-node2.lab.com
PCSD Status:
 pm-node1.lab.com: Online
 pm-node2.lab.com: Online
Daemon Status:
 corosync: active/enabled
 pacemaker: active/enabled
 pcsd: active/enabled

In the event that there are problems the “pcs resource debug-start <resource>” command can be used for troubleshooting.

#pcs resource debug-start disk1

Pacemaker GUI

I was pleasantly surprised with the new Pacemaker GUI. The upstream community has done a terrific job. The GUI can even be used to manage multiple clusters. Below are some screenshots to give you a better idea.

To access GUI use the following URL and login as hacluster.

https://pm-node1:2224

Below screenshot shows the interface for managing clusters.

Pacemake_GUI_0

Below screenshot shows the interface for managing nodes.

Pacemake_GUI_1

Below screenshot shows the interface for managing resources.

Pacemake_GUI_2

Summary

Pacemaker is the result of open source innovation and long maturity. We have learned about Pacemaker basics and even configured a cluster for Apache using shared storage fencing. There is no doubt that Pacemaker is a viable alternative to commercial clustering from HP, IBM, Oracle and Microsoft. If you interested in traditional clustering I highly recommend giving Pacemaker a chance, you won’t be disappointed.

Happy Clustering!

(c) 2015 Keith Tenzer

9 thoughts on “Pacemaker – The Open Source, High Availability Cluster

  1. Nice job! Just one remark.
    Noticed that in first you created iscsi-target. It’s OK, but for what you created filesystem for shared block storage ?
    I suppose you presented to both cluster nodes the raw block device. Only after this you created VG, LV and Filesystem on one of the nodes. Also you didn’t describe login proccess with iscsiadm command on cluster node.

    Like

    • Yes I exposed a disk from a server two both nodes of cluster using ISCSI. I then created file system, VG and LV on one node. I also configured LVM to not import that VG since it goes under pacemaker control. The reason I needed shared storage is for the website but I also wanted to show how to setup fencing using fence_scsi. I understand this requires support for SCSI-3 PR, just did it as example.

      Like

    • Hi Abdur,

      I dont have a guide on exactly how to do this but it is supported with RHEL 7 and pacemaker. In pacemaker their are resource agents, oracle is one of them so you should be able to get an active/passive setup going pretty easily using built-in oracle resource group.

      Regards,

      Keith

      Like

  2. Hi Keith.

    Do you think that it is possible to add other filesystem like /dev/cluster_vg/disk2
    that works on pm-node2.lab.com at the same time whit this configuration I mean whit that
    LVM?

    I’m think on a cluster with 2 nodes, 3 filesystem on each nodes and
    the same LVM resource….

    thks

    Like

    • Should be possible since you can create multiple LVMs within VG and then mount each of those to filesystem. Important is that you exclude these from normal bootstrap process as I mention in guide since they fall under cluster control.

      Like

      • Hi Ketith.
        First off all thank you for your inestimable help on this.

        After follow all the steps of your cluster example,
        I introduce this logical volume
        #lvcreate -L 990M cluster_vg -n cluster_disk2

        So we have 1 volume group: cluster_vg
        and 2 logical volume: cluster_disk1 and cluster_disk2

        We create this resources by group: apachegroup that goes to pm-node1.lab.com

        pcs resource create disk1 LVM volgrpname= cluster_vg exclusive= true –group apachegroup
        pcs resource create apache_fs Filesystem device=”/dev/cluster_vg/disk1″ directory=”/var/www” fstype=”ext4 ” –group apachegroup

        And now I’ve mysql group:
        pcs resource create apache_fs Filesystem device=”/dev/cluster_vg/disk2″ directory=”/var/mysql” fstype=”ext4 ” –group mysql

        But it fails because LVM cluster_vg has starts on pm-node1.lab.com and It seems that can’t do it on pm-node2.lab.com
        So It’s a mandatory one LVM cluster resource for node?
        If not, how can I use it for the Filesystem resource on pm-node2.lab.com node?

        Like

  3. Hi Keith, after 2 days of searching, your syntax on creating the iSCSI stonith resource FINALLY allowed me to bring it online. No other guide or man page I’d read included the syntax for generating the key files (done by pcmk_host_list I assume). I’d been chasing “fence_scsi.key” errors when I came across your blog. Thank you.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s