Pacemaker in Ubuntu 16.04
Prerequisite
Two nodes of Ubuntu 16.04 with below initial configuration.
/etc/hostname /etc/hosts IP
Node1 node1.hatest.com 192.168.1.191 node1.hatest.com node1 192.168.1.191
Node2 Node2.hatest.com 192.168.1.192 node2.hatest.com node2 192.168.1.192
Alias 192.168.1.190
Preparations
Before proceed to the main installation phase please check below issues, that may help us during the whole process.
You may change the APT sources from bd to us
apt-get update
apt-get upgrade
Enable root login
Installation:
For an instant keep on a ping on the alias IP (192.168.1.190) from another host of the LAN. After the installation, we will check whether we are getting response from it or not…!!!!!!!!!!
We can sync time on both nodes by below steps.
root@node1:/home/sharif# dpkg-reconfigure tzdata
[Follow next easy steps to get below output on both nodes]
Current default time zone: 'Asia/Dhaka'
Local time is now: Tue Jun 19 15:19:53 +06 2018.
Universal Time is now: Tue Jun 19 09:19:53 UTC 2018.
Perform apt update & upgrade on both nodes.
Install pacemaker on both nodes.
root@node1:/home/sharif# apt install pacemaker
root@node2:/home/sharif# apt install pacemaker
Corosync is installed as a dependency of the Pacemaker package. So we do not need to install it manually.
Follow below steps only on Node1.
root@node1:/home/sharif# apt install haveged
root@node1:/home/sharif# corosync-keygen
This haveged package allows us to easily increase the amount of entropy on our server, which is required by the corosync-keygen script.
corosync-keygen will generate a 128-byte cluster authorization key, and write it to /etc/corosync/authkey
Now copy the authkeys from Node1 to Node2 as shown below.
root@node1:/home/sharif# scp /etc/corosync/authkey root@192.168.1.192:/tmp
The authenticity of host '192.168.1.192 (192.168.1.192)' can't be established.
ECDSA key fingerprint is SHA256:fnsDOzHiX3MHlhsqr97PKZqgwBKG0gmCOnbKUDpzDYE.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.192' (ECDSA) to the list of known hosts.
root@192.168.1.192's password:
authkey 100% 128 0.1KB/s 00:00
On Node2, move the authkey file to the proper location (/etc/corosync/), and restrict its permissions to root. Check below steps:
root@node2:/home/sharif# mv /tmp/authkey /etc/corosync
root@node2:/home/sharif# chown root: /etc/corosync/authkey
root@node2:/home/sharif# chmod 400 /etc/corosync/authkey
On both nodes, open the /etc/corosync/corosync.conf file and modify/add it as per below example file.
totem {
version: 2
cluster_name: lbcluster
transport: udpu
interface {
ringnumber: 0
bindnetaddr: 192.168.1.190
broadcast: yes
mcastport: 5405
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
nodelist {
node {
ring0_addr: 192.168.1.191
name: node1.hatest.com
nodeid: 1
}
node {
ring0_addr: 192.168.1.192
name: node2.hatest.com
nodeid: 2
}
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
timestamp: on
}
*** Green marked lies may not found at the file, please add it under the previous line/existing point. If any further change noticed at the file please match it exactly like the above example file.
Now, we need to configure Corosync to allow the Pacemaker service. We will do this on both nodes.
root@node1:/home/sharif# mkdir -p /etc/corosync/service.d
root@node1:/home/sharif# vim /etc/corosync/service.d/pcmk
service {
name: pacemaker
ver: 1
}
Open file /etc/default/corosync and add this line (if there is already a line START=no, change it to YES as below OR add a line with START=yes. Do it on both nodes as per below output example file.
root@node1:/home/sharif# cat /etc/default/corosync
# Corosync runtime directory
#COROSYNC_RUN_DIR=/var/lib/corosync
# Path to corosync.conf
#COROSYNC_MAIN_CONFIG_FILE=/etc/corosync/corosync.conf
# Path to authfile
#COROSYNC_TOTEM_AUTHKEY_FILE=/etc/corosync/authkey
# Command line options
#OPTIONS=""
START=yes
Now restart the corosync service on both nodes.
root@node1:/home/sharif# service corosync stop
root@node1:/home/sharif# service corosync start
root@node2:/home/sharif# service corosync stop
root@node2:/home/sharif# service corosync start
Once Corosync is running on both nodes, they should be clustered together. We can verify this by running this command with given output:
root@node1:/home/sharif# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp. members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp. members.1.ip (str) = r(0) ip(192.168.1.191)
runtime.totem.pg.mrp.srp. members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp. members.1.status (str) = joined
runtime.totem.pg.mrp.srp. members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp. members.2.ip (str) = r(0) ip(192.168.1.192)
runtime.totem.pg.mrp.srp. members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp. members.2.status (str) = joined
So far corosync set up properly, now we will configure Pacemaker. Pacemaker, which depends on the messaging capabilities of Corosync, is now ready to be started and to have its basic properties configured.
The Pacemaker service requires Corosync to be running, so it is disabled by default.
Enable Pacemaker to start on system boot with this command: (On both nodes)
root@node1:/home/sharif# update-rc.d pacemaker defaults 20 01
root@node1:/home/sharif# update-rc.d pacemaker defaults 20 01
we set Pacemaker's start priority to 20. It is important to specify a start priority that is higher than Corosync's (which is 19 by default), so that Pacemaker starts after Corosync.
Restart the pacemaker service on both nodes.
root@node1:/home/sharif# service pacemaker stop
root@node1:/home/sharif# service pacemaker start
root@node2:/home/sharif# service pacemaker stop
root@node2:/home/sharif# service pacemaker start
Check pacemaker status on both nodes with crm utility. The output will be as given below.
root@node1:/home/sharif# crm status
Last updated: Tue Jun 19 16:33:44 2018 Last change: Tue Jun 19 16:32:17 2018 by hacluster via crmd on node1.hatest.com
Stack: corosync
Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum
2 nodes and 0 resources configured
Online: [ node1.hatest.com node2.hatest.com]
Full list of resources:
Key points
Current DC = Current Designated Coordinator.
2 Nodes and 0 resources configured
Online = Both nodes should be online. For us they are node1 & node2
We can also use crm_mon as an alternative of crm status. crm_mon shows the output in a continuous manner.
We are still not getting any response at 192.168.1.190
One last task to do. Follow below step on node1.
root@node1:/home/sharif# crm configure primitive virtual_public_ip \ocf:heartbeat:IPaddr2 params ip="192.168.1.190" \cidr_netmask="32" op monitor interval="10s" \meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100"
Now check the ping status of 192.168.1.190
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Check crm status on either node. Compare it with previous crm status.
root@node1:/home/sharif# crm status
Last updated: Tue Jun 19 16:58:38 2018 Last change: Tue Jun 19 16:55:07 2018 by root via cibadmin on node1.hatest.com
Stack: corosync
Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum
2 nodes and 1 resource configured
Online: [ node1.hatest.com node2.hatest.com ]
Full list of resources:
virtual_public_ip (ocf::heartbeat:IPaddr2): Started node1.hatest.com
Check the IP configuration of node1. For that use,
ip -4 addr ls OR
ip a s
root@node1:/home/sharif# ip -4 addr ls
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.1.191/24 brd 192.168.1.255 scope global ens33
valid_lft forever preferred_lft forever
inet 192.168.1.190/32 brd 192.168.1.255 scope global ens33
valid_lft forever preferred_lft forever
We noticed the alias IP is assigned at Node1 as it is the active node for now. If we shut down or standby the Node1, it will move to the standby mode and the alias IP will be assigned at Node2.Check below step.
Perform below steps on Node1 & Node2 as per mentioned below.
root@node1:/home/sharif# crm node standby node1.hatest.com
Check the crm status.
root@node1:/home/sharif# crm status
Last updated: Tue Jun 19 17:05:33 2018 Last change: Tue Jun 19 17:05:25 2018 by root via crm_attribute on node1.hatest.com
Stack: corosync
Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum
2 nodes and 1 resource configured
Node node1.hatest.com: standby
Online: [ node2.hatest.com ]
Full list of resources:
virtual_public_ip (ocf::heartbeat:IPaddr2): Started node2.hatest.com
Check IP address of Node2.
root@node2:/home/sharif# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:78:28:85 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.192/24 brd 192.168.1.255 scope global ens33
valid_lft forever preferred_lft forever
inet 192.168.1.190/32 brd 192.168.1.255 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe78:2885/64 scope link
valid_lft forever preferred_lft forever
And, we are still getting response from the alias ip address.
Again make the Node1 active with below step.
root@node1:/home/sharif# crm node online node1.hatest.com
Check the crm status now.
root@node1:/home/sharif# crm status
Last updated: Tue Jun 19 17:14:27 2018 Last change: Tue Jun 19 17:12:34 2018 by root via crm_attribute on node1.hatest.com
Stack: corosync
Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum
2 nodes and 1 resource configured
Online: [ node1.hatest.com node2.hatest.com ]
Full list of resources:
virtual_public_ip (ocf::heartbeat:IPaddr2): Started node2.hatest.com
Here, is a confusion, We have again set the Node1 active/online. Now both nodes are online. The service should be running from Node1 but still we see service s are running from Node2 (Started node2.hatest.com) because of one of our previous command
crm configure primitive virtual_public_ip \ocf:heartbeat:IPaddr2 params ip="192.168.1.190" \cidr_netmask="32" op monitor interval="10s" \meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100
Now, we have to manually set the node2 standby and then online to resolve the issue. See below steps.
root@node2:/home/sharif# crm node standby node2.hatest.com
root@node2:/home/sharif# crm node online node2.hatest.com
Now again check the crm status.
root@node2:/home/sharif# crm status
Last updated: Tue Jun 19 17:22:55 2018 Last change: Tue Jun 19 17:16:16 2018 by root via crm_attribute on node2.hatest.com
Stack: corosync
Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum
2 nodes and 1 resource configured
Online: [ node1.hatest.com node2.hatest.com ]
Full list of resources:
virtual_public_ip (ocf::heartbeat:IPaddr2): Started node1.hatest.com
Now consider below scenario:
Node1 & Node2 both are online. Node2 crm status is
root@node2:/home/sharif# crm status
Last updated: Tue Jun 19 19:07:24 2018 Last change: Tue Jun 19 17:16:16 2018 by root via crm_attribute on node2.hatest.com
Stack: corosync
Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum
2 nodes and 1 resource configured
Online: [ node1.hatest.com node2.hatest.com ]
Full list of resources:
virtual_public_ip (ocf::heartbeat:IPaddr2): Started node1.hatest.com
Shutdown the Node1 and check the crm status on Node2.
root@node2:/home/sharif# crm status
Last updated: Tue Jun 19 19:07:33 2018 Last change: Tue Jun 19 17:16:16 2018 by root via crm_attribute on node2.hatest.com
Stack: corosync
Current DC: node2.hatest.com (version 1.1.14-70404b0) - partition with quorum
2 nodes and 1 resource configured
Online: [ node2.hatest.com ]
OFFLINE: [ node1.hatest.com ]
Full list of resources:
virtual_public_ip (ocf::heartbeat:IPaddr2): Started node2.hatest.com
Still, We are getting uninterrupted ICMP from alias IP (192.168.1.190)
Comments
Post a Comment