Pacemaker in Ubuntu 16.04

Prerequisite

Two nodes of Ubuntu 16.04 with below initial configuration.

/etc/hostname /etc/hosts IP

Node1 node1.hatest.com 192.168.1.191 node1.hatest.com node1 192.168.1.191

Node2 Node2.hatest.com 192.168.1.192 node2.hatest.com node2 192.168.1.192

Alias 192.168.1.190

Preparations

Before proceed to the main installation phase please check below issues, that may help us during the whole process.

 You may change the APT sources from bd to us

 apt-get update

 apt-get upgrade

 Enable root login

Installation:

 For an instant keep on a ping on the alias IP (192.168.1.190) from another host of the LAN. After the installation, we will check whether we are getting response from it or not…!!!!!!!!!!

 We can sync time on both nodes by below steps.

root@node1:/home/sharif# dpkg-reconfigure tzdata

[Follow next easy steps to get below output on both nodes]

Current default time zone: 'Asia/Dhaka'

Local time is now: Tue Jun 19 15:19:53 +06 2018.

Universal Time is now: Tue Jun 19 09:19:53 UTC 2018.

 Perform apt update & upgrade on both nodes.

 Install pacemaker on both nodes.

root@node1:/home/sharif# apt install pacemaker

root@node2:/home/sharif# apt install pacemaker

Corosync is installed as a dependency of the Pacemaker package. So we do not need to install it manually.

 Follow below steps only on Node1.

root@node1:/home/sharif# apt install haveged

root@node1:/home/sharif# corosync-keygen

This haveged package allows us to easily increase the amount of entropy on our server, which is required by the corosync-keygen script.

corosync-keygen will generate a 128-byte cluster authorization key, and write it to /etc/corosync/authkey

 Now copy the authkeys from Node1 to Node2 as shown below.

root@node1:/home/sharif# scp /etc/corosync/authkey root@192.168.1.192:/tmp

The authenticity of host '192.168.1.192 (192.168.1.192)' can't be established.

ECDSA key fingerprint is SHA256:fnsDOzHiX3MHlhsqr97PKZqgwBKG0gmCOnbKUDpzDYE.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added '192.168.1.192' (ECDSA) to the list of known hosts.

root@192.168.1.192's password:

authkey 100% 128 0.1KB/s 00:00

 On Node2, move the authkey file to the proper location (/etc/corosync/), and restrict its permissions to root. Check below steps:

root@node2:/home/sharif# mv /tmp/authkey /etc/corosync

root@node2:/home/sharif# chown root: /etc/corosync/authkey

root@node2:/home/sharif# chmod 400 /etc/corosync/authkey

 On both nodes, open the /etc/corosync/corosync.conf file and modify/add it as per below example file.

totem {

version: 2

cluster_name: lbcluster

transport: udpu

interface {

ringnumber: 0

bindnetaddr: 192.168.1.190

broadcast: yes

mcastport: 5405

}

quorum {

provider: corosync_votequorum

two_node: 1

}

nodelist {

node {

ring0_addr: 192.168.1.191

nodeid: 1

}

node {

ring0_addr: 192.168.1.192

nodeid: 2

}

logging {

to_logfile: yes

logfile: /var/log/corosync/corosync.log

to_syslog: yes

timestamp: on

}

*** Green marked lies may not found at the file, please add it under the previous line/existing point. If any further change noticed at the file please match it exactly like the above example file.

 Now, we need to configure Corosync to allow the Pacemaker service. We will do this on both nodes.

root@node1:/home/sharif# mkdir -p /etc/corosync/service.d

root@node1:/home/sharif# vim /etc/corosync/service.d/pcmk

service {

ver: 1

}

 Open file /etc/default/corosync and add this line (if there is already a line START=no, change it to YES as below OR add a line with START=yes. Do it on both nodes as per below output example file.

root@node1:/home/sharif# cat /etc/default/corosync

# Corosync runtime directory

#COROSYNC_RUN_DIR=/var/lib/corosync

# Path to corosync.conf

#COROSYNC_MAIN_CONFIG_FILE=/etc/corosync/corosync.conf

# Path to authfile

#COROSYNC_TOTEM_AUTHKEY_FILE=/etc/corosync/authkey

# Command line options

#OPTIONS=""

START=yes

 Now restart the corosync service on both nodes.

root@node1:/home/sharif# service corosync stop

root@node1:/home/sharif# service corosync start

root@node2:/home/sharif# service corosync stop

root@node2:/home/sharif# service corosync start

 Once Corosync is running on both nodes, they should be clustered together. We can verify this by running this command with given output:

root@node1:/home/sharif# corosync-cmapctl | grep members

runtime.totem.pg.mrp.srp. members.1.config_version (u64) = 0

runtime.totem.pg.mrp.srp. members.1.ip (str) = r(0) ip(192.168.1.191)

runtime.totem.pg.mrp.srp. members.1.join_count (u32) = 1

runtime.totem.pg.mrp.srp. members.1.status (str) = joined

runtime.totem.pg.mrp.srp. members.2.config_version (u64) = 0

runtime.totem.pg.mrp.srp. members.2.ip (str) = r(0) ip(192.168.1.192)

runtime.totem.pg.mrp.srp. members.2.join_count (u32) = 1

runtime.totem.pg.mrp.srp. members.2.status (str) = joined

 So far corosync set up properly, now we will configure Pacemaker. Pacemaker, which depends on the messaging capabilities of Corosync, is now ready to be started and to have its basic properties configured.

 The Pacemaker service requires Corosync to be running, so it is disabled by default.

 Enable Pacemaker to start on system boot with this command: (On both nodes)

root@node1:/home/sharif# update-rc.d pacemaker defaults 20 01

we set Pacemaker's start priority to 20. It is important to specify a start priority that is higher than Corosync's (which is 19 by default), so that Pacemaker starts after Corosync.

 Restart the pacemaker service on both nodes.

root@node1:/home/sharif# service pacemaker stop

root@node1:/home/sharif# service pacemaker start

root@node2:/home/sharif# service pacemaker stop

root@node2:/home/sharif# service pacemaker start

 Check pacemaker status on both nodes with crm utility. The output will be as given below.

root@node1:/home/sharif# crm status

Last updated: Tue Jun 19 16:33:44 2018 Last change: Tue Jun 19 16:32:17 2018 by hacluster via crmd on node1.hatest.com

Stack: corosync

Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum

2 nodes and 0 resources configured

Online: [ node1.hatest.com node2.hatest.com]

Full list of resources:

Key points

Current DC = Current Designated Coordinator.

2 Nodes and 0 resources configured

Online = Both nodes should be online. For us they are node1 & node2

We can also use crm_mon as an alternative of crm status. crm_mon shows the output in a continuous manner.

 We are still not getting any response at 192.168.1.190

 One last task to do. Follow below step on node1.

root@node1:/home/sharif# crm configure primitive virtual_public_ip \ocf:heartbeat:IPaddr2 params ip="192.168.1.190" \cidr_netmask="32" op monitor interval="10s" \meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100"

 Now check the ping status of 192.168.1.190

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

 Check crm status on either node. Compare it with previous crm status.

root@node1:/home/sharif# crm status

Last updated: Tue Jun 19 16:58:38 2018 Last change: Tue Jun 19 16:55:07 2018 by root via cibadmin on node1.hatest.com

Stack: corosync

Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum

2 nodes and 1 resource configured

Online: [ node1.hatest.com node2.hatest.com ]

Full list of resources:

virtual_public_ip (ocf::heartbeat:IPaddr2): Started node1.hatest.com

 Check the IP configuration of node1. For that use,

ip -4 addr ls OR

ip a s

root@node1:/home/sharif# ip -4 addr ls

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

inet 192.168.1.191/24 brd 192.168.1.255 scope global ens33

valid_lft forever preferred_lft forever

inet 192.168.1.190/32 brd 192.168.1.255 scope global ens33

valid_lft forever preferred_lft forever

We noticed the alias IP is assigned at Node1 as it is the active node for now. If we shut down or standby the Node1, it will move to the standby mode and the alias IP will be assigned at Node2.Check below step.

 Perform below steps on Node1 & Node2 as per mentioned below.

root@node1:/home/sharif# crm node standby node1.hatest.com

Check the crm status.

root@node1:/home/sharif# crm status

Last updated: Tue Jun 19 17:05:33 2018 Last change: Tue Jun 19 17:05:25 2018 by root via crm_attribute on node1.hatest.com

Stack: corosync

Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum

2 nodes and 1 resource configured

Node node1.hatest.com: standby

Online: [ node2.hatest.com ]

Full list of resources:

virtual_public_ip (ocf::heartbeat:IPaddr2): Started node2.hatest.com

Check IP address of Node2.

root@node2:/home/sharif# ip a s

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

inet6 ::1/128 scope host

valid_lft forever preferred_lft forever

2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

link/ether 00:0c:29:78:28:85 brd ff:ff:ff:ff:ff:ff

inet 192.168.1.192/24 brd 192.168.1.255 scope global ens33

valid_lft forever preferred_lft forever

inet 192.168.1.190/32 brd 192.168.1.255 scope global ens33

valid_lft forever preferred_lft forever

inet6 fe80::20c:29ff:fe78:2885/64 scope link

valid_lft forever preferred_lft forever

And, we are still getting response from the alias ip address.

Again make the Node1 active with below step.

root@node1:/home/sharif# crm node online node1.hatest.com

Check the crm status now.

root@node1:/home/sharif# crm status

Last updated: Tue Jun 19 17:14:27 2018 Last change: Tue Jun 19 17:12:34 2018 by root via crm_attribute on node1.hatest.com

Stack: corosync

Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum

2 nodes and 1 resource configured

Online: [ node1.hatest.com node2.hatest.com ]

Full list of resources:

virtual_public_ip (ocf::heartbeat:IPaddr2): Started node2.hatest.com

Here, is a confusion, We have again set the Node1 active/online. Now both nodes are online. The service should be running from Node1 but still we see service s are running from Node2 (Started node2.hatest.com) because of one of our previous command

crm configure primitive virtual_public_ip \ocf:heartbeat:IPaddr2 params ip="192.168.1.190" \cidr_netmask="32" op monitor interval="10s" \meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100

Now, we have to manually set the node2 standby and then online to resolve the issue. See below steps.

root@node2:/home/sharif# crm node standby node2.hatest.com

root@node2:/home/sharif# crm node online node2.hatest.com

Now again check the crm status.

root@node2:/home/sharif# crm status

Last updated: Tue Jun 19 17:22:55 2018 Last change: Tue Jun 19 17:16:16 2018 by root via crm_attribute on node2.hatest.com

Stack: corosync

Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum

2 nodes and 1 resource configured

Online: [ node1.hatest.com node2.hatest.com ]

Full list of resources:

virtual_public_ip (ocf::heartbeat:IPaddr2): Started node1.hatest.com

Now consider below scenario:

Node1 & Node2 both are online. Node2 crm status is

root@node2:/home/sharif# crm status

Last updated: Tue Jun 19 19:07:24 2018 Last change: Tue Jun 19 17:16:16 2018 by root via crm_attribute on node2.hatest.com

Stack: corosync

Current DC: node1.hatest.com (version 1.1.14-70404b0) - partition with quorum

2 nodes and 1 resource configured

Online: [ node1.hatest.com node2.hatest.com ]

Full list of resources:

virtual_public_ip (ocf::heartbeat:IPaddr2): Started node1.hatest.com

Shutdown the Node1 and check the crm status on Node2.

root@node2:/home/sharif# crm status

Last updated: Tue Jun 19 19:07:33 2018 Last change: Tue Jun 19 17:16:16 2018 by root via crm_attribute on node2.hatest.com

Stack: corosync

Current DC: node2.hatest.com (version 1.1.14-70404b0) - partition with quorum

2 nodes and 1 resource configured

Online: [ node2.hatest.com ]

OFFLINE: [ node1.hatest.com ]

Full list of resources:

virtual_public_ip (ocf::heartbeat:IPaddr2): Started node2.hatest.com

Still, We are getting uninterrupted ICMP from alias IP (192.168.1.190)

Search This Blog

The Craving Mind

Pacemaker in Ubuntu 16.04

Comments

Post a Comment

Popular posts from this blog

Disabling Zimbra's AntiSpam, Amavis and AntiVirus filtering

Cambium cnPilot E400/E410/E500 Configuration Tutorial

Error "Unable to retrive Zimbra GPG key for package validation"