Deploying a High Availability MaaS
1. What I need to prepare before start
- Create 3x VMs using this specification:
- 4 vCPUs
- 4 GB RAM
- 100 GB Storage Disk
- Operating System : Ubuntu 18.04 LTS
- 6 x Network Interfaces
-
Networking map:
interface network name ip address mtu bonding ens3
ens4oam 10.101.1.xx/24 1500 bondm ens5
ens6bond0 - 9000 bond0 ens7
ens8bond1 - 1500 bond1 interface vlan id cidr mtu bond master purpose bond0.5 5 192.168.5.0/24 9000 bond0 internal bond0.6 6 192.168.6.0/24 9000 bond0 ceph replication bond0.8 8 192.168.8.0/24 9000 bond0 overlay bond0.10 10 10.11.12.0/24 9000 bond0 external bond0.11 11 10.11.12.0/24 9000 bond0 dns bond1.7 7 7.8.9.0/24 9000 bond1 ceph access
2. MaaS: Install & Configure
2.1. Add Necessary Repository and Install the Required Packages.
- Set mapping hostname on /etc/hosts and generate ssh-keygen.
cat << EOF >> /etc/hosts
# MAAS Cluster
10.101.1.6 sofyan01-maas01-rack01.cloud.sofyan.dev sofyan01-maas01-rack01
10.101.1.8 sofyan02-maas02-rack02.cloud.sofyan.dev sofyan02-maas02-rack02
10.101.1.10 sofyan03-maas03-rack03.cloud.sofyan.dev sofyan03-maas03-rack03
10.101.1.5 maas-vip
EOF
ssh-keygen -q -t rsa -N '' -f ~/.ssh/id_rsa <<<y >/dev/null 2>&1
- Install MaaS and required packages.
sudo apt-add-repository -y ppa:maas/stable
sudo apt update
sudo apt-get install maas jq wget sshpass -y
2.2. Configure PostgreSQL Replicated Cluster
- Install PostgreSQL Automatic Failover (PAF).
wget https://github.com/ClusterLabs/PAF/releases/download/v2.3.0/resource-agents-paf_2.3.0-1_all.deb
sudo dpkg -i resource-agents-paf_2.3.0-1_all.deb
sudo bash -c "cat << EOF > /etc/tmpfiles.d/postgresql-part.conf
## Directory for PostgreSQL temp stat files
d /run/postgresql/10-main.pg_stat_tmp 0700 postgres postgres - -
EOF"
systemd-tmpfiles --create /etc/tmpfiles.d/postgresql-part.conf
- Configure PostgreSQL.
su - postgres -c "cat << EOF >> /etc/postgresql/10/main/postgresql.conf
listen_addresses = '*'
max_connections = 300
wal_level = hot_standby
synchronous_commit = on
archive_mode = off
max_wal_senders = 10
wal_keep_segments = 256
hot_standby = on
restart_after_crash = off
hot_standby_feedback = on
EOF"
sed -ir 's/local replication.*/#local replication all peer/g' /etc/postgresql/10/main/pg_hba.conf
sed -ir 's/host replication.*/#host replication all 127.0.0.1/32 md5/g' /etc/postgresql/10/main/pg_hba.conf
sed -ir 's/host replication.*/#host replication all ::1/128 md5/g' /etc/postgresql/10/main/pg_hba.conf
- Edit pg_hba.conf, HBA stands for host-based authentication.
su - postgres -c "cat << EOF >> /etc/postgresql/10/main/pg_hba.conf
host replication postgres 10.101.1.5/32 trust
host replication postgres 10.101.1.6/32 trust
host replication postgres 10.101.1.8/32 trust
host replication postgres 10.101.1.10/32 trust
host maasdb maas 10.101.1.5/32 md5
host maasdb maas 10.101.1.6/32 md5
host maasdb maas 10.101.1.8/32 md5
host maasdb maas 10.101.1.10/32 md5
EOF"
- Create recovery.conf.pcmk file and asign temporary IP address to broam9 (maas01 only).
su - postgres -c "cat << EOF > /etc/postgresql/10/main/recovery.conf.pcmk
standby_mode = on
primary_conninfo = 'host=10.101.1.5 port=5432 user=postgres application_name=sofyan01-maas01-rack01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = ''
recovery_target_timeline = 'latest'
EOF"
systemctl restart postgresql
ip address add 10.101.1.5/24 dev broam9
- Take a base backup of a running PostgreSQL database cluster (maas02 & maas03).
systemctl stop postgresql
su - postgres -c "rm -rf ~/10/main/"
su - postgres -c "pg_basebackup -h maas-vip -D ~postgres/10/main/ -U postgres -v -X stream -P"
su - postgres -c "cp /usr/share/postgresql/10/recovery.conf.sample /var/lib/postgresql/10/main/recovery.conf"
- Create recovery.conf and recovery.conf.pcmk file (maas02 only).
su - postgres -c "cat << EOF > /etc/postgresql/10/main/recovery.conf.pcmk
standby_mode = on
primary_conninfo = 'host=10.101.1.5 port=5432 user=postgres application_name=sofyan02-maas02-rack02 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = ''
recovery_target_timeline = 'latest'
EOF"
su - postgres -c "cat << EOF > /var/lib/postgresql/10/main/recovery.conf
standby_mode = on
primary_conninfo = 'host=10.101.1.5 port=5432 user=postgres application_name=sofyan02-maas02-rack02 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = ''
recovery_target_timeline = 'latest'
EOF"
systemctl start postgresql
- Create recovery.conf and recovery.conf.pcmk file (maas03 only).
su - postgres -c "cat << EOF > /etc/postgresql/10/main/recovery.conf.pcmk
standby_mode = 'on'
primary_conninfo = 'host=10.101.1.5 port=5432 user=postgres application_name=sofyan03-maas03-rack03 keepalives_interval=5 keepalives_count=5'
restore_command = ''
recovery_target_timeline = 'latest'
EOF"
su - postgres -c "cat << EOF > /var/lib/postgresql/10/main/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=10.101.1.5 port=5432 user=postgres application_name=sofyan03-maas03-rack03 keepalives_interval=5 keepalives_count=5'
restore_command = ''
recovery_target_timeline = 'latest'
EOF"
systemctl start postgresql
- Verify the PostgreSQL Cluster is Replicated (maas01).
su - postgres -c 'psql -c "select client_addr,sync_state from pg_stat_replication;"' > /root/postgresql_replica.log
- Stop PostgreSQL on all nodes.
systemctl disable --now postgresql@10-main
2.3. Setup HA for PostgreSQL
- Install HAProxy, Pacemaker, Corosync and required packages. (all nodes).
apt-get install haproxy pacemaker corosync -y
apt-get install pcs crmsh -y
- Configure Corosync.
cat << EOF > /etc/corosync/corosync.conf
totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
join: 60
consensus: 3600
vsftype: none
max_messages: 20
clear_node_high_bit: yes
secauth: off
threads: 0
ip_version: ipv4
rrp_mode: none
transport: udpu
}
quorum {
provider: corosync_votequorum
}
nodelist {
node {
ring0_addr: sofyan01-maas01-rack01
nodeid: 1000
}
node {
ring0_addr: sofyan02-maas02-rack02
nodeid: 1001
}
node {
ring0_addr: sofyan03-maas03-rack03
nodeid: 1002
}
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
logger_subsys {
subsys: QUORUM
debug: off
}
}
EOF
- Setup Pacemaker.
pcs cluster auth sofyan01-maas01-rack01 sofyan02-maas02-rack02 sofyan03-maas03-rack03 -u hacluster -p P@ssw0rd
pcs cluster setup --name ha-pgsql-maas sofyan01-maas01-rack01 sofyan02-maas02-rack02 sofyan03-maas03-rack03 --force
pcs cluster disable --all
pcs cluster start --all
- Create pacemaker config file.
pcs cluster cib /root/pgsql_cfg
pcs -f /root/pgsql_cfg property set no-quorum-policy="ignore"
pcs -f /root/pgsql_cfg property set stonith-enabled="false"
pcs -f /root/pgsql_cfg resource defaults resource-stickiness="INFINITY"
pcs -f /root/pgsql_cfg resource defaults migration-threshold="1"
pcs -f /root/pgsql_cfg resource create pgsql ocf:heartbeat:pgsqlms \
bindir="/usr/lib/postgresql/10/bin" \
pgdata="/etc/postgresql/10/main" \
datadir="/var/lib/postgresql/10/main" \
op start on-fail="restart" \
op monitor interval="3s" on-fail="restart" role="Master" \
op monitor interval="4s" on-fail="restart" role="Slave" \
op promote on-fail="restart" \
op demote on-fail="stop" \
op stop on-fail="block" \
op notify
pcs -f /root/pgsql_cfg resource master ms_pgsql pgsql notify=true
pcs -f /root/pgsql_cfg resource create res_pgsql_vip ocf:heartbeat:IPaddr2 \
nic="broam9" \
ip=10.101.1.5 \
cidr_netmask=24 \
op start interval="0s" on-fail="restart" \
op monitor interval="4s" on-fail="restart" \
op stop interval="0s" on-fail="block"
pcs -f /root/pgsql_cfg resource create res_maas_vip ocf:heartbeat:IPaddr2 \
nic="broam9" \
ip=10.101.1.11 \
cidr_netmask=24 \
op start interval="0s" on-fail="restart" \
op monitor interval="4s" on-fail="restart" \
op stop interval="0s" on-fail="block"
pcs -f /root/pgsql_cfg constraint colocation add res_pgsql_vip with master ms_pgsql INFINITY
pcs -f /root/pgsql_cfg constraint order promote ms_pgsql then start res_pgsql_vip symmetrical=false kind=Mandatory
pcs -f /root/pgsql_cfg constraint order demote ms_pgsql then stop res_pgsql_vip symmetrical=false kind=Mandatory
pcs cluster cib-push /root/pgsql_cfg
- Configure HAProxy.
cat << EOF > cat /etc/haproxy/haproxy.cfg
frontend maas
bind *:80
retries 3
option redispatch
option http-server-close
default_backend maas
backend maas
timeout server 900s
balance source
hash-type consistent
server maas-api-0 10.101.1.6:5240 check
server maas-api-1 10.101.1.8:5240 check
server maas-api-2 10.101.1.10:5240 check
EOF
Then restart haproxy service
systemctl restart haproxy.service
- Download HAProxy OCF Resource Agent.
cd /usr/lib/ocf/resource.d/heartbeat
curl -O https://raw.githubusercontent.com/thisismitch/cluster-agents/master/haproxy
chmod +x haproxy
- Add HAProxy resource.
crm configure primitive haproxy ocf:heartbeat:haproxy \
op start interval="0" on-fail="restart" \
op monitor interval="4s" on-fail="restart" \
op stop interval="0" on-fail="block"
crm configure clone haproxy-clone haproxy
crm configure colocation colocation-res_maas_vip-haproxy-clone inf: res_maas_vip haproxy-clone
2.4. Setup MAAS
- Reconfigure MAAS (on all nodes).
sed -ir 's/database_host: .*/database_host: 10.101.1.5/g' /etc/maas/regiond.conf
sed -ir 's/maas_url: .*/maas_url: http:\/\/10.101.1.11:80\/MAAS/g' /etc/maas/regiond.conf
sed -ir 's/- http.*/- http:\/\/10.101.1.11:80\/MAAS/g' /etc/maas/rackd.conf
systemctl restart maas-regiond.service
systemctl restart maas-rackd.service
- Initialize MaaS (maas01).
maas init --admin-username root --admin-password P@ssw0rd --admin-email maas@sofyan.dev
- Create SSH keys for MaaS (maas01).
maas login root http://localhost:5240/MAAS \$(maas apikey --username=root)
maas root maas set-config name=maas_name value="Sofyan Cloud-1"
maas root sshkeys create "key=\$(cat /root/.ssh/id_rsa.pub)"
- Create fabrics (maas01).
maas root fabrics create name=default
maas root fabrics create name=vlan7-bondm
maas root fabrics create name=vlan9-bond1
- Create space (maas01).
maas root spaces create name=oam-space
maas root spaces create name=internal-space
maas root spaces create name=ceph-replica-space
maas root spaces create name=overlay-space
maas root spaces create name=external-space
maas root spaces create name=ceph-access-space
maas root spaces create name=dns-space
- Set subnet name (maas01).
maas root subnets read | jq -r '.[] | [.vlan.fabric_id, .vlan.vid, .id, .cidr|tostring] | join(" ")' > /tmp/subnet-and-id.txt
maas root subnet update "\$(cat /tmp/subnet-and-id.txt | grep 10.101.- | awk '{print \$3}')" name=oam
maas root subnet update "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$3}')" name=internal
maas root subnet update "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$3}')" name=ceph_replication
maas root subnet update "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$3}')" name=overlay
maas root subnet update "\$(cat /tmp/subnet-and-id.txt | grep 10.11.1- | awk '{print \$3}')" name=external
maas root subnet update "\$(cat /tmp/subnet-and-id.txt | grep 10.11.1- | awk '{print \$3}')" name=dns
maas root subnet update "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$3}')" name=ceph_access
- Assign space to each vlan (maas01).
maas root subnets read | jq -r '.[] | [.vlan.fabric_id, .vlan.vid, .id, .cidr|tostring] | join(" ")' > /tmp/subnet-and-id.txt
maas root vlan update "\$(cat /tmp/subnet-and-id.txt | grep 10.101.- | awk '{print \$1}')" "\$(cat /tmp/subnet-and-id.txt | grep 10.101.- | awk '{print \$2}')" space=oam-space
maas root vlan update "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$1}')" "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$2}')" space=internal-space
maas root vlan update "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$1}')" "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$2}')" space=ceph-replica-space
maas root vlan update "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$1}')" "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$2}')" space=overlay-space
maas root vlan update "\$(cat /tmp/subnet-and-id.txt | grep 10.11.1- | awk '{print \$1}')" "\$(cat /tmp/subnet-and-id.txt | grep 10.11.1- | awk '{print \$2}')" space=external-space
maas root vlan update "\$(cat /tmp/subnet-and-id.txt | grep 10.11.1- | awk '{print \$1}')" "\$(cat /tmp/subnet-and-id.txt | grep 10.11.1- | awk '{print \$2}')" space=dns-space
maas root vlan update "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$1}')" "\$(cat /tmp/subnet-and-id.txt | grep 192.168.- | awk '{print \$2}')" space=ceph-access-space
- Register other nodes as Controller. (maas02 & maas03)
maas-rack register --url http://10.101.1.11:80/MAAS --secret \$(cat /home/jujumanage/secret)
3. Reference
- https://maas.io/docs/how-to-install-maas
- https://maas.io/docs/how-to-manage-controllers
- https://maas.io/docs/how-to-use-the-maas-cli
- https://pgstef.github.io/2018/02/07/introduction_to_postgresql_automatic_failover.html
- https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-resourceoperate-haar
- https://www.digitalocean.com/community/tutorials/how-to-create-a-high-availability-setup-with-corosync-pacemaker-and-reserved-ips-on-ubuntu-14-04
- https://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster