Category Archives: Storage

Multi master Database Cluster on OpenStack with Load Balancing

Multi Master Database Replication

Multi Master database replication in a cluster of databases allows applications to write to any database node and data is available at other nodes within short order. The main advantage is high availability deployment, high read performance and  scalability.

Overall Design

We are aiming have an application layer accessing  a database cluster via a Load balancer as show in picture below:

Load Balancer for a Database Cluster

Fig. 1: Load Balancer for a Database Cluster

Trove

For providing databases services on OpenStack we considered Trove. However, its broken on Kilo. There is no easy way to get a ‘Trove Image’ and launch it.  There is a nice and automated script  located here at the RDO page that actually creates an image. However, after the image is registered, it errors out upon DB instance launch. Given that Open Stack Trove documentation was not helpful so there was no motivation for us to debug that further as it would be much more riskier for us to maintain any hacked code. Wish it worked. Moving on to other options… Enter Galera Cluster and MySQL Cluster products.

Using other options

In the world of MySQL based multi master replication cluster databases, there are few popular ones:

  • MariaDB Galera Cluster
  • Percona XtraDB Cluster
  • MySQL Cluster

Out of the three, we chose Percona XtraDB Cluster (PXC). Mainly because of slightly better support for tables without primary keys [1] [2] – Note Galera is used both in MariaDB and PXC. However, some users have still reported issues on not having PK on MariaDB. Generally, you must have PK for every table. We could have used MariaDB Galera Cluster, however, either the documentation is not maintained or has a pretty strict rule about primary keys required. Unfortunately, that is a significant restriction. MySQL Cluster on the other hand has a huge learning curve for setup and administration. This might be something to consider when scaling up to millions of queries per second. MySQL Cluster bears no resemblance to MariaDB or Percona’s cluster counterparts so its a completely different mindset.

Instance Preparation

We use CentOS 7.1 instances that  create a new volume for OS disk. The database volume itself is on a separate volume: vdb.

Swap File Preparation

Normally, the instances don’t have swap file enabled (check by swapon --summary). So prepare a swap file like so:

fallocate -l 1G /swapfile;
dd if=/dev/zero of=/swapfile bs=1M count=1024;
chmod 600 /swapfile;
mkswap /swapfile;
swapon /swapfile
swapon --summary

MySQL data directory preparation

Next, prepare the secondary hard that will hold the data directory of mysql

fdisk /dev/vdb
new partition, extended.
new partition, logical.
w (to write the partition table)

Now make a file system. Ensure you have a valid partion created (vdb5 – in this case).

mkfs.ext4 /dev/vdb5

Automount swap and data directory

Create mysql directory as we have not yet installed mysql and setup /etc/fstab

mkdir /var/lib/mysql
echo "/swapfile none swap defaults 0 0" >> /etc/fstab
echo "/dev/vdb5 /var/lib/mysql ext4 defaults 0 2" >> /etc/fstab

Mount the fstab file and make sub directory for data (I like to use non default directories so I know whats going on)

mount -av
mkdir /var/lib/mysql/mysql_data
touch /var/lib/mysql/mysql_data/test_file

Finally restore security context on the mysql directory

restorecon -R /var/lib/mysql

Database Node List

In our case we have 3 database servers all with CentOS 7.1.

DBNode1 - 10.0.32.23
DBNode2 - 10.0.32.24
DBNode3 - 10.0.32.25

Security Groups, Iptables & Selinux

We need to open these ports for each of the database nodes:

 TCP 873 (rsync)
 TCP 3306 (Mysql)
 TCP 4444 (State Transfer)
 TCP 4567 (Group Communication - GComm_port)
 TCP 4568 (Incremental State Transfer port = GComm_port+1)

Selinux was set to Permissive (setenforce 0) — temporarily while installation was done. Ensure the above ports allowed by a security group applied to the database instances.
For every node, we need to install the PXC database software. Install, but don’t start the mysql service yet.

Installing the Database Percona XtraDB Cluster Software

Before you install, there is a pre-requisite to install socat. This package should installed from the base repository. If you have epel, remove it (assuming this node is going to be used only for database).

sudo yum remove epel-release
sudo yum install -y socat;

Installing the Database Percona XtraDB Cluster Software

Install the Percona repo and software itself.

sudo yum install -y http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm;

sudo yum install Percona-XtraDB-Cluster-56

First Node (Primary) in Cluster setup

In order to start a new cluster, the very first node should be started in specific way – aka bootstrapping. This will cause the node to assume its the primary of the DB cluster that we are going make come to life.

First edit the /etc/my.cnf so setup your requirements.

 # Edit to your requirements.
[mysqld]
datadir=/var/lib/mysql/mysql_data
user=mysql
log_bin                        = mysql-bin
binlog_format                  = ROW
innodb_buffer_pool_size        = 200M
innodb_flush_log_at_trx_commit = 0
innodb_flush_method            = O_DIRECT
innodb_log_files_in_group      = 2
innodb_log_file_size           = 20M
innodb_file_per_table          = 1
wsrep_cluster_address          = gcomm://10.0.32.23,10.0.32.24,10.0.32.25
wsrep_provider                 = /usr/lib64/galera3/libgalera_smm.so
wsrep_slave_threads            = 2
wsrep_cluster_name             = SilverSkySoftDBClusterA
wsrep_node_name                = DBNode1
wsrep_node_address             = 10.0.32.23
wsrep_sst_method               = rsync
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode       = 2
[mysqld_safe]
pid-file = /run/mysqld/mysql.pid
syslog

Start the bootstrap service
systemctl start mysql@bootstrap.service

This special service uses the my.cnf with wsrep_cluster_address = gcomm://  (no IPs) and start the MySQL server as the first node. This creates a new cluster. Be sure to run this service only at create cluster time and not at node join time.

While this first node is running, login to each of the other nodes DBNode2 & DBNode3 and use the my.cnf from above as a template. For each node update the wsrep_node_name and wsrep_node_address. Note that The wsrep_cluster_address should contain all IP addresses of that node.

Start the mysql service on each of the nodes 2 & 3 while node 1 is still running:
systemctl start mysql

Verify Cluster is up and nodes are joined

It should show Value: 3 (indicating 3 nodes are joined)

mysql> select @@hostname\G show global status like 'wsrep_cluster_size' \G
*************************** 1. row ***************************
@@hostname: dbserver1.novalocal
1 row in set (0.00 sec)

*************************** 1. row ***************************
Variable_name: wsrep_cluster_size
Value: 3
1 row in set (0.00 sec)

Start Node 1 back in normal mode

On the Node 1, restart in normal mode:
systemctl stop mysql@bootstrap.service; systemctl start mysql

Verify database and replication actually happens

In one of the node, say DBNode3, create a sample database and table.

mysql -u root -p
CREATE DATABASE my_test_db;
USE my_test_db;
CREATE TABLE my_test_table (test_year INT, test_name VARCHAR(255));
INSERT INTO my_test_table (test_year, test_name) values (1998, 'Hello year 1998');

On an another node, say DBNode2, check the table and rows are visible:

 mysql -u root -p 
 SELECT @@hostname\G SELECT * from my_test_db.my_test_table;
 *************************** 1. row ***************************
 @@hostname: dbserver2.novalocal
 1 row in set (0.00 sec)
 +-----------+-----------------+
 | test_year | test_name       |
 +-----------+-----------------+
 | 1998      | Hello year 1998 |
 +-----------+-----------------+
 1 row in set (0.00 sec)

This confirms our cluster is up and running.
Don’t forget to enable the mysql service to start automatically – systemctl enable mysql
Also set the root password for MySQL.

Managing Users in Clustered Database

In the cluster setup, the mysql.*  is not replicated so manually creating an user in mysql.* table will be limited to local. So you can use CREATE USER statements to create users that are replicated across the cluster. A sample is:

CREATE USER 'admin'@'%' IDENTIFIED BY 'plainpassword';
GRANT ALL ON *.* TO 'admin'@'%';

You can log into any other node to the new user is created.

In addition, you can use MySQL workbench to databases in the cluster.

OpenStack Load Balancer

OpenStack Load balancer as a service (LBaaS) is easily enabled in RDO packstack and other installs. To create a Load balancer for the database cluser we created above, click on the Load balancer menu under Network and click add pool as show in figure below:

Image of how to add add a New Load Balancing Pool in OpenStack
Adding a New Load Balancing Pool in OpenStack

Then fill in the pool details as show in below picture:

image of Setting the details of the Load Balancing Pool
Setting the details of the Load Balancing Pool

Note that we are using TCP protocol in the case as we need to allow MySQL connections. For simplicity of testing use ROUND_ROBIN balancing method.

Next, add the VIP for the load balancer from the Actions column. In the VIP setup choose protocol TCP and port as 3306

Next, add the members of the pool by selecting ‘Members’ tab and then selecting the Database Nodes. For now you can keep weight as 1.

Get the VIP address by clicking the VIP link at the Load balancer pool. Once you get the IP, you can optionally choose to associate a floating IP.  This can be done by going compute -> Access & Security. Allocate an IP to your project. Then click on Associate. In the drop down, you should the the vip’s name and IP you provided.

This completes the Load balancer setup.

Testing the Load Balancer

A simple test is to query the load balancer’s VIP with mySQL client. In our case the VIP is 172.16.99.35 and result is seen below.

[centos@client1 etc]$ mysql -u root -p -h 172.16.99.35 -e "SHOW VARIABLES LIKE 'wsrep_node_name';"
Enter password: 
+-----------------+---------+
| Variable_name | Value     |
+-----------------+---------+
| wsrep_node_name | DBNode1 |
+-----------------+---------+
[centos@client1 etc]$ mysql -u root -p -h 172.16.99.35 -e "SHOW VARIABLES LIKE 'wsrep_node_name';"
Enter password: 
+-----------------+---------+
| Variable_name | Value     |
+-----------------+---------+
| wsrep_node_name | DBNode2 |
+-----------------+---------+

You can see that each query is being routed to different nodes.

Simplistic PHP Test App

On an another VM, install apache and PHP. Start Apache and insert a PHP file as below. The database is the one we create above.

<?php
 $user = "root";
 $pass = "your_password";
 
 $db_handle = new  PDO("mysql:host=dbcluster1.testdomain.com;dbname=my_test_db", $user, $pass);
 print "<pre>";
 foreach ($db_handle->query("SELECT test_name FROM my_test_table") as $row) 
 {
   print "Name from db " . $row['test_name'] . "<br />";
 }
 print "\n";
 foreach ($db_handle->query("SHOW VARIABLES LIKE 'wsrep_%'") as $row) {
 print $row['Variable_name'] . " = " . $row['Value'];
 print "\n";
 }
 print_r ($row);
 print "</pre>";
 
 $db_handle = null;
 
?>

From the browser navigate to the URL where this file is.

This would show the data from the table and various wsrep variables. Each time you refresh the page you should see wsrep_node_address, wsrep_node_name changing so you know load balancer is working.

Monitoring

In general, the cluster needs to be monitored for crashed databases etc. The OpenStack load balancer can monitor the members in the pool and set it to inactive state.

Crashed Node Recovery

Recovery of crashed nodes with little impact to overall cluster is one of main reasons why we go with a cluster. A very nice article about various ways to recover a crashed node is on Percona’s site.

Conclusion

We described how to create a database cluster and configure a load balancer on top. Its not a very complex process. The entire environment was in OpenStack Kilo.

The Search for the Ideal Backup Tool Part 2 of 2

In this installment, we publish our results of comparing ZBackup and Attic backup tools.

We put both ZBackup and Attic to two main tests: Backup and restore.

The input file generally was QEMU’s IMG or QCOW2 format containing CentOS or empty data. The hard disk was all SSD RAID1+0. The CPU was 2xHaswell Xeon 2.3 GHz with 6 cores each.

Backup Test

Attic

Backup Number input Size (GB) Num Files Time (hh:mm:ss) Size of folder (GB) Effective Compression Ratio Notes
1 50 3 00:09:54 2.1 23.81
2 50 3 00:00:18 2.1 23.81 No new files. No updates
3 50 3 00:01:15 2.1 23.81 No new files. But minor update to one of the larger files
4 470 5 00:50:16 2.16 217.59 2 new files
5 470 5 00:41:31 2.16 217.59 No new files. But minor update to one of the larger files
Total data processed = 1,090 GB.
Total time for data  = 6,194 seconds

Attic takes 5.68 seconds per GB for data that is generally duplicate like IMG/QCOW2 files containing CentOS install.

 

ZBackup

Backup Number input Size (GB) Num Files Time (hh:mm:ss) Size of folder (GB) Effective Compression Ratio Notes
1 50 3 00:45:43 1.6 31.25
2 50 3 00:08:17 1.6 31.25 No new files. No updates
3 50 3 00:08:22 1.6 31.25 No new files. But minor update to one of the larger files
4 470 5 04:10:13 1.6 293.75 2 new files
5 470 5 04:08:00 1.6 293.75 No new files. But minor update to one of the larger files
Total data processed = 1,090 GB.
Total time for data  = 33,635 seconds

ZBackup takes 30.86 seconds per GB for data that is generally duplicate like IMG/QCOW2 files containing CentOS install.

Restore Test

For restore, all the restored file must match the SHA1 fingerprint as the original file exactly. Both ZBackup and Attic passed this test.

Attic

Restore Number Restore Size (GB) Num files Time (hh:mm:ss)
1 350 1 00:39:11
2 25 1 00:00:20
3 48 2 00:05:18
Total data processed = 423 GB.
Total time for data  = 2,689 seconds

 

Attic takes 6.35 seconds per GB to restore data.

ZBackup

Restore Number Restore Size (GB) Num files Time (hh:mm:ss)
1 350 1 00:24:29 (2 GB cache)
2 350 1 00:26:40 (40 MB cache)
3 25 1 00:01:19
4 48 2 00:06:02
Total data processed = 773 GB.
Total time for data  = 3,510 seconds

 

ZBackup takes 4.54 seconds per GB to restore data.

Comparison

Attic Zbackup Attic vs Zbackup
Backup -seconds/GB 5.68 30.86 -443.31%
Backup Compression 217 293 35.02%
Restore-seconds/GB 6.35 4.54 -28.50%

 

Final selection depends on which factor has more weight. For instance, if you have a cheaper cost to store a GB but need  fast backup time, Attic seems best. If you care about size, Zbackup seems best at the expense of time. I believe, ZBackup has selectable compression algorithms so it might even be faster if you choose a faster LZO compressor, however the author mentions LZO is a caveat. Our quick tests show LZO is definitely faster but compression ratio is lower than attic.

Do let us know you thoughts in the comments

Post Script – The Test script Files
Attic Create Backup Script
run=$1
if [ "$run" == "" ]; then
 echo "Error run number is required."
 exit
fi
 
attic create --stats /vm_backup/atticrepo.attic::$run /virtual_machines/images/file1.img /virtual_machines/images/file2.img . . .
du -h -d 1 /vm_backup/atticrepo.attic
echo "Done"
ZBackup CREATE BACKUP SCRIPT
. . . Preamble Same as attic . . .
zbackup backup --non-encrypted --threads 8 --cache-size 1024mb
/vm_backup/zbak/backups/file1.img.$run < /virtual_machines/images/file1.img

. . . other files . . .

sha1sum was used to calculate SHA1 on restored files.

The Search for the Ideal Backup Tool Part 1 of 2

In the context of virtualization, backing up VM images to storage nodes involves moving very large files. Many VM images are just copies of the OS and data on top. So data deduplication and compression must offer great savings. In our search, we found various utilities which we list later down. But we settled into reviewing two popular ones zbackup and attic . Another popular tool bup was considered but few things like unable to prune old versions was major point for us.

The main requirements were data deduplication, compression, easy to script with and encryption all in one tool. In this article, we will give a background on their usage on CentOS 7.1. We don’t plan on extensive evaluation of various other capabilities as we are looking for these basic features to be done well.

ZBackup

ZBackup describes itself as a globally-deduplicating backup tool originating its inspiration from bup and rsync tools. As you add more files to the archives, it will store duplicate regions once. It also supports AES encrypted files.

Installing ZBackup on CentOS 7.1

ZBackup is the easiest to install. Its available in the EPEL repos and you can simply do yum install zbackup.

Usage

The archive is called a repository in ZBackup.  Its nothing but a folder created for the tool to use where it stores its metadata and all the files added into it for backup.

First step is to initialize the folder , say zbak, with metadata folders.

zbackup init --non-encrypted /kvmbackup/zbak

If you need encryption, you can enable it with a key file.

Next is add files into it.

zbackup backup --non-encrypted --threads 8 --cache-size 1024mb /kvmbackup/zbak/backups/centos_nfs.img.20150721040100  < /virtual_machines/images/centos_nfs.img

It’ll take time for the first add. Subsequent add of the same source file name but with modified contents is generally faster.

Restore files as follows:

zbackup --non-encrypted restore zbak/backups/centos_nfs.img.20150721040100 > /virtual_machines/images/centos_nfs.img

Attic

Attic describes itself as a deduplicating backup program to provide an efficient and secure way to perform daily backups.

INSTALLING Attic ON CENTOS 7.1

Install Python 3.4 from EPEL

yum install python34
curl https://attic-backup.org/downloads/releases/0.14/Attic-0.14-linux-x86_64.tar.gz | tar -C /usr/local/share/ -zxf -
ln -s /usr/local/share/Attic-0.14-linux-x86_64/attic /usr/local/bin/attic

Usage

The archive is also called a repository in Attic.

First step is to initialize the folder , say atticrepo.attic, with metadata folders. In this case, attic can create the folder if it doesn’t exist.

attic init /kvmbackup/atticrepo.attic

Next is add files into it.

attic create --stats /KVMBACK/atticrepo.attic::20150721040100  /virtual_machines/images/centos_nfs.img <otherfiles if necessary>

Restore files as follows:

attic extract atticrepo.attic::20150721040100 virtual_machines/images/centos_nfs.img

One immediate quirk feature of Attic is destination directory can’t be specified as of version 0.14. It will extract it to the current directory but will maintain the original path.

This makes scripted use of this tool a little inconvenient. This feature seems to be on their todo list. But would hope its available sooner.

Which One to choose?

This is the subject of our next post. In the next part, we will compare the speeds of both these tools on backup and on restore path.

Other backup utilities we considered

  • bup
  • Duplicity
  • rsync
  • rdiff-backup
  • backula
  • ZFS  (filesystem, not tool)

Most were either lacking all features we were looking for or were too complex.  Do let us know your thoughts in the comments.

Also see comparison of backup tools on Ask Ubuntu for a list of desktop/GUI tools.