MariaDB/Galera Cluster

High Availability for database servers is a complicated field, with various considerations based on the usage scenario. As with anything non-trivial, the answer to "What should I use?" and "How should I configure ...?" is "It depends".

Traditional multi-master replication in mysql is brittle, so there are several approaches to making the service more robust. The one that I have spent time with is MariaDB using Galera for clustering. I put each node into a load-balanced pool managed by keepalived/LVS.

Installation

MariaDB provides yum repositories: https://yum.mariadb.org and other repository configuration. MariaDB 10.x, which is currently still Beta not GA anyway, does not yet include support for Galera clustering, so I'm using a 5.5.x version, for now.

Configuration

[mysqld]

wsrep_slave_threads=4
innodb_buffer_pool_size=256M
wsrep_cluster_name=nccs-galera
max-allowed-packet=256M
datadir=/var/lib/mysql
wsrep_provider_options=gcache.size=1G
thread_cache_size=32
wsrep_provider=/usr/lib64/galera/libgalera_smm.so

Mandatory settings: these settings are REQUIRED for proper cluster operation:

# Disable Query Cache, not supported with wsrep
# https://dev.mysql.com/doc/refman/5.5/en/query-cache.html
query_cache_size=0
query_cache_type=0
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
# innodb_doublewrite=1 - this is the default and it should stay this way
innodb_file_per_table

Optional mysqld settings: your regular InnoDB tuning, etc.:

innodb_log_file_size=256M
innodb_log_files_in_group=3
innodb_flush_log_at_trx_commit=2

# MyISAM #
key-buffer-size = 32M
myisam-recover = FORCE,BACKUP

# SAFETY #
max-connect-errors = 1000000
innodb = FORCE
sql_mode=NO_ENGINE_SUBSTITUTION

# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 64M
max-connections = 250
open-files-limit = 65535
table-definition-cache = 1024
table-open-cache = 2048

# LOGGING #
log-error = /var/log/mysql/error.log
log-queries-not-using-indexes = 0
slow-query-log = 0
slow-query-log-file = /var/log/mysql/slow.log

wsrep provider configuration: basic wsrep options:

bind-address=0.0.0.0
wsrep_cluster_address=gcomm://node1.example.com,node2.example.com,node3.example.com
#wsrep_node_address=
wsrep_node_name='node1.example.com'
wsrep_sst_method=xtrabackup-v2
#wsrep_sst_auth is only required if using xtrabackup/mysqldump for sst
wsrep_sst_auth=root:'rootpa$'

additional "frequently used" wsrep settings:

#wsrep_node_incoming_address='192.168.10.2'
#wsrep_sst_donor='node3'

Some settings to consider if you're using SST:

[sst]
streamfmt=xbstream

[xtrabackup]
compress
compact
parallel=4
compress-threads=4
rebuild-threads=4

[mysqldump]
max_allowed_packet=512M

You'll want to look at the xtrabackup docs and perhaps this post about xtrabackup-2.1+.

State Snapshot Transfer

This is how MariaDB/Galera nodes catchup to the current state of the cluster, for example when a node has been rebuilt. Initially, I tried rsync as our SST method, with a HostbasedAuthentication over SSH, but it became clear that in order to take advantage of our existing load-balancing tooling, we would need to using xtrabackup for SST, because of SST via rsync blocks the donor node for too long.

xtrabackup-v2 required:

a newer version of percona-xtrabackup, 2.1.6+
a newer version of MariaDB-Galera-server/galera
database permissions

grant reload, lock tables, replication client, create tablespace, super on *.* to sst_xtrabackup@'localhost' identified by '...';

Storage Note

One subtlety: we have 3 RAID volumes presented to the OS, one for the OS itself, one of SAS for certain databases, one of SSD for databases requiring more performance. We were thinking of:

/var/lib/mysql/foo -> /data/sas/foo /var/lib/mysql/bar -> /data/ssd/bar

When a SQL statement CREATE DATABASE foo is executed, mariadb gives an error if the directory that it wants to create to contain the db files cannot be created, such as if it already exists, even if it is a dangling symlink. Various possible, but inelegant solutions sprang to our minds involving wrapper scripts, but the elegant approach was to put all of /var/lib/mysql on the SAS disk and then when we want to create a db on the faster SSD, we can move it; and by hand will be ok, because it's only once we have determined that this db needs the extra performance.

/var/lib/mysql -> /data/sas/mysql /var/lib/mysql/bar -> /data/ssd/bar

Benchmarking

Benchmarking methodology is complicated and fraught with the chance of error and inaccuracy; interpreting and understanding the results is even more challenging.

Sysbench

However, sysbench can give us something to go on. You'll need to create a test database and grant access first. Here is an example, just by way of a jumping off point:

sysbench --test=oltp --db-driver=mysql --mysql-table-engine=innodb \
--oltp-table-size=1000000 --mysql-user=root \
--mysql-password='foo' --mysql-socket=/var/lib/mysql/mysql.sock \
prepare

sysbench --test=oltp --db-driver=mysql --num-threads=16 --max-requests=250000 \
--oltp-table-size=1000000 --mysql-socket=/var/lib/mysql/mysql.sock \
--oltp-read-only --mysql-user=testuser  --mysql-password='foo' run

sysbench --test=oltp --db-driver=mysql --num-threads=16 --max-requests=250000 \
--oltp-table-size=1000000 --mysql-socket=/var/lib/mysql/mysql.sock \
--oltp-test-mode=complex --mysql-user=testuser  --mysql-password='foo' run

mysqlslap

I used this cronjob to get output from mysql_test.sh, a wrapper I wrote around mysqlslap to run a series of tests and put the output into syslog. I then used Splunk toprocess the syslog entries and generate some pretty pictures. Changing the end of the pipe from '| logger' to '| nc -c mycarbon.example.com 2003' would enable us to get the output into graphite.

for c in $(seq 5 5 25)
do for q in 50 500 5000
    do
      mysql_test.sh -c $c -i 1 -q $q -H mariadb.example.com -f | logger -t dbtest
    done
done

Architecture

Clients connect to the MariaDB cluster through a load balancer using keepalived. There are various strategy options to help ensure your load balancers know which nodes are able to answer queries:

use a Galera-aware load balancer such as MaxScale or Galera Load Balancer (based on Pen, a generic TCP load-balancer)
use a load balancer that you already have, such as HAProxy, and make it aware Galera aware.
wsrep_sst_donor_rejects_queries = ON: Each MariaDB node is able to reject connections with this setting.
wsrep_notify_cmd: a command to be executed each time the node state or cluster membership changes.

Management

Bootstrapping a new cluster

Ensure that the first host isn't trying to connect to an existing cluster so it will create a new one:

sed -i '/wsrep_cluster_address/s/gcomm:\/\/.*$/gcomm:\/\//' /etc/my.cnf

Start mysql:

mkdir -p /var/log/mysql
mkdir -p /var/run/mysql
chown mysql:mysql /var/log/mysql
chown mysql:mysql /var/run/mysql
mysql_install_db
service mysql start
mysql_secure_installation

Recorrect wsrep_cluster_address (perhaps by running your config mgmt tool). Test the first node! Perhaps start by netstat -ntap | grep mysql, connecting and looking at status variables, loading/checking some data.

Starting the cluster, if all nodes are down

If you can, it's better to first bring up the node that was up last.
Ensure that the first host isn't trying to connect to an existing cluster so it will create a new one and start mysql:
```
sed -i '/wsrep_cluster_address/s/gcomm:\/\/.*$/gcomm:\/\//' /etc/my.cnf
service mysql start
```

Verify behaviour (or leave it to monitoring?).

Starting/Restarting an individual node

verify that the wsrep_cluster_address line in my.cnf is not empty and instead contains the other nodes of the cluster (this should be controlled by config mgmt):

service mysql start

Much of the mariadb configuration came from the codership wiki, /usr/share/mysql/ and the MariaDB/Galera docs.

AllGoodBits.org