MariaDB/Galera Cluster
High Availability for database servers is a complicated field, with various considerations based on the usage scenario. As with anything non-trivial, the answer to "What should I use?" and "How should I configure ...?" is "It depends".
Traditional multi-master replication in mysql is brittle, so there are several approaches to making the service more robust. The one that I have spent time with is MariaDB using Galera for clustering. I put each node into a load-balanced pool managed by keepalived/LVS.
Installation
MariaDB provides yum repositories: https://yum.mariadb.org and other repository configuration. MariaDB 10.x, which is currently still Beta not GA anyway, does not yet include support for Galera clustering, so I'm using a 5.5.x version, for now.
Configuration
[mysqld] wsrep_slave_threads=4 innodb_buffer_pool_size=256M wsrep_cluster_name=nccs-galera max-allowed-packet=256M datadir=/var/lib/mysql wsrep_provider_options=gcache.size=1G thread_cache_size=32 wsrep_provider=/usr/lib64/galera/libgalera_smm.so
Mandatory settings: these settings are REQUIRED for proper cluster operation:
# Disable Query Cache, not supported with wsrep # https://dev.mysql.com/doc/refman/5.5/en/query-cache.html query_cache_size=0 query_cache_type=0 binlog_format=ROW default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 innodb_locks_unsafe_for_binlog=1 # innodb_doublewrite=1 - this is the default and it should stay this way innodb_file_per_table
Optional mysqld settings: your regular InnoDB tuning, etc.:
innodb_log_file_size=256M innodb_log_files_in_group=3 innodb_flush_log_at_trx_commit=2 # MyISAM # key-buffer-size = 32M myisam-recover = FORCE,BACKUP # SAFETY # max-connect-errors = 1000000 innodb = FORCE sql_mode=NO_ENGINE_SUBSTITUTION # CACHES AND LIMITS # tmp-table-size = 32M max-heap-table-size = 64M max-connections = 250 open-files-limit = 65535 table-definition-cache = 1024 table-open-cache = 2048 # LOGGING # log-error = /var/log/mysql/error.log log-queries-not-using-indexes = 0 slow-query-log = 0 slow-query-log-file = /var/log/mysql/slow.log
wsrep provider configuration: basic wsrep options:
bind-address=0.0.0.0 wsrep_cluster_address=gcomm://node1.example.com,node2.example.com,node3.example.com #wsrep_node_address= wsrep_node_name='node1.example.com' wsrep_sst_method=xtrabackup-v2 #wsrep_sst_auth is only required if using xtrabackup/mysqldump for sst wsrep_sst_auth=root:'rootpa$'
additional "frequently used" wsrep settings:
#wsrep_node_incoming_address='192.168.10.2' #wsrep_sst_donor='node3'
Some settings to consider if you're using SST:
[sst] streamfmt=xbstream [xtrabackup] compress compact parallel=4 compress-threads=4 rebuild-threads=4 [mysqldump] max_allowed_packet=512M
You'll want to look at the xtrabackup docs and perhaps this post about xtrabackup-2.1+.
State Snapshot Transfer
This is how MariaDB/Galera nodes catchup to the current state of the cluster, for example when a node has been rebuilt. Initially, I tried rsync as our SST method, with a HostbasedAuthentication over SSH, but it became clear that in order to take advantage of our existing load-balancing tooling, we would need to using xtrabackup for SST, because of SST via rsync blocks the donor node for too long.
xtrabackup-v2 required:
- a newer version of percona-xtrabackup, 2.1.6+
- a newer version of MariaDB-Galera-server/galera
- database permissions
grant reload, lock tables, replication client, create tablespace, super on *.* to sst_xtrabackup@'localhost' identified by '...';
Storage Note
One subtlety: we have 3 RAID volumes presented to the OS, one for the OS itself, one of SAS for certain databases, one of SSD for databases requiring more performance. We were thinking of:
/var/lib/mysql/foo -> /data/sas/foo /var/lib/mysql/bar -> /data/ssd/bar
When a SQL statement CREATE DATABASE foo is executed, mariadb gives an error if the directory that it wants to create to contain the db files cannot be created, such as if it already exists, even if it is a dangling symlink. Various possible, but inelegant solutions sprang to our minds involving wrapper scripts, but the elegant approach was to put all of /var/lib/mysql on the SAS disk and then when we want to create a db on the faster SSD, we can move it; and by hand will be ok, because it's only once we have determined that this db needs the extra performance.
/var/lib/mysql -> /data/sas/mysql /var/lib/mysql/bar -> /data/ssd/bar
Benchmarking
Benchmarking methodology is complicated and fraught with the chance of error and inaccuracy; interpreting and understanding the results is even more challenging.
Sysbench
However, sysbench can give us something to go on. You'll need to create a test database and grant access first. Here is an example, just by way of a jumping off point:
sysbench --test=oltp --db-driver=mysql --mysql-table-engine=innodb \ --oltp-table-size=1000000 --mysql-user=root \ --mysql-password='foo' --mysql-socket=/var/lib/mysql/mysql.sock \ prepare sysbench --test=oltp --db-driver=mysql --num-threads=16 --max-requests=250000 \ --oltp-table-size=1000000 --mysql-socket=/var/lib/mysql/mysql.sock \ --oltp-read-only --mysql-user=testuser --mysql-password='foo' run sysbench --test=oltp --db-driver=mysql --num-threads=16 --max-requests=250000 \ --oltp-table-size=1000000 --mysql-socket=/var/lib/mysql/mysql.sock \ --oltp-test-mode=complex --mysql-user=testuser --mysql-password='foo' run
mysqlslap
I used this cronjob to get output from mysql_test.sh, a wrapper I wrote around mysqlslap to run a series of tests and put the output into syslog. I then used Splunk toprocess the syslog entries and generate some pretty pictures. Changing the end of the pipe from '| logger' to '| nc -c mycarbon.example.com 2003' would enable us to get the output into graphite.
for c in $(seq 5 5 25) do for q in 50 500 5000 do mysql_test.sh -c $c -i 1 -q $q -H mariadb.example.com -f | logger -t dbtest done done
Architecture
Clients connect to the MariaDB cluster through a load balancer using keepalived. There are various strategy options to help ensure your load balancers know which nodes are able to answer queries:
- use a Galera-aware load balancer such as MaxScale or Galera Load Balancer (based on Pen, a generic TCP load-balancer)
- use a load balancer that you already have, such as HAProxy, and make it aware Galera aware.
- wsrep_sst_donor_rejects_queries = ON: Each MariaDB node is able to reject connections with this setting.
- wsrep_notify_cmd: a command to be executed each time the node state or cluster membership changes.
Management
Bootstrapping a new cluster
Ensure that the first host isn't trying to connect to an existing cluster so it will create a new one:
sed -i '/wsrep_cluster_address/s/gcomm:\/\/.*$/gcomm:\/\//' /etc/my.cnf
Start mysql:
mkdir -p /var/log/mysql mkdir -p /var/run/mysql chown mysql:mysql /var/log/mysql chown mysql:mysql /var/run/mysql mysql_install_db service mysql start mysql_secure_installation
Recorrect wsrep_cluster_address (perhaps by running your config mgmt tool). Test the first node! Perhaps start by netstat -ntap | grep mysql, connecting and looking at status variables, loading/checking some data.
Starting the cluster, if all nodes are down
If you can, it's better to first bring up the node that was up last.
Ensure that the first host isn't trying to connect to an existing cluster so it will create a new one and start mysql:
sed -i '/wsrep_cluster_address/s/gcomm:\/\/.*$/gcomm:\/\//' /etc/my.cnf service mysql start
Verify behaviour (or leave it to monitoring?).
Starting/Restarting an individual node
verify that the wsrep_cluster_address line in my.cnf is not empty and instead contains the other nodes of the cluster (this should be controlled by config mgmt):
service mysql start
Much of the mariadb configuration came from the codership wiki, /usr/share/mysql/ and the MariaDB/Galera docs.