Jesse's Software Engineering Blog
MySQL Master-Slave Replication
MySQL replication is the act of actively cloning all changes made on one server (master) to another server(s) (slaves). This is commonly done to create backups of data as well as to add multiple servers in which applications can query against. This becomes advantageous in high read environments where there is concern for bogging down the main MySQL server. There are various different replication topologies, this article will focus on the master-slave configuration.
MySQL replication is based on the binary log. The master server writes “events” to it’s binary log which slaves are then configured to read from. Once binary logging has been turned on, the master will write all events and have no interest or knowledge in whom is reading from it’s log. Written events only consist of changes that have been made to tables in databases; therefore, routine selection queries are not recorded. Events are written to the log immediately after a statement or transaction completes but before any locks are released. Although enabling binary logging will cause a slight decrease in performance, the ability to replicate and monitor from the binary log outweighs the downside of performance loss.
One of the most important settings for a binary log is the format in which the log is written: statement vs row formatting.
Statement Based – Statement based formatting works by recording the query that changed the data. Statement based is simple to implement and keeps binary logs relatively small. This also makes auditing easier as all the queries are written directly into the log. The downside is there are A LOT of functions that statement based logging does not support, and complex queries can cause errors on the slave.
Row Based – Row based formatting records the actual data changes to the binary log. Row based logging is a standard approach used by most RDBMS. Row based can replicate all queries correctly and some statements will be replicated more efficiently. Row based also offers fewer locks and higher concurrency. Row based replication is generally a better all around approach. However, since data changes are logged opposed to queries, the binary logs can get very large if not managed correctly which can lead to adverse performance issues.
For a more detailed comparison of statement vs. row based binary log formatting:
There are numerous other configurations that can be used to configure a master’s binary log, such as how big log files are, which databases/tables are written, etc. For a complete list:
MySQL Master Configurations
In order to set up a MySQL server as a master, binary logging will need to be enabled. To see the status of binary logging:
SHOW MASTER STATUS \G
If no data comes back from the query then binary logging has not been turned on. Turning binary logging on will require a server restart. Here are common configurations for a MySQL master server that can be placed in the MySQL conf file (my.cnf):
[mysqld] log-bin = master-bin log-bin-index = master-bin.index server-id = 10 innodb_flush_log_at_trx_commit = 1 innodb_support_xa = 1 sync_binlog = 1 binlog-format = ROW expire_logs_days = 14
log-bin – This is what actually turns MySQL binary logging on. It is important to give a file name value (full path ok too) for the binary log instead of relying on the default which can cause complications in more complex replication strategies.
log-bin-index – The index file for binary log names. Also best to supply a base name opposed to relying on the default
server-id – Every server involved in replication needs it’s own unique server id. Never rely on the defaults
innodb_flush_log_at_trx_commit – Controls how often the log buffer is written to the log. A value of 1 forces a write and flush after every transaction commit.
innodb_support_xa – Ensures that transactions are written to the log in the same order that they are executed in
sync_binlog – How often the log is synced with disk. A value of 1 ensures each transaction is synced to disk
binlog-format – This is the format of the log, STATEMENT or ROW, as discussed earlier. There is also MIXED, which defaults to STATEMENT unless it is a query that only ROW can handle in which case MySQL will switch between the two.
expire_logs_days – Length in which binary logs are stored. By default they are stored indefinitely unless manually cleared. In a situation where the log has been removed and replication dies, replication can be restarted by doing a SQL dump and current log position, don’t need to save all the logs.
After adding the variables to the conf file and restarting the server, the SHOW MASTER STATUS will show data:
... File: master-bin.000001 Position: 120 ...
This shows the current binary log file name and the position within the binary log, both of which are needed when setting up slave servers.
In order for slaves to access a master’s binary log they need to have users. On the master server create the appropriate user. Notice the % can be used as a wildcard. The following creates a slave user from an internal IP address to access the master without a password. Although the REPLICATION CLIENT privilege is not needed by the slave, it is useful to grant the privilege now to allow for monitoring and slave-master swapping if the need arises. NOTE: This is an example, any production set up should be more secure.
CREATE USER 'slave'@'10.2.2.%' IDENTIFIED BY ''; GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'slave'@'10.2.2.%'; FLUSH PRIVILEGES;
To verify slave privileges, ssh onto the slave server and run the following (replace IP with master’s IP):
mysql -h 10.2.2.2 -u slave
MySQL Slave Configurations
At this point the master slave should have binary logging enabled as well as have a slave user in which the slave can use to access the master’s binary log. There are a couple different scenarios when setting up the slave server.
In the simplest scenario the master server will be a fresh install with no existing data nor be actively queried against. First step is to add the correct configurations to the slave’s configuration file (my.cnf):
[mysqld] server-id = 20 skip-slave-start = true read_only = true relay-log = /var/log/mysql/slave-relay-bin slave_parallel_workers = 2
skip-slave-start – This controls whether replication auto starts on server boot. In the event of a server crash it is best to start replication manually when you are ready, not have the server start automatically
read_only – In no situation should a slave be written to, this will break replication. With this configuration SUPER users will still be able to write but no other users will be able to.
relay-log – The relay log is similar to the binary log in the sense that it writes events, however the relay log can not be used by another slave for replication purposes. Like the binary log, the name of the rely log should not be left to default. NOTE: The directory structure must be created and the mysql folder should be owned by the MySQL user.
slave_parallel_workers – As of 5.6.3 MySQL allows for concurrent slave workers. This helps alleviate the replication delay caused by heavy writes and a single slave thread handling replication. This should be used, with an exact number to be determined by the slave server and overall architecture.
NOTE: This slave server does not set up the binary log for itself. In certain circumstances it is beneficial to have a slave write to its own binary log i.e. in case it may need to become a master or have other slaves are reading from it.
Now that the slave has been configured run the following in MySQL:
SHOW SLAVE STATUS \G
There is no data because the slave has not been hooked up to the master yet. Before the slave can be linked to the master, the log file and position is needed from the master. If the master is not currently in use and has no existing data, running SHOW MASTER STATUS, as outlined above, will give the necessary information. Once obtained, run the following on the slave:
CHANGE MASTER TO MASTER_HOST = '10.2.2.2', MASTER_USER = 'slave', MASTER_PASSWORD = '', MASTER_LOG_FILE = 'master-bin.000001', MASTER_LOG_POS = 120;
Running SHOW SLAVE STATUS should now output data about the slave:
... Slave_IO_State: Master_Host: 10.2.2.2 Master_User: slave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 120 Slave_IO_Running: No Slave_SQL_Running: No ...
Some important output vars to note:
Master_Host – The master which slave is connected to
Slave_IO_Running – Whether the slave IO thread is on
Slave_SQL_Running – Whether the slave SQL thread is on
Last_Error – Shows last error while processing the relay log, should be blank
Seconds_Behind_Master – How far behind slave is from master
Slave_SQL_Running_State – Current status of the slave
To start the slave:
Running SHOW SLAVE STATUS should now show the slave as running and have no output errors. Slave verification can also be run via the master by logging onto the master server and running:
SHOW PROCESSLIST \G
There should be an entry showing the slave user being connected:
... User: slave Host: slave:48326 Command: Binlog Dump ...
To manually verify replication, run the following on master:
CREATE DATABASE rep; CREATE TABLE rep.test (id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(100)); INSERT INTO rep.test (name) VALUES (‘Jeff’), (‘Joe’);
Then back on the slave server:
SELECT * FROM rep.test;
All data should be present. If you need to stop replication:
Slave to existing Master
A slightly more complex scenario occurs when the master server has existing data and is currently in use. In order to set up replication the master must be locked to determine the log coordinates as well as capture the current data. Once that is completed the master can be unlocked and the data can then be imported into the slave. Finally the slave can start replicating at the point at which the data has stopped.
NOTE: This was written for InnoDB to InnoDB replication. Setting up replication between different database types may require additional steps and/or considerations.
Let’s assume that the above outlined master configurations have been completed: binary logging has been enabled and the slave user has been created. Also assume the following data exists in the master server and that the master server is currently running.
To begin, open two connections to the master server. In the first tab, log into MySQL server with SUPER privileges and run the following command to ensure that no data can be written to any tables:
FLUSH TABLES WITH READ LOCK;
As long as the connection is active, the lock will remain. In the other tab, dump the MySQL data from the server:
mysqldump -u root --all-databases --master-data > ~/master-data.sql
The –master-data option will ensure the the master binary log coordinates are included in the SQL dump. Once the dump is completed, return to the connection that is holding the lock and release:
After unlocking the tables enter some new data. This will demonstrate the master getting new data while you are in the process of setting up replication:
INSERT INTO replication.users (name) VALUES ('Daryl');
Now that the data has been collected and the locks released the second connection tab can be closed. Opening the SQL dump shows the following lines:
... -- -- Position to start replication or point-in-time recovery from -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=1206; ...
These configurations will auto set the slave to the correct position in the master binary log when the SQL dump is imported into the slave server. Next copy the dump to the slave server:
scp ~/master-data.sql email@example.com:~
Now on the slave server, update the my.cnf conf file and restart the server:
[mysqld] server-id = 20 skip-slave-start = true read_only = true relay-log = /var/log/mysql/slave-relay-bin slave_parallel_workers = 2
Next, connect to the master just as before except without specifying the MASTER_LOG_FILE or MASTER_LOG_POS since those are defined in the SQL dump:
CHANGE MASTER TO MASTER_HOST = '10.2.2.2', MASTER_USER = 'slave', MASTER_PASSWORD = '';
At this point SHOW SLAVE STATUS should display no errors and have all the master info except the binary log name and position. Next import the SQL dump:
mysql -u root < ~/master-data.sql
And verify the data was imported:
SELECT * FROM replication.users; +----+--------+ | id | name | +----+--------+ | 1 | Jodi | | 2 | Rick | | 3 | Carl | | 4 | Judith | +----+--------+
As expected we only see the data from the dump, not the data that was added after. Finally start the replication, verify that replication is running, and select the data again:
START SLAVE; SHOW SLAVE STATUS; SELECT * FROM replication.users; +----+--------+ | id | name | +----+--------+ | 1 | Jodi | | 2 | Rick | | 3 | Carl | | 4 | Judith | | 5 | Daryl | +----+--------+
The replication picked up where it should have and the new data has been successfully replicated into the slave. Also notice in the output from the SHOW SLAVE STATUS, the master binary coordinates were correctly imported via the dump:
... Master_Log_File: mysql-bin.000006 Read_Master_Log_Pos: 1887 ...
MySQL replication is essential to any system architecture and can provide numerous performance benefits as well as back up capabilities. It is important to note that replication should not be the only data backup system in place but it can help augment other systems. For large applications, numerous slaves can be set up and connections can be pooled and balanced amongst the various servers to prevent any single server from being overloaded. This will also shift load from the master write server to a distributed slave read network. There are other replication topologies, such as master-master, as well as numerous system set ups to help maximize performance and availability from a system stack.