15.4. Replication Implementation Overview

MySQL replication is based on the master server keeping track of all changes to your databases (updates, deletes, and so on) in its binary logs. Therefore, to use replication, you must enable binary logging on the master server. See Section 5.2.3, “The Binary Log”.

Each slave server receives from the master the saved updates that the master has recorded in its binary log, so that the slave can execute the same updates on its copy of the data.

It is extremely important to realize that the binary log is simply a record starting from the fixed point in time at which you enable binary logging. Any slaves that you set up need copies of the databases on your master as they existed at the moment you enabled binary logging on the master. If you start your slaves with databases that are not in the same state as those on the master when the binary log was started, your slaves are quite likely to fail.

After the slave has been set up with a copy of the master's data, it connects to the master and waits for updates to process. If the master fails, or the slave loses connectivity with your master, the slave keeps trying to connect periodically until it is able to resume listening for updates. The --master-connect-retry option controls the retry interval. The default is 60 seconds.

Each slave keeps track of where it left off when it last read from its master server. The master has no knowledge of how many slaves it has or which ones are up to date at any given time.

15.4.1. Replication Implementation Details

MySQL replication capabilities are implemented using three threads (one on the master server and two on the slave). When a START SLAVE statement is issued on a slave server, the slave creates an I/O thread, which connects to the master and asks it to send the updates recorded in its binary logs. The master creates a thread to send the binary log contents to the slave. This thread can be identified as the Binlog Dump thread in the output of SHOW PROCESSLIST on the master. The slave I/O thread reads the updates that the master Binlog Dump thread sends and copies them to local files, known as relay logs, in the slave's data directory. The third thread is the SQL thread, which the slave creates to read the relay logs and to execute the updates they contain.

MySQL Enterprise.  For constant monitoring of the status of slaves subscribe to the MySQL Enterprise Monitor. For more information see http://www.mysql.com/products/enterprise/advisors.html.

In the preceding description, there are three threads per master/slave connection. A master that has multiple slaves creates one thread for each currently-connected slave, and each slave has its own I/O and SQL threads.

The slave uses two threads so that reading updates from the master and executing them can be separated into two independent tasks. Thus, the task of reading statements is not slowed down if statement execution is slow. For example, if the slave server has not been running for a while, its I/O thread can quickly fetch all the binary log contents from the master when the slave starts, even if the SQL thread lags far behind. If the slave stops before the SQL thread has executed all the fetched statements, the I/O thread has at least fetched everything so that a safe copy of the statements is stored locally in the slave's relay logs, ready for execution the next time that the slave starts. This enables the master server to purge its binary logs sooner because it no longer needs to wait for the slave to fetch their contents.

The SHOW PROCESSLIST statement provides information that tells you what is happening on the master and on the slave regarding replication. See Section 7.5.5, “Examining Thread Information”, for descriptions of all replicated-related states.

The following example illustrates how the three threads show up in the output from SHOW PROCESSLIST.

On the master server, the output from SHOW PROCESSLIST looks like this:

mysql> SHOW PROCESSLIST\G
*************************** 1. row ***************************
     Id: 2
   User: root
   Host: localhost:32931
     db: NULL
Command: Binlog Dump
   Time: 94
  State: Has sent all binlog to slave; waiting for binlog to
         be updated
   Info: NULL

Here, thread 2 is a Binlog Dump replication thread for a connected slave. The State information indicates that all outstanding updates have been sent to the slave and that the master is waiting for more updates to occur. If you see no Binlog Dump threads on a master server, this means that replication is not running — that is, that no slaves are currently connected.

On the slave server, the output from SHOW PROCESSLIST looks like this:

mysql> SHOW PROCESSLIST\G
*************************** 1. row ***************************
     Id: 10
   User: system user
   Host:
     db: NULL
Command: Connect
   Time: 11
  State: Waiting for master to send event
   Info: NULL
*************************** 2. row ***************************
     Id: 11
   User: system user
   Host:
     db: NULL
Command: Connect
   Time: 11
  State: Has read all relay log; waiting for the slave I/O
         thread to update it
   Info: NULL

This information indicates that thread 10 is the I/O thread that is communicating with the master server, and thread 11 is the SQL thread that is processing the updates stored in the relay logs. At the time that the SHOW PROCESSLIST was run, both threads were idle, waiting for further updates.

The value in the Time column can show how late the slave is compared to the master. See Section 15.3.4, “Replication FAQ”.

15.4.2. Replication Relay and Status Files

By default, relay logs filenames have the form host_name-relay-bin.nnnnnn, where host_name is the name of the slave server host and nnnnnn is a sequence number. Successive relay log files are created using successive sequence numbers, beginning with 000001. The slave uses an index file to track the relay log files currently in use. The default relay log index filename is host_name-relay-bin.index. By default, the slave server creates relay log files in its data directory. The default filenames can be overridden with the --relay-log and --relay-log-index server options. See Section 15.1.2, “Replication Startup Options and Variables”.

Relay logs have the same format as binary logs and can be read using mysqlbinlog. The SQL thread automatically deletes each relay log file as soon as it has executed all events in the file and no longer needs it. There is no explicit mechanism for deleting relay logs because the SQL thread takes care of doing so. However, FLUSH LOGS rotates relay logs, which influences when the SQL thread deletes them.

A slave server creates a new relay log file under the following conditions:

  • Each time the I/O thread starts.

  • When the logs are flushed; for example, with FLUSH LOGS or mysqladmin flush-logs.

  • When the size of the current relay log file becomes too large. The meaning of “too large” is determined as follows:

    • If the value of max_relay_log_size is greater than 0, that is the maximum relay log file size.

    • If the value of max_relay_log_size is 0, max_binlog_size determines the maximum relay log file size.

A slave replication server creates two additional small files in the data directory. These status files are named master.info and relay-log.info by default. Their names can be changed by using the --master-info-file and --relay-log-info-file options. See Section 15.1.2, “Replication Startup Options and Variables”.

The two status files contain information like that shown in the output of the SHOW SLAVE STATUS statement, which is discussed in Section 12.6.2, “SQL Statements for Controlling Slave Servers”. Because the status files are stored on disk, they survive a slave server's shutdown. The next time the slave starts up, it reads the two files to determine how far it has proceeded in reading binary logs from the master and in processing its own relay logs.

The I/O thread updates the master.info file. The following table shows the correspondence between the lines in the file and the columns displayed by SHOW SLAVE STATUS.

LineDescription
1Number of lines in the file
2Master_Log_File
3Read_Master_Log_Pos
4Master_Host
5Master_User
6Password (not shown by SHOW SLAVE STATUS)
7Master_Port
8Connect_Retry
9Master_SSL_Allowed
10Master_SSL_CA_File
11Master_SSL_CA_Path
12Master_SSL_Cert
13Master_SSL_Cipher
14Master_SSL_Key

The SQL thread updates the relay-log.info file. The following table shows the correspondence between the lines in the file and the columns displayed by SHOW SLAVE STATUS.

LineDescription
1Relay_Log_File
2Relay_Log_Pos
3Relay_Master_Log_File
4Exec_Master_Log_Pos

The contents of the relay-log.info file and the states shown by the SHOW SLAVE STATES command may not match if the relay-log.info file has not been flushed to disk. Ideally, you should only view relay-log.info on a slave that is offline (i.e. mysqld is not running). For a running system, SHOW SLAVE STATUS should be used.

When you back up the slave's data, you should back up these two status files as well, along with the relay log files. They are needed to resume replication after you restore the slave's data. If you lose the relay logs but still have the relay-log.info file, you can check it to determine how far the SQL thread has executed in the master binary logs. Then you can use CHANGE MASTER TO with the MASTER_LOG_FILE and MASTER_LOG_POS options to tell the slave to re-read the binary logs from that point. Of course, this requires that the binary logs still exist on the master server.

If your slave is subject to replicating LOAD DATA INFILE statements, you should also back up any SQL_LOAD-* files that exist in the directory that the slave uses for this purpose. The slave needs these files to resume replication of any interrupted LOAD DATA INFILE operations. The directory location is specified using the --slave-load-tmpdir option. If this option is not specified, the directory location is the value of the tmpdir system variable.

15.4.3. How Servers Evaluate Replication Rules

If a master server does not write a statement to its binary log, the statement is not replicated. If the server does log the statement, the statement is sent to all slaves and each slave determines whether to execute it or ignore it.

On the master side, decisions about which statements to log are based on the --binlog-do-db and --binlog-ignore-db options that control binary logging. For a description of the rules that servers use in evaluating these options, see Section 5.2.3, “The Binary Log”.

On the slave side, decisions about whether to execute or ignore statements received from the master are made according to the --replicate-* options that the slave was started with. (See Section 15.1.2, “Replication Startup Options and Variables”.) The slave evaluates these options using the following procedure, which first checks the database-level options and then the table-level options.

In the simplest case, when there are no --replicate-* options, the procedure yields the result that the slave executes all statements that it receives from the master. Otherwise, the result depends on the particular options given. In general, to make it easier to determine what effect an option set will have, it is recommended that you avoid mixing “do” and “ignore” options, or wildcard and non-wildcard options.

Stage 1. Check the database options.

At this stage, the slave checks whether there are any --replicate-do-db or --replicate-ignore-db options that specify database-specific conditions:

  • No: Permit the statement and proceed to the table-checking stage.

  • Yes: Test the options using the same rules as for the --binlog-do-db and --binlog-ignore-db options to determine whether to permit or ignore the statement. What is the result of the test?

    • Permit: Do not execute the statement immediately. Defer the decision and proceed to the table-checking stage.

    • Ignore: Ignore the statement and exit.

This stage can permit a statement for further option-checking, or cause it to be ignored. However, statements that are permitted at this stage are not actually executed yet. Instead, they pass to the following stage that checks the table options.

Stage 2. Check the table options.

First, as a preliminary condition, the slave checks whether the statement occurs within a stored function or (prior to MySQL 5.0.12) a stored procedure. If so, execute the statement and exit. (Stored procedures are exempt from this test as of MySQL 5.0.12 because procedure logging occurs at the level of statements that are executed within the routine rather than at the CALL level.)

Next, the slave checks for table options and evaluates them. If the server reaches this point, it executes all statements if there are no table options. If there are “do” table options, the statement must match one of them if it is to be executed; otherwise, it is ignored. If there are any “ignore” options, all statements are executed except those that match any ignore option. The following steps describe how this evaluation occurs in more detail.

  1. Are there any --replicate-*-table options?

    • No: There are no table restrictions, so all statements match. Execute the statement and exit.

    • Yes: There are table restrictions. Evaluate the tables to be updated against them. There might be multiple tables to update, so loop through the following steps for each table looking for a matching option (first the non-wild options, and then the wild options). Only tables that are to be updated are compared to the options. For example, if the statement is INSERT INTO sales SELECT * FROM prices, only sales is compared to the options). If several tables are to be updated (multiple-table statement), the first table that matches “do” or “ignore” wins. That is, the server checks the first table against the options. If no decision could be made, it checks the second table against the options, and so on.

  2. Are there any --replicate-do-table options?

    • No: Proceed to the next step.

    • Yes: Does the table match any of them?

      • No: Proceed to the next step.

      • Yes: Execute the statement and exit.

  3. Are there any --replicate-ignore-table options?

    • No: Proceed to the next step.

    • Yes: Does the table match any of them?

      • No: Proceed to the next step.

      • Yes: Ignore the statement and exit.

  4. Are there any --replicate-wild-do-table options?

    • No: Proceed to the next step.

    • Yes: Does the table match any of them?

      • No: Proceed to the next step.

      • Yes: Execute the statement and exit.

  5. Are there any --replicate-wild-ignore-table options?

    • No: Proceed to the next step.

    • Yes: Does the table match any of them?

      • No: Proceed to the next step.

      • Yes: Ignore the statement and exit.

  6. No --replicate-*-table option was matched. Is there another table to test against these options?

    • No: We have now tested all tables to be updated and could not match any option. Are there --replicate-do-table or --replicate-wild-do-table options?

      • No: There were no “do” table options, so no explicit “do” match is required. Execute the statement and exit.

      • Yes: There were “do” table options, so the statement is executed only with an explicit match to one of them. Ignore the statement and exit.

    • Yes: Loop.

Examples:

  • No --replicate-* options at all

    The slave executes all statements that it receives from the master.

  • --replicate-*-db options, but no table options

    The slave permits or ignores statements using the database options. Then it executes all statements permitted by those options because there are no table restrictions.

  • --replicate-*-table options, but no database options

    All statements are permitted at the database-checking stage because there are no database conditions. The slave executes or ignores statements based on the table options.

  • A mix of database and table options

    The slave permits or ignores statements using the database options. Then it evaluates all statements permitted by those options according to the table options. In some cases, this process can yield what might seem a counterintuitive result. Consider the following set of options:

    [mysqld]
    replicate-do-db    = db1
    replicate-do-table = db2.mytbl2
    

    Suppose that db1 is the default database and the slave receives this statement:

    INSERT INTO mytbl1 VALUES(1,2,3);
    

    The database is db1, which matches the --replicate-do-db option at the database-checking stage. The algorithm then proceeds to the table-checking stage. If there were no table options, the statement would be executed. However, because the options include a “do” table option, the statement must match if it is to be executed. The statement does not match, so it is ignored. (The same would happen for any table in db1.)