Synchronization of slaves in MySQL replication

When I started to use replication (on few tables) I encountered some problems with discrepancy on slave – it was mainly due to filters on slaves. I had to change some queries and everything was almost fine. Soon, I expanded my topology and started to replicate all databases from master. Simultenously, I read in the book
“High Performance MySQL” that after setting up replication the tasks related with monitoring and administrating are common occupation. It is true.

I decided to use free solution for synchronization but now I would prefere to write my own solution for this task. I try to describe my idea:
– technicaly, written in c# (due to excperience in this technology)  working as multithreaded windows service.
– service gets slaves from command “SHOW SLAVE HOSTS”;
– service gets area of replication from slaves (tables/databases);
– service analyses tables and choose tables which can be checked continuously. Such option is available for each table having timestamp collumn with MySQL’s trigger “on update CURRENT_TIMESTAMP” and default value on update CURRENT_TIMESTAMP.
– service analyses rest of tables and qualify them to daily, weekly synchronization, also sends report to DBA which tables cannot by checked / synchronize.
– service analyses count of rows for tables qualified for continuous synchronization.
– service finds the difference and synchronizes each slave separately (not through replication but directly on concrete slave).

But let’s go back to the subject. I divided my process into two threads.
First thread is continuous, second daily sychronization. Obviously, it depends on your needs which tables are to be monitored non-stop. Each thread constists of three phases:

Phase 1: Tests

    /usr/bin/perl mk-table-checksum --set-vars="@REPLICATION_SYS:=1" \
    --replicate=replman.checksum_continuous --arg-table=replman.checksum_continuous_args \
    --quiet localhost -S /tmp/mysql.sock  \
    -u SomeUser -p SomePassword

    As you see I used app mk-table-checksum for this phase. This app executes SQL commands in order to reveal possible inconsistency on slaves (it runs command on master so it is replicated to slaves, then slaves’ this_* columns are updated).
    Options:

    • –arg-table: indicates the table which contains tables and conditions which are to be used.
    • –replicate: table used as result set of SQL commands;
    • –set-vars: this is only to set SQL variable – but is very important. My problem was that when I put into –arg-table where condition e.g.:
      WHERE `update_timestamp` > DATE_FORMAT( current_timestamp - interval 5 MINUTE, '%Y-%m-%d %H:%i:00');

      the condition is not copied into –replicate table and when some unconformity is revealed the app from phase 3 tries to synchronize whole table. That is why I put the trigger which fulfills “where” column which is later used in phase 3.

    Phase 2: Check
    The only difference from phase 1 is option –replicate-check=1. Thanks to that I receive information which parts of tables are different (count or/and CRC). I save the result of this file and send it to my email (If there is no difference the result is empty). You can ommit this phase – it is only information. I would like to add that I check if there is no lag on slave. In case the lag exists I wait till there is no lag – this is my example:

    ../bin/replication.sh "Slave01.domain" "3306";
    if [ "$INT_RES" -eq "1" ]; then
        echo "Slave does not work [$INT_RES]";
    else
    if [ "$INT_RES" -gt "1" ]; then
        echo "Slave has lags [$INT_RES] [sleeping]";
        while [ "$INT_RES" -gt "1" ]; do
            sleep $INT_RES;
            ../bin/replication.sh "Slave01.domain" "3306";
            INT_RES=$?;
        done;
        echo "["`date`"] No more lags";
    fi
    

    I used my script which is responsible for checking of replication status:

    #!/bin/bash
    
    HOST="$1";
    PORT="$2";
    
    if [ -z $HOST ]; then
        /bin/echo "[Error] no host given (first argument)";
        exit 9;
    fi
    if [ -z $PORT ]; then
        PORT='3306';
    fi
    
    PRINT_RESULT=0;
    RESULT="";
    if [ "$3" = "print" ]; then
        PRINT_RESULT=1;
    fi
    SCRIPT_NAME=`basename ${0%.*}`;
    
    TMP_ERROR_DIR="/tmp/";
    DIR_REPORT="/var/log/overseer/";
    ERR_FILE="$DIR_REPORT""Mysql_"$HOST".Replication";
    LAG_ERR_FILE="$DIR_REPORT""Mysql_"$HOST".ReplicationLag";
    
    MYSQL_HOMEDIR="/usr/local/services/mysql/bin/";
    MYSQL_CMD="mysql";
    
    MY_PASSWD="YourPassword";
    MY_USER="YourUser";
    
    SLAVE_INFO=$($MYSQL_HOMEDIR$MYSQL_CMD -u$MY_USER --password=$MY_PASSWD -h $HOST --port=$PORT -e "show slave status\G" 2>$TMP_ERROR_DIR$SCRIPT_NAME$HOST);
    SLAVE_ERR=$(/bin/cat < $TMP_ERROR_DIR$SCRIPT_NAME$HOST);
    
    if [ -n "$SLAVE_ERR" ]; then
            /bin/echo $SLAVE_ERR  > $ERR_FILE;
            exit 1;
    fi
    
    SLAVE_IO=$(/bin/echo $SLAVE_INFO  | /bin/grep -o -i -e '\(Slave_IO_Running:[[:space:]]\)\([A-Za-z]\{3\}\)'  | /bin/awk -F " "  '{print $2 }');
    SLAVE_SQL=$(/bin/echo $SLAVE_INFO | /bin/grep -o -i -e '\(Slave_SQL_Running:[[:space:]]\)\([A-Za-z]\{3\}\)' | /bin/awk -F " "  '{print $2 }');
    
    SLAVE_IO2=$(/bin/echo $SLAVE_INFO  | /bin/grep -o -i -e "\(Slave_IO_Running:[[:space:]]\)");
    
    if  [ "$SLAVE_IO" = "Yes" ] && [ "$SLAVE_SQL" = "Yes" ]; then
            SLAVE_LAG=$(/bin/echo $SLAVE_INFO  | /bin/grep -o -i -e '\(Seconds_Behind_Master:[[:space:]]\)\([0-9]\{0,10\}\)' | /bin/awk -F " "  '{print $2 }');
            SLAVE_LAG=$((0 +  $SLAVE_LAG));
            if [ -f "$ERR_FILE" ]; then
                /bin/rm "$ERR_FILE";
            fi
            if [ "$SLAVE_LAG" -gt "0" ]; then
                /bin/echo $SLAVE_LAG  > $LAG_ERR_FILE;
                exit $SLAVE_LAG;
            fi
            RESULT="Slave is working correctly.";
            # echo "Status:  1|MySQL Replication SQL [OK]" > $ERR_FILE;
        elif [ "$SLAVE_IO" = "Yes" ] && [ "$SLAVE_SQL" = "No" ]; then
            RESULT="SQL thread does not work.";
            /bin/echo "$RESULT" > $ERR_FILE;
            exit 1;
        elif [ "$SLAVE_IO" = "No" ] && [ "$SLAVE_SQL" = "Yes" ]; then
            RESULT="IO thread does not work.";
            /bin/echo "$RESULT" > $ERR_FILE;
            exit 1;
        else
            RESULT="Neither SQL nor IO thread does work.";
            /bin/echo "$RESULT" > $ERR_FILE;
            exit 1;
    fi;
    
    exit 0;
    
    

    Phase 3: Synchronization of data.
    I use app mk-table-sync with option –sync-to-master. My example is as folows:

    /usr/bin/perl ./mk-table-sync --print --pid /tmp/sync.cont.pid 
    --charset utf8 --nobuffer-to-client --noindex-hint \
    --nocheck-privileges --replicate replman.checksum_continuous \
    --set-vars="@REPLMAN_CONTINUOUS:=1" \
    --sync-to-master Slave01.host:3307 -u YourReplmanUser -p "YourReplmanPass" \
    > ./SomeFile.tmp;
    

    Please, take attention to 2 options which are the basis for operation (and not standard):
    a) –print: it means I only print SQL command and direct’em to file ./SomeFile.tmp;
    Firstly I describe the problem that I had to solve, i.e. in some circumstances standard way of using this script caused unwanted DELETE operations (in my continuous replication) – it was due to the fact that app uses replication to propagate changes.
    My solution was to create trigger BEFORE DELETE on MASTER which checks whether variable @REPLMAN_CONTINUOS is set to 1. If it is true, trigger tries to update non existing update – it sounds like madness? Not exactly, thanks to this in case a record exists it will not delete it. In case there is no such record on master the DELETE command will replicate to SLAVE.
    Unfortunately, the synchronization of data is executed through REPLACE command and problem is that REPLACE is for triggers nothing more then sequence of INSERT and DELETE commands – considering the fact thath BEFORE DELETE is trigger which do not allow for such operation – so I had to divide my data into REPLACE AND DELETE commands. All DELETEs are preceded @REPLMAN_CONTINUOS variable set to 1.

    Now it works fine.

  1. No comments yet.

  1. No trackbacks yet.