{"id":223,"date":"2010-11-19T15:14:21","date_gmt":"2010-11-19T15:14:21","guid":{"rendered":"http:\/\/10sa.com\/sql_stories\/?p=223"},"modified":"2011-02-19T13:38:50","modified_gmt":"2011-02-19T13:38:50","slug":"synchronization-of-slaves-in-mysql-replication","status":"publish","type":"post","link":"http:\/\/10sa.com\/sql_stories\/?p=223","title":{"rendered":"Synchronization of slaves in MySQL replication"},"content":{"rendered":"<p>When I started to use replication (on\u00a0few tables)\u00a0I encountered some problems with discrepancy on slave &#8211; it was mainly due to filters on slaves. I had to change some queries and everything was almost fine. Soon, I expanded my topology and started to replicate all databases from master. Simultenously, I read in the book<br \/>\n&#8220;High Performance MySQL&#8221; that after setting\u00a0up replication the tasks related with monitoring and administrating are common occupation.\u00a0It\u00a0is\u00a0true.<\/p>\n<p>I decided to use\u00a0free solution for synchronization but now\u00a0I\u00a0would prefere to write my own\u00a0solution for\u00a0this task.\u00a0I\u00a0try to describe my idea:<br \/>\n&#8211; technicaly, written\u00a0in c# (due to excperience in this technology)\u00a0 working as multithreaded windows service.<br \/>\n&#8211; service gets slaves from command &#8220;SHOW SLAVE HOSTS&#8221;;<br \/>\n&#8211; service gets area of replication from slaves (tables\/databases);<br \/>\n&#8211; service analyses tables and choose tables which can be checked continuously. Such option is available for each table having <bdo dir=\"ltr\" xml:lang=\"en\">timestamp<\/bdo>\u00a0collumn with MySQL&#8217;s trigger &#8220;on update CURRENT_TIMESTAMP&#8221; and default value on update CURRENT_TIMESTAMP.<br \/>\n&#8211; service analyses rest of tables and qualify them to daily, weekly synchronization, also sends report to DBA which tables cannot by checked \/ synchronize.<br \/>\n&#8211; service analyses count of rows for tables qualified for continuous synchronization.<br \/>\n&#8211; service finds the difference and synchronizes each slave separately (not through replication but directly on concrete slave).<\/p>\n<p>But let&#8217;s go back to the subject. I divided my process into two threads.<br \/>\nFirst thread is continuous, second daily sychronization. Obviously, it depends on your needs which tables are to be monitored non-stop. Each thread constists of three phases:<\/p>\n<p>Phase 1: Tests<\/p>\n<ol>\n<pre lang=\"bash\">\/usr\/bin\/perl mk-table-checksum --set-vars=\"@REPLICATION_SYS:=1\" \\\r\n--replicate=replman.checksum_continuous --arg-table=replman.checksum_continuous_args \\\r\n--quiet localhost -S \/tmp\/mysql.sock  \\\r\n-u SomeUser -p SomePassword<\/pre>\n<p>As you see I used app mk-table-checksum for this phase. This app executes SQL commands in order to reveal possible inconsistency on slaves (it runs command on master so it is replicated to slaves, then slaves&#8217; this_* columns are updated).<br \/>\nOptions:<\/p>\n<ul>\n<li>\n&#8211;arg-table: indicates the table which contains tables and conditions which are to be used.<\/li>\n<li>&#8211;replicate: table used as result set of SQL commands;<\/li>\n<li>&#8211;set-vars: this is only to set SQL variable &#8211; but is very important. My problem was that when I put into &#8211;arg-table where condition e.g.:\n<pre lang=\"SQL\">WHERE `update_timestamp` > DATE_FORMAT( current_timestamp - interval 5 MINUTE, '%Y-%m-%d %H:%i:00');<\/pre>\n<p>the condition is not copied into &#8211;replicate table and when some unconformity is revealed the app from phase 3 tries to synchronize whole table. That is why I put the trigger which fulfills &#8220;where&#8221; column which is later used in phase 3.\n<\/li>\n<\/ul>\n<p>Phase 2: Check<br \/>\nThe only difference from phase 1 is option &#8211;replicate-check=1. Thanks to that I receive information which parts of tables are different (count or\/and CRC). I save the result of this file and send it to my email (If there is no difference the result is empty). You can ommit this phase &#8211; it is only information. I would like to add that I check if there is no lag on slave. In case the lag exists I wait till there is no lag &#8211; this is my example:<\/p>\n<pre lang=\"bash\">\r\n..\/bin\/replication.sh \"Slave01.domain\" \"3306\";\r\nif [ \"$INT_RES\" -eq \"1\" ]; then\r\n    echo \"Slave does not work [$INT_RES]\";\r\nelse\r\nif [ \"$INT_RES\" -gt \"1\" ]; then\r\n    echo \"Slave has lags [$INT_RES] [sleeping]\";\r\n    while [ \"$INT_RES\" -gt \"1\" ]; do\r\n        sleep $INT_RES;\r\n        ..\/bin\/replication.sh \"Slave01.domain\" \"3306\";\r\n        INT_RES=$?;\r\n    done;\r\n    echo \"[\"`date`\"] No more lags\";\r\nfi\r\n<\/pre>\n<p>I used my script which is responsible for checking of replication status:<\/p>\n<pre lang=\"bash\">\r\n#!\/bin\/bash\r\n\r\nHOST=\"$1\";\r\nPORT=\"$2\";\r\n\r\nif [ -z $HOST ]; then\r\n    \/bin\/echo \"[Error] no host given (first argument)\";\r\n    exit 9;\r\nfi\r\nif [ -z $PORT ]; then\r\n    PORT='3306';\r\nfi\r\n\r\nPRINT_RESULT=0;\r\nRESULT=\"\";\r\nif [ \"$3\" = \"print\" ]; then\r\n    PRINT_RESULT=1;\r\nfi\r\nSCRIPT_NAME=`basename ${0%.*}`;\r\n\r\nTMP_ERROR_DIR=\"\/tmp\/\";\r\nDIR_REPORT=\"\/var\/log\/overseer\/\";\r\nERR_FILE=\"$DIR_REPORT\"\"Mysql_\"$HOST\".Replication\";\r\nLAG_ERR_FILE=\"$DIR_REPORT\"\"Mysql_\"$HOST\".ReplicationLag\";\r\n\r\nMYSQL_HOMEDIR=\"\/usr\/local\/services\/mysql\/bin\/\";\r\nMYSQL_CMD=\"mysql\";\r\n\r\nMY_PASSWD=\"YourPassword\";\r\nMY_USER=\"YourUser\";\r\n\r\nSLAVE_INFO=$($MYSQL_HOMEDIR$MYSQL_CMD -u$MY_USER --password=$MY_PASSWD -h $HOST --port=$PORT -e \"show slave status\\G\" 2>$TMP_ERROR_DIR$SCRIPT_NAME$HOST);\r\nSLAVE_ERR=$(\/bin\/cat < $TMP_ERROR_DIR$SCRIPT_NAME$HOST);\r\n\r\nif [ -n \"$SLAVE_ERR\" ]; then\r\n        \/bin\/echo $SLAVE_ERR  > $ERR_FILE;\r\n        exit 1;\r\nfi\r\n\r\nSLAVE_IO=$(\/bin\/echo $SLAVE_INFO  | \/bin\/grep -o -i -e '\\(Slave_IO_Running:[[:space:]]\\)\\([A-Za-z]\\{3\\}\\)'  | \/bin\/awk -F \" \"  '{print $2 }');\r\nSLAVE_SQL=$(\/bin\/echo $SLAVE_INFO | \/bin\/grep -o -i -e '\\(Slave_SQL_Running:[[:space:]]\\)\\([A-Za-z]\\{3\\}\\)' | \/bin\/awk -F \" \"  '{print $2 }');\r\n\r\nSLAVE_IO2=$(\/bin\/echo $SLAVE_INFO  | \/bin\/grep -o -i -e \"\\(Slave_IO_Running:[[:space:]]\\)\");\r\n\r\nif  [ \"$SLAVE_IO\" = \"Yes\" ] && [ \"$SLAVE_SQL\" = \"Yes\" ]; then\r\n        SLAVE_LAG=$(\/bin\/echo $SLAVE_INFO  | \/bin\/grep -o -i -e '\\(Seconds_Behind_Master:[[:space:]]\\)\\([0-9]\\{0,10\\}\\)' | \/bin\/awk -F \" \"  '{print $2 }');\r\n        SLAVE_LAG=$((0 +  $SLAVE_LAG));\r\n        if [ -f \"$ERR_FILE\" ]; then\r\n            \/bin\/rm \"$ERR_FILE\";\r\n        fi\r\n        if [ \"$SLAVE_LAG\" -gt \"0\" ]; then\r\n            \/bin\/echo $SLAVE_LAG  > $LAG_ERR_FILE;\r\n            exit $SLAVE_LAG;\r\n        fi\r\n        RESULT=\"Slave is working correctly.\";\r\n        # echo \"Status:  1|MySQL Replication SQL [OK]\" > $ERR_FILE;\r\n    elif [ \"$SLAVE_IO\" = \"Yes\" ] && [ \"$SLAVE_SQL\" = \"No\" ]; then\r\n        RESULT=\"SQL thread does not work.\";\r\n        \/bin\/echo \"$RESULT\" > $ERR_FILE;\r\n        exit 1;\r\n    elif [ \"$SLAVE_IO\" = \"No\" ] && [ \"$SLAVE_SQL\" = \"Yes\" ]; then\r\n        RESULT=\"IO thread does not work.\";\r\n        \/bin\/echo \"$RESULT\" > $ERR_FILE;\r\n        exit 1;\r\n    else\r\n        RESULT=\"Neither SQL nor IO thread does work.\";\r\n        \/bin\/echo \"$RESULT\" > $ERR_FILE;\r\n        exit 1;\r\nfi;\r\n\r\nexit 0;\r\n\r\n<\/pre>\n<p>Phase 3: Synchronization of data.<br \/>\nI use app mk-table-sync with option &#8211;sync-to-master. My example is as folows:<\/p>\n<pre lang=\"bash\">\r\n\/usr\/bin\/perl .\/mk-table-sync --print --pid \/tmp\/sync.cont.pid \r\n--charset utf8 --nobuffer-to-client --noindex-hint \\\r\n--nocheck-privileges --replicate replman.checksum_continuous \\\r\n--set-vars=\"@REPLMAN_CONTINUOUS:=1\" \\\r\n--sync-to-master Slave01.host:3307 -u YourReplmanUser -p \"YourReplmanPass\" \\\r\n> .\/SomeFile.tmp;\r\n<\/pre>\n<p>Please, take attention to 2 options which are the basis for operation (and not standard):<br \/>\na) &#8211;print: it means I only print SQL command and direct&#8217;em to file .\/SomeFile.tmp;<br \/>\nFirstly I describe the problem that I had to solve, i.e. in some circumstances standard way of using this script caused unwanted DELETE operations (in my continuous replication) &#8211; it was due to the fact that app uses replication to propagate changes.<br \/>\nMy solution was to create trigger BEFORE DELETE on MASTER which checks whether variable @REPLMAN_CONTINUOS is set to 1. If it is true, trigger tries to update non existing update &#8211; it sounds like madness? Not exactly, thanks to this in case a record exists it will not delete it. In case there is no such record on master the DELETE command will replicate to SLAVE.<br \/>\nUnfortunately, the synchronization of data is executed through REPLACE command and problem is that REPLACE is for triggers nothing more then sequence of INSERT and DELETE commands &#8211; considering the fact thath BEFORE DELETE is trigger which do not allow for such operation &#8211; so I had to divide my data into REPLACE AND DELETE commands. All DELETEs are preceded @REPLMAN_CONTINUOS variable set to 1.<\/p>\n<p>Now it works fine.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When I started to use replication (on\u00a0few tables)\u00a0I encountered some problems with discrepancy on slave &#8211; it was mainly due to filters on slaves. I had to change some queries and everything was almost fine. Soon, I expanded my topology and started to replicate all databases from master. Simultenously, I read in the book &#8220;High [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[7,3],"tags":[13],"_links":{"self":[{"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/posts\/223"}],"collection":[{"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=223"}],"version-history":[{"count":32,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/posts\/223\/revisions"}],"predecessor-version":[{"id":351,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/posts\/223\/revisions\/351"}],"wp:attachment":[{"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=223"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=223"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=223"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}