一、mysql group replication 生来就要面对两个问题:

  一、主节点宕机如何恢复。

  二、多数节点离线的情况下、余下节点如何继续承载业务。

  在这里我们只讨论第一个问题、也就是说当主结点宕机之后、我们怎么把它从新加入到高可用集群中去。这个问题又可以细分成

  两种情况:

    1、温和打击:主结点的数据还在、宕机期间集群中的其它结点的binlog日志也都还在

          这种情况下重新启动mysql group replication 就可修复问题。

    2、毁灭打击:主结点的数据都没有了

          这种情况下要从其余结点备份恢复宕机结点、然后再重启mysql group replication 就可修复问题。

  详细的修复步骤请看后面的例子

二、环境介绍:

  环境简介

主机名         ip地址        mgr角色

mtls17        10.186.19.17      primary    

mtls18        10.186.19.18      seconde

mtls19        10.186.19.19      seconde

  集群状态:

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

  说明:

  由上面的信息可以看出mtls17上的mysql为集群当前的primary结点、并且集群的各结点的状态正常。

三、情况下的故障模拟 + 解决:

  1、模拟mtls17结点宕机

ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql
[root@mtls17 data]# kill -
[root@mtls17 data]# ps -ef | grep mysql
root : pts/ :: grep --color=auto mysql

  

  2、查看余下两个结点的情况

mysql> melect * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

  由上面可以看出在mtls17结点上的mysql被kill掉之后、余下的两个结点组成了新的集群、并且mtls18上的mysql

  成为了primary

  

  3、解决primary宕机恢复的问题

systemctl start mysql
[root@mtls17 data]# mysql -uroot -pmtls0352
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is
Server version: 5.7.-log MySQL Community Server (GPL) Copyright (c) , , Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> start group_replication;
Query OK, rows affected (4.03 sec) mysql>

  4、检查问题是否正常解决

select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

  总论:之前的主结点在宕机之后、通过重启服务、重启mysql-group-replication成功的解决了问题。

四、模拟primary结点上的数据已经丢失的情况下、如果恢复结点:

  1、退出服务、删除数据

[root@mtsl18 ~]# ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql
[root@mtsl18 ~]# kill -
[root@mtsl18 ~]# rm -rf /database/mysql/data/
[root@mtsl18 ~]# ps -ef | grep mysql
root : pts/ :: grep --color=auto mysql

  这个实验是接着情况一做下去的、所以primary在mtls18上、所以我们在mtls18上做退出服务、删除数据的动作

  2、查看集群的状态:

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

  说明:当mtls18宕机后primary就从mtls18切到了mtls17上去了

  3、通过meb备份mtls19用于还原宕机的mtls18

mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp \
--host=localhost --user=root --password=mtls0352 \
--backup-dir=/tmp/ --backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging \
backup-to-image MySQL Enterprise Backup version 4.1. Linux-2.6.-400.215..el5uek-x86_64 [//]
Copyright (c) , , Oracle and/or its affiliates. All Rights Reserved. :: MAIN INFO: A thread created with Id ''
:: MAIN INFO: Starting with following command line ...
mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp --host=localhost
--user=root --password=xxxxxxxx --backup-dir=/tmp/
--backup-image=/tmp/--01T12::.mbi --no-history-logging
backup-to-image :: MAIN INFO:
:: MAIN INFO: MySQL server version is '5.7.20-log'
.......
........
:: MAIN INFO: Full Image Backup operation completed successfully.
:: MAIN INFO: Backup image created successfully.
:: MAIN INFO: Image Path = /tmp/--01T12::.mbi
:: MAIN INFO: MySQL binlog position: filename mysql-bin., position -------------------------------------------------------------
Parameters Summary
-------------------------------------------------------------
Start LSN :
End LSN :
------------------------------------------------------------- mysqlbackup completed OK!

  4、传输备份到mtls18

scp /tmp/--01T12::.mbi mtls18:/tmp/

  5、还原备份

mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/tmp/2017-12-01T12:30:00.mbi \
> --backup-dir=/tmp/ --datadir=/database/mysql/data/3306/ \
> copy-back-and-apply-log
MySQL Enterprise Backup version 4.1. Linux-2.6.-400.215..el5uek-x86_64 [//]
Copyright (c) , , Oracle and/or its affiliates. All Rights Reserved. :: MAIN INFO: A thread created with Id ''
:: MAIN INFO: Starting with following command line ...
mysqlbackup --defaults-file=/etc/my.cnf
--backup-image=/tmp/--01T12::.mbi --backup-dir=/tmp/
--datadir=/database/mysql/data// copy-back-and-apply-log :: MAIN INFO:
IMPORTANT: Please check that mysqlbackup run completes successfully.
.....
.....
:: PCR1 INFO: The first data file is '/database/mysql/data/3306/ibdata1'
and the new created log files are at '/database/mysql/data/3306/'
:: MAIN INFO: MySQL server version is '5.7.20-log'
:: MAIN INFO: Restoring ...5.7.-log version
:: MAIN INFO: Apply-log operation completed successfully.
:: MAIN INFO: Full Backup has been restored successfully. mysqlbackup completed OK!

  6、重启mtls18上的mysql

[root@mtsl18 tmp]# chown -R mysql:mysql /database/mysql/data/
[root@mtsl18 tmp]# systemctl start mysql
[root@mtsl18 tmp]# ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql

  7、重启mysql group replication

mysql -uroot -pmtls0352
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.20-log MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> reset master;
Query OK, 0 rows affected (0.10 sec) mysql> reset slave;
Query OK, 0 rows affected (0.00 sec) mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec) mysql> source /database/mysql/data/3306/backup_gtid_executed.sql ;
Query OK, 0 rows affected (0.10 sec) mysql> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec) mysql> change master to
-> master_user='mgr_usr',
-> master_password='mgr10352'
-> for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.21 sec) mysql> start group_replication;
Query OK, 0 rows affected (3.46 sec)

  8、检查集群的状态是否正常

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
| group_replication_applier | 85f82fce-d65e-11e7-9e92-1e1b3511358e | mtsl18 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.01 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

五、总结:

  对于两种primary宕故障的修复总结:

    1、数据没有丢、binlog日志也没有丢 那直接重启mysql group replication 就行、它会自动修复问题。

    2、数据丢失的情况、先备份还原-->重启mysql group replication 就行。

  对于mysql group replication 维护操作复杂性的总结:  

    总的来说mysql group replication 对dba还是比较友好的、几个小小的操作就能恢复故障的集群。

六、我写的关于mysql group replication 的相关文章 

  1、mysql group replication 安装与配置详解:http://www.cnblogs.com/JiangLe/p/6727281.html#3849996

  2、mysql group replication 在mysql-5.7.20版本下的可用性报告:http://www.cnblogs.com/JiangLe/p/7809229.html

  3、mysql group replication 主节宕机点恢复 https://i.cnblogs.com/EditPosts.aspx?postid=7941929

  4、mysql group replication 多数据结点丢失情况下的恢复

  5、我写的全自动化安装mysql-group-replication 开源工具 https://github.com/Neeky/mysqltools

----

mysql group replication 主节点宕机恢复的更多相关文章

  1. CDH集群主节点宕机恢复

    1       情况概述 公司的开发集群在周末莫名其妙的主节点Hadoop-1的启动固态盘挂了,由于CM.HDFS的NameNode.HBase的Master都安装在Hadoop-1,导致了整个集群都 ...

  2. Mysql 5.7 基于组复制(MySQL Group Replication) - 运维小结

    之前介绍了Mysq主从同步的异步复制(默认模式).半同步复制.基于GTID复制.基于组提交和并行复制 (解决同步延迟),下面简单说下Mysql基于组复制(MySQL Group Replication ...

  3. Mysql Group Replication 简介及单主模式组复制配置【转】

    一 Mysql Group Replication简介    Mysql Group Replication(MGR)是一个全新的高可用和高扩张的MySQL集群服务.    高一致性,基于原生复制及p ...

  4. MySQL Group Replication 介绍

    2016-12-12,一个重要的日子,mysql5.7.17 GA版发布,正式推出Group Replication(组复制) 插件,通过这个插件增强了MySQL原有的高可用方案(原有的Replica ...

  5. MySQL group replication介绍

    “MySQL group replication” group replication是MySQL官方开发的一个开源插件,是实现MySQL高可用集群的一个工具.第一个GA版本正式发布于MySQL5.7 ...

  6. mysql group replication观点及实践

    一:个人看法 Mysql  Group Replication  随着5.7发布3年了.作为技术爱好者.mgr 是继 oracle database rac 之后. 又一个“真正” 的群集,怎么做到“ ...

  7. MySQL Group Replication配置

    MySQL Group Replication简述 MySQL 组复制实现了基于复制协议的多主更新(单主模式). 复制组由多个 server成员构成,并且组中的每个 server 成员可以独立地执行事 ...

  8. MySQL Group Replication 技术点

    mysql group replication,组复制,提供了多写(multi-master update)的特性,增强了原有的mysql的高可用架构.mysql group replication基 ...

  9. MySQL Group Replication 动态添加成员节点

    前提: MySQL GR 3节点(node1.node2.node3)部署成功,模式定为多主模式,单主模式也是一样的处理. 在线修改已有GR节点配置 分别登陆node1.node2.node3,执行以 ...

随机推荐

  1. Stop Googling!

    http://www.experts-exchange.com/Programming/Languages/Java

  2. Atitit  深入理解命名空间namespace  java c# php js

    Atitit  深入理解命名空间namespace  java c# php js 1.1. Namespace还是package1 1.2. import同时解决了令人头疼的include1 1.3 ...

  3. [codeforces 241]A. Old Peykan

    [codeforces 241]A. Old Peykan 试题描述 There are n cities in the country where the Old Peykan lives. The ...

  4. Open the Lock[HDU1195]

    Open the Lock Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)Tot ...

  5. jsonp 调用天气API

    由于Sencha Touch 2这种开发模式的特性,基本决定了它原生的数据交互行为几乎只能通过AJAX来实现. 当然了,通过调用强大的PhoneGap插件然后打包,你可以实现100%的Socket通讯 ...

  6. Eclipse中WEB项目自动部署到Tomcat

    原因 很长时间没用Eclipse了,近期由于又要用它做个简单的JSP项目,又要重新学习了,虽然熟悉的很快,但记忆总是很模糊,偶尔犯错,以前很少写博客,现在感觉还是很有必要的,编程中每个人对于犯过的错误 ...

  7. C++红旗之更短形式:500多字符且无法遵守原题规则

    Purpose and Scope 研究五星红旗C++代码生成问题的代码压缩方法. 没有最短,仅仅有更短. 已经尽力了.爱因斯坦的三个小板凳里,我这是第四个. 继续深入压缩代码的方法肯定非常诡异了. ...

  8. jQuery Pagination Ajax分页插件中文详解(转)

    一.相关demo 基本demo页面 Ajax demo页面 参数可编辑demo页面 二.简介与说明 此jQuery插件为Ajax分页插件,一次性加载,故分页切换时无刷新与延迟,如果数据量较大不建议用此 ...

  9. (转载)equals与==

    引言:从一个朋友的blog转过来的,里面解决了两个困扰我很久的问题.很有久旱逢甘霖的感觉. 概述:        A.==可用于基本类型和引用类型:当用于基本类型时候,是比较值是否相同:当用于引用类型 ...

  10. [MFC美化] SkinMagic使用详解1- SkinMagic使用流程

    [SkinMagic使用流程] 1.工程配置SkinMagic相关文件 2.初始化SkinMagic皮肤文件,窗体加载皮肤 3.释放皮肤资源 特别声明,SkinMagic要是破解版的,如果不是,可能需 ...