MySQL 常见的高可用方案有 MHA,Galera,Orchestrator,以及官方的 Group Replication(MGR),本文是 MGR 多主集群的搭建及维护学习记录。
Contents
MGR特点
- 支持单主和多主模式
- 基于 Paxos 算法,实现数据复制的一致性
- 插件化设计,支持插件检测,新增节点小于集群当前节点主版本号,拒绝加入集群,大于则加入,但无法作为主节点
- 没有第三方组件依赖
- 支持全链路 SSL 通讯
- 支持 IP 白名单
- 不依赖网络多播
搭建
可以使用 vagrant 启动 3 台虚拟机。分配 IP 如下:
itop-mgr-1 | 192.168.10.101 |
itop-mgr-2 | 192.168.10.102 |
itop-mgr-3 | 192.168.10.103 |
安装
MySQL 版本选择最新的 8.x,用以下命令安装:
yum install -y https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
yum install -y mysql-community-server mysql-shell mysql-router
手动配置
可以手动配置一个 mgr 集群,也可以用 mysql-shell 来配置。本文将记录手动配置过程,然后用 mysql-shell 来接管。
my.cnf
my.cnf
配置示例。
[mysqldump]
max_allowed_packet = 64M
[mysqld]
server-id=$ID
innodb_buffer_pool_size = 768M
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
# Binary logging and Replication
log_bin = mysql-bin
binlog_format = ROW
binlog_checksum = NONE # or CRC32
master_verify_checksum = OFF # ON if binlog_checksum = CRC32
slave_sql_verify_checksum = OFF # ON if binlog_checksum = CRC32
binlog_cache_size = 1M
binlog_stmt_cache_size = 3M
max_binlog_size = 512M
sync_binlog = 1
expire_logs_days = 7
log_slave_updates = 1
relay_log = mysql-relay-bin
relay_log_purge = 1
# Group Replication parameter
gtid_mode = ON
enforce_gtid_consistency = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
slave_parallel_workers = 10
slave_preserve_commit_order = ON
slave_parallel_type = LOGICAL_CLOCK
#以便在server收集写集合的同时将其记录到二进制日志。写集合基于每行的主键,并且是行更改后的唯一标识此标识将用于检测冲突。
transaction_write_set_extraction = XXHASH64
#组的名字可以随便起,但不能用主机的GTID! 所有节点的这个组名必须保持一致!
loose-group_replication_group_name = "abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
##为了避免每次启动自动引导具有相同名称的第二个组,所以设置为OFF。
loose-group_replication_start_on_boot = OFF
loose-group_replication_local_address = "$MYIP:33061"
loose-group_replication_group_seeds = "$NODE1:33061,$NODE2:33061,$NODE3:33061"
#为了避免每次启动自动引导具有相同名称的第二个组,所以设置为OFF。
loose-group_replication_bootstrap_group = OFF
##关闭单主模式的参数(本例测试时多主模式,所以关闭该项,开启多主模式的参数
loose-group_replication_single_primary_mode = OFF # = multi-primary
loose-group_replication_enforce_update_everywhere_checks=ON # = multi-primary
report_host=$MYIP
report_port=3306
# 允许加入组复制的客户机来源的ip白名单
loose-group_replication_ip_whitelist="192.168.10.0/24,127.0.0.1/8"
账号设置
为了方便在 vagrant 中用脚本操作,先重新用 insecure
方式初始化 MySQL。
rm -fr /var/lib/mysql
mysqld --initialize-insecure --user=mysql
systemctl start mysqld
# (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2),重启生成 sock 文件
systemctl restart mysqld
systemctl status mysqld
mysql -uroot -e "show databases;"
然后修改 root 账号的密码。
cat > /tmp/init.sql <<EOF
ALTER USER 'root'@'localhost' IDENTIFIED BY '$MYSQL_ROOT' PASSWORD EXPIRE NEVER;
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '$MYSQL_ROOT';
create user root@'%' identified WITH mysql_native_password BY '$MYSQL_ROOT';
grant all privileges on *.* to root@'%' with grant option;
flush privileges;
EOF
mysql -uroot < /tmp/init.sql
设置复制账号。
cat > /tmp/rep.sql <<EOF
SET SQL_LOG_BIN=0;
CREATE USER IF NOT EXISTS repl@'%' IDENTIFIED WITH 'mysql_native_password' BY 'repl';
GRANT SUPER,REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO repl@'%';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';
EOF
if [ ! -f $LOCK ];then
mysql -uroot -p$MYSQL_ROOT < /tmp/rep.sql
touch $LOCK
fi
安装MGR插件
# install mgr
cat > /tmp/mgr.sql <<EOF
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
SHOW PLUGINS;
EOF
mysql -uroot -p$MYSQL_ROOT -e "SHOW PLUGINS" |grep -q "group_replication" || mysql -uroot -p$MYSQL_ROOT < /tmp/mgr.sql
启动MGR
在第一个节点上启动 MGR,另外两个节点加入 MGR。
# start mgr
cat > /tmp/start.sql <<EOF
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
SELECT * FROM performance_schema.replication_group_members;
EOF
# only run on first node.
[ "$ID"x == "10101"x ] && mysql -uroot -p$MYSQL_ROOT < /tmp/start.sql
# join
cat > /tmp/join.sql <<EOF
SELECT * FROM performance_schema.replication_group_members;
show global variables like '%seed%';
START GROUP_REPLICATION;
SELECT * FROM performance_schema.replication_group_members;
SHOW STATUS LIKE 'group_replication_primary_member';
show global variables like 'group_replication_single%';
EOF
# run on other two node
[ "$ID"x == "10101"x ] || mysql -uroot -p$MYSQL_ROOT < /tmp/join.sql
group_replication_bootstrap_group
选项,为了避免每次启动自动引导具有相同名称的第二个组,所以需要设置为 OFF。因此执行 START GROUP_REPLICATION
之后需要关闭。
使用mysql-shell接管MGR
如果在已经配置好的组复制上创建 InnoDB Cluster,并且希望使用它来创建集群,可将 adoptFromGR
选项传递给 dba.createCluster()
函数。创建的InnoDB Cluster
会匹配复制组是以单主数据库还是多主数据库运行。
# 使用 mysqlsh 管理集群
# MGR集群接管:如果在已经配置好的组复制上创建InnoDB Cluster,并且希望使用它来创建集群,可将adoptFromGR选项传递给dba.createCluster()函数。创建的InnoDB Cluster会匹配复制组是以单主数据库还是多主数据库运行。要采用现有的组复制组,使用MySQL Shell连接到组成员。
# 在最后一个节点上操作,即等所有节点启动后在操作
JSDIR="/vagrant/js"
MYSQLSHLOG=/tmp/mysql.log
if [ "$ID"x == "10103"x ];then
echo "Run Get Status"
mysqlsh --js --file=$JSDIR/status.js > $MYSQLSHLOG 2>&1
grep "RuntimeError" $MYSQLSHLOG && r=0 || r=1
if [ $r -eq 0 ];then
grep "but GR is not active" $MYSQLSHLOG && r=0 || r=1
if [ $r -eq 0 ];then
echo "Cluster Need Reboot"
mysqlsh --js --file=$JSDIR/reboot.js
else
echo "Init Cluster"
mysqlsh --js --file=$JSDIR/init.js
fi
else
cat $MYSQLSHLOG
fi
fi
status.js
shell.connect('root@192.168.10.101', 'root');
var cluster = dba.getCluster('itopMgr');
cluster.status();
init.js
shell.connect('root@192.168.10.101', 'root');
var cluster = dba.createCluster('itopMgr', {adoptFromGR: true});
cluster.status();
reboot.js
shell.connect('root@192.168.10.101', 'root');
dba.rebootClusterFromCompleteOutage('itopMgr')
var cluster = dba.getCluster('itopMgr');
cluster.status();
如果从一开始就使用 mysql-shell 创建并管理集群,可以参考以下命令。
##链接主节点
MySQL JS >\connect root@192.168.10.101:3306
##创建集群
MySQL 192.168.10.101:3306 ssl JS > dba.createCluster('mgrCluster')
MySQL 192.168.10.101:3306 ssl JS > var cluster = dba.getCluster()
##添加节点
MySQL 192.168.10.101:3306 ssl JS > cluster.addInstance('root@192.168.10.102:3306')
MySQL 192.168.10.101:3306 ssl JS > cluster.addInstance('root@192.168.10.103:3306')
##查看集群状态
MySQL 192.168.10.101:3306 ssl JS > cluster.status()
##解散InnoDB Cluster
MySQL 192.168.10.101:3306 ssl JS > cluster.dissolve()
##配置新主选举权重
MySQL 192.168.10.101:3306 ssl JS > var mycluster = dba.getCluster()
MySQL 192.168.10.101:3306 ssl JS > mycluster.addInstance('root@192.168.10.102:3306', {memberWeight:25})
MySQL 192.168.10.101:3306 ssl JS > mycluster.addInstance('root@192.168.10.103:3306', {memberWeight:50})
##指定一个新的主节点
MySQL 192.168.10.101:3306 ssl JS > cluster.setPrimaryInstance('192.168.10.102:3306')
##Cluster.switchToMultiPrimaryMode()切换到多主模式:
MySQL 192.168.10.101:3306 ssl JS > cluster.switchToMultiPrimaryMode()
##Cluster.switchToSinglePrimaryMode()切换到单主模式
MySQL 192.168.10.101:3306 ssl JS > cluster.switchToSinglePrimaryMode('172.16.1.125:3306')
运维问题
节点状态
- ONLINE - 节点状态正常。
- OFFLINE - 实例在运行,但没有加入任何Cluster。
- RECOVERING - 实例已加入Cluster,正在同步数据。
- ERROR - 同步数据发生异常。
- UNREACHABLE - 与其他节点通讯中断,可能是网络问题,可能是节点crash。
- MISSING 节点已加入集群,但未启动group replication
集群状态
- OK – 所有节点处于online状态,有冗余节点。
- OK_PARTIAL – 有节点不可用,但仍有冗余节点。
- OK_NO_TOLERANCE – 有足够的online节点,但没有冗余,例如:两个节点的Cluster,其中一个挂了,集群就不可用了。
- NO_QUORUM – 有节点处于online状态,但达不到法定节点数,此状态下Cluster无法写入,只能读取。
- UNKNOWN – 不是online或recovering状态,尝试连接其他实例查看状态。
- UNAVAILABLE – 组内节点全是offline状态,但实例在运行,可能实例刚重启还没加入Cluster。
重新加入集群
如果节点故障后重启,状态为 MISSING
时,需要手动重新加入集群。
var cluster = dba.getCluster("itopMgr")
cluster.rejoinInstance('root@192.168.10.102:3306')
重启集群
当集群所有节点掉线,比如所有机器都宕机后启动,可以用 mysql-shell 重启集群。
shell.connect('root@192.168.10.101', 'root');
dba.rebootClusterFromCompleteOutage('itopMgr')
var cluster = dba.getCluster('itopMgr');
当使用 vagrant reload
虚拟机的时候,1 号虚机最先关机,这时候 2,3 号如果还在写入,会比 1 号数据新,当 reload 完成之后,如果还默认以1 号为准,用 SQL 语句启动集群,会出现以下问题:
Rejoining instance to the cluster ...
ERROR: A GTID set check of the MySQL instance at '192.168.10.102:3306' determined that it contains transactions that do not originate from the cluster, which must be discarded before it can join the cluster.
192.168.10.102:3306 has the following errant GTIDs that do not exist in the cluster:
abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1002598-1002690
Having extra GTID events is not expected, and it is recommended to investigate this further and ensure that the data can be removed prior to rejoining the instance to the cluster.
Discarding these extra GTID events can either be done manually or by completely overwriting the state of 192.168.10.102:3306 with a physical snapshot from an existing cluster member. To achieve this remove the instance from the cluster and add it back using <Cluster>.addInstance() and setting the 'recoveryMethod' option to 'clone'.
Cluster.rejoinInstance: The instance '192.168.10.102:3306' contains errant transactions that did not originate from the cluster. (RuntimeError)
因此这种情况用 mysql-shell 来重启比较好。
当已经出现这种情况的时候,用下面的命令查看所有 member。
[root@itop-mgr-1 ~]# ./run.sh "select * from performance_schema.replication_group_members;"
RUN on 192.168.10.101
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | | | NULL | OFFLINE | | |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
RUN on 192.168.10.102
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 4dc877f0-becf-11ea-bf0b-5254004d77d3 | 192.168.10.102 | 3306 | OFFLINE | | |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
RUN on 192.168.10.103
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | | | NULL | OFFLINE | | |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
然后以 MEMBER_ID
和 MEMBER_HOST
不为空的节点为基础,启动 group_replication
。
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
然后启动 3 号节点
START GROUP_REPLICATION;
1 号节点无法直接加入,需要先移除在加入,并且使用 clone
方式恢复数据。
MySQL 192.168.10.102:33060+ ssl JS > c.addInstance('root@192.168.10.101:3306', {recoveryMethod:'clone'})
WARNING: A GTID set check of the MySQL instance at '192.168.10.101:3306' determined that it contains transactions that do not originate from the cluster, which must be discarded before it can join the cluster.
192.168.10.101:3306 has the following errant GTIDs that do not exist in the cluster:
abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:4218-8557
WARNING: Discarding these extra GTID events can either be done manually or by completely overwriting the state of 192.168.10.101:3306 with a physical snapshot from an existing cluster member. To use this method by default, set the 'recoveryMethod' option to 'clone'.
Having extra GTID events is not expected, and it is recommended to investigate this further and ensure that the data can be removed prior to choosing the clone recovery method.
Clone based recovery selected through the recoveryMethod option
NOTE: Group Replication will communicate with other members using '192.168.10.101:33061'. Use the localAddress option to override.
Validating instance configuration at 192.168.10.101:3306...
This instance reports its own address as 192.168.10.101:3306
Instance configuration is suitable.
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.
Adding instance to the cluster...
ERROR: Unable to enable clone on the instance '192.168.10.103:3306': Recovery user 'repl' not created by InnoDB Cluster
Monitoring recovery process of the new cluster member. Press ^C to stop monitoring and let it continue in background.
Clone based state recovery is now in progress.
NOTE: A server restart is expected to happen as part of the clone process. If the
server does not support the RESTART command or does not come back after a
while, you may need to manually start it back.
* Waiting for clone to finish...
NOTE: 192.168.10.101:3306 is being cloned from 192.168.10.103:3306
** Stage DROP DATA: Completed
** Clone Transfer
FILE COPY ############################################################ 100% Completed
PAGE COPY ############################################################ 100% Completed
REDO COPY ############################################################ 100% Completed
** Stage RECOVERY: \
NOTE: 192.168.10.101:3306 is shutting down...
* Waiting for server restart... ready
* 192.168.10.101:3306 has restarted, waiting for clone to finish...
* Clone process has finished: 109.54 MB transferred in 4 sec (27.38 MB/s)
State recovery already finished for '192.168.10.101:3306'
The instance '192.168.10.101:3306' was successfully added to the cluster.
执行完成之后,三个节点上 SELECT @@GTID_EXECUTED
应返回相同的结果:
[root@itop-mgr-1 ~]# ./run.sh "SELECT @@GTID_EXECUTED;"
RUN on 192.168.10.101
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------------------------------------------------+
| @@GTID_EXECUTED |
+---------------------------------------------------------------------+
| abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-4686:1001062-1002871:2001062 |
+---------------------------------------------------------------------+
RUN on 192.168.10.102
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------------------------------------------------+
| @@GTID_EXECUTED |
+---------------------------------------------------------------------+
| abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-4686:1001062-1002871:2001062 |
+---------------------------------------------------------------------+
RUN on 192.168.10.103
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------------------------------------------------+
| @@GTID_EXECUTED |
+---------------------------------------------------------------------+
| abcdaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-4686:1001062-1002871:2001062 |
+---------------------------------------------------------------------+
mysql-shell非交互式
使用 'interactive': false
选项。
dba.configureLocalInstance('root@$MYIP:3306', {'password': 'root', 'interactive': false})
执行 dba.rebootClusterFromCompleteOutage
时会要求用户确认是否加入集群 (y/N) ,脚本中需要强制 yes,可以用以下方法,注意列表里不要写当前操作的节点 IP。
dba.rebootClusterFromCompleteOutage('itopMgr',{rejoinInstances:["192.168.10.102:3306","192.168.10.103:3306"]})
执行 addInstance
时指定 recoveryMethod
选项,避免交互。
cluster.addInstance('root@192.168.10.203:3306', {'localAddress': '192.168.10.203', 'password': 'root','recoveryMethod':'clone'})
常用命令
集群验证 SQL 语句
select * from performance_schema.replication_group_members; #All members should be online.
select instance_name, mysql_server_uuid, addresses from mysql_innodb_cluster_metadata.instances; # All instances should return same value for mysql_server_uuid
SELECT @@GTID_EXECUTED; #All nodes should return same value
常用语句
mysql> SET SQL_LOG_BIN = 0;
mysql> stop group_replication;
mysql> set global super_read_only=0;
mysql> drop database mysql_innodb_cluster_metadata;
mysql> RESET MASTER;
mysql> RESET SLAVE ALL;
JS > var cluster = dba.getCluster()
JS > var cluster = dba.getCluster("<Cluster_name>")
JS > var cluster = dba.createCluster('name')
JS > cluster.removeInstance('root@<IP_Address>:<Port_No>',{force: true})
JS > cluster.addInstance('root@<IP add>,:<port>')
JS > cluster.addInstance('root@<IP add>,:<port>')
JS > dba.getCluster()
JS > dba.getCluster().status()
JS > dba.getCluster().checkInstanceState('root@<IP_Address>:<Port_No>')
JS > dba.getCluster().rejoinInstance('root@<IP_Address>:<Port_No>')
JS > dba.getCluster().describe()
查看集群状态
mysql-js> dba.getCluster().status()
Note: If you get following message, then it's time to reboot cluster from complete outage
Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, but GR is not active) (RuntimeError)
reboot from complete outage 即以下命令。
dba.rebootClusterFromCompleteOutage('<cluster_name>')
从仲裁丢失中恢复集群
集群处于 NO_QUORUM
状态。Re-establish quorum using the method cluster.forceQuorumUsingPartitionOf()
JS > cluster.forceQuorumUsingPartitionOf("root@<IP address>:<port>")
调试方法
启动 mysql-shell 时指定 log level。
mysqlsh --log-level=DEBUG3
unmanaged replication group
MySQL JS > dba.getCluster()
Dba.getCluster: This function is not available through a session to an instance belonging to an unmanaged replication group (RuntimeError)
InnoDB cluster status is different for mysqlsh and performance_schema.replicaion_group_members. In performance_schema it is ONLINE but mysqlsh depicts MISSING.( Cluster.status() is out of sync with the Group Replication view replication_group_members)。
这种情况可能是集群尚未被 mysql-shell 接管,可以执行上文提到的接管集群的命令。或者
Restore auto.cnf form backup, restart mysqld. If require remove instance and add it back again.
About auto.cnf and server_uuid:auto.cnf file is similar to my.cnf. It contains server_uuid. Server_uuid is generated automatically. When starting MySQL it read and use server_uuid from auto.cnf. The value of the server_uuid used in reapplication. server_uuid is true UUID in an addition to user supplied server_id system variable.
About auto.cnf and server_uuid
参考资料
1. https://jeremyxu2010.github.io/2019/05/mysql-innodb-cluster实战/
2. https://kubedb.com/docs/v0.13.0-rc.0/guides/mysql/clustering/overview/
3. http://blog.itpub.net/26736162/viewspace-2675139/
4. https://juejin.im/post/5df4d1dc6fb9a0164577d0a8
5. http://shrenikp.blogspot.com/2018/10/mysql-innodb-cluster-troubleshooting.html
6. https://www.cnblogs.com/xinysu/p/6674832.html
7. https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-working-with-cluster.html
8. https://dba.stackexchange.com/questions/224117/mysqlsh-rebootclusterfromcompleteoutage-force-prompt
9. https://dev.mysql.com/doc/dev/mysqlsh-api-javascript/8.0/classmysqlsh_1_1dba_1_1_dba.html
发表回复