下午正在开周会,然后收到短信,说是X.X.X.X的机器ping不通了,一轮测试过后,发现是某台数据库服务器挂了,先不急着重启,问下tencent客服。。。

乖乖的好家伙,母机的主板坏了。。。。一个小时候,母机起来了,看下数据库起来么。。

[root@VM_145_57_tlinux ~]# mysql -uroot -p1234
ERROR (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[root@VM_145_57_tlinux ~]# mysql -uroot -p
Enter password:
ERROR (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[root@VM_145_57_tlinux ~]# ps -ef |grep mysql
root : ? :: /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql : ? :: /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=
root : pts/ :: grep mysql
[root@VM_145_57_tlinux ~]# mysql -uroot -p1234 -S /tmp/mysql.sock
ERROR (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[root@VM_145_57_tlinux ~]# ls /tmp/mys^C
[root@VM_145_57_tlinux ~]# /etc/init.d/mysqld restart
Stopping mysqld: [ OK ]
Starting mysqld: [ OK ]
[root@VM_145_57_tlinux ~]# ps -ef | msyql
-bash: msyql: command not found
[root@VM_145_57_tlinux ~]# ps -ef |grep mysql
root : ? :: /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql : ? :: /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=
root : pts/ :: /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql : pts/ :: /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=
root : pts/ :: grep mysql

这下亮了,出现了double进程,没发解释啊。。。【下面这步觉得error】

[root@VM_145_57_tlinux ~]# kill -
[root@VM_145_57_tlinux ~]# kill -
[root@VM_145_57_tlinux ~]# ps -ef |grep mysql
root : pts/ :: /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql : pts/ :: /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=
root : pts/ :: grep mysql

之前起来的进程被我kill -9之后,发现能连上数据库了,可是innodb存储引擎没起来。

mysql> show engines;
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
| MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
| CSV | YES | CSV storage engine | NO | NO | NO |
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
rows in set (0.00 sec)

似乎能看出问题了,进程起来了,socket没有建立,存储引擎没有启动-->innodb正在后台线程操作!!查下errorlog一探究竟:

 :: mysqld_safe Starting mysqld daemon with databases from /data/mysql/var
:: InnoDB: Initializing buffer pool, size = .0G
:: InnoDB: Completed initialization of buffer pool
InnoDB: Log scan progressed past the checkpoint lsn
:: InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number

事实上,那个还没创建socket的进程是正在执行double write的回滚工作,继续往下翻页:

InnoDB: Doing recovery: scanned up to log sequence number
:: mysqld_safe Starting mysqld daemon with databases from /data/mysql/var
:: InnoDB: Initializing buffer pool, size = .0G
:: InnoDB: Error: cannot allocate bytes of
InnoDB: memory with malloc! Total allocated memory
InnoDB: by InnoDB bytes. Operating system errno:
InnoDB: Check if you should increase the swap file or
InnoDB: ulimits of your operating system.
InnoDB: On FreeBSD check you have compiled the OS with
InnoDB: a big enough maximum process size.
InnoDB: Note that in most -bit computers the process
InnoDB: memory space is limited to GB or GB.
InnoDB: We keep retrying the allocation for seconds...
InnoDB: Doing recovery: scanned up to log sequence number

这个是我在innodb引擎正在执行recovery的时候强行启动mysql的报错,提示内存不足。。继续往下翻页:

InnoDB: Doing recovery: scanned up to log sequence number
::03InnoDB: Fatal error: cannot allocate the memory for the buffer pool
:: [ERROR] Plugin 'InnoDB' init function returned error.
:: [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
:: [Note] Event Scheduler: Loaded events
:: [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61' socket: '/tmp/mysql.sock' port: Source distribution
InnoDB: Doing recovery: scanned up to log sequence number 3612187648
[此处省略若干行]
InnoDB: Doing recovery: scanned up to log sequence number
:: [Note] /usr/libexec/mysqld: Normal shutdown :: [Note] Event Scheduler: Purging the queue. events
:: [Note] /usr/libexec/mysqld: Shutdown complete :: mysqld_safe mysqld from pid file /data/mysql/mysqld/mysqld.pid ended
:: mysqld_safe Starting mysqld daemon with databases from /data/mysql/var
:: InnoDB: Initializing buffer pool, size = .0G
:: InnoDB: Completed initialization of buffer pool
InnoDB: Log scan progressed past the checkpoint lsn 1808 1368235506 [kill -9的结果]
:: InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number

看到上面的错误,才恍然大悟,前面的操作是有多危险,要不是mysql的recovery不那么强悍的话,恐怕数据就被我这样弄没了。。。好悬

InnoDB: Doing recovery: scanned up to log sequence number
InnoDB: Doing recovery: scanned up to log sequence number
:: InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents:
InnoDB: Apply batch completed
:: InnoDB: Started; log sequence number
:: [Note] Event Scheduler: Loaded events
:: [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61' socket: '/tmp/mysql.sock' port: Source distribution

回滚完成,happy启动服务。

下面来讲下innodb_log_file_size在innodb异常时回滚机制:

下面我们参考 http://www.cnblogs.com/zuoxingyu/archive/2012/10/25/2738864.html 大牛的博客,算一下innodb_log_file_size到底多大为最合适:【作为一个粗略的规则,你可以让这个日志足够大到能容纳最多一小时左右的日志】

mysql> pager grep sequence
PAGER set to 'grep sequence'
mysql> show engine innodb status\G select sleep(); show engine innodb status\G
Log sequence number
row in set (0.00 sec) row in set ( min 0.00 sec) Log sequence number
row in set (0.00 sec)

(411580730-446445181)/1024/1024/60=1999M,向上取整2G,由于默认有2个日志文件,所以按照当前高峰期计算,设为1G为最佳值。

cvm母机宕机重启后数据库修复的更多相关文章

  1. 数据库主库从库宕机重启后binlog数据同步

    由于阿里云经典网络迁移到专用网络,一不小心没有先预备方案调整网段, 导致实例无法以内网IP形式访问数据库,被迫进行数据库停机后网络网段调整,导致宕机了几个小时...被客户各种投诉爆了.. 基于这次数据 ...

  2. 一次Oracle宕机切换后产生ORA错误的处理过程

    问题背景 机房意外断电后Oracle主服务器启动失败,Oracle备机接管 为了安全,管理员对于数据库做expdp的逻辑备份.但备份时发现AttributeInstance表备份失败,提示ORA-01 ...

  3. openstack环境-解决windows虚机重启后比当前时间晚8小时问题

    背景: 生产环境下,发现windows虚机每次重启,时间都会倒退到虚机的格林威治时间(+8小时才是北京时间),也就是比当前时间晚8小时.测试发现,windows虚机所用的镜像,缺少了一个os_type ...

  4. keepalived 容器在宿主机重启后无法启动问题:报错:daemon is already running

    初步猜测原因是:keepalived容器内的keepalived.pid文件在keepalived容器非正常退出时,没有正确删除,造成第二次启动时容器检查到pid文件已经存在,认为该进程已经存在,因为 ...

  5. 由Redis的hGetAll函数所引发的一次服务宕机事件

    昨晚通宵生产压测,终于算是将生产服务宕机的原因定位到了,心累.这篇博客,算作一个复盘和记录吧... 先来看看Redis的缓存淘汰算法思维导图: 说明:当实际占用的内存超过Redis配置的maxmemo ...

  6. 分享:Windows2008重启后提示系统恢复选项的解决办法

    如题:WINdows2008服务器. 重启后提示系统恢复选项的解决办法 使用windows 2008后,不能启动的问题,重启后出现 修复系统选项 采用下面帖子中的部分命令搞定之. 我自己是直接使用:选 ...

  7. 记一次 oracle 数据库在宕机后的恢复

    系统:redhat 6.6 oracle版本: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production 问题描述: ...

  8. oracle 归档模式开启后数据库宕机解决过程

    首先按照网友说的shutdown immediately,结果hang了半个小时也么反应. 然后检查日志,全盘搜索.trc,发现 (D:\app\oracle\diag\rdbms\cms1u\cms ...

  9. 万答#4,延迟从库加上MASTER_DELAY,主库宕机后如何快速恢复服务

    欢迎来到 GreatSQL社区分享的MySQL技术文章,如有疑问或想学习的内容,可以在下方评论区留言,看到后会进行解答 当主库宕机后,延迟从库如何才能"取消"主动延迟,以便恢复服务 ...

随机推荐

  1. VirtualBox – Error In supR3HardenedWinReSpawn

    Genymotion 模拟器安装好虚拟机后,启动时报错: ————————— VirtualBox – Error In supR3HardenedWinReSpawn ————————— <h ...

  2. ubuntu 安装 regex模块时 fatal error: Python.h: No such file or directory

    原因是 python-dev包没有安装 根据Py2还是py3 sudo apt-get install python-dev 或者 sudo apt-get install python3-dev 安 ...

  3. iOS学习4_UITableView的使用

    UITableView相当于Android里面的ListView.但功能却比ListView强大太多. 使用UITableView须要指定数据源和代理. 1.显示全部的行 遵守UITableViewD ...

  4. Mongodb副本集+分片集群环境部署

    前面详细介绍了mongodb的副本集和分片的原理,这里就不赘述了.下面记录Mongodb副本集+分片集群环境部署过程: MongoDB Sharding Cluster,需要三种角色: Shard S ...

  5. RSA加密解密及数字签名Java实现

    http://my.oschina.net/jiangli0502/blog/171263

  6. IntelliJ IDEA和pycharm注册码

    BIG3CLIK6F-eyJsaWNlbnNlSWQiOiJCSUczQ0xJSzZGIiwibGljZW5zZWVOYW1lIjoibGFuIHl1IiwiYXNzaWduZWVOYW1lIjoiI ...

  7. Windows网页上碰到无法完全显示的图片怎么办

    如图所示,下一幅图片只能显示一半. 我们选中并在新标签中打开 一般即可正常显示,如果还不行,留意地址栏,这就是这个图片的真实地址,我们完全可以用迅雷直接把这幅图片下载下来. 我们甚至可以猜测,去掉后缀 ...

  8. Django——基于类的视图(class-based view)

    刚开始的时候,django只有基于函数的视图(Function-based views).为了解决开发视图中繁杂的重复代码,基于函数的通用视图( Funcation-based generic vie ...

  9. Redis与Reactor模式

    Redis与Reactor模式 Jan 9, 2016 近期看了Redis的设计与实现,这本书写的还不错,看完后对Redis的理解有非常大的帮助. 另外,作者整理了一份Redis源代码凝视,大家能够c ...

  10. java中按字节获得字符串长度的两种方法 Java问题通用解决代码

    jdk本身就自带获取字符串字节长度的api了,但字符串如果包含特殊符号或全半角符号或标点符号获取到的结果会有偏差,最好的证据就是新浪微博的字数统计了 // jdk自带的获取字节长度 //注意getBy ...