从错误日志中检查,有两个信号量等待时间超长,MySQL自动crash了:

# grep " 600 " 31-226-mysql-error.log 
2021-12-27T02:25:15.086231Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2021-12-27T03:22:41.057752Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
# grep "ready for connections" 31-226-mysql-error.log 
2021-12-27T02:57:54.430405Z 0 [Note] /usr/sbin/mysqld: ready for connections.
2021-12-27T04:01:35.813766Z 0 [Note] /usr/sbin/mysqld: ready for connections. 

srv_error_monitor_thread发现存在阻塞超过600s的latch锁时,如果连续多次检测该锁仍没有释放,就会触发自动crash,避免MySQL持续hung住。
检查等待信息量时间长的线程有3个:

# grep "Thread 140487646721792 " 31-226-mysql-error.log 
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 241  seconds the semaphore:
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 244  seconds the semaphore:
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 264  seconds the semaphore:
......

--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 923  seconds the semaphore:
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 924  seconds the semaphore:
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 944  seconds the semaphore:
# grep "Thread 140487338841856" 31-226-mysql-error.log 
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 241  seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 244  seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 264  seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 272  seconds the semaphore:
.......

--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 923  seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 924  seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 944  seconds the semaphore:
# grep "Thread 140487643309824" 31-226-mysql-error.log 
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 241  seconds the semaphore:
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 244  seconds the semaphore:
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 264  seconds the semaphore:
......
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 923  seconds the semaphore:
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 924  seconds the semaphore:
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 944  seconds the semaphore:

这3个线程都出现了58次等待信号量,查询系统的信号量:

# cat /proc/sys/kernel/sem
250	32000	32	128

说明:
第一列,表示每个信号集中的最大信号量数目。
第二列,表示系统范围内的最大信号量总数目。
第三列,表示每个信号发生时的最大系统操作数目。
第四列,表示系统范围内的最大信号集总数目。

增大信号量只能治标,不治本,实际上还是数据库中有效率底下的SQL语句,优化SQL才是正解!

文章下方是我的微信,欢迎加我。👇

Logo

技术共进,成长同行——讯飞AI开发者社区

更多推荐