問題描述:Linux VM異常重啓,須要排查問題緣由centos
排查結果:服務器
查詢Messages日誌獲取到的信息ssh
虛擬機內核版本:ui
Jun 20 03:34:51 test01 kernel: Linux version 2.6.32-642.1.1.el6.x86_64 (mockbuild@worker1.bsys.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 31 21:57:07 UTC 2016
查看到虛擬機重啓時間約爲:2019/6/20 03:34 CSTthis
Jun 20 03:31:04 test01 kernel: hv_utils: Shutdown request received - graceful shutdown initiated Jun 20 03:31:04 test01 init: tty (/dev/tty1) main process (2703) killed by TERM signal Jun 20 03:31:04 test01 init: tty (/dev/tty2) main process (2705) killed by TERM signal Jun 20 03:31:04 test01 init: tty (/dev/tty3) main process (2707) killed by TERM signal Jun 20 03:31:04 test01 init: tty (/dev/tty4) main process (2709) killed by TERM signal Jun 20 03:31:04 test01 init: tty (/dev/tty5) main process (2711) killed by TERM signal Jun 20 03:31:04 test01 init: tty (/dev/tty6) main process (2713) killed by TERM signal Jun 20 03:31:04 test01 init: serial (ttyS0) main process (2723) killed by TERM signal Jun 20 03:31:05 test01 abrtd: Got signal 15, exiting Jun 20 03:31:10 test01 dnsmasq[1710]: exiting on receipt of SIGTERM Jun 20 03:31:13 test01 acpid: exiting Jun 20 03:31:13 test01 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w" Jun 20 03:31:13 test01 init: Disconnected from system bus Jun 20 03:31:14 test01 auditd[1424]: The audit daemon is exiting. Jun 20 03:31:14 test01 kernel: type=1305 audit(1560972674.940:131977): audit_pid=0 old=1424 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 Jun 20 03:31:15 test01 kernel: type=1305 audit(1560972675.039:131978): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditctl_t:s0 res=1 Jun 20 03:31:15 test01 kernel: Kernel logging (proc) stopped. Jun 20 03:31:15 test01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1458" x-info="http://www.rsyslog.com"] exiting on signal 15. Jun 20 03:34:51 test01 kernel: imklog 5.8.10, log source = /proc/kmsg started. Jun 20 03:34:51 test01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1457" x-info="http://www.rsyslog.com"] start Jun 20 03:34:51 test01 kernel: Initializing cgroup subsys cpuset Jun 20 03:34:51 test01 kernel: Initializing cgroup subsys cpu Jun 20 03:34:51 test01 kernel: Linux version 2.6.32-642.1.1.el6.x86_64 (mockbuild@worker1.bsys.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 31 21:57:07 UTC 2016 Jun 20 03:34:51 test01 kernel: Command line: ro root=UUID=adc76f7c-fef6-4075-941e-e7ce50fb3e50 rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 rd_NO_LVM rd_NO_DM Jun 20 03:34:51 test01 kernel: KERNEL supported cpus: Jun 20 03:34:51 test01 kernel: Intel GenuineIntel Jun 20 03:34:51 test01 kernel: AMD AuthenticAMD Jun 20 03:34:51 test01 kernel: Centaur CentaurHauls
查詢到虛擬機重啓以前出現了Kernel Panic的錯誤,提示:INFO: task jbd2/sda1-8:540 blocked for more than 120 seconds. 備註:jdb2進程是一個文件系統的寫journal的進程spa
Jun 19 00:27:01 test01 auditd[1424]: Audit daemon rotating log files Jun 20 03:28:27 test01 kernel: INFO: task jbd2/sda1-8:540 blocked for more than 120 seconds. Jun 20 03:28:27 test01 kernel: Not tainted 2.6.32-642.1.1.el6.x86_64 #1 Jun 20 03:28:27 test01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 20 03:28:27 test01 kernel: jbd2/sda1-8 D 0000000000000000 0 540 2 0x00000000 Jun 20 03:28:27 test01 kernel: ffff880433257b30 0000000000000046 0000000000000000 ffff880433336d80 Jun 20 03:28:27 test01 kernel: 000000002a5c72b4 ffffffffa307745d ffff880433257ae0 ffff880433257ad0 Jun 20 03:28:27 test01 kernel: ffffffffa0045958 0000000000000000 ffff8804316505f8 ffff880433257fd8 Jun 20 03:28:27 test01 kernel: Call Trace: Jun 20 03:28:27 test01 kernel: [<ffffffffa0045958>] ? read_hv_clock_tsc+0x38/0x80 [hv_vmbus] Jun 20 03:28:27 test01 kernel: [<ffffffff8112e610>] ? sync_page+0x0/0x50 Jun 20 03:28:27 test01 kernel: [<ffffffff81547db3>] io_schedule+0x73/0xc0 Jun 20 03:28:27 test01 kernel: [<ffffffff8112e64d>] sync_page+0x3d/0x50 Jun 20 03:28:27 test01 kernel: [<ffffffff8154889f>] __wait_on_bit+0x5f/0x90 Jun 20 03:28:27 test01 kernel: [<ffffffff8112e883>] wait_on_page_bit+0x73/0x80 Jun 20 03:28:27 test01 kernel: [<ffffffff810a6b40>] ? wake_bit_function+0x0/0x50 Jun 20 03:28:27 test01 kernel: [<ffffffff811449c5>] ? pagevec_lookup_tag+0x25/0x40 Jun 20 03:28:27 test01 kernel: [<ffffffff8112ecab>] wait_on_page_writeback_range+0xfb/0x190 Jun 20 03:28:27 test01 kernel: [<ffffffff8112ed6f>] filemap_fdatawait+0x2f/0x40 Jun 20 03:28:27 test01 kernel: [<ffffffffa00bfe59>] jbd2_journal_commit_transaction+0x7e9/0x14f0 [jbd2] Jun 20 03:28:27 test01 kernel: [<ffffffff8108fd4b>] ? try_to_del_timer_sync+0x7b/0xe0 Jun 20 03:28:27 test01 kernel: [<ffffffffa00c5a38>] kjournald2+0xb8/0x220 [jbd2] Jun 20 03:28:27 test01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40 Jun 20 03:28:27 test01 kernel: [<ffffffffa00c5980>] ? kjournald2+0x0/0x220 [jbd2] Jun 20 03:28:27 test01 kernel: [<ffffffff810a662e>] kthread+0x9e/0xc0 Jun 20 03:28:27 test01 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Jun 20 03:28:27 test01 kernel: [<ffffffff810a6590>] ? kthread+0x0/0xc0 Jun 20 03:28:27 test01 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
查詢了secure日誌,僅僅發現有關機信號的記錄,沒有更加詳細的信息了 備註:SIGTERM 15 A 終止信號 操作系統
Jun 20 03:31:05 test01 sshd[1811]: Received signal 15; terminating. Jun 20 03:34:59 test01 sshd[1808]: Server listening on 0.0.0.0 port 22. Jun 20 03:34:59 test01 sshd[1808]: Server listening on :: port 22.
查詢了Linux問題虛擬機所在的物理節點,發現早在6/20 03:10 CST左右,物理節點已經出現了系統故障,虛擬機被自動遷移至其餘可用節點,此過程形成了虛擬機的自動重啓3d
總結:在某些狀況下,服務器系統日誌出現了call trace的報錯,也不必定是操做系統層面引發的問題,還多是物理節點(針對虛擬機)或硬件故障。日誌