Linux上进程意外宕机有内存OOM操作系统杀死 或者 进程内存越界被杀死两种场景。本文介绍了如何定位进程是因为哪种原因被操作系统杀死。

操作系统杀死进程排查Linux上进程意外宕机有内存OOM操作系统杀死 或者 进程内存越界被杀死两种场景。

123456# 通过dmesg和/var/logs/messages 获取信息# /var/logs/messages 文件包含内核和用户空间程序消息,数据持久化保存(主要)cat /var/log/messages |grep -i "dump\|kill"# dmesg 获取内核消息,服务器重启后消失dmesg -T

OOM Killer12345678910111213# 通过以下命令发现OOM Killeddmesg -T| egrep -i 'killed process'grep -i 'killed process' /var/log/messagestotal-vm:19058680kB, anon-rss:14610188kB, file-rss:76kB# sar -r 查看内存使用率情况确认Sar 命令查看历史硬件使用率情况1. 每10分钟刷新一次信息2. 可以实时监测,也可以查看过去的信息。3. 存储资源放在/var/log/sa路径,通过文件创建日期来判断数据时间,sar -r -f /var/log/sa/sa23# 解决方案:内存扩容或swap扩大

Core Dumped1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768# ======= 现象 ============# 操作系统日志中有dump日志cat /var/log/messages |grep -i "dump\|kill"# 或者有如下日志Segmentation fault(Core dumped)原因:program tries to access a memory address outside the programs range.# ======= 解决方案 ============# 查询coredump历史,获取异常PIDcoredumpctl list# 查询coredump dbp分析(重要),info指令可以看到简介,gdb指令包含info,并可以直接分析coredump进行debug,若因操作系统原因未生成coredump,则无法查看更多。coredumpctl gdb $pid# Note: 也可以直接找到coredump文件,使用gdb -c 进行分析coredumpctl info $pid# 参考示例 PID: 16042 (npc) UID: 1000 (yzy) GID: 100 (users) Signal: 6 (ABRT) Timestamp: Sun 2018-06-17 10:52:15 +08 (7h ago) Command Line: ./npc Executable: /home/yzy/Repos/naive-pascal-compiler/cmake-build-debug/npc Control Group: /user.slice/user-1000.slice/session-c2.scope Unit: session-c2.scope Slice: user-1000.slice Session: c2 Owner UID: 1000 (yzy) Boot ID: 6d54757c981742b6984d6d89a59ba2a3 Machine ID: adbc8ab8645546ffb9ef66ec15b02081 Hostname: yzy-arch Storage: /var/lib/systemd/coredump/core.npc.1000.6d54757c981742b6984d6d89a59ba2a3.16042.1529203935000000.lz4 Message: Process 16042 (npc) of user 1000 dumped core. Stack trace of thread 16042: #0 0x00007f8df8f4986b raise (libc.so.6) #1 0x00007f8df8f3440e abort (libc.so.6) #2 0x00007f8df8f342e0 __assert_fail_base.cold.0 (libc.so.6) #3 0x00007f8df8f42112 __assert_fail (libc.so.6) #4 0x0000563a871686f3 n/a (/home/yzy/Repos/naive-pascal-compiler/cmake-build-debug/npc) #5 0x0000563a87166c3c n/a (/home/yzy/Repos/naive-pascal-compiler/cmake-build-debug/npc) #6 0x0000563a8716afe3 n/a (/home/yzy/Repos/naive-pascal-compiler/cmake-build-debug/npc) #7 0x0000563a8716582b n/a (/home/yzy/Repos/naive-pascal-compiler/cmake-build-debug/npc) #8 0x0000563a8715e499 n/a (/home/yzy/Repos/naive-pascal-compiler/cmake-build-debug/npc) #9 0x00007f8df8f3606b __libc_start_main (libc.so.6) #10 0x0000563a8715e36a n/a (/home/yzy/Repos/naive-pascal-compiler/cmake-build-debug/npc)GNU gdb (GDB) 8.1Copyright (C) 2018 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-pc-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:.Find the GDB manual and other documentation resources online at:.For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from /home/yzy/Repos/naive-pascal-compiler/cmake-build-debug/npc...done.[New LWP 16042]Core was generated by `./npc'.Program terminated with signal SIGABRT, Aborted.#0 0x00007f8df8f4986b in raise () from /usr/lib/libc.so.6(gdb)

Coredump未生成12345678910111213141516# 声明coredump生成路径及格式/proc/sys/kernel/core_pattern/proc/sys/kernel/core_pipe_limit/proc/sys/kernel/core_uses_pid# coredump未生成问题解决# 查看操作系统限制ulimit -a # 若core file size为0会导致coredump文件无法生成,如下# core file size 0# 修改为unlimited,仅对当前会话生效ulimit -c unlimited# 永久生效,重新登录会话vi /etc/security/limits.conf* hard core unlimited