性能分析之 GDB 調試 C++ 應用去分析 core dump
背景說明

問題分析
背景說明
問題分析
背景說明
這個內容只是為了做個記錄。
因為項目中有出現 coredump 的情況。
問題分析
先用 GDB 調起來。
[app@主機A bin]$ gdb PROGRAM core.31018
下面是一連串的 GDB 信息。
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later
上面這段話的意思是,隨便用,沒毛病。
Reading symbols from /bin/PROGRAM...done. [New LWP 31018] [New LWP 31027] [New LWP 31022] [New LWP 31036] [New LWP 31038] [New LWP 31041] [New LWP 31044] [New LWP 31047] [New LWP 31042] [New LWP 31032] [New LWP 31033] [New LWP 31034] [New LWP 31035] [New LWP 31037] [New LWP 31020] [New LWP 31026] [New LWP 31031] [New LWP 31030] [New LWP 31040] [New LWP 31039] [New LWP 31046] [New LWP 31045] [New LWP 31043] [New LWP 31019] [New LWP 31025] [New LWP 31024] [New LWP 31023] [New LWP 31021] [New LWP 31029] [New LWP 31028]
上面是 LWP 編號,也就是我們常說的線程號,在 linux 中線程就是 LWP,有人說,LWP 不是線程,而是進程。因為是 light-weight process 嘛,肯定是進程,是的,又不是 thread,確實它是叫做輕量級進程。但是在 linux中,除了它其他的也沒有線程了。看一下 WIKI 上說的:
In computer operating systems, a light-weight process (LWP) is a means of achieving multitasking. In the traditional meaning of the term, as used in Unix System V and Solaris, a LWP runs in user space on top of a single kernel thread and shares its address space and system resources with other LWPs within the same process. Multiple user level threads, managed by a thread library, can be placed on top of one or many LWPs - allowing multitasking to be done at the user level, which can have some performance benefits.
看了半天,也不知道所以然是啥對吧。那就對了,不用糾結,來跟我一起說,計較那么多概念干嗎,這個東西就是線程!
[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1".
上面是說 debug 用的是啥子庫。
Core was generated by `PROGRAM -g 1 -i 3006 -u VM_16_46_centos -U /data/app/log/LOG -m 0 -A'. Program terminated with signal 6, Ab
這里列出來了是怎么產生的 core。 這里有信號 6. 中止。 系統有多少信號呢?
大概是下面這么多。
那上面的處理動作是什么意思呢?
_A 缺省的動作是終止進程 _
_B 缺省的動作是忽略此信號 _
_C 缺省的動作是終止進程并進行內核映像轉儲(dump core) _
_D 缺省的動作是停止進程 _
_E 信號不能被捕獲 _
_F 信號不能被忽略 _
#0 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-19.2.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libcurl-7.29.0-25.el7.centos.x86_64 libgcc-4.8.5-4.el7.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libssh2-1.4.3-10.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19.1-18.el7.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-4.el7_1.x86_64 openldap-2.4.40-8.el7.x86_64 openssl-libs-1.0.1e-42.el7.9.x86_64 pcre-8.32-15.el7.x86_64 readline-6.2-9.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
上面這些是引用了一系列的東西來 debug這個 core 文件。要是換了個機器說不定 core 的。要是換了個機器說不定 core 的內容都看不到了呢(我猜的,我并沒有那么閑,真的換個機器試一下)。
查看斷點。
(gdb) bt #0 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6 #1 0x00007fa1fef39ce8 in abort () from /lib64/libc.so.6 #2 0x00007fa1fef78317 in __libc_message () from /lib64/libc.so.6 #3 0x00007fa1fef7e184 in malloc_printerr () from /lib64/libc.so.6 #4 0x00007fa1fef818e7 in _int_malloc () from /lib64/libc.so.6 #5 0x00007fa1fef828dc in malloc () from /lib64/libc.so.6 #6 0x000000000043a147 in CMemPool::frealloc (ud=0x0, ptr=0x0, osize=0, nsize=64, p=0x1a8a450) at MemPool.h:266 #7 0x0000000000434898 in luaM_realloc_ (L=0x1b344e0, block=0x0, osize=0, nsize=64) at lmem.cpp:79 #8 0x000000000043b481 in luaH_new (L=0x1b344e0, narray=0, nhash=0) at ltable.cpp:359 #9 0x000000000042cbf8 in lua_createtable (L=0x1b344e0, narray=0, nrec=0) at lapi.cpp:582 #10 0x00007fa1fecf0f76 in getMessage (l=0x1b344e0, pMessage=0x7fa1bc0008c0) at message.h:218 #11 0x00007fa1fecf3af6 in getResponse (l=0x1b344e0, res=0x1b0d6d0) at service.cpp:28 #12 0x00007fa1fecf3d3b in sendM (l=0x1b344e0) at service.cpp:59 #13 0x0000000000430dc0 in luaD_precall (L=0x1b344e0, func=0x1b247b0, nresults=2) at ldo.cpp:319 #14 0x000000000043faad in luaV_execute (L=0x1b344e0, nexeccalls=1) at lvm.cpp:590 #15 0x0000000000431092 in luaD_call (L=0x1b344e0, func=0x1b24740, nResults=-1) at ldo.cpp:377 #16 0x000000000042d420 in f_call (L=0x1b344e0, ud=0x7ffeb1c9db20) at lapi.cpp:801 #17 0x000000000042ffed in luaD_rawrunprotected (L=0x1b344e0, f=0x42d3eb
上面這條就是告訴你這個 core 文件 dump 點是在哪里,調用關系從下到上。這里面看到的問題點基本上都是底層的調用。而這些底層的調用也只是表現,最重要的是上層的變量是怎么傳的。
閑著沒事,看下所有線程的當前斷點。
(gdb) info threads Id Target Id Frame 30 Thread 0x7fa1f5365700 (LWP 31028) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 29 Thread 0x7fa1f4b64700 (LWP 31029) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 28 Thread 0x7fa1f8b6c700 (LWP 31021) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 27 Thread 0x7fa1f7b6a700 (LWP 31023) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 26 Thread 0x7fa1f7369700 (LWP 31024) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 25 Thread 0x7fa1f6b68700 (LWP 31025) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 24 Thread 0x7fa1f9b6e700 (LWP 31019) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 23 Thread 0x7fa1edb56700 (LWP 31043) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 22 Thread 0x7fa1ecb54700 (LWP 31045) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 21 Thread 0x7fa1ec353700 (LWP 31046) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 20 Thread 0x7fa1efb5a700 (LWP 31039) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 19 Thread 0x7fa1ef359700 (LWP 31040) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 18 Thread 0x7fa1f4363700 (LWP 31030) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 17 Thread 0x7fa1f3b62700 (LWP 31031) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 16 Thread 0x7fa1f6367700 (LWP 31026) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 15 Thread 0x7fa1f936d700 (LWP 31020) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 14 Thread 0x7fa1f0b5c700 (LWP 31037) 0x00007fa1feff09b3 in select () from /lib64/libc.so.6 13 Thread 0x7fa1f1b5e700 (LWP 31035) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 12 Thread 0x7fa1f235f700 (LWP 31034) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 11 Thread 0x7fa1f2b60700 (LWP 31033) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 10 Thread 0x7fa1f3361700 (LWP 31032) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 9 Thread 0x7fa1ee357700 (LWP 31042) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 8 Thread 0x7fa1ebb52700 (LWP 31047) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7fa1ed355700 (LWP 31044) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7fa1eeb58700 (LWP 31041) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 0x7fa1f035b700 (LWP 31038) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 Thread 0x7fa1f135d700 (LWP 31036) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x7fa1f836b700 (LWP 31022) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x7fa1f5b66700 (LWP 31027) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 * 1 Thread 0x7fa2009b0740 (LWP 31018) 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6 (gdb)
大部分都在 wait/timewait 之類的,也沒啥毛病。
嘗試打印下變量:
(gdb) p req No symbol "req" in current context.
怎么沒有符號表?
切一下frame。
(gdb) frame 29 #29 0x00000000004268ac in PROGRAM (req=0x7ffeb1c9e340) at srv.cpp:107 (gdb) p req = (SVCINFO *) 0x7ffeb1c9e340
可以看到這個變量的定義和值。有人說,這玩意是地址怎么看?
其實有源碼就什么都能看得到的。只是這里沒有加載進來。
GDB 默認搜索當前目錄,但是也沒搜索到。
編譯的時候是會記錄源碼位置的,但是因為這個主機上沒有,所以看不到。
如果有興趣玩的話,可以自己寫一段把源碼放一起,看看效果。
C++ 任務調度
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。