微吼云上線多路互動直播服務 加速多場景互動直播落地
1318
2025-04-01
前言
在日常數據庫的使用中,難免會遇到一些內存問題。此次博文主要向大家分享一些華為云數倉GaussDB(DWS)內存的基本框架以及基本視圖的使用,以便遇到內存問題后可以有一個基本的判斷。
注意,本篇博文基于華為云數倉GaussDB(DWS) 8.0版本,其他版本細節上或許稍有不同。
內存常用視圖
1.?? PV_TOTAL_MEMORY_DETAIL視圖
該視圖會展示當前數據庫節點的內存使用信息,單位為MB。
視圖中個字段的含義:nodename:節點名稱,memorytype:內存類型,memorymbytes:對應內存類型的大小。
常用的內存類型有以下幾種:
max_process_memory:取自GUC參數max_process_memory的配置,表示一個數據庫節點最大可使用的物理內存。
process_used_memory:取自/proc/pid/statm(第二個值) * pagesize,pid替換為當前節點所在的進程號。表示當前節點所處進程已使用的內存。
max_dynamic_memory:由下面公式計算而來,表示Gaussdb內核所能使用的最大內存。
max_dynamic_memory = max_process_memory- max_cstore_memory - udf_reserved_memory - max_shared_memory ;
dynamic_used_memory:GaussDB內核已使用內存,由GaussDB內存管理在申請內存時統計而來。
dynamic_peak_memory:GaussDB內核使用內存峰值,由GaussDB內存管理在申請內存時統計而來。
dynamic_used_shrctx:GaussDB內核已使用線程間共享內存上下文內存大小,由GaussDB內存管理在申請內存時統計而來。
dynamic_peak_shrctx:GaussDB內核已使用線程間共享內存上下文內存峰值,由GaussDB內存管理在申請內存時統計而來。
max_shared_memory:進程間最大共享內存大小
shared_used_memory:進程間已使用共享內存大小,由/proc/pid/statm(第三個值) * pagesize值統計而來。
max_cstore_memory:列存允許的最大使用內存,由GUC參數cstore_buffers配置。
cstore_used_memory:列存已使用內存,一般包含列存或HDFS使用過程中所消耗的內存。
other_used_memory:通常表示除去GaussDB內核使用的內存以外的內存使用,通常是三方庫使用所消耗的內存,例如LLVM,Kerberos等。
postgres=#?select?*?from??PV_TOTAL_MEMORY_DETAIL; ???nodename???|???????memorytype????????|?memorymbytes --------------+-------------------------+-------------- ?coordinator1?|?max_process_memory??????|????????12288 ?coordinator1?|?process_used_memory?????|??????????240 ?coordinator1?|?max_dynamic_memory??????|????????11564 ?coordinator1?|?dynamic_used_memory?????|??????????229 ?coordinator1?|?dynamic_peak_memory?????|??????????234 ?coordinator1?|?dynamic_used_shrctx?????|????????????1 ?coordinator1?|?dynamic_peak_shrctx?????|????????????1 ?coordinator1?|?max_shared_memory???????|??????????211 ?coordinator1?|?shared_used_memory??????|??????????139 ?coordinator1?|?max_cstore_memory???????|??????????512 ?coordinator1?|?cstore_used_memory??????|????????????0 ?coordinator1?|?max_sctpcomm_memory?????|????????????0 ?coordinator1?|?sctpcomm_used_memory????|????????????0 ?coordinator1?|?sctpcomm_peak_memory????|????????????0 ?coordinator1?|?other_used_memory???????|????????????0 ?coordinator1?|?gpu_max_dynamic_memory??|????????????0 ?coordinator1?|?gpu_dynamic_used_memory?|????????????0 ?coordinator1?|?gpu_dynamic_peak_memory?|????????????0 ?coordinator1?|?pooler_conn_memory??????|????????????0 ?coordinator1?|?pooler_freeconn_memory??|????????????0 ?coordinator1?|?storage_compress_memory?|????????????0 ?coordinator1?|?udf_reserved_memory?????|????????????0 (22?rows)
2. PV_SESSION_MEMORY_DETAIL視圖
華為云數倉GaussDB(DWS)的內存管理框架沿用了之前的內存上下文的思路。在PV_SESSION_MEMORY_DETAIL的視圖中,將會統計各線程的內存上下文維度統計的內存使用情況。
視圖中個字段的含義如下:
Sessid:表示Session ID,由線程啟動時間+線程標識拼接而來。
Sesstype:線程名稱
Contextname:內存上下文名稱。
Level:內存上下文層級。
Parent:父內存上下文名稱。
Totalsize:當前內存上下文內存大小
Freesize:當前內存上下文已釋放內存大小
Usedsize:當前內存上下文已使用大小。
postgres=#?select?*?from?PV_SESSION_MEMORY_DETAIL?order?by?totalsize?desc; ???????????sessid???????????|????????sesstype?????????|?????????contextname??????????|?level?|????????????parent????????????|?totalsize?|?freesize?|?usedsize ----------------------------+-------------------------+------------------------------+-------+------------------------------+-----------+----------+---------- ?0.140169093357952??????????|?postmaster??????????????|?Postmaster???????????????????|?????1?|?TopMemoryContext?????????????|??26566912?|????23912?|?26543000 ?0.140169093357952??????????|?postmaster??????????????|?gs_signal????????????????????|?????1?|?TopMemoryContext?????????????|???4272464?|??2050224?|??2222240 ?1594694378.140168361137920?|?WLMCollectWorker????????|?CacheMemoryContext???????????|?????1?|?TopMemoryContext?????????????|???1455032?|???268488?|??1186544 ?1594694378.140168296134400?|?WLMarbiter??????????????|?CacheMemoryContext???????????|?????1?|?TopMemoryContext?????????????|???1455032?|???266696?|??1188336 ?1594694378.140168465999616?|?JobScheduler????????????|?CacheMemoryContext???????????|?????1?|?TopMemoryContext?????????????|???1455032?|???239584?|??1215448 ?1594708276.140168270964480?|?postgres????????????????|?CacheMemoryContext???????????|?????1?|?TopMemoryContext?????????????|???1455032?|???316392?|??1138640 ?1594694438.140168207001344?|?postgres????????????????|?CacheMemoryContext???????????|?????1?|?TopMemoryContext?????????????|???1455032?|???329944?|??1125088 ?1594694378.140168344356608?|?WLMmonitor??????????????|?CacheMemoryContext???????????|?????1?|?TopMemoryContext?????????????|???1455032?|???269528?|??1185504 ?1594708276.140168270964480?|?postgres????????????????|?TempSmallContextGroup????????|?????0?|??????????????????????????????|????550592?|???160320?|??????114 ?1594694438.140168207001344?|?postgres????????????????|?TempSmallContextGroup????????|?????0?|??????????????????????????????|????530816?|???148256?|??????107 ?1594708276.140168270964480?|?postgres????????????????|?SRF?multi-call?context???????|?????5?|?FunctionScan_140168270964480?|????496704?|?????8032?|???488672 ?1594694378.140168465999616?|?JobScheduler????????????|?TempSmallContextGroup????????|?????0?|??????????????????????????????|????489152?|???120672?|??????102 ?1594694378.140168361137920?|?WLMCollectWorker????????|?TempSmallContextGroup????????|?????0?|??????????????????????????????|????477184?|???118168?|???????98 ?1594694378.140168344356608?|?WLMmonitor??????????????|?TempSmallContextGroup????????|?????0?|??????????????????????????????|????477184?|???118280?|???????98 ?1594694378.140168296134400?|?WLMarbiter??????????????|?TempSmallContextGroup????????|?????0?|??????????????????????????????|????477184?|???118136?|???????98 ?1594694438.140168207001344?|?postgres????????????????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????460808?|????24728?|???436080 ?1594708276.140168270964480?|?postgres????????????????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????452616?|?????5352?|???447264 ?1594694378.140168361137920?|?WLMCollectWorker????????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????411528?|?????3272?|???408256 ?1594694378.140168296134400?|?WLMarbiter??????????????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????411528?|?????3032?|???408496 ?1594694378.140168344356608?|?WLMmonitor??????????????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????411528?|?????3272?|???408256 ?1594694378.140168465999616?|?JobScheduler????????????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????255240?|????10552?|???244688 ?0.140169093357952??????????|?postmaster??????????????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????140680?|?????7856?|???132824 ?1594694438.140168207001344?|?postgres????????????????|?VecFuncHash??????????????????|?????1?|?TopMemoryContext?????????????|????122272?|????20928?|???101344 ?1594708276.140168270964480?|?postgres????????????????|?VecFuncHash??????????????????|?????1?|?TopMemoryContext?????????????|????122272?|????20928?|???101344 ?1594694378.140168539399936?|?CheckPointer?thread?????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????116104?|?????5944?|???110160 ?1594694378.140168482780928?|?WalWriter?thread????????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????107848?|?????7224?|???100624 ?1594694378.140168505317120?|?BackgroundWriter?thread?|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????107848?|?????7144?|???100704 ?1594694378.140168394700544?|?TwoPhaseCleaner?thread??|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????107848?|?????8280?|????99568 ?1594694378.140168377919232?|?FaultMonitor?thread?????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????107848?|?????8280?|????99568 ?1594694378.140168442930944?|?PgstatCollector?????????|?TopMemoryContext?????????????|?????0?|??????????????????????????????|????107848?|?????6696?|???101152 ?1594694378.140168482780928?|?WalWriter?thread????????|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656 ?1594694378.140168465999616?|?JobScheduler????????????|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656 ?1594694378.140168505317120?|?BackgroundWriter?thread?|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656 ?0.140169093357952??????????|?postmaster??????????????|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656 ?1594694378.140168296134400?|?WLMarbiter??????????????|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656 ?1594694438.140168207001344?|?postgres????????????????|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656 ?1594694378.140168377919232?|?FaultMonitor?thread?????|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656 ?1594694378.140168539399936?|?CheckPointer?thread?????|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656 ?1594694378.140168442930944?|?PgstatCollector?????????|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656 ?1594694378.140168361137920?|?WLMCollectWorker????????|?Timezones????????????????????|?????1?|?TopMemoryContext?????????????|?????83488?|?????2832?|????80656
3. PG_SHARED_MEMORY_DETAIL視圖
華為云數倉GaussDB(DWS)除了通用內存上下文以外,還包含共享內存上下文類型用于線程間共享數據。由于共享內存上下文是屬于一個進程的,故該視圖相比PV_SESSION_MEMORY_DETAIL,不存在sessid,其他的字段含義相同。
postgres=#?select?*?from?PG_SHARED_MEMORY_DETAIL?order?by?totalsize?desc; ??????????????contextname???????????????|?level?|?????????????????parent?????????????????|?totalsize?|?freesize?|?usedsize ----------------------------------------+-------+----------------------------------------+-----------+----------+---------- ?Workload?manager?memory?context????????|?????1?|?ProcessMemory??????????????????????????|???1056832?|?????6080?|??1050752 ?PoolerAgentContext?????????????????????|?????2?|?PoolerMemoryContext????????????????????|?????57344?|????36000?|????21344 ?PoolerCoreContext??????????????????????|?????2?|?PoolerMemoryContext????????????????????|?????57344?|????30544?|????26800 ?ProcessMemory??????????????????????????|?????0?|????????????????????????????????????????|?????57344?|????28304?|????29040 ?wlm?iostat?info?hash?table?????????????|?????2?|?Workload?manager?memory?context????????|?????24576?|????10832?|????13744 ?WaitCountGlobalContext?????????????????|?????1?|?ProcessMemory??????????????????????????|?????24576?|?????9984?|????14592 ?wlm?user?info?hash?table???????????????|?????2?|?Workload?manager?memory?context????????|?????24576?|????10832?|????13744 ?OBS?connector?cache????????????????????|?????1?|?ProcessMemory??????????????????????????|?????24576?|????15056?|?????9520 ?Resource?pool?hash?table???????????????|?????2?|?Workload?manager?memory?context????????|?????17984?|?????2704?|????15280 ?Dummy?server?cache?????????????????????|?????1?|?ProcessMemory??????????????????????????|??????8192?|?????2832?|?????5360 ?Node?Pool??????????????????????????????|?????3?|?PoolerCoreContext??????????????????????|??????8192?|??????768?|?????7424 ?dywlm?register?hash?table??????????????|?????2?|?Workload?manager?memory?context????????|??????8192?|?????2704?|?????5488 ?node?group?hash?table??????????????????|?????2?|?Workload?manager?memory?context????????|??????8192?|?????2832?|?????5360 ?sql?count?lookup?hash??????????????????|?????2?|?WaitCountGlobalContext?????????????????|??????8192?|?????8000?|??????192 ?wlm?session?info?hash?table????????????|?????2?|?Workload?manager?memory?context????????|??????8192?|?????2704?|?????5488 ?wlm?collector?hash?table???????????????|?????2?|?Workload?manager?memory?context????????|??????8192?|?????2704?|?????5488 ?DFS?connector?cache????????????????????|?????1?|?ProcessMemory??????????????????????????|??????8192?|??????768?|?????7424 ?PoolerMemoryContext????????????????????|?????1?|?ProcessMemory??????????????????????????|??????8192?|?????5456?|?????2736 ?operator?collector?hash?table??????????|?????3?|?Operator?resource?track?memory?context?|??????8192?|??????640?|?????7552 ?operator?running?hash?table????????????|?????3?|?Operator?resource?track?memory?context?|??????8192?|??????640?|?????7552 ?Query?resource?track?memory?context????|?????2?|?Workload?manager?memory?context????????|??????8192?|?????8144?|???????48 ?bad?block?stat?global?hash?table???????|?????2?|?bad?block?stat?global?memory?context???|??????8192?|?????2704?|?????5488 ?global?info?hash?table?????????????????|?????2?|?Workload?manager?memory?context????????|??????8192?|?????2832?|?????5360 ?StreamInfoContext??????????????????????|?????1?|?ProcessMemory??????????????????????????|?????????0?|????????0?|????????0 ?Operator?resource?track?memory?context?|?????2?|?Workload?manager?memory?context????????|?????????0?|????????0?|????????0 ?bad?block?stat?global?memory?context???|?????1?|?ProcessMemory??????????????????????????|?????????0?|????????0?|????????0 ?ProcSubXidCacheContext?????????????????|?????1?|?ProcessMemory??????????????????????????|?????????0?|????????0?|????????0 (27?rows)
內存相關數據收集
通過前面講述的幾個內存視圖,我們可以對華為云數倉GaussDB(DWS)內存有一個整體的理解。下面將分享幾個內存相關數據收集的功能。
注意: 鑒于論壇中的問題多是release版本,故debug版本的各種內存相關功能將不再此次介紹以免混淆。同時收集數據就會帶來一些消耗,避免長期大規模的使用下面的方案,僅用作問題診斷數據分析使用。
1.?? pv_session_memctx_detail函數
通過上面的視圖介紹我們了解到了PV_SESSION_MEMORY_DETAIL視圖的作用。我們可以通過pv_session_memctx_detail打印出該線程內存上下文的詳細信息。注意第一個參數表示線程ID,我們根據上線的介紹得知sessid的后半部分就是線程ID。第二個參數表示需要打印內存上下文的名稱,在release為空才可以生效即由TopMemoryContext開始遞歸打印內存上下文信息。Release版本不包含chunk的詳細信息。
例如:
select?*?from?pv_session_memctx_detail(140168207001344,'');
生成的文件默認在/tmp/dumpmem下,文件中三列分別表示內存上下文名稱,總大小,剩余大小。
文件內容樣例:
140168207001344_1594695418.log
TopMemoryContext,?460808,?24728 Record?information?cache,?24576,?14928 TableSpace?cache,?8192,?2304 set?params?hash?table,?8192,?2832 VecFuncHash,?122272,?20928 MaskPasswordCtx,?8192,?8144 RowDescriptionContext,?8192,?7104 MessageContext,?8192,?7104 Operator?class?cache,?8192,?768 smgr?relation?table,?24576,?8880 tokenize?file?cxt,?0,?0 hba?parser?context,?3072,?480 TransactionAbortContext,?32768,?32720 bad?block?stat?thread?hash?table,?8192,?1664 bad?block?thread?memory?context,?0,?0 Portal?hash,?8192,?768 PortalMemory,?8192,?8144 Partcache?by?OID,?8192,?2832 Relcache?by?OID,?24576,?11904 CacheMemoryContext,?1455032,?329944 pg_index_indrelid_index,?3776,?280 pg_toast_2618_index,?5824,?1760 pg_prepared_xacts,?31744,?3488 pg_db_role_setting_databaseid_rol_index,?5824,?1808 pg_opclass_am_name_nsp_index,?5824,?1768 pg_directory_name_index,?3776,?328 pg_foreign_data_wrapper_name_index,?3776,?328 pg_enum_oid_index,?3776,?328 pg_class_relname_nsp_index,?5824,?1760 pg_foreign_server_oid_index,?3776,?328 pg_statistic_relid_kind_att_inh_index,?5184,?760 pg_cast_source_target_index,?5824,?1808 pg_language_name_index,?3776,?328 pg_collation_oid_index,?3776,?328 pg_amop_fam_strat_index,?5184,?760 pg_index_indexrelid_index,?3776,?280 pg_ts_template_tmplname_index,?5824,?1808 pg_ts_config_map_index,?5824,?1768 pg_partition_partoid_index,?5824,?1768
2.?? memory_tracking_mode參數
除了上面的內存上下文數據統計,我們還可以通過memory_tracking_mode設置內存信息統計的模式,共支持四種模式:
none:不啟動內存統計功能。
normal:僅做內存實時統計,不生成文件。
executor:生成統計文件,包含執行層使用過的所有已分配內存的上下文信息。當為executor模式時,將在GaussDB進程(取決于在哪個數據節點最終執行了該算子)的pg_log目錄下生成cvs格式文件,命名方式為:memory_track_
fullexec:生成文件包含執行層申請過的所有內存上下文信息。當設置為fullexec模式時,輸出信息和executor模式相同,但可能增加部分內存上
下文分配信息,因為所有申請內存(無論是否申請成功)相關的信息都會被打印出來。由于僅記錄內存申請信息,故記錄中內存上下文使用的峰值均為0。
csv文件內容樣例:
memory_track_datanode1_query_72339069014639220.csv
0,?0,?ExecutorState,?0,?(null),?0,?8K,?656K,?4 1,?4,?CStoreScan_139944754403072,?0,?ExecutorState,?1,?272K,?625K,?4 2,?9,?cstore?scan?per?scan?memory?context,?1,?CStoreScan_139944754403072,?2,?24K,?24K,?4 3,?8,?cstore?scan?memory?context,?1,?CStoreScan_139944754403072,?2,?328K,?328K,?4 4,?2,?VecToRow_139944754403072,?0,?ExecutorState,?1,?23K,?23K,?4 0,?0,?ExecutorState,?0,?(null),?0,?8K,?144K,?0 1,?13,?Stream_72339069014639220_4,?0,?ExecutorState,?1,?72K,?72K,?0 2,?10,?Sort_72339069014639220_3,?0,?ExecutorState,?1,?8K,?40K,?0 3,?16,?TupleSort,?2,?Sort_72339069014639220_3,?2,?32K,?32K,?0 4,?2,?Agg_72339069014639220_2,?0,?ExecutorState,?1,?24K,?24K,?0
一些診斷方案:
1.?? 內存膨脹
在release版本調試工具以及信息比較受限,基本都是先通過前面介紹的三種視圖初步定位大致功能。首先查看PV_TOTAL_MEMORY_DETAIL視圖確定是哪一塊內存出現了膨脹或者泄露。若是other_used_memory則要考慮三方倉的場景。若是dynamic_used_memory較大,則要查PV_SESSION_MEMORY_DETAIL視圖,查看哪個線程,哪個內存上下文占用內存過多。根據這些信息推斷出大致問題場景。
2.?? 內存不足
在內核發現內存不足的時候會有memory is temporarily unavailable的日志提示。首先觀察日志,若日志里是reaching the database memory limitation則說明內核使用內存到達了max_dynamic_memory,則需要根據PV_SESSION_MEMORY_DETAIL視圖分析是那個內存上下文占用內存較多,分析出業務場景。若日志里是reaching the OS memory limitation,則表示是操作系統分配內存失敗導致,需要看操作系統參數配置以及內存硬件等情況。
小結:
生產環境出現內存問題一般會比較棘手,而且release版本內存檢測工具以及數據信息使用都比較受限,遇到問題需要通過上述的方案以及手段快速定位出出現內存問題的相關業務場景。有了業務場景,后面通過debug版本使用ASAN地址消毒技術以及Jemalloc Profiling便可以較快的定位出來。
DWS EI企業智能
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。