CPU 瓶頸
-------------------
下面我們將就如何使用命令vmstat、tprof和ps檢查系統是否存在CPU瓶頸做一個簡單介紹。
1. vmstat
使用命令
# vmstat 1 10
P650A:/#vmstat 1 10
System configuration: lcpu=16 mem=15744MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 3208684 10343 0 0 0 0 0 0 19 1447 290 0 0 99 0
0 0 3208686 10341 0 0 0 0 0 0 2 1268 248 0 0 99 0
0 0 3208686 10341 0 0 0 0 0 0 1 1265 246 0 0 99 0
0 0 3208687 10340 0 0 0 0 0 0 3 1260 254 0 0 99 0
0 0 3208687 10340 0 0 0 0 0 0 1 1320 264 0 0 99 0
0 0 3208687 10337 0 0 0 0 0 0 24 4145 321 0 3 97 0
0 0 3208687 10337 0 0 0 0 0 0 9 1438 313 0 0 99 0
0 0 3208687 10334 0 3 0 0 0 0 40 2348 1110 0 0 99 0
0 0 3208687 10334 0 0 0 0 0 0 1 1323 257 0 0 99 0
0 0 3208687 10334 0 0 0 0 0 0 5 1251 242 0 0 99 0
注: 運行隊列有進程等待時系統運行速度會降低。
id CPU 空閒時間或無I/O等待時間的百分比;
wa CPU I/O 等待時間的百分比;
r 運行隊列中的線程數;
如果 id 和wa 的值持續為接近0的值,表明CPU此時處於繁忙狀態。
下面來看看字段r(運行隊列中的線程數)。
運行隊列中等待的線程數越多,系統性能受到的影響越大。
2. tprof
tprof命令用於統計每個進程的CPU使用情況。
以超級用戶root的身份運行下列命令,可以找出進程占用的CPU時間:
# tprof -x sleep 30
此命令運行30秒鐘,在當前目錄下創建一個prof的文件。30秒鐘內,CPU被調度次數約為3000次。
prof文件中的字段Total為此進程調度到的CPU次數。如果進程所對應的Total字段的值為1500,
表示該進程在3000次CPU調度中占用了1500次,或理解為使用了一半的CPU時間。
tprof的輸出准確地顯示出哪個進程在使用CPU時間。
例:
P650A:/#tprof -x sleep 30
Mon May 16 16:08:54 2011
System: AIX 5.3 Node: P650A Machine: 00C3EE9E4C00
Starting Command sleep 30
stopping trace collection.
Generating sleep.prof
P650A:/#more sleep.prof
Configuration information
=========================
System: AIX 5.3 Node: P650A Machine: 00C3EE9E4C00
Tprof command was:
tprof -x sleep 30
Trace command was:
/usr/bin/trace -ad -M -L 283203993 -T 500000 -j 000,00A,001,002,003,38F,005,006,134,139,5A2,5A5,465,234, -o -
Total Samples = 8113
Traced Time = 30.01s (out of a total execution time of 30.01s)
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Process Freq Total Kernel User Shared Other
======= ==== ===== ====== ==== ====== =====
wait 16 99.68 99.68 0.00 0.00 0.00
/usr/bin/ps 4 0.18 0.18 0.00 0.00 0.00
swapper 1 0.11 0.11 0.00 0.00 0.00
/usr/sbin/syncd 1 0.01 0.01 0.00 0.00 0.00
ora_mrp0_standby 1 0.01 0.00 0.01 0.00 0.00
======= ==== ===== ====== ==== ====== =====
Total 23 100.00 99.99 0.01 0.00 0.00
Process PID TID Total Kernel User Shared Other
======= === === ===== ====== ==== ====== =====
wait 65568 90157 18.80 18.80 0.00 0.00 0.00
wait 8196 8197 15.88 15.88 0.00 0.00 0.00
wait 296 309 7.25 7.25 0.00 0.00 0.00
wait 33080 33093 7.08 7.08 0.00 0.00 0.00
wait 73764 98353 7.06 7.06 0.00 0.00 0.00
wait 24884 24897 6.64 6.64 0.00 0.00 0.00
wait 16688 16701 3.70 3.70 0.00 0.00 0.00
wait 69666 94255 3.70 3.70 0.00 0.00 0.00
wait 37178 37191 3.70 3.70 0.00 0.00 0.00
wait 77862 102451 3.70 3.70 0.00 0.00 0.00
wait 12590 12603 3.70 3.70 0.00 0.00 0.00
wait 57372 77863 3.70 3.70 0.00 0.00 0.00
wait 28982 28995 3.70 3.70 0.00 0.00 0.00
wait 20786 20799 3.70 3.70 0.00 0.00 0.00
wait 53274 73765 3.70 3.70 0.00 0.00 0.00
wait 61470 86059 3.70 3.70 0.00 0.00 0.00
swapper 0 3 0.11 0.11 0.00 0.00 0.00
/usr/bin/ps 9646118 10195121 0.07 0.07 0.00 0.00 0.00
/usr/bin/ps 9580580 10076383 0.05 0.05 0.00 0.00 0.00
/usr/bin/ps 9646122 10195125 0.05 0.05 0.00 0.00 0.00
ora_mrp0_standby 364696 860273 0.01 0.00 0.01 0.00 0.00
/usr/sbin/syncd 242164 442539 0.01 0.01 0.00 0.00 0.00
/usr/bin/ps 9580582 10076385 0.01 0.01 0.00 0.00 0.00
======= === === ===== ====== ==== ====== =====
Total 100.00 99.99 0.01 0.00 0.00
3. netpmon
netpmon命令用於監控與網絡有關的I/0及CPU的使用情況。
以root 身份運行下面的命令,可以找出進程使用的CPU時間,以及其中與網絡有關的代碼使用的CPU時間:
# netpmon -o /tmp/netpmon.out -O cpu -v; sleep 30; trcstop
此命令運行30秒鐘,並在/tmp目錄下生成文件 netpmon.out。其中字段 CPU Time 為進程使用CPU的時間總值,
CPU%對應其百分比,Network CPU% 為進程中與網絡有關的代碼所占用的CPU百分比。
例:
P650A:/#netpmon -o /tmp/netpmon.out -O cpu -v; sleep 30; trcstop
Mon May 16 16:13:36 2011
System: AIX 5.3 Node: P650A Machine: 00C3EE9E4C00
/usr/bin/trace -ad -L 283203993 -T 1000000 -j 000,00A,001,002,003,38F,005,006,106,10C,4B0,210,139,134,135,100,200,102,103,101,104,465,467,46A,419,256,255,262,26A,26B,32D,32E,2A7,2A8,351,352,320,321,30A,30B,330,331,334,335,2C3,2C4,2A4,2A5,2E6,2E7,2DA,2DB,2EA,2EB,473,474,470,471,252,216,211, -o -
Run trcstop command to signal end of trace.
P650A:more /tmp/netpmon.out
Process CPU Usage Statistics:
-----------------------------
Network
Process PID CPU Time CPU % CPU %
----------------------------------------------------------
netpmon 7995686 27.3816 5.876 0.003
xmwlm 360786 15.0879 3.238 0.000
UNKNOWN 9637940 14.7336 3.162 0.000
aioserver 1536082 6.0583 1.300 0.000
ora_p010_standby 401556 0.6207 0.133 0.000
sched 4394 0.2329 0.050 0.000
ps 1511926 0.0678 0.015 0.000
ps 9625748 0.0631 0.014 0.000
oraclestandby 405822 0.0468 0.010 0.000
dtgreet 233958 0.0338 0.007 0.000
syncd 242164 0.0315 0.007 0.000
swapper 0 0.0269 0.006 0.000
sh 9666742 0.0155 0.003 0.000
ora_mrp0_standby 364696 0.0130 0.003 0.000
wrapper-aix-ppc-32 7827780 0.0107 0.002 0.000
ora_dbw0_standby 373114 0.0095 0.002 0.000
java 9584824 0.0085 0.002 0.000
ora_p001_standby 376966 0.0069 0.001 0.000
init 1 0.0058 0.001 0.000
ora_dbw1_standby 328002 0.0047 0.001 0.000
grep 9666746 0.0047 0.001 0.000
ora_p005_standby 389262 0.0046 0.001 0.000
dsmrecalld 254234 0.0041 0.001 0.000
ora_p004_standby 385164 0.0041 0.001 0.000
gil 45374 0.0039 0.001 0.001
ora_p002_standby 381064 0.0036 0.001 0.000
ora_p014_standby 413852 0.0036 0.001 0.000
ora_p009_standby 368772 0.0033 0.001 0.000
oraclestandby 7852356 0.0031 0.001 0.000
ora_pmon_standby 241746 0.0030 0.001 0.000
grep 9625744 0.0027 0.001 0.000
ora_p003_standby 422282 0.0026 0.001 0.000
ora_p000_standby 372868 0.0025 0.001 0.000
ora_ckpt_standby 381286 0.0022 0.000 0.000
ora_p006_standby 393360 0.0021 0.000 0.000
aioserver 8032318 0.0020 0.000 0.000
----------------------------------------------------------
Total (all processes) 64.5909 13.862 0.004
Idle time 431.2581 92.550
========================================================================
First Level Interrupt Handler CPU Usage Statistics:
---------------------------------------------------
Network
FLIH CPU Time CPU % CPU %
----------------------------------------------------------
data page fault 0.2858 0.061 0.000
PPC decrementer 0.1983 0.043 0.000
external device 0.1106 0.024 0.000
queued interrupt 0.0045 0.001 0.000
instruction page fault 0.0001 0.000 0.000
----------------------------------------------------------
Total (all FLIHs) 0.5993 0.129 0.000
========================================================================
Second Level Interrupt Handler CPU Usage Statistics:
----------------------------------------------------
Network
SLIH CPU Time CPU % CPU %
----------------------------------------------------------
goentdd64 0.0080 0.002 0.002
sisraid_dd64 0.0063 0.001 0.000
----------------------------------------------------------
Total (all SLIHs) 0.0143 0.003 0.002
========================================================================
Detailed Second Level Interrupt Handler CPU Usage Statistics:
-------------------------------------------------------------
SLIH: goentdd64
count: 112
cpu time (msec): avg 0.072 min 0.003 max 0.199 sdev 0.057
SLIH: sisraid_dd64
count: 860
cpu time (msec): avg 0.007 min 0.005 max 0.032 sdev 0.002
COMBINED (All SLIHs)
count: 972
cpu time (msec): avg 0.015 min 0.003 max 0.199 sdev 0.028
P650A:/#
本文出自 “麥地塢” 博客,請務必保留此出處http://yunlongzheng.blog.51cto.com/788996/566538