Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default.
Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate.
Nov 8 18:07:04 dp3 /usr/sbin/gmond[1664]: [PYTHON] Can't call the metric handler function for [diskstat_sdd_reads] in the python module [diskstat].#012
Nov 8 18:07:04 dp3 /usr/sbin/gmond[1664]: [PYTHON] Can't call the metric handler function for [diskstat_sdd_writes] in the python module [diskstat].#012
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Error writing state file: No space left on device
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Cannot write to file /var/run/ConsoleKit/database~
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Unable to spawn /usr/lib/ConsoleKit/run-session.d/pam-foreground-compat.ck: Failed to fork (Cannot allocate memory)
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Error writing state file: No space left on device
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Cannot write to file /var/run/ConsoleKit/database~
Nov 8 18:07:28 dp3 console-kit-daemon[1760]: WARNING: Cannot unlink /var/run/ConsoleKit/database: No such file or directory
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: slurpfile() open() error on file /proc/stat: Too many open files
Nov 8 18:08:12 dp3 /usr/sbin/gmond[1664]: update_file() got an error from slurpfile() reading /proc/stat
Nov 8 18:08:12 dp3 kernel: [4319715.969327] gmond[1664]: segfault at ffffffffffffffff ip 00007f52e0066f34 sp 00007fff4e428620 error 4 in libganglia-3.1.2.so.0.0.0[7f52e0060000+13000]
Nov 8 18:10:01 dp3 cron[1637]: (CRON) error (can't fork)
Nov 8 18:13:53 dp3 init: tty1 main process (2341) terminated with status 1
Nov 8 18:13:53 dp3 init: tty1 main process ended, respawning
Nov 8 18:13:53 dp3 init: Temporary process spawn error: Cannot allocate memory
而在hadoop的datanode日誌裏面,有下面這些錯誤(只是給出部分exception):
2012-11-08 18:07:01,283 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.18.10.56:50010, storageID=DS-1599419066-10.18.10.47-50010-1329122718923, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:290)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:334)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:398)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:480)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:171)
2012-11-08 18:07:02,163 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.18.10.56:50010, storageID=DS-1599419066-10.18.10.47-50010-1329122718923, infoPort=50075, ipcPort=50020):DataXceiverServer: Exiting due to:java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
at java.lang.Thread.run(Thread.java:662)
2012-11-08 18:07:04,964 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.18.10.56:50010, storageID=DS-1599419066-10.18.10.47-50010-1329122718923, infoPort=50075, ipcPort=50020):DataXceiver
java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[closed]. 0 millis timeout left.
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:287)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:334)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:398)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:480)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:171)
2012-11-08 18:07:04,965 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_-1079258682690587867_32990729 1 Exception java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:937)
at java.lang.Thread.run(Thread.java:662)
2012-11-08 18:07:05,057 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_1523791863488769175_32972264 1 Exception java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1047)
at java.lang.Thread.run(Thread.java:662)
2012-11-08 18:07:04,972 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.18.10.56:50010, storageID=DS-1599419066-10.18.10.47-50010-1329122718923, infoPort=5
0075, ipcPort=50020):DataXceiver
java.io.IOException: Interrupted receiveBlock
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:622)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:480)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:171)
2012-11-08 18:08:02,003 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 1
2012-11-08 18:08:02,025 WARN org.apache.hadoop.util.Shell: Could not get disk usage information
java.io.IOException: Cannot run program "du": java.io.IOException: error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.fs.DU.access$200(DU.java:29)
at org.apache.hadoop.fs.DU$DURefreshThread.run(DU.java:84)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
接着以後就一直打印下面日誌hang住了
2012-11-08 18:08:52,015 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to exit, active threads is 1
This enables or disables panic on out-of-memory feature.
If this is set to 0, the kernel will kill some rogue process, called oom_killer. Usually, oom_killer can kill rogue processes and system will survive.
If this is set to 1, the kernel panics when out-of-memory happens. However, if a process limits using nodes by mempolicy/cpusets, and those nodes become memory exhaustion status, one process may be killed by oom-killer. No panic occurs in this case. Because other nodes' memory may be free. This means system total status may be not fatal yet.
If this is set to 2, the kernel panics compulsorily even on the above-mentioned.
The default value is 0. 1 and 2 are for failover of clustering. Please select either according to your policy of failover.
note(dirlt):對於1,2不是很理解,多是用於分佈式集羣Linux系統上面的策略
1.1.5 /proc/sys/net
1.1.5.1 /proc/sys/net/ipv4/ip_local_port_range
本地port分配範圍.
1.1.5.2 /proc/sys/net/ipv4/tcp_tw_reuse
重複使用處於TIME_WAIT的socket.
Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint.
This specifies a limit on the total number of file descriptors that a user can register across all epoll instances on the system. The limit is per real user ID. Each registered file descriptor costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes on a 64-bit kernel. Currently, the default value for max_user_watches is 1/25 (4%) of the available low memory, divided by the registration cost in bytes.
1.1.7 /proc/sys/kernel
1.1.7.1 /proc/sys/kernel/hung_task_timeout_secs
Detecting hung tasks in Linux
Sometimes tasks under Linux are blocked forever (essentially hung). Recent Linux kernels have an infrastructure to detect hung tasks. When this infrastructure is active it will periodically get activated to find out hung tasks and present a stack dump of those hung tasks (and maybe locks held). Additionally we can choose to panic the system when we detect atleast one hung task in the system. I will try to explain how khungtaskd works.
The infrastructure is based on a single kernel thread named as 「khungtaskd」. So if you do a ps in your system and see that there is entry like [khungtaskd] you know it is there. I have one in my system: "136 root SW [khungtaskd]"
The loop of the khungtaskd daemon is a call to the scheduler for waking it up after ever 120 seconds (default value). The core algorithm is like this:
Iterate over all the tasks in the system which are marked as TASK_UNINTERRUPTIBLE (additionally it does not consider UNINTERRUPTIBLE frozen tasks & UNINTERRUPTIBLE tasks that are newly created and never been scheduled out).
If a task has not been switched out by the scheduler atleast once in the last 120 seconds it is considered as a hung task and its stack dump is displayed. If CONFIG_LOCKDEP is defined then it will also show all the locks the hung task is holding.
One can change the sampling interval of khungtaskd through the sysctl interface /proc/sys/kernel/hung_task_timeout_secs.
以前在hdfs一個datanode上面出現了磁盤損壞問題,而後在syslog裏面發現了下面日誌
May 14 00:02:50 dp46 kernel: INFO: task jbd2/sde1-8:3411 blocked for more than 120 seconds.
May 14 00:02:50 dp46 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secsmahung_task_timeout_secs" disables this message.
May 14 00:02:50 dp46 kernel: jbd2/sde1-8 D 0000000000000000 0 3411 2 0x00000000
May 14 00:02:50 dp46 kernel: ffff880817a71a80 0000000000000046 ffff880096d12f00 0000000000000441
May 14 00:02:50 dp46 kernel: ffff880818052938 ffff880818052848 ffff88081805c3b8 ffff88081805c3b8
May 14 00:02:50 dp46 kernel: ffff88081b22e6b8 ffff880817a71fd8 000000000000f4e8 ffff88081b22e6b8
May 14 00:02:50 dp46 kernel: Call Trace:
May 14 00:02:50 dp46 kernel: [<ffffffff8109b809>] ? ktime_get_ts+0xa9/0xe0
May 14 00:02:50 dp46 kernel: [<ffffffff81110b10>] ? sync_page+0x0/0x50
May 14 00:02:50 dp46 kernel: [<ffffffff814ed1e3>] io_schedule+0x73/0xc0
May 14 00:02:50 dp46 kernel: [<ffffffff81110b4d>] sync_page+0x3d/0x50
May 14 00:02:50 dp46 kernel: [<ffffffff814eda4a>] __wait_on_bit_lock+0x5a/0xc0
May 14 00:02:50 dp46 kernel: [<ffffffff81110ae7>] __lock_page+0x67/0x70
May 14 00:02:50 dp46 kernel: [<ffffffff81090c30>] ? wake_bit_function+0x0/0x50
May 14 00:02:50 dp46 kernel: [<ffffffff811271a5>] ? pagevec_lookup_tag+0x25/0x40
May 14 00:02:50 dp46 kernel: [<ffffffff811261f2>] write_cache_pages+0x392/0x4a0
May 14 00:02:50 dp46 kernel: [<ffffffff81124c80>] ? __writepage+0x0/0x40
May 14 00:02:50 dp46 kernel: [<ffffffff81126324>] generic_writepages+0x24/0x30
May 14 00:02:50 dp46 kernel: [<ffffffffa00774d7>] journal_submit_inode_data_buffers+0x47/0x50 [jbd2]
May 14 00:02:50 dp46 kernel: [<ffffffffa00779e5>] jbd2_journal_commit_transaction+0x375/0x14b0 [jbd2]
May 14 00:02:50 dp46 kernel: [<ffffffff8100975d>] ? __switch_to+0x13d/0x320
May 14 00:02:50 dp46 kernel: [<ffffffff8107c0ec>] ? lock_timer_base+0x3c/0x70
May 14 00:02:50 dp46 kernel: [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
May 14 00:02:50 dp46 kernel: [<ffffffffa007d928>] kjournald2+0xb8/0x220 [jbd2]
May 14 00:02:50 dp46 kernel: [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
May 14 00:02:50 dp46 kernel: [<ffffffffa007d870>] ? kjournald2+0x0/0x220 [jbd2]
May 14 00:02:50 dp46 kernel: [<ffffffff81090886>] kthread+0x96/0xa0
May 14 00:02:50 dp46 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
May 14 00:02:50 dp46 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
May 14 00:02:50 dp46 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
The JBD is the journaling block device that sits between the file system and the block device driver. The jbd2 version is for ext4.
[dirlt@localhost.localdomain]$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 4 45752 33460 99324 0 0 1 1 1 9 0 0 99 0 0
0 0 4 45752 33460 99324 0 0 0 0 1 8 0 0 100 0 0
[zhangyan04@tc-hpc-dev.tc.baidu.com]$ vmstat -m
Cache Num Total Size Pages
nfs_direct_cache 0 0 168 24
nfs_write_data 69 69 704 23
Num 當前多少個對象正在被使用
Total 總共有多少個對象能夠被使用
Size 每一個對象大小
Pages 佔用了多少個Page(這個Page上面至少包含一個正在被使用的對象)
[zhangyan04@tc-hpc-dev.tc.baidu.com]$ vmstat -s
8191996 total memory
4519256 used memory
1760044 active memory
2327204 inactive memory
3672740 free memory
76200 buffer memory
3935788 swap cache
1020088 total swap
0 used swap
1020088 free swap
423476 non-nice user cpu ticks
91 nice user cpu ticks
295803 system cpu ticks
70621941 idle cpu ticks
39354 IO-wait cpu ticks
800 IRQ cpu ticks
52009 softirq cpu ticks
317179 pages paged in
54413375 pages paged out
0 pages swapped in
0 pages swapped out
754373489 interrupts
500998741 CPU context switches
1323083318 boot time
418742 forks
taskset is used to set or retrieve the CPU affinity of a running pro-
cess given its PID or to launch a new COMMAND with a given CPU affin-
ity. CPU affinity is a scheduler property that "bonds" a process to a
given set of CPUs on the system. The Linux scheduler will honor the
given CPU affinity and the process will not run on any other CPUs.
Note that the Linux scheduler also supports natural CPU affinity: the
scheduler attempts to keep processes on the same CPU as long as prac-
tical for performance reasons. Therefore, forcing a specific CPU
affinity is useful only in certain applications.
1.2.6 lsof
todo(dirlt):
1.2.7 hdparm
hdparm - get/set hard disk parameters
下面是使用的用法
/sbin/hdparm [ flags ] [device] ..
對於device的話能夠經過mount來查看
[dirlt@localhost.localdomain]$ mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
NOTE
This program is obsolete. Replacement for netstat is ss. Replacement
for netstat -r is ip route. Replacement for netstat -i is ip -s link.
Replacement for netstat -g is ip maddr.
[zhangyan04@tc-hpc-dev.tc.baidu.com]$ netstat -s
Ip:
322405625 total packets received
0 forwarded
0 incoming packets discarded
322405625 incoming packets delivered
369134846 requests sent out
33 dropped because of missing route
Icmp:
30255 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
echo requests: 30170
echo replies: 83
timestamp request: 2
30265 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 10
echo request: 83
echo replies: 30170
timestamp replies: 2
IcmpMsg:
InType0: 83
InType8: 30170
InType13: 2
OutType0: 30170
OutType3: 10
OutType8: 83
OutType14: 2
Tcp:
860322 active connections openings
199165 passive connection openings
824990 failed connection attempts
43268 connection resets received
17 connections established
322306693 segments received
368937621 segments send out
56075 segments retransmited
0 bad segments received.
423873 resets sent
Udp:
68643 packets received
10 packets to unknown port received.
0 packet receive errors
110838 packets sent
UdpLite:
TcpExt:
1999 invalid SYN cookies received
5143 resets received for embryonic SYN_RECV sockets
2925 packets pruned from receive queue because of socket buffer overrun
73337 TCP sockets finished time wait in fast timer
85 time wait sockets recycled by time stamp
4 delayed acks further delayed because of locked socket
Quick ack mode was activated 7106 times
5141 times the listen queue of a socket overflowed
5141 SYNs to LISTEN sockets ignored
81288 packets directly queued to recvmsg prequeue.
297394763 packets directly received from backlog
65102525 packets directly received from prequeue
180740292 packets header predicted
257396 packets header predicted and directly queued to user
5983677 acknowledgments not containing data received
176944382 predicted acknowledgments
2988 times recovered from packet loss due to SACK data
Detected reordering 9 times using FACK
Detected reordering 15 times using SACK
Detected reordering 179 times using time stamp
835 congestion windows fully recovered
1883 congestion windows partially recovered using Hoe heuristic
TCPDSACKUndo: 1806
1093 congestion windows recovered after partial ack
655 TCP data loss events
TCPLostRetransmit: 6
458 timeouts after SACK recovery
7 timeouts in loss state
3586 fast retransmits
178 forward retransmits
425 retransmits in slow start
51048 other TCP timeouts
37 sack retransmits failed
1610293 packets collapsed in receive queue due to low socket buffer
7094 DSACKs sent for old packets
14430 DSACKs received
4358 connections reset due to unexpected data
12564 connections reset due to early user close
29 connections aborted due to timeout
TCPDSACKIgnoredOld: 12177
TCPDSACKIgnoredNoUndo: 347
TCPSackShifted: 6421
TCPSackMerged: 5600
TCPSackShiftFallback: 119131
IpExt:
InBcastPkts: 22
InOctets: 167720101517
OutOctets: 169409102263
InBcastOctets: 8810
[zhangyan04@tc-hpc-dev.tc.baidu.com]$ netstat --ip --tcp -a -e -p
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 tc-hpc-dev.tc.baidu.c:19870 *:* LISTEN zhangyan04 30549010 28965/echo_server
tcp 1024 0 tc-hpc-dev.tc.baidu.c:19870 tc-com-test00.tc.baid:60746 ESTABLISHED zhangyan04 30549012 28965/echo_server
tcp 0 1024 tc-hpc-dev.tc.baidu.c:19870 tc-com-test00.tc.baid:60745 ESTABLISHED zhangyan04 30549011 28965/echo_server
Simple answer: you cannot. Longer answer: the uninterruptable sleep means the process will not be woken up by signals. It can be only woken up by what it's waiting for. When I get such situations eg. with CD-ROM, I usually reset the computer by using suspend-to-disk and resuming.
The D state basically means that the process is waiting for disk I/O, or other block I/O that can't be interrupted. Sometimes this means the kernel or device is feverishly trying to read a bad block (especially from an optical disk). Sometimes it means there's something else. The process cannot be killed until it gets out of the D state. Find out what it is waiting for and fix that. The easy way is to reboot. Sometimes removing the disk in question helps, but that can be rather dangerous: unfixable catastrophic hardware failure if you don't know what you're doing (read: smoke coming out).
Server Software: nginx/1.2.1
Server Hostname: localhost
Server Port: 80
Document Path: /
Document Length: 1439 bytes
Concurrency Level: 100
Time taken for tests: 0.760 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 16500000 bytes
HTML transferred: 14390000 bytes
Requests per second: 13150.09 [#/sec] (mean)
Time per request: 7.605 [ms] (mean)
Time per request: 0.076 [ms] (mean, across all concurrent requests)
Transfer rate: 21189.11 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.4 0 18
Processing: 2 7 1.8 7 20
Waiting: 1 7 1.8 7 20
Total: 5 7 2.0 7 20
Percentage of the requests served within a certain time (ms)
50% 7
66% 7
75% 8
80% 8
90% 9
95% 10
98% 14
99% 19
100% 20 (longest request)
#echo set terminal postscript color > gnuplot.cmd
echo set terminal png xffffff > gnuplot.cmd
#echo set data style linespoints >> gnuplot.cmd
echo set style data linespoints >> gnuplot.cmd
kernel將這個packet傳遞給IP層進行處理。IP層須要將信息組裝成爲ip packet。若是ip packet是tcp的話那麼放到socket backlog裏面。若是socket backlog滿了的話那麼將ip packet丟棄。 copy packet data to ip buffer to form ip packet
note(dirlt):這個步驟完成以後IP layer就能夠釋放sk_buffer結構
tcp layer從socket backlog裏面取出tcp packet, copy ip packet tcp recv buffer to form tcp packet
tcp recv buffer交給application layer處理, copy tcp recv buffer to app buffer to form app packet
application layer將數據copy到tcp send buffer裏面,若是空間不夠的話那麼就會出現阻塞。 copy app buffer to tcp send buffer as app packet
tcp layer等待tcp send buffer存在數據或者是須要作ack的時候,組裝ip packet推送到IP layer copy tcp send buffer to ip send buffer as tcp packet
IP layer從kernel memory申請sk_buffer,將ip data包裝成爲packet data,而後塞到qdisc(txqueuelen控制隊列長度)裏面。若是隊列滿的話那麼就會出現阻塞,反饋到tcp layer阻塞。 copy ip send buffer to packet data as ip packet
NIC driver若是檢測到qdisc有數據的話,那麼會將packet data從qdisc放置到ring buffer裏面,而後調用NIC DMA Engine將packet發送出去 todo(dirlt):可能理解有誤
a 16-bit number, whose low byte is the signal number that killed the process, and whose high byte is the exit status (if the signal number is zero); the high bit of the low byte is set if a core file was produced.
dp@dp8:~$ dmesg | grep eth0
[ 15.635160] eth0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express f
[ 15.736389] bnx2: eth0: using MSIX
[ 15.738263] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 37.848755] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex
[ 37.850623] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1933.934668] bnx2: eth0: using MSIX
[ 1933.936960] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 1956.130773] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex
[ 1956.132625] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[4804526.542976] bnx2: eth0 NIC Copper Link is Down
[4804552.008858] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex
[4804552.008858] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex dp8上的網卡速度被識別成100 Mbps了。,可能的緣由以下:
Table of Contents
1 linux
1.1 proc filesystem
1.1.1 /proc
1.1.1.1 /proc/meminfo
系統中關於當前內存的利用情況等的信息,常由free命令使用;可使用文件查看命令直接讀取此文件,其內容顯示爲兩列,前者爲統計屬性,後者爲對應的值;php
1.1.1.2 /proc/stat
實時追蹤自系統上次啓動以來的多種統計信息;以下所示,其中,css
1.1.1.3 /proc/swaps
當前系統上的交換分區及其空間利用信息,若是有多個交換分區的話,則會每一個交換分區的信息分別存儲於/proc/swap目錄中的單獨文件中,而其優先級數字越低,被使用到的可能性越大;html
1.1.1.4 /proc/cmdline
在啓動時傳遞至內核的相關參數信息,這些信息一般由lilo或grub等啓動管理工具進行傳遞;java
1.1.1.5 /proc/uptime
系統上次啓動以來的運行時間,以下所示,其第一個數字表示系統運行時間,第二個數字表示系統空閒時間,單位是秒;系統上次啓動以來的運行時間,以下所示,其第一個數字表示系統運行時間,第二個數字表示系統空閒時間,單位是秒;node
1.1.1.6 /proc/version
當前系統運行的內核版本號python
1.1.1.7 /proc/mounts
系統當前掛載的全部文件系統.第一列表示掛載的設備,第二列表示在當前目錄樹中的掛載點,第三點表示當前文件系統的類型,第四列表示掛載屬性(ro或者rw),第五列和第六列用來匹配/etc/mtab文件中的轉儲(dump)屬性;linux
1.1.1.8 /proc/modules
當前裝入內核的全部模塊名稱列表,能夠由lsmod命令使用,也能夠直接查看;以下所示,其中第一列表示模塊名,第二列表示此模塊佔用內存空間大小,第三列表示此模塊有多少實例被裝入,第四列表示此模塊依賴於其它哪些模塊,第五列表示此模塊的裝載狀態(Live:已經裝入;Loading:正在裝入;Unloading:正在卸載),第六列表示此模塊在內核內存(kernel memory)中的偏移量;ios
1.1.1.9 /proc/diskstats
每塊磁盤設備的磁盤I/O統計信息列表nginx
1.1.1.10 /proc/cpuinfo
1.1.1.11 /proc/crypto
系統上已安裝的內核使用的密碼算法及每一個算法的詳細信息列表git
1.1.1.12 /proc/loadavg
保存關於CPU和磁盤I/O的負載平均值,其前三列分別表示每1秒鐘、每5秒鐘及每15秒的負載平均值,相似於uptime命令輸出的相關信息;第四列是由斜線隔開的兩個數值,前者表示當前正由內核調度的實體(進程和線程)的數目,後者表示系統當前存活的內核調度實體的數目;第五列表示此文件被查看前最近一個由內核建立的進程的PID.
1.1.1.13 /proc/locks
保存當前由內核鎖定的文件的相關信息,包含內核內部的調試數據;每一個鎖定佔據一行,且具備一個唯一的編號;以下輸出信息中每行的第二列表示當前鎖定使用的鎖定類別,POSIX表示目前較新類型的文件鎖,由lockf系統調用產生,FLOCK是傳統的UNIX文件鎖,由flock系統調用產生;第三列也一般由兩種類型,ADVISORY表示不容許其餘用戶鎖定此文件,但容許讀取,MANDATORY表示此文件鎖按期間不容許其餘用戶任何形式的訪問;
1.1.1.14 /proc/slabinfo
在內核中頻繁使用的對象(如inode、dentry等)都有本身的cache,即slab pool,而/proc/slabinfo文件列出了這些對象相關slap的信息;詳情能夠參見內核文檔中slapinfo的手冊頁;
1.1.1.15 /proc/vmstat
當前系統虛擬內存的多種統計數據,信息量可能會比較大,這因系統而有所不一樣,可讀性較好;
1.1.1.16 /proc/zoneinfo
內存區域(zone)的詳細信息列表
1.1.2 proc/<pid>
其中pid爲對應的進程號,目錄下面就是這個進程對應的信息。
1.1.2.1 fd
todo(zhangyan04):
1.1.2.2 io
TODO:
1.1.2.3 limits
TODO:
1.1.2.4 maps
當前進程關聯到的每一個可執行文件和庫文件在內存中的映射區域及其訪問權限所組成的列表
1.1.2.5 mount
TODO:
1.1.2.6 net
todo(zhangyan04):
1.1.2.7 sched
todo(zhangyan04):
1.1.2.8 status
TODO:
1.1.2.9 statm
Provides information about memory usage, measured in pages. The columns are:
1.1.3 /proc/sys
在/proc/sys下面有一些能夠動態修改的內核參數,有兩種方式能夠修改這些參數。
首先可使用sysctl工具來進行修改。好比若是想修改sys/vm/swappiness==0的話,那麼能夠
上面修改方式是臨時的,若是想進行永久修改的話能夠修改/etc/sysctl.conf文件
而後重啓那麼這個設置就會永久生效。
1.1.4 /proc/sys/vm
1.1.4.1 /proc/sys/vm/overcommit_memory
所謂的overcommit是過量使用的意思。
下午將dp3的overcommit_memory參數修改爲爲2以後,首先出現的問題就是不可以再執行任何shell命令了,錯誤是fork can't allocate enough memory,就是fork沒有那麼多的內存可用。而後推出會話以後沒有辦法再登錄dp3了。這個主要是由於jvm應該基本上佔用滿了物理內存,而overcommit_ration=0.5,而且沒有swap空間,因此沒有辦法allocate更多的memory了。
從/var/log/syslog裏面能夠看到,修改了這個參數以後,不少程序受到影響(ganglia掛掉了,cron不可以fork出進程了,init也不可以分配出更多的tty,致使咱們沒有辦法登錄上去)在ganglia裏面看到內存以及CPU使用都是一條直線,不是由於系統穩定而是由於gmond掛掉了。
而在hadoop的datanode日誌裏面,有下面這些錯誤(只是給出部分exception):
接着以後就一直打印下面日誌hang住了
hdfs web頁面上面顯示dead node,可是實際上這個datanode進程還存活。緣由估計也是由於不可以分配足夠的內存出現這些問題的吧。
最後能夠登錄上去的緣由,我猜測應該是datanode掛掉了,上面的regionserver暫時沒有分配內存因此有足夠的內存空間,init能夠開闢tty。
如今已經將這個值調整成爲原來的值,也就是0。索性的是,在這個期間,這個修改對於線上的任務執行沒有什麼影響。
1.1.4.2 /proc/sys/vm/overcommit_ratio
若是overcommit_memory值爲2的話,那麼這個參數決定了系統的<可用內存>的大小。計算方式是 (Physical-RAM-Size) * ratio / 100 + (Swap-Size).
因此對於我這個系統來講,可用的虛擬內存在(491*50/100)+509=754M. note(dirlt):這個僅僅是在overcommit_memory=2的時候估算的<可用內存>大小, 實際上對於其餘狀況來講可用內存大小仍是(Physical-RAM-Size) + (Swap-Size).
1.1.4.3 /proc/sys/vm/swappiness
這個參數決定系統使用swap的程度。可是這個參數並無禁止使用swap分區,而只是一個依賴於swap分區的程度。 若是這個值設置成爲0的話那麼,那麼系統會盡量地將減小page swap in/out操做,將更多的內存操做於物理內存上面。
1.1.4.4 /proc/sys/vm/dirty_*
這幾個參數主要是用來控制髒頁刷回磁盤策略。關於髒頁刷回磁盤的過程能夠參看"文件IO/write"一節。
note(dirlt)@2013-05-25: 我copy了一分內容過來
對於這些髒頁的寫回策略是:
注意到這裏可能啓動pdflush daemon在後臺刷新髒頁。另外系統每隔dirty_writeback_centisecs時間會啓動pdflush daemon將髒頁刷到磁盤上面。而pdflush daemon工做方式是這樣的,檢查髒頁是否存在超過dirty_expire_centisecs時間的,若是超過的話那麼就會在後臺刷新這些髒頁。
1.1.4.5 /proc/sys/vm/drop_caches
能夠用來釋放kernel保存的buffers和cached memory,buffers保存的是目錄以及文件的inode,cached memory保存的是操做文件時候使用的pagecache
爲了防止數據丟失,能夠在修改這個文件以前先調用sync強制寫盤
1.1.4.6 /proc/sys/vm/panic_on_oom
This enables or disables panic on out-of-memory feature.
If this is set to 0, the kernel will kill some rogue process, called oom_killer. Usually, oom_killer can kill rogue processes and system will survive.
If this is set to 1, the kernel panics when out-of-memory happens. However, if a process limits using nodes by mempolicy/cpusets, and those nodes become memory exhaustion status, one process may be killed by oom-killer. No panic occurs in this case. Because other nodes' memory may be free. This means system total status may be not fatal yet.
If this is set to 2, the kernel panics compulsorily even on the above-mentioned.
The default value is 0. 1 and 2 are for failover of clustering. Please select either according to your policy of failover.
note(dirlt):對於1,2不是很理解,多是用於分佈式集羣Linux系統上面的策略
1.1.5 /proc/sys/net
1.1.5.1 /proc/sys/net/ipv4/ip_local_port_range
本地port分配範圍.
1.1.5.2 /proc/sys/net/ipv4/tcp_tw_reuse
重複使用處於TIME_WAIT的socket.
Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint.
1.1.5.3 /proc/sys/net/ipv4/tcp_tw_recycle
快速回收處理TIME_WAIT的socket.
Enable fast recycling of TIME_WAIT sockets.
1.1.5.4 /proc/sys/net/ipv4/tcp_max_syn_backlog
等待client作ack的鏈接數目上限
1.1.5.5 /proc/sys/net/core/somaxconn
每一個端口監聽隊列的最大長度。
1.1.5.6 /proc/sys/net/core/netdev_max_backlog
網絡設備接收數據包的速率比內核處理這些包的速率快時,容許送到隊列的數據包的最大數目。
1.1.6 /proc/sys/fs
1.1.6.1 /proc/sys/fs/file-max
全部進程容許打開文件的最大數量 note(dirlt):這個應該是和文件描述符有區別的
1.1.6.2 /proc/sys/fs/epoll/max_user_instances
單個用戶使用epoll的文件描述符上限。若是超過上限會返回EMFILE錯誤。 note(dirlt):不過在個人文件系統下面彷佛沒有這個選項
1.1.6.3 /proc/sys/fs/epoll/max_user_watches
單個用戶使用epoll進行watch的文件描述符上限。 note(dirlt):對於服務器應該特別有用,能夠限制內存使用量
This specifies a limit on the total number of file descriptors that a user can register across all epoll instances
on the system. The limit is per real user ID. Each registered file descriptor costs roughly 90 bytes on a 32-bit
kernel, and roughly 160 bytes on a 64-bit kernel. Currently, the default value for max_user_watches is 1/25 (4%)
of the available low memory, divided by the registration cost in bytes.
1.1.7 /proc/sys/kernel
1.1.7.1 /proc/sys/kernel/hung_task_timeout_secs
Detecting hung tasks in Linux
Sometimes tasks under Linux are blocked forever (essentially hung). Recent Linux kernels have an infrastructure to detect hung tasks. When this infrastructure is active it will periodically get activated to find out hung tasks and present a stack dump of those hung tasks (and maybe locks held). Additionally we can choose to panic the system when we detect atleast one hung task in the system. I will try to explain how khungtaskd works.
The infrastructure is based on a single kernel thread named as 「khungtaskd」. So if you do a ps in your system and see that there is entry like [khungtaskd] you know it is there. I have one in my system: "136 root SW [khungtaskd]"
The loop of the khungtaskd daemon is a call to the scheduler for waking it up after ever 120 seconds (default value). The core algorithm is like this:
One can change the sampling interval of khungtaskd through the sysctl interface /proc/sys/kernel/hung_task_timeout_secs.
以前在hdfs一個datanode上面出現了磁盤損壞問題,而後在syslog裏面發現了下面日誌
The JBD is the journaling block device that sits between the file system and the block device driver. The jbd2 version is for ext4.
1.1.8 /proc/net
1.1.8.1 /proc/net/tcp
記錄全部tcp鏈接,netstat以及lsof都會讀取這個文件. 咱們遇到過一個問題就是netstat/lsof速度很是慢,經過strace發現是在讀取這個文件時候很是耗時,下面兩個連接給出了一些相關信息
todo(dirlt):
1.2 system utility
1.2.1 SYS DEV
1.2.2 mpstat
mpstat - Report processors related statistics.
一般使用就是"mpstat -P ALL 1"
其中每一個字段的意思分別是:
1.2.3 vmstat
1.2.4 free
1.2.5 taskset
能夠用來獲取和修改進程的CPU親和性。
若是不指定-c的話那麼就是獲取親和性。程序上的話可使用sched_setaffinity/sched_getaffinity調用來修改和獲取某個進程和CPU的親和性。
1.2.6 lsof
todo(dirlt):
1.2.7 hdparm
hdparm - get/set hard disk parameters
下面是使用的用法
對於device的話能夠經過mount來查看
咱們關注本身讀寫目錄,好比一般在/home下面,這裏就是使用的device就是/dev/mapper/VolGroup00-LogVol00
todo(dirlt):好多選項都不太清楚是什麼意思.
1.2.8 pmap
todo(dirlt):
1.2.9 strace
todo(dirlt):
1.2.10 iostat
iostat主要用來觀察io設備的負載狀況的。首先咱們看看iostat的樣例輸出
第一行顯示了CPU平均負載狀況,而後給出的信息是自從上一次reboot起來今的iostat平均信息。若是咱們使用iostat採用interval輸出的話,那麼下一次的數值是相對於上一次的數值而言的。這裏解釋一下CPU的各個狀態:
而後在來看看iostat的命令行參數
其中interval表示每隔x時間刷新一次輸出,而count表示但願輸出多少次.下面解釋一下每隔參數的含義:
iostat也能夠指定選擇輸出哪些block device.
一般命令也就是iostat -d -k -x 1.咱們來看看樣例輸出
而後分析其中字段:
1.2.11 vmtouch
https://github.com/hoytech/vmtouch
note(dirlt):能夠用來warmup數據,使用參數彷佛也比較簡單
裏面有一些系統調用比較值得注意和學習:
1.2.12 latencytop
todo(dirlt): https://latencytop.org/
1.2.13 iotop
能夠用來觀察單獨進程的IO使用情況
todo(dirlt):
1.2.14 dstat
todo(dirlt): https://github.com/dagwieers/dstat
http://weibo.com/1840408525/AdGkO3uEL dstat -lamps
1.2.15 slurm
Simple Linux Utility for Resource Management todo(dirlt): https://computing.llnl.gov/linux/slurm/
1.2.16 sar
sar - Collect, report, or save system activity information.
下面是全部的選項
關於網絡接口數據顯示的話,下面是使用DEV能夠查看的字段
下面是使用EDEV能夠查看的字段
下面是使用SOCK能夠查看的字段
選項很是多,可是不少選項沒有必要打開。對於網絡程序來講的話,一般咱們使用到的選項會包括
一般咱們使用的命令就應該是sar -n DEV -P ALL -u 1 0(1表示1秒刷新,0表示持續顯示)
1.2.17 netstat
netstat - Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
netstat能夠查看不少信息,包括網絡連接,路由表,網卡信息,假裝連接以及多播成員關係。可是從文檔上看,一部分工做能夠在/sbin/ip裏面完成
咱們這裏打算對netstat使用限制在查看網絡鏈接,以及各類協議的統計數據上.
首先咱們看看如何查看各類協議的統計數據.
咱們能夠查看和tcp,udp以及raw socket相關的數據,delay表示刷新時間。
內容很是多這裏也不詳細分析了。
而後看看鏈接這個部分的功能
對於address_family容許指定協議族,一般來講咱們可能會使用
而後剩下的選項
咱們看看一個使用的例子
下面是對於tcp socket的字段解釋.對於unix domain socket字段不一樣可是這裏不寫出來了.
1.2.18 tcpdump
todo(zhangyan04):
1.2.19 iftop
todo(dirlt): http://www.ex-parrot.com/~pdw/iftop/
1.2.20 iftraf
todo(dirlt): http://iptraf.seul.org/
1.2.21 rsync
經常使用選項:
經常使用命令:
1.2.22 iodump
1.2.23 iopp
1.2.24 nethogs
todo(dirlt):
1.2.25 slabtop
slabtop - display kernel slab cache information in real time
1.2.26 nmon
nmon - systems administrator, tuner, benchmark tool.
http://nmon.sourceforge.net/pmwiki.php Nigel's performance Monitor for Linux
1.2.27 collectl
collectl http://collectl.sourceforge.net/ todo(dirlt):彷佛至關不錯,將不少關鍵信息都作了收集和整合
1.2.28 numactl
todo(dirlt):
1.2.29 jnettop
todo(dirlt):
1.2.30 glances
http://nicolargo.github.io/glances/
todo(dirlt):
1.2.31 ifconfig
ifconfig - configure a network interface
/sbin/ifconfig能夠用來配置和查看network interface.不過從文檔上看的話,更加推薦使用/sbin/ip這個工具
這裏咱們不打算學習如何配置network interface只是想查看一下網卡的信息。使用/sbin/ifconfig -a能夠查看全部的網卡信息,即便這個網卡關閉。
咱們這裏稍微仔細看看eht1的網卡信息
使用ifconfig可以建立虛擬網卡綁定IP
1.2.32 ps(process snapshot)
進程狀態有下面幾種:
在使用ubuntu的apt-get時候,可能會出現一些異常的情況,咱們直接終止了apt-get。可是這個時候apt-get軟件自己出於一個不正常的狀態, 致使以後不可以啓動apt-get。若是觀察進程的話會出現下面一些可疑的進程
這些進程的父進程都是init進程,而且狀態是uninterruptible sleep,給kill -9也沒有辦法終止,惟一的辦法只能reboot機器來解決這個問題。關於這個問題能夠看stackoverflow上面的解答 How to stop 'uninterruptible' process on Linux? - Stack Overflow http://stackoverflow.com/questions/767551/how-to-stop-uninterruptible-process-on-linux
1.2.33 ulimit
todo(dirlt)
1.2.34 sysprof
Sysprof - Statistical, system-wide Profiler for Linux : http://sysprof.com/
1.2.35 ss
1.2.36 SYS ADMIN
1.2.37 uptime
1.2.38 top
1.2.39 htop
1.2.40 ttyload
1.2.41 dmesg
可以察看開機時啓動信息(啓動信息保存在/var/log/dmesg文件裏)
1.2.42 quota
http://blog.itpub.net/post/7184/488931
quota用來爲用戶編輯磁盤配額。
修改/etc/fstab增長usrquota以及grpquota
若是爲用戶設定可使用/usr/sbin/edquota -u testuser,若是須要爲羣組設定的話/usr/sbin/edquota -g testgrp.
各個字段含義以下:
可使用/usr/sbin/edquota -t來修改軟限額期限。
能夠對WWW空間,FTP空間,Email空間進行磁盤配額限制。Quota只能基於磁盤分區進行配額管理,不能基於目錄進行配額管理,所以只能把數據存放在有配額限制的分區,再用符號連接到實際應用的目錄。
1.2.43 crontab
crontab就是爲了可以使得工做自動化在特定的時間或者是時間間隔執行特定的程序。crontab -e就能夠編輯crontab配置文件,默認是vim編輯器。crontab配置文件裏面能夠像shell同樣定義變量,以後就是任務描述,每個任務分爲6個字段: minute hour day month week command
對於每一個字段能夠有3種表示方式
對於系統級別的crontab配置文件在/etc/crontab貌似裏面還多了一個用戶字段.下面是幾個配置的例子:
1.2.44 ntp
ntp(network time protocol)是用來作機器時間同步的,包含下面幾個組件:
一個最重要的問題就是,daemon以什麼時間間隔來和指定的server進行同步。
ntp是能夠在minpoll和maxpoll指定的時間間隔內來選擇同步間隔的,默認使用minpoll也就是64seconds.
其實若是不考慮爲其餘機器提供服務的話,徹底能夠在cron裏面使用ntpdate來進行同步。
1.2.45 cssh
todo(dirlt):
1.2.46 iptables
查看當前過濾規則 iptables -L/-S
可使用 iptables -A [chain] [chain-specification]來添加規則
其中chain指INPUT, 以後部分都是chain-specification. 其中s表示過濾源地址,d表示目的地址,而-j而表示動做。
可使用 iptables -D 來刪除規則。其中規則既可使用rule-num來引用,也可使用chain-specification來指定
1.2.47 HTTP BENCH TOOL
1.2.48 httperf
download http://www.hpl.hp.com/research/linux/httperf/
paper http://www.hpl.hp.com/research/linux/httperf/wisp98/httperf.pdf
httperf是用來測試HTTP server性能的工具,支持HTTP1.0和1.1.下面是這個工具命令行參數
httperf有幾種不一樣的workload方式:
note(dirlt):關於session-oriented這個概念,是後來看了論文裏面才清楚的。主要解決的就是實際中browse的場景。 一般咱們請求一個頁面裏面都會嵌入不少objects包括js或者是css等。咱們一次瀏覽稱爲session,而session裏面會有不少請求。 這些請求一般是,首先等待第一個請求處理完成(瀏覽器解析頁面),而後同時發起其餘請求。
經常使用選項
note(dirlt):不過httperf採用select模型,致使最大鏈接數存在上限。
結果分析
Connection部分
Request部分
Reply部分
若是使用session模式的話,那麼結果會有
1.2.49 ab
ab(apache benchmarking)是apache httpd自帶的一個HTTP Server個的工具。下面是這個工具命令行參數
功能上沒有httperf多可是以爲應該大部分時候足夠使用的。
note(dirlt):ab和httperf工做模型不一樣。httperf是指定創建多少個連接,每一個連接上發起多少個calls。而ab指定一共發送多少個請求, 每批發送多少個請求,而後計算每批時間統計。ab必須等待這批請求所有返回或者是失敗或者是超時。能夠做爲對方的互補。nice!!!
下面看看每一個參數的含義:
咱們能夠這樣使用ab -c 100 -n 10000 -r localhost/ 輸出仍是很好理解的。對於最後面百分比時間,注意是包含100個concurrency的結果。
1.2.50 autobench
http://www.xenoclast.org/autobench/
autobench做爲httperf包裝,也提供了分佈式壓力測試的工具。
這裏先介紹一下單機使用狀況。autobench的manpage提供了很是清晰的說明 http://www.xenoclast.org/autobench/man/autobench.html. 能夠看到autobench提供了比較兩個站點的性能。
默認配置文件是~/.autobench.conf,方便常用。經常使用命令方式就是
獲得tsv文件以後可使用bench2graph轉換成爲png格式。bench2graph須要作一些修改
使用bench2graph bench.tsv bench.png,而後會提示輸入title便可生成比較圖。
todo(dirlt):後續可能須要學習如何使用autobench分佈式測試,由於httperf該死的select模型。
1.3 kernel
1.3.1 vmlinuz
vmlinuz是可引導的、壓縮的內核。「vm」表明「Virtual Memory」。Linux 支持虛擬內存,不像老的操做系統好比DOS有640KB內存的限制。Linux可以使用硬盤空間做爲虛擬內存,所以得名「vm」。vmlinuz是可執行的Linux內核,它位於/boot/vmlinuz,它通常是一個軟連接。vmlinux是未壓縮的內核,vmlinuz是vmlinux的壓縮文件。
vmlinuz的創建有兩種方式。一是編譯內核時經過「make zImage」建立,而後經過:「cp /usr/src/linux-2.4/arch/i386/linux/boot/zImage /boot/vmlinuz」產生。zImage適用於小內核的狀況,它的存在是爲了向後的兼容性。二是內核編譯時經過命令make bzImage建立,而後經過:「cp /usr/src/linux-2.4/arch/i386/linux/boot/bzImage /boot/vmlinuz」產生。bzImage是壓縮的內核映像,須要注意,bzImage不是用bzip2壓縮的,bzImage中的bz容易引發誤解,bz表示「big zImage」。 bzImage中的b是「big」意思。
zImage(vmlinuz)和bzImage(vmlinuz)都是用gzip壓縮的。它們不只是一個壓縮文件,並且在這兩個文件的開頭部份內嵌有gzip解壓縮代碼。因此你不能用gunzip 或 gzip –dc解包vmlinuz。內核文件中包含一個微型的gzip用於解壓縮內核並引導它。二者的不一樣之處在於,老的zImage解壓縮內核到低端內存(第一個640K),bzImage解壓縮內核到高端內存(1M以上)。若是內核比較小,那麼能夠採用zImage或bzImage之一,兩種方式引導的系統運行時是相同的。大的內核採用bzImage,不能採用zImage。
1.3.2 tcp IO
http://www.ece.virginia.edu/cheetah/documents/papers/TCPlinux.pdf
note(dirlt):後面影響擁塞部分沒有看
packet reception
整個流程大體以下:
其中內核參數有
packet transmission
整個流程大體以下:
其中內核參數有:
note(dirlt):在wangyx的幫助下這個配置在ifconfig下面找到了
txqueuelen = 1000
1.3.3 tcp congestion control
1.3.4 kernel panic
todo(dirlt):
1.4 application
1.4.1 返回值問題
首先看下面一段Java程序
而後這個Java程序被Python調用,判斷這個打印值
返回值不爲1而是256,對此解釋是這樣的
而後下面這段Python程序,使用echo $?判斷返回值爲0而不是256
1.4.2 dp8網卡問題
當時dp8的網絡流量從一個很是大的值變爲很是小的值,檢查/proc/net/netstat,如下幾個統計數值dp8和其餘機器差距較大(相差1-2個數量級):
以後在dmesg上面發現以下線索:
[4804552.008858] bnx2: eth0 NIC Copper Link is Up, 100 Mbps full duplex dp8上的網卡速度被識別成100 Mbps了。,可能的緣由以下:
咱們的網線都是由 世xx聯 提供的,質量應該不錯,有兩種狀況須要優先排除。
1.4.3 修改資源限制
臨時的修改方式能夠經過ulimit來進行修改,也能夠經過修改文件/etc/security/limits.conf來永久修改
1.4.4 CPU溫度太高
這個問題是我在Ubuntu PC上面遇到的,明顯的感受就是運行速度變慢。而後在syslog裏面出現以下日誌:
1.4.5 sync hangup
1.4.6 更換glibc
@2013-05-23 https://docs.google.com/a/umeng.com/document/d/12dzJ3OhVlrEax3yIdz0k08F8tM8DDQva1wdrD3K49PI/edit 懷疑glibc版本存在問題,在dp45上操做可是出現問題。
個人操做順序計劃是這樣的:
可是進行到2以後就發現cp不可用了,而且ls等命令也不可以使用了。緣由很是簡單,就是由於2以後libc.so.6沒有對應的文件了,而cp,ls這些基本的命令依賴於這個動態連接庫。
todo(dirlt): what is the correct way to do it(change glibc)
@2013-08-03
A copy of the C library was found in an unexpected directory | Blog : http://blog.i-al.net/2013/03/a-copy-of-the-c-library-was-found-in-an-unexpected-directory/
上面的連接給出了升級glibc的方法
1.4.7 容許不從tty執行sudo
修改/etc/sudoers文件,註釋掉
1.4.8 ssh proxy
http://serverfault.com/questions/37629/how-do-i-do-multihop-scp-transfers
Date: 2014-06-17T10:30+0800
Org version 7.9.3f with Emacs version 24
Validate XHTML 1.0