Sunday, March 3, 2013

Load issue related to Memory shortage


When you come across a server which overloads without any relevant process or number of connections, it can be related to less memory on the server. Eg: In the below server, the load goes high suddenly and when we stop the apache process, the load comes down to normal. In the first case we think the apache is the culprit. But if you stop mysql or any other heavy process on the server, you can see the load coming down. So we can assume this is not related to the process. Now check the number of connections to the server.

root@server [/home]# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
1 CLOSE_WAIT
16 ESTABLISHED
4 FIN_WAIT2
55 LISTEN
9 SYN_RECV
506 TIME_WAIT

With the above command, you can see there are more number of TIME_WAIT connections and the established connections are less. This means the server is not able to disconnect the connection in timely manner. Why this is happening? See below.
root@server [/]# top -c
top - 15:14:27 up 8:11, 5 users, load average: 154.59, 78.14, 43.09
Tasks: 536 total, 1 running, 534 sleeping, 1 stopped, 0 zombie
Cpu(s): 6.1% us, 2.1% sy, 0.0% ni, 0.0% id, 91.5% wa, 0.0% hi, 0.3% si
Mem: 2021396k total, 1958896k used, 62500k free, 2524k buffers
Swap: 2008120k total, 1889540k used, 118580k free, 39304k cached

Here you can see that, the memory is almost used up and buffers also dont have enough memory. So the server started using the swap partition. In normal cases the usage of swap partition is handy as it will be using for less amount of time and less amount of swap as swap is only needed in case memory get exhausted for small amount of time. In the above case you can see, the swap is used almost 85%. This is not normal. When this happens the cpu needs to access the the hard disk, as the swap partition is created in the hard disk. We all know the amount of work required for the CPU to retrieve data from HD is more when compared to retrieving data from memory. So this will introduce more I/O (91.5% wa), this means the 91.5% of processes are waiting in the queue for the hard disk to give the information. This will in turn increase the server load. This process is cumulative, as more and more processes gets added to queue and the number of tasks (Tasks: 536 total) increases and eventually the server freezes. When a process, say apache is stopped and started in these cases, the memory gets freed up and the process will start running smoothly for sometime. But when the server experiences the memory shortage, it will go into the state described above.

In the above situation we can recommend a RAM upgrade. The above server underwent a RAM upgrade (from 2 GB to 4 GB) and see the difference in performance.

root@server [~]# top -c
top - 16:34:59 up 1:08, 4 users, load average: 1.19, 2.02, 2.01
Tasks: 258 total, 2 running, 256 sleeping, 0 stopped, 0 zombie
Cpu(s): 73.6% us, 4.8% sy, 0.0% ni, 21.0% id, 0.2% wa, 0.0% hi, 0.5% si
Mem: 4081964k total, 3193060k used, 888904k free, 148040k buffers
Swap: 2008120k total, 0k used, 2008120k free, 1077372k cached

No comments:

Post a Comment