Diagnose Slow Server

14 posts / 0 new

Topic locked

#1 Wed, 10/23/2013 - 16:14

webwzrd

Diagnose Slow Server

I have a Centos server that seems to be getting quite slow for certain tasks. I don't assume this has anything to do with webmin or virtualmin, but thought it might be helpful to get feedback from this group.

The server is a core2 quad with 8gb of ram. It has VM-Pro and about 180 virtual servers. For the most part the server performs fine, loads are generally below 1 through out the day with occasional jumps.

The slowness is observered while doing backups to the same drive, uploading backups from another server, doing system updates and reboots. Not only is the server slower based on what I remember but I have another Centos server also with 8gb of ram, however a slower duo core CPU to make comparisons with. I'll call the slow server, server 1 and the second server, server 2.

Server 1 takes 3.25 hours to backup 25gb to the same drive (looks like the load stays around 3 with maybe a spike to 6), server 2 backs up 45gb in 1.75 hours. Server 1 takes about 3 times as long to do system updates and to come back online from a reboot after a kernel update. and lastly it takes about 13 hours to backup server 2 to server 1. While I can backup server 1 to server 2 in about 3 hours.

Server 1 has high loads when rebooting, typically hits 20+ within a couple minutes of coming online. If apache needs to restart and there are 550 or so processes running, the load will hit 20+. Server 2 doesn't come anywhere close to this.

This is certainly not an emergency, I'm just hoping for some friendly conversation to send me in the right direction to analyze and get a handle on this.

#2 Wed, 10/23/2013 - 22:36

andreychek

Well, for backups themselves -- one thing I'd suggest is making sure that they're configured to use gzip, and not bzip2. You can verify that in System Settings -> Virtualmin Config -> Backup and Restore, and there, verify the setting of the "Backup compression format" option.

You might be able to speed them up just a bit by having it pass in the "--fast" option during the backup.

As for all the other issues you're describing though -- the first thought that pops into mind is some sort of disk IO issue. How do the disks here compare to disks in the other systems you're running?

-Eric

#3 Thu, 10/24/2013 - 09:59

webwzrd

Thanks Eric,

gzip was selected and I am now using "--fast". I have submitted some SMART reports to the data center for their review. I'll report back their thoughts.

Brian

#4 Thu, 10/24/2013 - 10:23 (Reply to #3)

webwzrd

Data center said HD is fine. So still looking for reason why some tasks are so slow and why the load spikes to 20+ when rebooting or restating Apache.

#5 Thu, 10/24/2013 - 11:12

Locutus

550 processes is quite a lot! You might want to use the utility "atop" to monitor processes and performance of the system.

#6 Thu, 10/24/2013 - 13:59 (Reply to #5)

webwzrd

550 is a lot? That's been pretty normal for midday. If I restart httpd, it will usually drop to low 300's and over the next several hours work it way up to the mid 500's again.

There are 180 sites with a high percent being CMS's.

#7 Thu, 10/24/2013 - 16:11

Locutus

Okay, you can still use "atop" to monitor and record which processes use how many resources (CPU, RAM, disk I/O, network). :)

#8 Thu, 10/24/2013 - 19:00 (Reply to #7)

webwzrd

I have atop install, but may not have been in full appreciation of what it can do for me. Where does it record information?

#9 Fri, 10/25/2013 - 02:07

Locutus

In its default installation it writes logfiles in /var/log/atop.log.*, which you can view with the "atop -r" option. To change the snapshot interval, you need to edit its start script in "/etc/init.d".

#10 Fri, 10/25/2013 - 09:04

webwzrd

atop -r produces:

"/var/log/atop//atop_20131025 - open raw file: No such file or directory"

The directory /var/log/atop exist, but it's empty.

#11 Fri, 10/25/2013 - 11:16

Locutus

For me, the logfiles are named /var/log/atop.log (latest), /var/log/atop.log.1 /var/log/atop.log.2 and so on. So you do atop -r /var/log/atop.log to get the latest recorded data.

#12 Fri, 10/25/2013 - 11:51

webwzrd

Thanks,

Looks like mine was setup differently. I have Centos stock yum install from the atomic repo. Guess I'll need to read up on it and get it logging properly.

#13 Wed, 10/30/2013 - 07:30

webwzrd

Best I can tell now, I'm maxing out my 8gb of ram and I suspect that is causing the server to be slow at time. The atop shows 1gb of swap being used and the dsk busy # was red a lot while uploading a weekly 45gb backup from another server which took 16 hours.

I have adjust httpd.conf to:

StartServers 8 MinSpareServers 5 MaxSpareServers 20 ServerLimit 50 MaxClients 50 MaxRequestsPerChild 4000

Which seems to help a little in lowering the number of processes, but I've still seen them creep up into the mid 500's. When I restart Apache, I tend to drop 150-250 processes and free up 5gb of ram.

Obviously moving to another server could be an option, but would appreciate any fine tuning tips to help this box run smoother.

#14 Fri, 01/24/2014 - 14:47

webwzrd

Just wanted to add a quick follow up to the issue I was having. I finally put together some compelling evidence of the issue and gave it to my datacenter. Here what they ended up doing...

Your server had encountered the same issues after the cable and port change. I then attempted reloading your networking driver, and the speeds increased back to where they should be.

To hopefully prevent this from happening again, I have updated your e1000e driver to the latest available version. Your connection is still working as expected after this upgrade.

Topic locked