load average very high with peaks, server far too slow

30 posts / 0 new
Last post
#1 Mon, 11/15/2010 - 06:33
lex

load average very high with peaks, server far too slow

Hi, I don't know where to look anymore. Sometimes, load on the server shoots up, right now it did to 40 and stays there for a while. Then, normally, it goes back to normal levels (2 or 3 or so)

I 'did a netstat' and I'll attach it here.

It looks like there a lot of attempts to connect to dovecot.

If I 'disable' a certain 'server' (domain) on the server, then the load goes back to normal.

At least it seems it does, it's hard to tell really as it's not very predictable when the load will go up or down, so you never know for sure what caused the change.

I've blocked a few more ip addresses via iptables, but I'm not convinced that does the trick really.

My question is, what is the best way to troubleshoot this and make sure the server responds in a normal way, always?

Thanks,

lex

Mon, 11/15/2010 - 07:26
Locutus

What "load" exactly are we talking about here? CPU? Network traffic? Connections per second? What unit exactly is "40" and "2 to 3"?

Mon, 11/15/2010 - 09:13
andreychek

I suspect the units he's referring to are the output of "uptime"... so 40 is pretty high in that case.

When the load spikes, try running "top"... are there some processes that just seem to sit there at the top of the list, consuming a lot of resources?

Also, review the output of "mailq" -- are there a lot of emails in the queue? A lot of emails sitting there could mean someone is making heavy use of a newsletter. It could also mean a spammer figured out how to send email through someones account :-) So you'd want to make sure the mail in the queue seemed legitimate.

What also might be helpful is to run "ps auxw", and attach that output here. That will show a list of running processes, and we might be able to figure out if any seem out of the ordinary.

-Eric

Wed, 11/17/2010 - 05:18 (Reply to #3)
lex

Hi and thanks for all your answers!

It's with 'top' that I see that the load is high, and mostly a lot of 'apache2' show up. More than normally so to say.

I did as suggested, and will attach both ps auxw and mailq during a high peak just now. (however, I did so as well in my first post in this thread with netstat, but only see the file when I edit that message, not here in the thread... Might be me, but just wondering if you people can see the files I attach here.)

Wed, 11/17/2010 - 05:22 (Reply to #4)
lex

I now see the attachments here, but not the one in the thread starter (but I do see it when I click the 'edit' button). So I'll attach the netstat one here too.

Lots of imap and dovecot

If it's a spam thing, how do I find out where it all happens and how to stop it?

These 'peaks' normally last a few minutes and then it all goes back to normal some how.

Mon, 11/15/2010 - 10:37
Locutus

Oh hell, yeah I just read up on the meaning of the "load average" output. As far as I understand it, a value of "40" kinda means that the system way overloaded about 40 times. :-) ("1" would mean that the CPU was busy working on one process all the time, while "2" means that while the CPU was loaded, another process was waiting to get CPU cycles all the time, and so on.)

And I agree, check the process list during "overload" times. atop is a nice alternative to top too, which has a somewhat better-arranged output, since it can be configured to only show processes that produce a notable load.

Some hints about atop: Press "i" to set a new refresh interval. "a" toggles between "show all process" and "show only those with load". "t" triggers refresh manually. "?" shows a help screen.

Wed, 11/17/2010 - 08:31
andreychek

Okay, so it looks like you have 800+ emails in your mail queue. I'd classify that as quite a bit :-)

The next step is to determine if they're legitimate or not. To do that, you can log into Virtualmin, and go into Webmin -> Servers -> Postfix -> Mail Queue, and click on some of the messages.

Does the message appear to be a newsletter or mailing from one of your users? Or does it look "spammy"?

-Eric

Wed, 11/17/2010 - 08:33
Locutus

There's a lot of connections to your web server. You might check your Apache logs for details, which URLs have been requested.

Wed, 11/17/2010 - 10:20
ronald
ronald's picture

By reading the thread I draw a conclusion that a user sends a newsletter and recipients vist his/her website.
If that conclusion is right or false needs to be seen.

If I send out a newsletter to 4500+ recipients, it is not even noticible..
What specs does the server have?

Wed, 11/17/2010 - 14:23
Locutus

Indeed, a mailserver should not cause such a noticeable CPU load when sending out mails. The ps output implies that it's Apache causing the high CPU and memory load. It seems like a lot of people are requesting web pages at the same time, which would also explain the high number of connections to port 80.

Hint: The atop application, in mode "p", will show all resources used by one application accumulated over all processes.

Interesting is the high count of Dovecot processes doing "pop3-login" and "imap-login". How many users does the server have that might concurrently be reading their mail? :)

Wed, 11/17/2010 - 17:36 (Reply to #10)
lex

mail: there are only a few sites on the server and they don't use mail as such at all. Well, newsletters are sent out daily by 3 of the 5 sites: at 9:00 in the morning (2) and one at 15:00. But that's all. But the site owners don't use the domain for receiving mail. So yes, those pop3 logins and imap logins are worrying.

The server is old(ish), and I'm getting a new one. However, I don't think it's (just) that as it's not newsletter related. If i send out a newsletter from my own domain on the server, to a bit over 10.000 people, (all personalized), then the load goes up, but not such as I'm showing here in these examples.

I'll check the apache logs

I'll check the mail too

Thanks people, your help is appreciated a lot!

Wed, 11/17/2010 - 19:04
Locutus

Actually, your netstat shows only three direct connections to IMAP, and none to POP3. Are you maybe running webmail via your Apache? That might explain the great lot of Apache connects in combination with the Dovecot processes.

If the problems with mail logins persist, you might turn on the Linux firewall and have it record a log of connections, in addition to what the other servers themselves log.

Thu, 11/18/2010 - 04:39
lex

Thanks for your post. I'll check if I can find info on how to see if webmail is up and running and how to enable a linux firewall.

Thu, 11/18/2010 - 05:09
lex

I've got these servers up and running:

Apache Webserver BIND DNS Server MySQL Database Server Postfix Mail Server ProFTPD Server Procmail Mail Filter Read User Mail SSH Server SpamAssassin Mail Filter Virtualmin Virtual Servers (GPL) Webalizer Logfile Analysis

Didn't see anything in postfix about some kind of webmail, so looking at this, I'd say I have no webmail up and running. But then, I might be totally wrong.

At SMTP Client Restriction postfix is 'allowing all clients'. can that have anything to do with this?

About Linux Firewall: I've got iptables up and running. I'll check about logging.

Thu, 11/18/2010 - 07:16 (Reply to #14)
Locutus

Didn't see anything in postfix about some kind of webmail

Webmail isn't something you configure (or see) in Postfix actually. :) It's basically web code - based on PHP or Ruby or similar - that connects, as if it was a normal mail client, to the mail server via IMAP.

At SMTP Client Restriction postfix is 'allowing all clients'. can that have anything to do with this?

Should not. That setting can be used to early-reject clients e.g. by IP range, or by DNS Blacklist or similar.

About Linux Firewall: I've got iptables up and running. I'll check about logging.

Goodies. Check the Webmin iptables module, you can quite easily add a rule to log stuff there. Make sure to log only packets with the TCP SYN flag set, otherwise it will log ALL traffic (as in all packets), and not just those that initiate a new connection, which will quite probably flood your logfile.

Thu, 11/18/2010 - 09:42 (Reply to #15)
lex

Just a quicky:

if i didn't install the iptable module in webmin, but just installed it on the server (using ssh), can I add the module to webmin and will it pick up what's on the server already or will it mess up things?

Thu, 11/18/2010 - 10:09
andreychek

You should be able to setup firewalling in Webmin without an additional module... just take a peek in Webmin -> Networking -> Linux Firewall.

As an aside, if you haven't already, I'd definitely take a look at some of those messages in your mail queue... if those are spam of some sort, you'd want to clear them out. You can do that from Webmin -> Servers -> Postfix -> Mail Queue.

If they're spam, that may mean that a spammer is sending those though a security vulnerability in one of your web apps. And in that case, a firewall isn't likely to help your issue, since all a firewall would see is incoming web traffic.

The key would be to determine what web app is the culprit, and fix the security issue.

When looking at messages in your mail queue, click the "View all headers" option on the right, and look for the "Received" header at the top. It should say what userid it was received from, if it was generated from your server. That should help you track down the culprit.

-Eric

Thu, 11/18/2010 - 11:12
Locutus

I agree with Eric there... Make sure no spammer can abuse your server. Such a thing can result in quite some trouble if it persists.

The firewall suggestion was rather meant to log connections to syslog (which is one of iptables' features) and find out which IP addresses connect to where, to have something to work with.

Thu, 11/18/2010 - 18:36 (Reply to #18)
lex

Hmm, let's see if i can explain what I've just seen.

So, a lot of these messages are messages of my server, saying

"This is the mail system at host server2.penghost.co.uk.

I'm sorry to have to inform you that your message could not be delivered to one or more recipients. It's attached below"

The original (attached) message is a, wait, example:

Received: from SQANHQY (unknown [221.207.145.66])by server2.penghost.co.uk (Postfix) with ESMTP id 6083644F2for ooi@deining.org; Wed, 17 Nov 2010 05:29:20 +0000 (GMT) Received: from [221.207.145.66] (port=2479 helo=039)by smtp.secureserver.net with asmtp id 78851A-0009E4-01for ooi@deining.org; Wed, 17 Nov 2010 13:18:43 +0800 Message-ID: <1173CF2BB0AC47DA8C780D89A765B3D8@039> From: "Millie Proctor" currentsv70@stopbeingatool.com To: ooi@deining.org

Now, 'deining.org' is only a parked domain really, pointing people to some group site somewhere (just like ning). Anyway, I checked the site and there's nothing there to abuse really. (It's not in virtualmin yet, maybe I should import it into virtua

In fact: mail isn't used at all for that domain so I should 'just' switch it off somewhere really.

Others are like this:

"This is the mail system at host server2.penghost.co.uk.

I'm sorry to have to inform you that your message could not be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to postmaster."

attached e-mail: Received: by server2.penghost.co.uk (Postfix)id E1295429F; Wed, 17 Nov 2010 10:55:40 +0000 (GMT) Delivered-To: gci@server2.penghost.co.uk Received: from pc200912031808 (125-230-124-118.dynamic.hinet.net [125.230.124.118])by server2.penghost.co.uk (Postfix) with SMTP id 3A6F74279for lex@gran-canaria-info.com; Wed, 17 Nov 2010 10:55:39 +0000 (GMT) Received: (qmail 9564 by uid 564); Wed, 17 Nov 2010 18:44:39 -0800 From: "Free ViagraAndCialis" nagoya3P956@wixgame.com To: lex@gran-canaria-info.com

The 'to' one, that would be me.

So I guess, if I could switch off the server2.penghost.co.uk sending those "your message could not be delivered" messages, that would help already a bit, no?

Fri, 11/19/2010 - 04:05
Locutus

So those mails in your queue are basically "undeliverable" replies from the mail-daemon due to nonexistent local email addresses? The "From" lines in the bounces look like it was an attempt to send spam to them. :)

It's a bit odd. Delivery attempts to unknown local addresses should be denied with a 550 error code during delivery, and not trigger a bounce-mail. You might want to check your /var/log/mail.log at the time when such a mail comes in, there might be hints there why that happens.

Fri, 11/19/2010 - 06:22 (Reply to #20)
lex

So it's not a setting somewhere that I've set wrong or so? (And that can be easily set as how it's supposed to be set ;)

Fri, 11/19/2010 - 06:49
Locutus

I don't know, that's why I suggested checking the logfile. :) Without further information about why those non-deliverable mails are sent, it's hard to say.

Mon, 11/22/2010 - 15:58 (Reply to #22)
lex

well i've been lookig at the log file, but to be honest, i don't know what i'm supposed to be looking for and it all looks as suspicious to me...

If somebody cares to have a look, I've put the last 1000 lines in an (attached) text file.

What I will do now is look at the mailq again and see if I can match it with something in the log.

I'll be back!

Wed, 12/01/2010 - 08:53 (Reply to #23)
lex

Hi Locutus,

I've attached a bit of mail log in this thread, would you mind having a look?

Thanks!

Wed, 12/01/2010 - 17:56 (Reply to #24)
Locutus

Well, that logs looks indeed quite "severe". Seems your server is trying to send out mail to random addresses all over the world. Question is whether those are falsely generated non-deliverable reports, or if your server is being abused by a spammer. Hard to say anything in particular there.

As first step I'd clean up by deleting the whole outgoing queue, and then watch the logs closely as mail traffic continues. You might post a new log when such a problem occurs again, from the "very beginning", as in from the mail delivery that initially triggers it.

Thu, 12/02/2010 - 04:10 (Reply to #25)
lex

Ok, thanks Locutus, I'll try this.

Mon, 11/22/2010 - 16:04
lex

Well:

mail.log:

Nov 22 07:15:19 server2 postfix/smtp[18428]: 01F35400B: to=mapplethorpet7@1888bestbuy.com, relay=mailstore1.secureserver.net[216.69.186.201]:25, delay=80394, delays=80392/0.08/2.1/0, dsn=4.0.0, status=deferred (host mailstore1.secureserver.net[216.69.186.201] refused to talk to me: 554-m1pismtp01-021.prod.mesa1.secureserver.net 554 Your access to this mail system has been rejected due to spam or virus content. If you believe that this failure is in error, please submit an unblock request at http://unblock.secureserver.net)

(I got this message in the log every hour or so)

mailq:

01F35400B 2010/11/21 08:55 MAILER-DAEMON mapplethorpet7@1888bestbuy.com 5.20 kB host mailstore1.secureserver.net[216.69.186.201] refused to talk to me: 554-m1pismtp01-008.prod.mesa1.secureserver.net 554 Your access to this mail system has been rejected due to spam or virus content. If you believe that this failure is in error, please submit an unblock request at http://unblock.secureserver.net

Tue, 11/23/2010 - 09:05
andreychek

Okay, I'm not sure what caused that, but I think I fixed your two posts.

I already contacted Joe to see if we can prevent that from happening in the future :-)

-Eric

Tue, 11/23/2010 - 13:22
lex

Thanks for that Eric, now let's see if someone sees anything useful in the mail logs...

Wed, 12/01/2010 - 22:49
08a4210
08a4210's picture

Wow, This forum support is great. Specially some guys are helping a lot

Topic locked