Submitted by responsiveny on Tue, 02/24/2015 - 23:20
Hi,
We have a new Centos 7 x64 Virtualmin server running on an AWS m3.medium instance. Memory has not been a problem. It has 4GB and we never see it go into swap, usually hovering around 1.9GB used.
Today I noticed all sites were down and the mysqld service stopped due to oom-killer.
I would be interested to know why this would happen? Is this something you've seen before? This server used to run on Centos5 for years on 4GB and this never happened once. I am regretting going to Centos 7 a bit now. Should we consider building a new Centos 6.5 server instead?
As I type this webmin was just oom-killed...
Status:
Active
Comments
Submitted by andreychek on Tue, 02/24/2015 - 23:24 Comment #1
Howdy -- we haven't heard of stability problems with CentOS 7.
If you're seeing processes being killed by oom-killer, it sounds like there may be some sort of resource issue that's occurring.
What is the output of this command:
free -m
Also, I'm curious if Apache is throwing any errors that would indicate heavy traffic... what is the output of this command:
tail -30 /var/log/httpd/error_log
Submitted by responsiveny on Tue, 02/24/2015 - 23:30 Comment #2
Thanks for the help, as usual. Since I'm going to be sending logs, can you make this private?
Submitted by responsiveny on Tue, 02/24/2015 - 23:32 Comment #3
By the way, looking thru /var/log/messages I see there have been quite a few process that have been oom-killed in there including mysqld. But that never was noticed because it seems like it somehow restarted itself? Is there some mechanism for mysqld to restart if it's oom-killed?
Submitted by responsiveny on Tue, 02/24/2015 - 23:42 Comment #4
Ah, I just investigated and found that mysql_safe will restart mysqld if it goes down. Except that one time it didn't for some reason... So that explains why I never noticed it...
Submitted by andreychek on Tue, 02/24/2015 - 23:51 Comment #5
You may want to find a way to monitor the memory usage over time... it really does look like you're seeing a number of resource related problems. Apache shows some issues with FCGID not being able to spawn due to low memory.
I'm wondering if you're seeing occasional memory spikes of some kind. If that's true, you might want to monitor both the process list and the memory usage.
Also, I just wanted to verify -- which kernel are you using? You can determine that with the command "uname -a".
Submitted by responsiveny on Tue, 02/24/2015 - 23:53 Comment #6
I'll be keeping my eye on it
Submitted by responsiveny on Wed, 02/25/2015 - 00:15 Comment #7
Ah, I see SMTP (postfix) is using a very high amount of memory compared to what it usually uses. Its using 250MB and consuming a lot of the CPU. On the old server is averaged only a few MB, like 2-3 MB.
Any idea what could be causing so much resource to be devoted to SMTP? This server does not allow inbound email on port 25, only outbound.
Submitted by responsiveny on Wed, 02/25/2015 - 00:16 Comment #8
...and 60 instances of SMTP running... hmmm
Submitted by andreychek on Wed, 02/25/2015 - 00:30 Comment #9
You may want to review the mail queue to see if there are any outgoing messages. You can see it in webmin - servers - postfix - mail queue.
Also, you could look at the mall logs as well for clues as to what postfix is doing.