I haven't made any modifications to my server in months. Today I tried to go to Virtualmin but my browser said it couldn't establish a connection on that port. I ssh'd in and everything seems normal when I run top (though I'm no expert, but no unusual memory usage). When I try running "/etc/init.d/webmin start" the Virtualmin page pulls up for a moment in my browser. As soon as I try to login the connection is reset in my browser. There are no errors on the command line, unless I try to restart Webmin, where it lets me know it's not running before starting it. What should I try to keep it up and staying up like usual? I thought about rebooting the server but I don't want to have any issues on reboot that would cause other services like email and websites to go down (if my Webmin install is in need of help).
Might be a memory or resource issue, killing the Webmin process. You might want to check your syslog for respective messages, also
/etc/webmin/miniserv.log
might contain useful information.Check
free
for available memory. If you're on a OpenVZ VPS, check/proc/user_beancounters
.It would appear email spam is up quite a bit today too, perhaps coincidentally?
@Locutus There is no miniserv.log to be found in that location, and I am not on OpenVZ VPS
I'm sorry, stupid me. :) The log is at
/var/webmin/miniserv.log
of course.The increased spam mails can possibly cause more resources to be used, if you're e.g. using SpamAssassin in standalone mode (which will spawn an SA process for each mail).
Please check the other things I mentioned (syslog,
free
). You can also use the toolatop
which records historical performance data like which process uses how much memory/CPU etc., to find potential resource leaks.@Locutus I think we may be off on the wrong track suspecting memory usage, as running "top" shows nothing unusual, and I checked my host's admin panel (at Media Temple) and looked at my VPS memory and CPU usage live, today, over the week, and over the month, and there are no spikes at all. The server and all services appear to be running normally other than the fact Virtualmin -- out of nowhere -- no longer pulls up (not for more than a moment, anyway!) One user of three (this is a small server!) reported a surge of spam emails, but the other accounts have been fine (all use SSL and very strong passwords).
@Locutus I have found the miniserv.log but there are no unusual or very recent entries. I can't recall what mode in which I have SpamAssassin running... it's the mode that takes the least virtual server memory. To check "free" do I just type "free", like running "top"? And can you point me in the direction of my syslog (I'm accustomed to using Virtualmin and Webmin for most tasks, although I can at least ssh and su in as root, but I'm a bit out of my depth after that). The logwatch (which I have set to detailed reporting) this morning showed nothing unusual other than 59 emails delivered to "root", which usually receives no emails unless something is wrong. Unfortunately, I only know how to check those through Webmin (via the Read User Mail server), and of course I still can't pull up Webmin :( I was tempted earlier to reboot the server via Media Temple's VPS control panel, but I don't want to have any Webmin-related bootup issues that take existing services (email/websites) offline.
Ah, something new in my troubleshooting! Spam Assassin is not running at all, I can see in my email headers that it has stopped along with Webmin/Virtualmin. So that explains the sudden influx of spam to one email account. I know "/etc/init.d/webmin start" works to try to start webmin (though it is not working in my case, or at least, webmin isn't staying up more than a split second or two). I'm not sure what command to use to try to restart Spam Assassin though (even after some research on the web)? I believe it is operating as a separate process, maybe spamc or spamd? Of course the main issue is still the lack of Webmin/Virtualmin all of a sudden, but Spam Assassin is a pretty important process too, and it's odd they're both down together, while email and sites continue to run as normal...
There might be resource issues even though you don't see them in
free
, i.e. if processes have already been killed. But that's just a guess of course. It's certainly not normal that Webmin and SpamAssassin simply stop running.The syslog is usually located in
/var/log/syslog
or/var/log/messages
, depending on your distribution. Check that first and look for crash or OOM messages.Also check those 59 emails sent to root. They should be located in
/root/Maildir/new
or/root/Maildir/cur
.I'll check the root emails in just a sec, and the logs, but quickly, the Logwatch this morning looked a lot stranger, with ClamAV in trouble:
--------------------- clam-update Begin ------------------------
The ClamAV update process was started 1 time(s)
Last ClamAV update process started at Thu Dec 26 03:31:06 2013
Last Status:
main.cvd is up to date (version: 55, sigs: 2424225, f-level: 60, builder: neo)
Downloading daily-18284.cdiff [100%]
Downloading daily-18285.cdiff [100%]
Downloading daily-18286.cdiff [100%]
Downloading daily-18287.cdiff [100%]
WARNING: [LibClamAV] mpool_malloc(): Can't allocate memory (262144 bytes).
WARNING: [LibClamAV] cli_mpool_strdup(): Can't allocate memory (24 bytes).
WARNING: [LibClamAV] cli_loadhash: Problem parsing database at line 52176
WARNING: [LibClamAV] Can't load daily.mdb: Malformed database
WARNING: [LibClamAV] cli_tgzload: Can't load daily.mdb
WARNING: [LibClamAV] Can't load /var/lib/clamav/clamav-ba720c437667db49a41f36fdea54b7d8.tmp/clamav-00990949e1715fc6913f79e25927592d.cld: Malformed database
ERROR: Failed to load new database: Malformed database
ERROR: During database load : ERROR: Failed to load new database: Malformed database
WARNING: Database load exited with status 55
ERROR: Failed to load new database
The following ERRORS and/or WARNINGS were detected when
running the ClamAV update process. If these ERRORS and/or
WARNINGS do not show up in the "Last Status" section above,
then their underlying cause has probably been corrected.
ERRORS:
During database load : ERROR: Failed to load new database: Malformed database: 1 Time(s)
Failed to load new database: 1 Time(s)
Failed to load new database: Malformed database: 1 Time(s)
WARNINGS:
[LibClamAV] Can't load /var/lib/clamav/clamav-ba720c437667db49a41f36fdea54b7d8.tmp/clamav-00990949e1715fc6913f79e25927592d.cld: Malformed database: 1 Time(s)
[LibClamAV] cli_mpool_strdup(): Can't allocate memory (24 bytes).: 1 Time(s)
[LibClamAV] mpool_malloc(): Can't allocate memory (262144 bytes).: 1 Time(s)
[LibClamAV] cli_tgzload: Can't load daily.mdb: 1 Time(s)
[LibClamAV] cli_loadhash: Problem parsing database at line 52176: 1 Time(s)
Database load exited with status 55: 1 Time(s)
[LibClamAV] Can't load daily.mdb: Malformed database: 1 Time(s)
---------------------- clam-update End -------------------------
--------------------- Clamav Begin ------------------------
Daemon check list:
Database status OK: 144 Time(s)
---------------------- Clamav End -------------------------
By the way, I'm running the latest Cent OS 6.
So that bit I just pasted above about ClamAV I also found in my system log, but today's entries show ClamAV is fine again, and loaded its database ok. I see nothing else in /var/log/messages other than these ClamAV entries over the last week, all of which looked normal except the one pasted above (the second-to-last one in the main system log for Dec. 26). Oddly, I'm only seeing ClamAV entries in this system messages log, at least over the last week, but perhaps that's normal.
I will look at those root emails next, as I appeared to get 124 of them overnight in addition to the other 59!
So the first system message:
postfix::is_postfix_running failed : Failed to query Postfix config command to get the current value of parameter process_id_directory: at ../web-lib-funcs.pl line 1376.
Actually it's looking like all 59 + 124 messages are along those lines, though I'm just checking a few randomly right now.
Should I try just rebooting the server via Media Temple's control panel? It's been running well for awhile now (it'd actually been 90 days of uptime or so when I last looked at it the the other week and ran some backups for the end of the month... I've been running Virtualmin/Webmin very happily for over a year now, with the server updating itself, and it gets very little usage, just a couple up-to-date Wordpress sites, some static sites, and a bit of email). I only hesitate to reboot as I can at least SSH in right now, and I'd hate for something to go terribly wrong, and wish I had spent more time troubleshooting while I still had a way in!
I just typed "free" as well (and all those root emails do seem to be about Postfix):
total used free shared buffers cached
Mem: 3774872 930212 2844660 0 0 318108
-/+ buffers/cache: 612104 3162768
Swap: 0 0 0
Please enclose all screen listings in
[code]
[/code] tags, otherwise monospace font and linebreaks are lost, making it unreadable.The memory errors you receive from ClamAV are odd, considering your "free" shows enough memory. It might be a hardware issue of your server. Is it a physical or virtual machine?
You can try rebooting it. Using
atop
you can record historical memory usage data, to see if at the time of problems occurring there's a memory issue.About the Postfix error, Eric or someone else from the Virtualmin team would have to say something. You can try running
postconf
(if that's the "Postfix config command" they're talking about) and see if it works.The atop command doesn't appear to be installed on my system.
It is a virtual machine on Media Temple's VPS service. I shudder to say that's it's inside a Plesk/Parallels virtual container of some sort (I'm a refugee from Plesk's control panel!)
Running postconf appears to work fine, I get a whole bunch of output in my terminal.
I thought those memory errors odd too, though they cleared up over the day as this morning ClamAV had no such trouble. Still no Spam Assassin or Webmin/Virtualmin running though! I'm a little concerned still about rebooting in case I have more trouble. Should I wait to hear from Eric on this forum before rebooting?
Sorry I just saw your note on enclosing tags with code, I knew I was doing something wrong there, I'll do that with any future lines of code to make them more readable!
Eric might be able to say more, yeah, since I'm not familiar with CentOS or Plesk. He also has more experience with (resource) issues on several virtual machine hosters.
I will await Eric's feedback here then... just in case there's something we're missing to check before rebooting. Perhaps rebooting will cure everything magically, but in my experience (mostly with Plesk long ago!) rebooting while other things are going wrong is not always wise, as one can lose one's access to the server, and it seems with Linux that most ailments can be cured over SSH and without a reboot.
The server throughout this period has been performing quite normally, I should add... no slowdown that typically accompanies memory issues. Just a lack of Virtualmin/Webmin and SpamAssassin these last few days, which is rather worrying of course, but you'd never know it from accessing the mailserver and websites.
Do you have a /proc/user_beancounters file? If so, could you post it's contents?
-Eric
Here is the contents of the /proc/user_beancounters file:
There's about a million failures for private virtual memory page allocations, so it indeed it is a memory related issue.
I guess you need to save memory in Virtualmin, or ask your hoster to increase the "privvmpages" limit for you. Eric might have some ideas too, since I'm not familiar with this kind of virtualization.
(Didn't know that other services besides OpenVZ use the beancounter file, otherwise I'd have asked for it immediately without the "if you're running under OpenVZ" restriction.)
If the memory is now free, do you have any ideas why Virtualmin/Webmin wouldn't be staying up after I run
/etc/init.d/webmin start
? Or is trying to pull it up perhaps responsible for the memory spike recorded in this beancounters output? It seems like rebooting might help if it's a memory related issue? Assumedly SpamAssassin is down because its process is resource-intensive too? I've never had similar trouble over the past year+ (everything typically ticks along fine around 1GB of memory usage out of my guaranteed 2GB... in fact I already have the server tuned for minimal memory usage as I used just to have 1GB of guaranteed memory before I got a free upgrade).Unfortunately I'm not really familiar with OpenVZ and related virtualization systems that use the beancounters file, so I can't really say why the memory allocation requests fail. Eric might be able to say more about this.
The general consensus and suggestion is (considering the myriad of problems we've seen in this forum that are related to OpenVZ-like systems) to not use such a virtualization hoster with Virtualmin.
It might be as simple as asking your hoster to increase the beancounter limits that fail for you. It might also be that there's no real solution and you need to find another hoster. But as I said, I can only guess here, Eric might know more.
Thank you @Locutus for taking your time during the holidays with this thread... and you were right all along about it being a memory/resource issue :) The latest message to root was
fatal: couldn't execute /usr/bin/gpg: Cannot allocate memory
Rebooting the server brought Virtualmin and SpamAssassin back up and everything is running as normal again.Howdy,
Yeah, as Locutus mentioned, you are seeing resource failures with your VPS.
It appears that your provider is either using OpenVZ or Virtuozzo, and those VPS types can have issues where even if you aren't technically out of RAM, if you're using what they call "burst memory", memory can be taken away from one of the processes on your server using that RAM to give to another user on that host, if they need it.
What you'd want to do is ask your provider for more guaranteed RAM, as you seem to frequently be running out of RAM.
Each time a failure shows up in that user_beancounters file -- that failure represents a process that may be killed off due to a resource problem.
-Eric
I've noticed some oddities actually in Virtualmin with my reported memory usage. It was after some upgrade or other (some Virtualmin upgrade in the past maybe six months or so?) The issue is that Virtualmin started reporting real memory usage at half its actual usage. Right now it says:
Real memory 3.42 GB total, 590.42 MB used
but in fact I'm using twice that much memory. The memory available is correct, however... I'm actually supposed to get 2GB of memory guaranteed, up to 4GB burstable, and it never appears (in the historical data within Media Temple's control panel) as if I've gone over 50% usage of my guaranteed allocation, so it's odd that the beancounters file indicates these RAM usage problems... but perhaps it's that Plesk Virtuozzo software, I'm not any fan of their software.Here's the current state of the beancounters file, no failures so far... I'll definitely keep an eye on this file from time to time now I know it exists!
Depending on usage (number of websites, use of PHP, FTP uploads, incoming emails, spam and virus scanning, other general activity), the server's memory usage could over time easily peak over 2 GB, even if the average is lower. "Burstable" memory is very unreliable, as you've seen, and can lead to processes being killed randomly if they hold that memory for too long.
Having burstable memory is even counter-productive in this case. The OS sees that it has 4 GB of memory, and wants to make use of it. It does not know that half of that memory is dangerously unreliable.
Also note that you're already using about one third of the allowed privvmpages, so with some time of usage, the limit could be reached again.
I suppose the only way to somewhat reliably prevent that (aside from not using Virtuozzo/OpenVZ) would be to regularly reboot the server. Or have a script monitor the beancounters and reboot the server if some limit is being reached. Both of which I'd not recommend for serious web hosting.