Upgrade to Virtualmin 3.70 brought down my server

Hi,

FYI: some seconds (~20) after tapping on the "Upgrade all" link for Virtualmin 3.70 at the iPhone my server did not respond at any port. Trying to log in at the console (XEN's xm console) I only saw:

gaea:~# xm console hosting
* Stopping web server apache2
                               * Stopping web server apache2
                                                             * Starting web server apache2

Ubuntu 8.04.2 hosting.fuerstnet.de tty1

hosting.fuerstnet.de login: fuerst

gaea:~# xm console hosting
Out of Memory: Kill process 2139 (mailmanctl) score 11122 and children.
Out of memory: Killed process 2162 (python).

Seems like something went crazy after or while the upgrade. I did upgrade from Virtualmin 3.69

After issuing xm shutdown the server went down circa 10 minutes later. It came up well having this packages installed:

ii  usermin-virtual-server-mobile          2.1  
ii  usermin-virtual-server-theme           6.8  
un  virtual-mysql-client                   <none>
un  virtual-mysql-server                   <none>
ii  virtualmin-base                        1.0-20Virtualmin.
iF  webmin-virtual-server                  3.70 
ii  webmin-virtual-server-mobile           2.1  
ii  webmin-virtual-server-theme            7.1  
ii  webmin-virtualmin-awstats              4.1  
ii  webmin-virtualmin-dav                  3.0  
ii  webmin-virtualmin-htpasswd             2.0  
ii  webmin-virtualmin-mailman              5.5  
ii  webmin-virtualmin-notes                1.2  
ii  webmin-virtualmin-password-recovery    1.3  
ii  webmin-virtualmin-registrar            1.9  
ii  webmin-virtualmin-signup               1.2  
ii  webmin-virtualmin-styles-oswd          1.0  
ii  webmin-virtualmin-support              1.6  
ii  webmin-virtualmin-svn                  4.3  

I tried to run aptitude update which told me:

E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct the problem.
E: Couldn't rebuild package cache

Running dpkg --configure -a filled the memory and swap very fast until I was able to CTRL-C the dpkg process. See the attached screenshot for a htop output.

Can you help me to get the server to a normal state again?

TIA Bernhard

Status: 
Closed (fixed)

Comments

Joe's picture
Submitted by Joe on Sat, 07/04/2009 - 15:48 Pro Licensee

I'm not sure what's happening there. But, I do recall the postinstall process of the virtual-server module did do something wonky during the last upgrade cycle...I think maybe we have some sort of bug in the Debian packages, but I don't know what. The postinstall, at least, seems to return false and makes dpkg unhappy.

As for the memory usage, I have no idea. Could you look at top and sort by memory usage and add that here? (Start top and press "Shift-M" to switch to sort on memory)

I'm pretty sure the upgrade of the packages we just rolled out would not impact Apache, unless the OOM (out of memory) killer kicked in because of memory usage during the upgrade. I don't think any of our postinstall scripts mess with Apache at all.

I downgraded to Virtualmin 3.69 which went well. Trying to upgrade again put the memory high again. Only process responsible for that is /usr/bin/perl /usr/share/webmin/run-postinstalls.pl virtual-server

See the attached screenshot for the memory usage. Since that machine is a production server I stuck to Virtualmin 3.69 so far.

BTW: I do not see any problem related to Apache too.

Hi both,

Just to add that the exact same thing happened to me and during 'upgrade all' the entire server went down big style - swap filled in seconds. Rebooting via console just did the same thing again, and it was a bitch to shut down and restart as well.

Server was unrecoverably b0rked, and I have had to load a backup and boot from that (luckily I had a full image from just a few hours before the upgrade killed the server).

I'm also on XEN (at Linode) and also on Debian (Lenny).

Bad bug this one. Should have been caught before release.

The Virtualmin screen is again saying there are updates available obviously, but I'm not touching that upgrade again for obvious reasons.

Wow, having postinstall.pl using 2gb of ram is surprising! Any chance I could login to this system and see what is going wrong?

Rogi - which packages did you upgrade? Was it just the Virtualmin updates, or were others included too?

I have a few test systems that were upgraded to 3.70 with no problems.

Rogi - which packages did you upgrade? Was it just the Virtualmin updates, or were others included too?

I have a few test systems that were upgraded to 3.70 with no problems.

Hi there Jamie,

I only upgraded the Virtualmin packages that were shown on the VM login screen - I just clicked 'update all' as usual, and then things went very haywire very quickly.

Nothing else was updated at the same time, the server was, and still is other than your upgrades, fully up to date.

I only have 540 megs memory btw, so if it was OOMing the above 2 gig system, well mine wouldn't have stood a chance.

I'm just running (what was) my backup for now with VM 3.69 until you figure out the problem. You are welcome to login and see what's what as the upgrades run (and go wrong), but I can't do that until mebbe 12 hours or so from now as there's some stuff that can't be interrupted going on at the moment.

Hi Rogi,

If I can login and re-try the upgrade (while watching to kill it if it runs out of RAM), that would be very useful in debugging this issue. You could send me login details at jcameron@virtualmin.com. Also, let me know when is a good time to re-try the upgrade..

Same here....

/usr/share/webmin/run-postinstalls.pl hangs

Oh and I have 2gigs ram and it ate ever bit up. OS is debian etch.

Hmm only 1 server had this issue with me. Just one failed and hanged which is really weird.

Ok, I found the cause of this - it was an infinite loop inside the Virtualmin code that is only triggered when a certain configuration is set. The quick fix is to edit /etc/webmin/virtual-server/config and remove the line starting with :

scriptwarn_url=

You can then upgrade safely. Let us know if that helps..

A new minor release will be out shortly that will include this fix.

yup that was it - hope you can fix this bug

I have a fix for it already, and have attached a patch for the fix to this bug report. We should have the 3.70-2 minor release out soon ..

How did this happen anyway ?

Joe's picture
Submitted by Joe on Sun, 07/05/2009 - 03:21 Pro Licensee

How did this happen anyway ?

A security fix (which will be discussed in detail in the release notes; in short, it is a local exploit that requires tricking the administrator into performing seemingly harmless actions, but still potentially serious enough to warrant a rapid release) led to a synchronous (and faster than usual) release with the GPL version. We usually release the Open Source version first, and let it stew for a day or two. Open Source users are (rightly) more forgiving of bugs like this.

Our automated and human testing simply did not reveal this problem, as none of our test systems have this option enabled. It's a very specific case, and with the number of options Virtualmin has, testing them all would take weeks (which is longer even than our release cycle, so we'd be developing more than twice as slowly). That said, it has been added to the automated tests, as all bugs of this nature are...so regressions in the future that happen to be triggered with this option will be found during automated testing. In short, we can't fix bugs we don't know about, and this one was invisible to our test suite.

Since security issues are historically very rare in Webmin, Usermin, and Virtualmin, the rushed release schedule is very unlikely to be repeated very often.

All we can do is apologize and roll the fix as quickly as possible. The fixed version is being uploaded to the repository right now.

Hi Joe,

Which option being enabled is causing the upgrade to fail?

Joe's picture
Submitted by Joe on Sun, 07/05/2009 - 04:02 Pro Licensee

scriptwarn_url

But it doesn't matter. The new version is rolling as we speak. It will be available within minutes.

Ok. I have notification showing for version '3.70-2'.

Just to confirm that that is indeed the New! Improved! Version! ? :)

(I don't remember what the crashing version number was now).

Thanks.

R.

Joe's picture
Submitted by Joe on Sun, 07/05/2009 - 04:46 Pro Licensee

Yep, this is the updated version, which shouldn't exhibit this problem, even if you do have that option enabled.

Thanks for fixing, upgrade to 3.70-2 worked well now.

@JamieCameron: Being on the other side on the Atlantic I was sleeping already when your request to login to my system came in - sorry!

I just ran the 3.70-2 update and it appears, so far at least, to have just gone as Virtualmin updates usually go - i.e no problems at all.

Thanks for the fix.

R.

@fuerst - Fortunately another user (co-incidentally the one who requested that feature originally) contacted me directly and gave me remote access to his system, which I was able to use to track it down.

Automatically closed -- issue fixed for 2 weeks with no activity.