Loss of data after webmin / virtualmin upgrade. Creepy.

A few minutes ago i just witnesed my playground server deleting a user's home directory for no other reason than saving a virtual server. I left it like it is and i can provide a root password if there's somebody interesed to investigate.

The setup is: - Debian 8 upgraded to 9 on a simple vps. - Webmin upgraded from 1.831 to 1.850 . - Virtualmin upgraded from 5.06 to 5.99 .

I was trying to solve an issue where let's encrypt spits an error after i upgraded webmin / virtualmin (second try same results) but that's another ticket.

Status: 
Active

Comments

Joe's picture
Submitted by Joe on Wed, 07/12/2017 - 02:04 Pro Licensee

Can you provide a bit more detail about what actually happened?

I'm curious about what relation (if any) the user had to the virtual server. And, when you say "saving a virtual server", were you creating a new one, restoring a backup of one, or just updating some setting within one? I can't think of a code path that could lead to a home being deleted without it explicitly asking if you want to delete something...so, I wanna narrow down the search as much as possible in terms of where it could be happening.

Some time ago i tried to upgrade webmin to 1.850 but that broke let's encrypt. Then after digging on this forum i saw that virtualmin 5.99 was needed too. Still no luck. The message error just changed to: raise ValueError("Gave up waiting for valiation") ] .

Then i tried looking for culprits, removed all possible redirects and rewrites from nginx, left only the acme-challenge one (it was working just fine). Changed the permissions to well-known to 777 as i was seeing no files being generated. Then tried the dns validation (just changed options in module config and resaved the servers). Still failed (same message) so i abandoned it for a few days, reverting to 1.830/5.06 (succesfully renewing certificates again with the older version)

Today i tried to update again to 1.850 / 5.99 in hope that i wil be able to at least debug something. Webmin update ran fine, virtualmin update ran fine, got to the ssl page of the said domain, tried renewing to get some errors with "Create TLSA DNS records for SSL certificates - off", then changed it to on. Went to the same server and hit save. Then nginx started complaining that no ssl.cert files exits. That's how i found out the whole /home/user directory was gone.

Then i tried to look at the logs to see what actually happened but the log viewer is broken too, some js fails and i cannot actually find the damn code in the source to fix it.

As an observation, the acme-challenge key was written and deleted very fast. I doubt the file validation had actually time to resolve dns and read that file. (that fast). A second one, without updating virtualmin, acme-challenges were just piling and never got deleted after the failed validation.

(somebody needs to fix his mixed progr languages skills)

There is still nothing logged as a command. Files changed and commands run is empty. Still all other resources are still in place (db, dns, the user itself). Just his home dir is missing.

Managed to reproduce this for another user just resaving a server it owned. Then i reverted to 1.830, recreated the missing home dirs, fixed some nginx config (missing ssl files) then resaved the user. Nginx started complaining that log files pointed to /home/user . Fixed them manually, disabled log rotation, resaved the server and nginx complained again about the missing logs pointing to ... you guessed the same location /home/user.

Will dig more. Actually i think i'm gonna start from scratch and see what happens.

Thanks for all the info! Yeah we'd definitely like to hear what happens during your testing.

If you can walk us through how to reproduce it, that'd be fantastic.

This bug can be reproduced using latest webmin / virtualmin / virtualmin-nginx versions. Tested only on Debian 9 so far, fresh install.

Install all above including virtualmin-nginx-ssl. Create a new virtualserver. Enable
nginx and logrotate. By default nginx tries to put the logs at /home/user/error_log | /home/user/access_log or make some symlinks. Now when we look at nginx config for the virtual server, we see both logs pointing to /home/user/ . Edit the logs paths from the UI at /virtual-server/edit_phpmode.cgi or just the nginx file. Resave the server. Voila. Homedir gone.

Ok, I see what can cause this - Virtualmin by default tries to take the log file path from the global Apache configuration, but doesn't gracefully handle the case where this isn't setup (which is unlikely but possible when Nginx is in use). I will fix this in the next release.

I tried to retest your changes pulling from github but:

Undefined subroutine &virtual_server::modify_all_resellers called at /usr/share/webmin-1.851/virtual-server/check.cgi line 29.

also

Undefined subroutine &virtual_server::content_style_chooser called at /usr/share/webmin-1.851/virtual-server/domain_form.cgi line 655.

Did you apply just the one commit, or did you update the entire virtual-server-lib-funcs.pl file?

Also, are you running Virtualmin GPL or Pro?

Cloned the whole repo and used master of virtualmin-gpl . All GPL.

I tried to pull the latest changes hoping to get letsencrypt working too. But i'll test with the patch alone then.

Patch seems to work fine.

Some other issue (i don't know if they are related) is that webmin-nginx-ssl breaks webmin restart.

At a new virtual server creation the server details are not saved. It creates the users, db, or whatever feature is enabled but the actual server is not saved in virtualmin. Modifying an existing server randomly works. (will dig more)

Ok, that sounds like a separate problem. What do you mean by "breaks webmin restart" though?

After saving a virtual server with virtualmin, the return gets stuck at "restarting webmin" (my mistake):

Updating Webmin user .. .. done Updating Webmin user .. .. done Saving server details .. .. done Re-loading Webmin ..

Then the webmin server stops and is not accesible anymore and the service needs to be restarted manually. (service webmin start)

A "service webmin status" states that it's a clean shutdown.

Does anything get logged to /var/webmin/miniserv.error when this happens?

This:

[26/Jul/2017:07:09:06 +0200] miniserv.pl started
[26/Jul/2017:07:09:06 +0200] IPv6 support enabled
[26/Jul/2017:07:09:06 +0200] Using MD5 module Digest::MD5
[26/Jul/2017:07:09:06 +0200] PAM authentication enabled
Use of uninitialized value $_ in concatenation (.) or string at /usr/share/webmin-1.851/virtualmin-nginx-ssl/virtual_feature.pl line 141.
Use of uninitialized value in concatenation (.) or string at /usr/share/webmin-1.851/virtualmin-nginx-ssl/virtual_feature.pl line 215.
Argument "5.99.gpl" isn't numeric in numeric ge (>=) at /usr/share/webmin-1.851/virtualmin-nginx-ssl/virtual_feature.pl line 252.
restarting miniserv
[26/Jul/2017:07:09:42 +0200] Restarting
[26/Jul/2017:07:09:45 +0200] miniserv.pl started
[26/Jul/2017:07:09:45 +0200] IPv6 support enabled
[26/Jul/2017:07:09:45 +0200] Using MD5 module Digest::MD5
[26/Jul/2017:07:09:45 +0200] PAM authentication enabled
Argument "5.99.gpl" isn't numeric in numeric ge (>=) at /usr/share/webmin-1.851/virtualmin-nginx-ssl/virtual_feature.pl line 450. restarting miniserv
[26/Jul/2017:07:10:04 +0200] Restarting
[26/Jul/2017:07:10:07 +0200] miniserv.pl started
[26/Jul/2017:07:10:07 +0200] IPv6 support enabled
[26/Jul/2017:07:10:07 +0200] Using MD5 module Digest::MD5
[26/Jul/2017:07:10:07 +0200] PAM authentication enabled

And now found this:

service webmin status

â webmin.service - Webmin
Loaded: loaded (/lib/systemd/system/webmin.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Wed 2017-07-26 07:10:07 CEST; 7min ago
Process: 5582 ExecStop=/etc/webmin/stop (code=exited, status=0/SUCCESS)
Process: 5041 ExecStart=/etc/webmin/start (code=exited, status=0/SUCCESS)
Main PID: 5439 (code=exited, status=0/SUCCESS)

Jul 26 07:09:42 mail.blabla.com su[5403]: + ??? root:bbb
Jul 26 07:09:42 mail.blabla.com su[5403]: pam_unix(su:session): session opened for user bbb by (uid=0)
Jul 26 07:09:42 mail.blabla.com su[5409]: Successful su for bbb by root
Jul 26 07:09:42 mail.blabla.com su[5409]: + ??? root:bbb
Jul 26 07:09:42 mail.blabla.com su[5409]: pam_unix(su:session): session opened for user bbb by (uid=0)
Jul 26 07:09:42 mail.blabla.com perl[5062]: pam_unix(webmin:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=
Jul 26 07:09:45 mail.blabla.com webmin[5062]: Webmin starting
Jul 26 07:10:04 mail.blabla.com perl[5439]: pam_unix(webmin:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=
Jul 26 07:10:07 mail.blabla.com webmin[5439]: Webmin starting
Jul 26 07:10:07 mail.blabla.com stop[5582]: Stopping Webmin server in /usr/share/webmin-1.851

Looks like it is being killed by some other process? If you run /etc/webmin/restart as root from the command line, does it restart cleanly?

In the error log i found that miniserv was complaining about missing libsocket6-perl and libauthen-pam-perl (minimal os i guess). So i installed them. Now the issue is only at virtual server deletion as far as i can see. So it might actually be just some library missing. I also encountered this on digitalocean's debian 7 / 8 default os at some point.

Now it restarts with this:

service webmin status

â webmin.service - Webmin
Loaded: loaded (/lib/systemd/system/webmin.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2017-07-26 07:36:12 CEST; 2s ago
Process: 7427 ExecStop=/etc/webmin/stop (code=exited, status=0/SUCCESS)
Process: 7443 ExecStart=/etc/webmin/start (code=exited, status=0/SUCCESS)
Main PID: 7444 (miniserv.pl)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/webmin.service
ââ7444 /usr/bin/perl /usr/share/webmin-1.851/miniserv.pl /etc/webmin/miniserv.conf

Jul 26 07:36:10 mail.blabla.com systemd[1]: Starting Webmin...
Jul 26 07:36:10 mail.blabla.com start[7443]: Starting Webmin server in /usr/share/webmin-1.851
Jul 26 07:36:10 mail.blabla.com perl[7443]: pam_unix(webmin:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=
Jul 26 07:36:12 mail.blabla.com webmin[7443]: Webmin starting
Jul 26 07:36:12 mail.blabla.com systemd[1]: webmin.service: PID file /var/webmin/miniserv.pid not readable (yet?) after start: No s
Jul 26 07:36:12 mail.blabla.com systemd[1]: Started Webmin.

It seems virtualmin-nginx-ssl and webmin restart issues are related anyway.
" Configure Webmin to use same SSL cert for IP? = yes " seems to create this restart issue. If you have 2 ssl enabled nginx virtual servers, webmin fails to restart in it's own (even manually restarting from UI, or after update etc)

It also yells at me that " ServerName ${DOM} " is invalid as a ServerName value when " Configure Webmin to use same SSL cert for IP? " is enabled.

I guess mixing apache values with nginx is not a good ideea after all.
I know virtualmin is apache centric but apache is still slow and a memory hog and we kind of push towards nginx hence i think it deserves to be better integrated / supported.

Ok, maybe its the SSL cert per IP issue that's the problem. Do your domains have their own private IPs, or are you using a shared IP?

Shared IP on a vps usually.

I wonder if this is an ordering issue when creating a domain. Can you post the full output from the creation process in Webmin?

I just sent you the credentials for the vps on email. You can nuke it if you please.

Thanks .. taking a look now.

Ok, taking a look now..

Are you still seeing this problem? I created and deleted a test domain a few times, but wasn't able to trigger the issue.

The problem is still there.

You can reproduce it by restarting webmin a few times from the UI at /webmin/index.cgi
or creating or deleting a virtual server with virtualmin. Just did it myself.

Also, the user and pass i sent in the mail shoul work for ssh too.

Ok, it looks like there are fixes to Virtualmin's Nginx support for these issues that haven't been released yet. We'll put out a new 1.6 version of the Nginx SSL plugin to address them.