The Problem
Nginx configurations are almost always edited by hand to add htaccess-style directives to sites. Moreover, it is extremely common to have a single server take up more than one server{} block, for redirect reasons (i.e. one block listening to www.domain.com on port 80/443 and redirecting to https://domain.com, another block listening to domain.com on port 80 and redirecting to https://domain.com, and finally one block listening to domain.com on port 443 - hosting the actual server), or when doing complex setups (i.e. port 80 of 3 different domains all redirect you to https://domain1.com, but each domain has its own unique site on port 443).
If Nginx is set to use separate per-site config files (in sites-available), these heavily edited configs are improperly parsed by Virtualmin (backup/edit/delete sites) - only a handful of lines are backed up.
If Nginx is set to write all servers to the main Nginx.conf, things are even more hopelessly broken.
The New Algorithm
- If Nginx is setup to use a folder of per-site config files, then back up/restore the complete site file, untouched. Never even attempt to parse it.
- If Nginx/Virtualmin is in its default mode of putting everything in a single main config file, use my new algorithm to identify and extract every server{} block related to the server, thus including all edits that have been added to the domain, untouched.
The latter is the only harder part, but that single-file nginx.conf scenario can be solved with this algorithm:
step 1: find each server{ block start-position (as int-offsets into the main nginx.conf-file) using the following regex:
/\bserver\s*\{/
step 2: extract all individual server {} blocks using a simple nesting-count algorithm:
foreach server_start_pos
pos = server_start_pos
openbr = 0
while pos < filesize
chr = input[pos]
if chr == "{"
openbr++;
elseif chr == "}" {
openbr--;
if openbr == 0 {
// we have found the closing } for the server{} block
server_end_pos = pos
// store the entire server{} text block in an array
copy_server_block_to_array(server_start_pos, server_end_pos)
}
}
step 3: loop through all of the server{} blocks you have extracted from the main nginx.conf file,
and look for every server{} block that belongs to the domain name we are backing up/editing/deleting.
$dom = regex-escaped domain name string, i.e. "mydomain\.com", and NO "www\." prefix(!)
foreach server_blocks as server_block_text
test if server_block_text contains /\bserver_name\b[^;]*?\s(?:www\.)?$dom[\s;]/
// if true: this is a server{} block with data for the domain we desire.
// so extract this block, and then keep scanning for more matching blocks...
This crazy section: \bserver_name\b[^;]*?\s(?:www\.)?$dom[\s;] just enforces boundaries and validity and ensures that
domain\.com doesn't give false positives for similar domains like "server_name mydomain.com;".
It also guarantees that a search for example.com matches a block for www.example.com.
broken down:
\bserver_name\b = enforced boundaries to ensure that we are looking at the correct property
[^;]*? = allows there to be other domain-names preceding the one we are looking for (non-greedy)
\s = ensures that there is whitespace before the domain we are searching for; if we had used \b for word boundary instead,
then sub1.example.com would have matched during a search for example\.com
(?:www\.)?$dom = match both www.example.com and example.com
[\s;] = ensure that there is either whitespace or the closing ; after our desired domain name,
again to ensure that "example.com.tv" won't give a false-positive during a search for example\.com
step 4: now that we've found the start/end offsets of every server{} block related to the domain,
they should be handled as follows:
if backing up:
concatenate all found server{}-blocks that belong to the domain, i.e.
"server{...}\nserver{...}\nserver{...}" and save that to a single backup file.
if editing:
upgrade the edit-interface to support understanding multiple blocks -
by making a new gui editor that literally shows all X related server{} blocks
and their various editable property fields - or alternatively giving up and showing
"this is a complex domain with more than one server{} block - we won't even
attempt to understand what you are doing, go edit it manually instead,
to be sure that our web gui doesn't corrupt things for you"
the latter is a completely okay solution.
if deleting:
delete all of the found server{} blocks for that domain.
That is a great algorithm, but all is still not well...
Read post #2 for why.
Comments
Submitted by aitte on Sat, 06/08/2013 - 01:36 Comment #1
This new algorithm is the only way to accurately find heavily-edited server{} blocks in the main nginx.conf file, by identifying the full extent of a domain's block no matter how heavily edited it is, and to also find all server{} blocks that have something to do with the domain.
But - even with this new algorithm, there is an issue that can never be solved for as long as we hold onto the idea that putting server{} configs in Nginx.conf is a "good idea" in the first place: Multiple, different domains can be sharing the same server{} block for various reasons.
If that's the case - then backing up/editing/deleting server{} blocks related to domainA, would lead to duplication/corruption/over-extensive deletion (respectively) from Nginx.conf, cloning/nuking/editing other domains in the process (domainB, domainC, ...).
Here's an example in the backup-case:
Imagine this potential scenario that could still arise even with my algorithm, given a more complex relationship between servers than just "one block = just one domain":
Now the user backs up "test.com" and "test2.com". My algorithm (which, DESPITE THIS PROBLEM, is as good as it gets; better parsing cannot be done since we have no more knowledge than what the flawed, intermixed Nginx.conf format provides us) would end up with these backup files:
Finally, the user restores both backups (doesn't matter if they do it into nginx.conf or sites-available at this stage, because the irrecoverable cloning damage has already been done):
This is completely unavoidable for as long as people are allowed to put servers into nginx.conf. But if that ugly config-mode was removed from Virtualmin, and sites-available was enforced, backups become as clear and simple as "cp sites-available/test1.com ..." - without ever intermixing data, getting partial backups, corrupt backups, losing comments, or any other things that can happen when trying to parse nginx.conf. It won't matter how complex sites are; their configs will always be perfectly backed up.
You'll have to think about this of course; but what is more important? Preserving an ugly "insert into nginx.conf" mode of operation which nobody is recommended to use, or actually getting clean separation and always-perfect, non-corrupt backups?
Either way, if you maintain nginx.conf-mode support, you should at least use the proposed new algorithm. It will perfectly find every server{} block that relates to any given domain name, and the issue I just described will only arise if someone has set up a very complex relationship between domains, with multiple overlapping domains in certain server blocks (I do). If someone does have such a complex setup, they are unlikely to still be using the messy, newbie "nginx.conf"-mode (I don't), and this bug-waiting-to-happen of course won't hit them. Still, the bug will be there waiting for some user to come along with just such an edge-case setup... ;-)
In my opinion, unavoidable configuration corruption is at the top of things that must take precedence, even if it means dropping nginx.conf-mode.
As good as the algorithm is, there is no way to redesign it to intelligently figure out which server-configfile is more in need of a shared, multi-domain server{} block. Who's to say which domain deserves the block more? And if one domain-backup gets it and the other doesn't, then that'd break restores if only the domain without it was restored, etc. But - the light at the end of the tunnel - drop nginx.conf-mode and everything just works itself out naturally - the user makes the decision which config file deserves the conflicting server{} block! No more ambiguity or parsing errors ever again.
The best solution of all? It's to not even "play the game" at all: Remove support for nginx.conf-based server{} blocks. No algorithms needed.
Submitted by aitte on Sat, 06/08/2013 - 00:59 Comment #2
Deleted all subsequent posts, and rewrote everything into post #1 and #2 to combine everything into an easier to read format.
Submitted by aitte on Sat, 06/08/2013 - 00:43 Comment #3
Deleted post.
Submitted by aitte on Sat, 06/08/2013 - 00:53 Comment #4
Deleted post.
Submitted by aitte on Fri, 06/07/2013 - 09:47 Comment #5
Submitted by aitte on Fri, 06/07/2013 - 09:48 Comment #6
Submitted by JamieCameron on Fri, 06/07/2013 - 15:07 Comment #7
I think a simple first step to fixing this would be to just backup the whole .conf file for the domain from the sites-available directory. That way multiple or oddly-formatted server blocks will be preserved.
Submitted by aitte on Sat, 06/08/2013 - 00:58 Comment #8
Deleted post.
Submitted by aitte on Sat, 06/08/2013 - 01:02 Comment #9
This ticket is now restructured and rewritten, because it was difficult to read. You'll have to re-read it top to bottom, but this time it'll actually flow naturally and make sense to you. ;-)
Submitted by aitte on Sun, 06/09/2013 - 06:35 Comment #10
Actually, forget deprecating Nginx.conf mode. I have a better idea.
Keep Nginx.conf-mode as a valid mode of operation, implement my new algorithm, and finally implement a third check which is a sanity-check that says: "If nginx.conf-mode is used, none of the found server{} blocks are allowed to have anything except completely related ${DOM} or www.${DOM} server_name entries - NO mixing of multiple different domains allowed. In case they've mixed things, then give the user a warning and force them to migrate to per-site configs."
I like this option a lot, as it won't harm existing users: Most users that use Nginx.conf mode will never have that advanced configs (that they'd mix domains), so my algorithm will cover them perfectly. For the very rare ones that do have complex relationships, they're going to be smart enough to know what your Nginx module means when it says "Okay dude, slow down... I am getting confused here. You are going to have to migrate all your sites out of Nginx.conf because you are doing some complex shizzlewizzle here and I don't know how to tell which server{} blocks belong to which sites."
And of course, all of this algorithm-talk only relates to the Nginx.conf-mode. In per-site mode, just back up/edit/delete the exact config file that's named /etc/nginx/sites-available/${DOM} - do NOT scan/parse/edit/delete/do anything to ANY other configs in that folder. Just go DIRECTLY to the one with the exact sitename. No parsing allowed - cuz that way be' dragons and false positives.
Submitted by aitte on Mon, 06/10/2013 - 00:03 Comment #11
I've been standing by for 3 days and checking back to see if you have any questions that arise. Unfortunately I have to leave on a business trip now and will be gone for two weeks, so all I can do is briefly summarize everything instead, and pre-emptively cover everything you might ask about:
Nginx.conf mode issues:
The solutions:
That's ALL the info you need.
I have to leave now and wish you the best of luck. Thanks to the already-provided algorithm, you don't have to do much work to fix these bugs.
Submitted by JamieCameron on Mon, 06/10/2013 - 21:49 Comment #12
I think what we'll do as a first step is implement backing up of the whole .conf for for a domain, if per-domain files are in use. That solves the issue for 99% of users, without getting into the complexity of figuring out which
server
blocks belong to which domains.Submitted by aitte on Thu, 06/13/2013 - 03:41 Comment #13
Hi Jamie, I'm back from the business meeting/trip.
Yes, I agree that the most important step is just to simply back up the whole .conf as-is for every domain. Almost anyone that edits their Nginx files manually or has multiple server blocks will have switched to per-site files for management reasons, so that one small change would indeed cover 99% of users.
The rest can be covered with the provided algorithm as a starting point whenever you have time to analyze it and try it out. (I've already tested it and the regexps; it works very well.)
Submitted by aitte on Tue, 06/25/2013 - 12:43 Comment #14
Bumping the ticket so it doesn't fall off the board, since this is quite a serious flaw in the Nginx module. Almost any website you can imagine today needs multiple modifications to the server{} block to convert .htaccess rules to Nginx rules to make the site scripts work, and currently the process of doing so corrupts the configuration during site backup/restore. So I'd say that this is serious.
Not in any hurry with this at all myself (can always restore the uncorrupt configs via the latest full filesystem backup) but just want to make sure the ticket doesn't fall away, that's all.
I doubt that you've forgotten about this issue, but still, just a harmless bump just in case.
Submitted by JamieCameron on Wed, 06/26/2013 - 00:19 Comment #15
A code change to backup the whole .conf file for the domain has been implemented, and should be rolled out already in an update to the Nginx plugin for Virtualmin.