We've bought a 4 node rack server and I'm wondering how to set it up with regard to Virtualmin.
To take advantage of all the nodes, there'll be load balancing - with something like HAProxy, for example - to distribute the incoming traffic amongst the nodes.
The thing is, each of the nodes needs to be identical. All the websites / emails / etc. the same, so that it doesn't matter which node the load balancer sends the request to, it can all be satisfied by any and all of them equally well.
This means configuring the same settings for each node and then having the exact same virtual servers and "/home" directories on them all. Of course, this isn't just read but also write too, so websites will be changed and emails sent and received, changing the data in these "/home" directories - so if the data is duplicated for each node, then it also needs to somehow be kept in sync between them too.
The obvious thing, then, is to have a SAN for the network. A rack with NAS drives, let's say. Then all the virtual server "/home" directories could be on a shared network drive, auto-mounted to "/home" on each of the nodes, so they all see the same thing and the data inherently remains in sync.
That's the theory. The question is, can Virtualmin be made to work this way? For each node, the "/home" directory could actually be auto-mounted on boot to the NAS drive, so they're all actually seeing the exact same directory. Also, I guess, I'd need the "sites-available" in the Apache "etc" directory. Indeed, maybe the "/etc" directory should also be auto-mounted in the same way - that'll ensure the configurations are all the same too - and then I'm lead towards the notion that perhaps the whole file system ought to be on the NAS drive and we PXE boot the nodes (but all changes still need to go back to the NAS drive).
The thing is, in comes the traffic, I load balance between the nodes - so splitting it apart - but then having them all access the NAS drive on the SAN is bottlenecking it again. So would I be throwing away the advantage gained from load balancing between 4 servers, if it all gets serialised again accessing the SAN? Well, I guess it's a "CPU bound vs. I/O bound" thing. Processing PHP and such is CPU, but largely what a web server is often doing is just spooling data. Well, literally serving it up, hence the name "server". Which is more i/O bound and, thus, potentially bottlenecking the gains from having multiple nodes.
Mind you, each node can have its own local hard drives. Is there any way to perhaps have those act as caches for the shared NAS drive? Pull down the data from the NAS drive on first access but keep it on the hard drive and just serve from that. Unless data changes, then write it back to the NAS drive. Thus, only on a "cache miss" does the node actually go all the way to the NAS drive, preferring the local copy first. If anything changes on that local copy - a write, not just a read - then it can be pushed back to the NAS (even potentially lazily, when the node has a bit of idle time on its hands).
I have absolutely no idea if that's even possible, but it logically would make the most sense. If it is possible, then would it all still work perfectly happily with Virtualmin?
So many questions, but I've never architected a system like this before. I'm having to learn rapidly as I go along. :D
Scaling is generally unique to every application...but, there are patterns.
A common pattern is one or more large centralized database server(s), with tons of memory, plus many frontend nodes that communicate with the backend database server. Each frontend node may will likely have memcached running, to reduce load on the backend server(s) and provide faster lookups for common queries that don't change often. The database is usually the bottleneck in most web applications, so it makes sense to focus your scaling attention on the database. Virtualmin can work with a remote database; Cloudmin Services makes that somewhat more automated, but it's supported even with base Virtualmin GPL/Pro, because the Webmin MySQL and PostgreSQL server modules both support remote servers.
Another aspect of hosting systems that is not usually a bottleneck, but is very resource intensive, is mail processing. The common pattern for solving that is either a frontend spam/AV scanning mail forwarder, or off-loading mail delivery to a central server. The former requires no support from Virtualmin (though if you want to be able to bounce immediately on invalid users, you'd need to setup Virtualmin to work with a directory server, e.g. LDAP, so the frontend mail server can know who exists...but, that's probably not necessary except in very high load mail environments). Cloudmin Services allows off-loading the spam/AV scanning to another server.
I probably would not configure four identical hosts. Two "web" hosts and two "database" hosts would probably make sense. Four identical boxes probably does not.
But, really it depends on what you're trying to achieve, and what your workload looks like.
Ask yourself these questions:
Anyway...probably ought to stop chasing a "four identical servers" model. That's probably wasteful of resources, and won't provide more performance than two dedicated database and two frontend web servers. But, it really depends on what you're doing. There is no generic "this is how you scale and provide redundancy" model. You scale by designing around the needs of your application(s).
--
Check out the forum guidelines!
Thank you for your reply. I can see that what you're saying makes sense.
I'm still at a loss, though, as to how to organise the storage. That is, let us say that we'll have two database servers and two web servers. Well, to be able to balance traffic between them equally, both database servers will need all the databases on them and both web servers will need all the virtual servers on them too. Granted, we might not need four identical nodes, but if we have more than one of any type of node, these need to be working with the same - synchronised - data set.
Would my idea to mount NAS drive partitions as "/home" and "/etc" actually work? This can't surely be how it's conventionally done?