I have a Cloudmin Pro 10 VM license installed on my master system, host1. Host1 is a CentOS 6.3 kvm host and hosts ~7 kvm VMs all running Ubuntu 12.04 - all of this runs perfectly.
Recently we've added a 2nd host system, host2, again a CentOS 6.3 kvm host with a view to running another couple of VMs and providing some failover against host1 should it be required.
Both physical machines reside in the same cabinet in our DC, and are on the same subnet - let's say host1: 18.104.22.168 and host2: 22.214.171.124. Both have their gateway set to the DC gateway of 126.96.36.199 with no hardware firewall in between.
On each machine, I have 4 NICs that are bonded together to form a single interface, which is then bridged to allow the VMs to access the network. All of the VMs are online, and all of them can successfully ssh into the hosts without any delay.
Both systems can access the internet fine, and I can ssh into both systems from home without any issues. However, there is a real delay when attempting to ssh from host1 to host2 (or vice versa) and this obviously means that any action required on host2, that is controlled by host1 either takes forever or results in failure due to timeout.
In the interest of keeping this post short, I've put my ifcfg files into a pastie: http://pastie.org/8081648
I've tried both adding a firewall rule in each machine for the other, and also disabling the firewall entirely, so that can't be the issue.
I've tried troubleshooting this myself but can't seem to get to the bottom of it. Any help or advice would be appreciated.
Thanks in advance.
--------------------- SOLUTION ADDED 16 JULY ---------------------
For other who may have similar issues....
Believe it or not, this was all related to a mis-configured DNS address. The first DNS IP was actually wrong by a single digit, which obviously caused lookups to fail until it finally moved on to check the 2nd address.
So even though I tried to SSH from 1 server to the other using IP addresses only, it wants to check the reverse lookup of the source address upon connection, hence caused the huge delays.
Hmm, what if you SSH from host1 to another server on the Internet... do you experience the same slowness? Or is it only when going to or from host2?
And also, are you SSH'ing from the primary server -- are you SSH'ing from the actual host, or one of the guests running on that host?
If it's from one of the guests, or to a guest -- I'd be curious if you notice any difference going from and to the host systems, not accessing any of the guests.
Thanks for considering this for me.
I can SSH from host1 to other servers no problem, both physical and virtual. I can SSH from the physical host 1 to a guest on itself, and a guest on host2 without any delay.
From Host 2 I can SSH into a guest on itself just fine, and to a guest on host 1 too without delay.
One last check, I can also ssh from a guest on host1 to the host 2 just fine too.
The only issue seems to between the physical host1 and host2 machine, but only in that direction.
Hi again Eric
Do you have any ideas for me on this, I'm struggling to make any sense of the situation.