We have several Ubuntu 10.04.4 LTS VPS systems each running Virtualmin Pro. One of these systems is hosted in Australia and is the only one failing "apt-get update" with the error "Could not connect to software.virtualmin.com:80 (108.60.199.107), connection timed out". This error started approximately two weeks ago (27 Oct 2012).
Simple diagnostics indicates that the problem is intermittent, repeated telnet connections to software.virtualmin.com on port 80 will sometimes connect, and sometimes timeout:
Our non-Linux VPS systems with the same VPS provider show the same intermittent connection timeouts when using a browser to connect to http://software.virtualmin.com
The VPS Support personnel have verified the same intermittent connection timeout from their systems.
I cannot replicate this problem from other Australian ISPs, it is definitely specific to this VPS providers networks.
The intermittent nature of the problem leads me to believe this has the potential to be a fault with a reverse-proxy/load-balancer in front of software.virtualmin.com (if there is one).
It could also be a fault with a traffic balancing router, if one path is configured to block old bogon networks (unfortunately software.virtualmin.com is in a historical one: 108/8).
Your thoughts on the matter are appreciated.
Comments
Submitted by JamieCameron on Sun, 11/04/2012 - 21:50 Comment #1
We don't have any load balancer in front of software.virtualmin.com , so I would suspect that the cause is an intermittent network issue.
When this is happening, you should try running
traceroute software.virtualmin.com
and let us know what it outputs. That should show where the connnection is dropping out..Submitted by gastewart on Mon, 11/05/2012 - 20:35 Comment #2
Unfortunately, this is always happening. It is currently impossible to update virtualmin on this system.
Traceroute gets to www.cloud.virtualmin.com, which is the same as our other systems:
traceroute to software.virtualmin.com (108.60.199.107), 30 hops max, 60 byte packets
1 110.232.113.1 (110.232.113.1) 0.247 ms 0.204 ms 0.169 ms
2 gw-tpg.mammoth.net.au (203.220.0.225) 0.510 ms 0.729 ms 0.710 ms
3 13.112.220.203.unassigned.comindico.com.au (203.220.112.13) 0.573 ms 0.550 ms 0.518 ms
4 13.112.220.203.unassigned.comindico.com.au (203.220.112.13) 0.493 ms 0.462 ms 0.442 ms
5 syd-sot-ken-crt1-pos-8-0.tpgi.com.au (202.7.162.245) 0.421 ms 0.542 ms 0.528 ms
6 ge5-0-5d0.cir1.seattle7-wa.us.xo.net (216.156.100.37) 144.116 ms 144.287 ms 144.262 ms
7 206.111.7.138.ptr.us.xo.net (206.111.7.138) 190.744 ms 190.456 ms 190.507 ms
8 ae5.csr1.DAL2.gblx.net (67.16.166.41) 199.063 ms 198.406 ms 199.066 ms
9 * te1-1-10G.asr1.DAL2.gblx.net (67.17.79.110) 199.758 ms 199.603 ms
10 * * highwinds-network-group.tengigabitethernet2-2.asr1.dal2.gblx.net (208.48.236.130) 197.527 ms
11 xe-5-3-0.core3.dllstx01.corexchange.com (208.78.216.161) 197.629 ms * 198.554 ms
12 www.cloud.virtualmin.com (108.60.199.107) 200.603 ms !X 200.485 ms !X 200.465 ms !X
Oddly, at the time I did this traceroute, ping to software.virtualmin.com is failing to respond from this same host (but is OK on other systems).
Any chance you have some automated firewall or service level system for rate limiting connections ?
Submitted by JamieCameron on Mon, 11/05/2012 - 21:54 Comment #3
What is your system's IP address? I'd like to try a reverse traceroute, to see if I can identify where the problem is.
We aren't running any automated firewall or rate limiter on software.virtualmin.com.
Submitted by gastewart on Mon, 11/05/2012 - 21:58 Comment #4
ICMP echo request (ping) packets from our host (110.232.113.4) appear to be filtered at software.virtualmin.com (www.cloud.virtualmin.com)
From the traceroute in #2 above. I can ping 208.78.216.161 I can't ping 108.60.199.107
I had no firewall running on the host for the period of these tests.
I can ping 108.60.199.107 from other hosts just fine.
You should be able to ping it, and connect to TCP port 80 from anywhere.
Submitted by JamieCameron on Mon, 11/05/2012 - 22:44 Comment #5
Interesting - I found that from software.virtualmin.com , I cannot ping your system at 110.232.113.4 . But I can ping it from machines on other networks .. but not from machines on that same network.
Could you start a ping from your machine to software.virtualmin.com and leave it running? That way I can run tcpdump and see if the packets are arriving. It is possible that the ping fails because the response is not getting back.
Submitted by gastewart on Mon, 11/05/2012 - 22:47 Comment #6
Interesting result.
The system (110.232.113.4) is currently pinging software.virtualmin.com (108.60.199.107)
tcpdump shows only outgoing ICMP echo request packets.
I will leave the ping running until you get back to me.
Submitted by JamieCameron on Mon, 11/05/2012 - 23:17 Comment #7
Thanks - I don't see any ICMP packets coming from your system though, which leads me to suspect that the issue is somewhere in the middle.
I have also started a ping on 108.60.199.115 to your system. If you run tcpdump, do you see ICMP echo requests from or replies to that address?
Submitted by gastewart on Mon, 11/05/2012 - 23:26 Comment #8
I can see the current ping (request and reply) from 108.60.199.115 just fine.
I was running tcpdump the whole time you were testing, and I was able to see your test echo requests from 108.60.199.107, and I could see an echo reply go back (but you didn't see these).
So it would appear that ICMP datagrams from 110.232.113.4 to 108.60.199.107 are blocked, but not the other way around (asymetric routing perhaps).
Submitted by JamieCameron on Mon, 11/05/2012 - 23:44 Comment #9
Agreed, it looks like a failure in just one direction.
Are the hosts that work OK on the same network as the one that fails?
Submitted by gastewart on Mon, 11/05/2012 - 23:54 Comment #10
We have two other VPS hosts with this provider, one is on the same subnet, one is on a different subnet: 110.232.113.200/24 103.1.185.155/24
Both of these hosts can ping software.virtualmin.com
Both of these hosts display the intermittent connection timeout on port 80 (which is the original problem described in this ticket).
Submitted by JamieCameron on Tue, 11/06/2012 - 00:01 Comment #11
You might want to try talking to your VPS provider to see if they can figure out where the packets from 110.232.113.4 are being dropped, and I will try the same on our end.
A traceroute from our side looks like :
[root@jamie ~]# traceroute 110.232.113.4
traceroute to 110.232.113.4 (110.232.113.4), 30 hops max, 60 byte packets
1 108.60.199.113 (108.60.199.113) 1.151 ms 1.074 ms 0.826 ms
2 xe-5-3-0.core4.dllstx01.corexchange.com (208.78.216.162) 0.794 ms 0.767 ms 0.740 ms
3 10gigabitethernet3-1.core1.dal1.he.net (206.223.118.37) 5.458 ms 5.758 ms 5.736 ms
4 10gigabitethernet2-3.core3.fmt2.he.net (72.52.92.153) 46.976 ms 46.957 ms 46.926 ms
5 10gigabitethernet7-4.core1.sjc2.he.net (184.105.222.14) 53.763 ms 53.747 ms 53.720 ms
6 10gigabitethernet1-4.core1.sjc1.he.net (72.52.92.117) 50.303 ms 47.150 ms 47.097 ms
7 tpg-internet-pty-ltd.10gigabitethernet3-1.core1.sjc1.he.net (72.52.66.22) 193.889 ms 193.875 ms tpg-internet-pty-ltd.10gigabitethernet1-3.core1.sjc1.he.net (72.52.93.38) 199.301 ms
8 syd-sot-ken-crt1-ge-7-0-0.tpgi.com.au (203.29.135.42) 199.886 ms 200.076 ms syd-sot-ken-crt1-ge-4-1-0.tpgi.com.au (203.29.135.209) 198.320 ms
9 202-7-162-246.tpgi.com.au (202.7.162.246) 197.735 ms 197.699 ms 197.649 ms
10 155.112.220.203.unassigned.comindico.com.au (203.220.112.155) 198.585 ms 198.879 ms 200.133 ms
11 155.112.220.203.unassigned.comindico.com.au (203.220.112.155) 200.112 ms 198.794 ms 198.649 ms
12 * * *
13 * * *
14 * * *
Interestingly, a traceroute from another host gets one step further, but does not reach the destination :
# traceroute 110.232.113.4
traceroute to 110.232.113.4 (110.232.113.4), 30 hops max, 40 byte packets
1 67.188.12.1 (67.188.12.1) 21.213 ms 21.244 ms 22.214 ms
2 te-0-1-0-7-ur06.santaclara.ca.sfba.comcast.net (68.85.216.21) 11.633 ms 11.627 ms 11.613 ms
3 te-1-1-0-5-ar01.oakland.ca.sfba.comcast.net (68.86.143.98) 14.947 ms te-1-1-0-2-ar01.oakland.ca.sfba.comcast.net (68.85.155.70) 15.019 ms te-1-1-0-1-ar01.oakland.ca.sfba.comcast.net (69.139.198.94) 15.008 ms
4 he-2-15-0-0-cr01.sacramento.ca.ibone.comcast.net (68.86.91.225) 20.102 ms 20.149 ms 20.129 ms
5 pos-0-8-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.85.78) 24.559 ms 24.453 ms 24.513 ms
6 pos-0-5-0-0-pe01.11greatoaks.ca.ibone.comcast.net (68.86.87.162) 25.718 ms 18.545 ms 18.589 ms
7 as4323-pe01.11greatoaks.ca.ibone.comcast.net (75.149.229.2) 22.785 ms 19.591 ms 38.091 ms
8 pdx1-ar4-xe-5-0-0-0.us.twtelecom.net (66.192.254.86) 32.849 ms 32.569 ms pdx1-ar4-xe-0-1-0-0.us.twtelecom.net (66.192.244.74) 32.596 ms
9 66.162.129.150 (66.162.129.150) 174.956 ms 178.054 ms 178.079 ms
10 syd-sot-ken-crt1-ge-3-1-0.tpgi.com.au (203.29.135.33) 201.742 ms 204.538 ms 204.451 ms
11 202-7-162-246.tpgi.com.au (202.7.162.246) 205.197 ms 205.237 ms 205.185 ms
12 29.112.220.203.unassigned.comindico.com.au (203.220.112.29) 205.172 ms 205.201 ms 205.693 ms
13 29.112.220.203.unassigned.comindico.com.au (203.220.112.29) 205.173 ms 205.719 ms 205.694 ms
14 203.220.0.231.mammoth.net.au (203.220.0.231) 205.110 ms * 202.487 ms
15 * * *
16 * * *
Ping works OK though.
Submitted by gastewart on Tue, 11/06/2012 - 00:13 Comment #12
You should at least get to the 203.220.0.231.mammoth.net.au (203.220.0.231) hop.
The firewall (running again) on 110.232.113.4 will be blocking the traceroute UDP packets making the final hop to the destination fail.
I am in communication with the VPS provider right now, but haven't made any further progress at this stage.
Thanks for your assistance in diagnosing this.
Submitted by gastewart on Wed, 11/07/2012 - 22:15 Comment #13
It appears this was a router issue with our VPS provider. They have now rectified the problem, and we are able to connect to software.virtualmin.com again.
I have closed this issue.
Thanks again for your assistance.
Submitted by JamieCameron on Wed, 11/07/2012 - 23:20 Comment #14
Great!