apt-get update software.virtualmin.com connection timed out

We have several Ubuntu 10.04.4 LTS VPS systems each running Virtualmin Pro. One of these systems is hosted in Australia and is the only one failing "apt-get update" with the error "Could not connect to software.virtualmin.com:80 (108.60.199.107), connection timed out". This error started approximately two weeks ago (27 Oct 2012).

Simple diagnostics indicates that the problem is intermittent, repeated telnet connections to software.virtualmin.com on port 80 will sometimes connect, and sometimes timeout:

Our non-Linux VPS systems with the same VPS provider show the same intermittent connection timeouts when using a browser to connect to http://software.virtualmin.com

The VPS Support personnel have verified the same intermittent connection timeout from their systems.

I cannot replicate this problem from other Australian ISPs, it is definitely specific to this VPS providers networks.

The intermittent nature of the problem leads me to believe this has the potential to be a fault with a reverse-proxy/load-balancer in front of software.virtualmin.com (if there is one).

It could also be a fault with a traffic balancing router, if one path is configured to block old bogon networks (unfortunately software.virtualmin.com is in a historical one: 108/8).

Your thoughts on the matter are appreciated.

Status: 
Closed (fixed)

Comments

We don't have any load balancer in front of software.virtualmin.com , so I would suspect that the cause is an intermittent network issue.

When this is happening, you should try running traceroute software.virtualmin.com and let us know what it outputs. That should show where the connnection is dropping out..

Unfortunately, this is always happening. It is currently impossible to update virtualmin on this system.

Traceroute gets to www.cloud.virtualmin.com, which is the same as our other systems:
traceroute to software.virtualmin.com (108.60.199.107), 30 hops max, 60 byte packets
1  110.232.113.1 (110.232.113.1)  0.247 ms  0.204 ms  0.169 ms
2  gw-tpg.mammoth.net.au (203.220.0.225)  0.510 ms  0.729 ms  0.710 ms
3  13.112.220.203.unassigned.comindico.com.au (203.220.112.13)  0.573 ms  0.550 ms  0.518 ms
4  13.112.220.203.unassigned.comindico.com.au (203.220.112.13)  0.493 ms  0.462 ms  0.442 ms
5  syd-sot-ken-crt1-pos-8-0.tpgi.com.au (202.7.162.245)  0.421 ms  0.542 ms  0.528 ms
6  ge5-0-5d0.cir1.seattle7-wa.us.xo.net (216.156.100.37)  144.116 ms  144.287 ms  144.262 ms
7  206.111.7.138.ptr.us.xo.net (206.111.7.138)  190.744 ms  190.456 ms  190.507 ms
8  ae5.csr1.DAL2.gblx.net (67.16.166.41)  199.063 ms  198.406 ms  199.066 ms
9  * te1-1-10G.asr1.DAL2.gblx.net (67.17.79.110)  199.758 ms  199.603 ms
10  * * highwinds-network-group.tengigabitethernet2-2.asr1.dal2.gblx.net (208.48.236.130)  197.527 ms
11  xe-5-3-0.core3.dllstx01.corexchange.com (208.78.216.161)  197.629 ms *  198.554 ms
12  www.cloud.virtualmin.com (108.60.199.107)  200.603 ms !X  200.485 ms !X  200.465 ms !X

Oddly, at the time I did this traceroute, ping to software.virtualmin.com is failing to respond from this same host (but is OK on other systems).

Any chance you have some automated firewall or service level system for rate limiting connections ?

What is your system's IP address? I'd like to try a reverse traceroute, to see if I can identify where the problem is.

We aren't running any automated firewall or rate limiter on software.virtualmin.com.

ICMP echo request (ping) packets from our host (110.232.113.4) appear to be filtered at software.virtualmin.com (www.cloud.virtualmin.com)

From the traceroute in #2 above. I can ping 208.78.216.161 I can't ping 108.60.199.107

I had no firewall running on the host for the period of these tests.

I can ping 108.60.199.107 from other hosts just fine.

You should be able to ping it, and connect to TCP port 80 from anywhere.

Interesting - I found that from software.virtualmin.com , I cannot ping your system at 110.232.113.4 . But I can ping it from machines on other networks .. but not from machines on that same network.

Could you start a ping from your machine to software.virtualmin.com and leave it running? That way I can run tcpdump and see if the packets are arriving. It is possible that the ping fails because the response is not getting back.

Interesting result.

The system (110.232.113.4) is currently pinging software.virtualmin.com (108.60.199.107)

tcpdump shows only outgoing ICMP echo request packets.

I will leave the ping running until you get back to me.

Thanks - I don't see any ICMP packets coming from your system though, which leads me to suspect that the issue is somewhere in the middle.

I have also started a ping on 108.60.199.115 to your system. If you run tcpdump, do you see ICMP echo requests from or replies to that address?

I can see the current ping (request and reply) from 108.60.199.115 just fine.

I was running tcpdump the whole time you were testing, and I was able to see your test echo requests from 108.60.199.107, and I could see an echo reply go back (but you didn't see these).

So it would appear that ICMP datagrams from 110.232.113.4 to 108.60.199.107 are blocked, but not the other way around (asymetric routing perhaps).

Agreed, it looks like a failure in just one direction.

Are the hosts that work OK on the same network as the one that fails?

We have two other VPS hosts with this provider, one is on the same subnet, one is on a different subnet: 110.232.113.200/24 103.1.185.155/24

Both of these hosts can ping software.virtualmin.com

Both of these hosts display the intermittent connection timeout on port 80 (which is the original problem described in this ticket).

You might want to try talking to your VPS provider to see if they can figure out where the packets from 110.232.113.4 are being dropped, and I will try the same on our end.

A traceroute from our side looks like :

[root@jamie ~]# traceroute 110.232.113.4
traceroute to 110.232.113.4 (110.232.113.4), 30 hops max, 60 byte packets
1  108.60.199.113 (108.60.199.113)  1.151 ms  1.074 ms  0.826 ms
2  xe-5-3-0.core4.dllstx01.corexchange.com (208.78.216.162)  0.794 ms  0.767 ms  0.740 ms
3  10gigabitethernet3-1.core1.dal1.he.net (206.223.118.37)  5.458 ms  5.758 ms  5.736 ms
4  10gigabitethernet2-3.core3.fmt2.he.net (72.52.92.153)  46.976 ms  46.957 ms  46.926 ms
5  10gigabitethernet7-4.core1.sjc2.he.net (184.105.222.14)  53.763 ms  53.747 ms  53.720 ms
6  10gigabitethernet1-4.core1.sjc1.he.net (72.52.92.117)  50.303 ms  47.150 ms  47.097 ms
7  tpg-internet-pty-ltd.10gigabitethernet3-1.core1.sjc1.he.net (72.52.66.22)  193.889 ms  193.875 ms tpg-internet-pty-ltd.10gigabitethernet1-3.core1.sjc1.he.net (72.52.93.38)  199.301 ms
8  syd-sot-ken-crt1-ge-7-0-0.tpgi.com.au (203.29.135.42)  199.886 ms  200.076 ms syd-sot-ken-crt1-ge-4-1-0.tpgi.com.au (203.29.135.209)  198.320 ms
9  202-7-162-246.tpgi.com.au (202.7.162.246)  197.735 ms  197.699 ms  197.649 ms
10  155.112.220.203.unassigned.comindico.com.au (203.220.112.155)  198.585 ms  198.879 ms  200.133 ms
11  155.112.220.203.unassigned.comindico.com.au (203.220.112.155)  200.112 ms  198.794 ms  198.649 ms
12  * * *
13  * * *
14  * * *

Interestingly, a traceroute from another host gets one step further, but does not reach the destination :

# traceroute 110.232.113.4
traceroute to 110.232.113.4 (110.232.113.4), 30 hops max, 40 byte packets
1  67.188.12.1 (67.188.12.1)  21.213 ms  21.244 ms  22.214 ms
2  te-0-1-0-7-ur06.santaclara.ca.sfba.comcast.net (68.85.216.21)  11.633 ms  11.627 ms  11.613 ms
3  te-1-1-0-5-ar01.oakland.ca.sfba.comcast.net (68.86.143.98)  14.947 ms te-1-1-0-2-ar01.oakland.ca.sfba.comcast.net (68.85.155.70)  15.019 ms te-1-1-0-1-ar01.oakland.ca.sfba.comcast.net (69.139.198.94)  15.008 ms
4  he-2-15-0-0-cr01.sacramento.ca.ibone.comcast.net (68.86.91.225)  20.102 ms  20.149 ms  20.129 ms
5  pos-0-8-0-0-cr01.sanjose.ca.ibone.comcast.net (68.86.85.78)  24.559 ms  24.453 ms  24.513 ms
6  pos-0-5-0-0-pe01.11greatoaks.ca.ibone.comcast.net (68.86.87.162)  25.718 ms  18.545 ms  18.589 ms
7  as4323-pe01.11greatoaks.ca.ibone.comcast.net (75.149.229.2)  22.785 ms  19.591 ms  38.091 ms
8  pdx1-ar4-xe-5-0-0-0.us.twtelecom.net (66.192.254.86)  32.849 ms  32.569 ms pdx1-ar4-xe-0-1-0-0.us.twtelecom.net (66.192.244.74)  32.596 ms
9  66.162.129.150 (66.162.129.150)  174.956 ms  178.054 ms  178.079 ms
10  syd-sot-ken-crt1-ge-3-1-0.tpgi.com.au (203.29.135.33)  201.742 ms  204.538 ms  204.451 ms
11  202-7-162-246.tpgi.com.au (202.7.162.246)  205.197 ms  205.237 ms  205.185 ms
12  29.112.220.203.unassigned.comindico.com.au (203.220.112.29)  205.172 ms  205.201 ms  205.693 ms
13  29.112.220.203.unassigned.comindico.com.au (203.220.112.29)  205.173 ms  205.719 ms  205.694 ms
14  203.220.0.231.mammoth.net.au (203.220.0.231)  205.110 ms *  202.487 ms
15  * * *
16  * * *

Ping works OK though.

You should at least get to the 203.220.0.231.mammoth.net.au (203.220.0.231) hop.

The firewall (running again) on 110.232.113.4 will be blocking the traceroute UDP packets making the final hop to the destination fail.

I am in communication with the VPS provider right now, but haven't made any further progress at this stage.

Thanks for your assistance in diagnosing this.

It appears this was a router issue with our VPS provider. They have now rectified the problem, and we are able to connect to software.virtualmin.com again.

I have closed this issue.

Thanks again for your assistance.