XEN performance problems

Hello! I am running Cloudmin Pro since a few weeks now. It is generally doing fine, however I have two problems:

  1. Starting up a XEM VM takes a looong time... basically in console i see this
Configuring network interfaces...[   52.704305] NET: Registered protocol family 10
[   52.704305] lo: Disabled Privacy Extensions

then it stays there literally for minutes

  1. I often get high iowait values on my main xen VM. It went so far this morning that I had a load of over 90.00, with the cpus doing almost nothing, but the vm became so unresponsive that i had to destroy it and start it up again. I am checking the host drives right now, but they seem ok so far. I wonder where this problem could originate from, maybe a kernel problem?

The host is Debian 5.0 x64 bit with Cloudmin Pro, the guest with the iowait problem is a debian 5.0 x64 with Virtualmin Pro.

Any help would be greatly appreciated

Status: 
Closed (fixed)

Comments

i did some tests with dd and found out that performance on my domU are 2-3x slower than on the dom0. the domUs are stored on LVM volumes.

For problem 1, are your Xen VMs using DHCP to get their network addresses? Also, is CPU load high during this slow part of the boot process?

For problem 2, what do IO loads look like on the host system? Is there perhaps some contention with other processes?

  1. In the xen vms, the network interface in /etc/network/interfaces is configured in a static way. I have three VMs, one debian x86, one debian x64 and one ubuntu x86. The ubuntu, strangely, is starting up really fast. Although in the console i see
mountall: Plymouth command failed
mountall: Disconnected from Plymouth

but basically it starts up really quickly. All three VMs are configured in a static manner. However, i noticed something strange in the xen config file for the vm that is makind problems: vif = ['ip=123.123.123.123,mac=00:00:00:00:00:00,rate=92160K rate=90Mb/s'] two times rate=? is that correct like that?

  1. The host system seems to be unaffected by these high IO loads. Even the other VMs are unaffected. What do you mean with contention? Btw i removed the "locate" command, which had a daily cron script that scanned the whole drive, on the affecting VM. Hopefully things wont get so bad tomorrow, but still i often notice that from time to time the load goes up drammatically for some minutes, then returns back to normal.

btw i don't know if this is related in any way, but on all VMs i get this error in their consoles: INIT: Id "xvc0" respawning too fast: disabled for 5 minutes oh and the xm text console seems not to work for the ubuntu vm (the one that starts up fast). i just see that Plymouth error i pasted above, then nothing else.

Having rate= in there twice seems wrong! If you remove one or both of them and then restart the VM, does it help?

Also, is it possible to SSH into the VM or access it via the console when booting to see what is slow?

unfortunately it does not help. btw, the first message showing up after it resumes booting normally is

if-up.d/mountnfs[eth0]: waiting for interface lo before doing NFS mounts (warning).
done.

accessing the vm via ssh is not possible at this point, only via xm console. cpu usage both on host and vm is very low.

Does the VM actually have any NFS mounted filesystems?

ok, i am basically constantly monitoring the VM to see when and why the load rises so much from time to time. basically it runs mostly fine, but sometimes there just is a lockup (top does not update anymore in putty) then it gets back and suddenly load is up to 70.00 or something. so i checked some logs to see what the hell is going on at those times, and look what i found in kern.log:

Nov 12 10:01:35 saturn kernel: [33481.449043] BUG: soft lockup - CPU#3 stuck for 61s! [swapper:0]
Nov 12 10:01:35 saturn kernel: [33481.449043] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.449043] CPU 3:
Nov 12 10:01:35 saturn kernel: [33481.449043] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.449043] Pid: 0, comm: swapper Not tainted 2.6.26-2-xen-amd64 #1
Nov 12 10:01:35 saturn kernel: [33481.449043] RIP: e030:[<ffffffff802083aa>]  [<ffffffff802083aa>]
Nov 12 10:01:35 saturn kernel: [33481.449043] RSP: e02b:ffff88009fc6df08  EFLAGS: 00000246
Nov 12 10:01:35 saturn kernel: [33481.449043] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff802083aa
Nov 12 10:01:35 saturn kernel: [33481.449043] RDX: ffff88009fc6df20 RSI: 0000000000000000 RDI: 0000000000000001
Nov 12 10:01:35 saturn kernel: [33481.449043] RBP: 0000000000000000 R08: 0000000000000000 R09: 000a08128f195db4
Nov 12 10:01:35 saturn kernel: [33481.449043] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.449043] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.449043] FS:  00007f452255f6e0(0000) GS:ffffffff8053a180(0000) knlGS:0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.449043] CS:  e033 DS: 002b ES: 002b
Nov 12 10:01:35 saturn kernel: [33481.449043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.449043] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 12 10:01:35 saturn kernel: [33481.449043]
Nov 12 10:01:35 saturn kernel: [33481.449043] Call Trace:
Nov 12 10:01:35 saturn kernel: [33481.449043]  [<ffffffff802612bf>] rcu_pending+0x26/0x50
Nov 12 10:01:35 saturn kernel: [33481.449043]  [<ffffffff8020e79d>] xen_safe_halt+0x90/0xa6
Nov 12 10:01:35 saturn kernel: [33481.449043]  [<ffffffff8020a0ce>] xen_idle+0x2e/0x66
Nov 12 10:01:35 saturn kernel: [33481.449043]  [<ffffffff80209d49>] cpu_idle+0x97/0xb9
Nov 12 10:01:35 saturn kernel: [33481.449043]
Nov 12 10:01:35 saturn kernel: [33481.451131] BUG: soft lockup - CPU#2 stuck for 61s! [swapper:0]
Nov 12 10:01:35 saturn kernel: [33481.451131] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.451131] CPU 2:
Nov 12 10:01:35 saturn kernel: [33481.451131] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.451131] Pid: 0, comm: swapper Not tainted 2.6.26-2-xen-amd64 #1
Nov 12 10:01:35 saturn kernel: [33481.451131] RIP: e030:[<ffffffff802083aa>]  [<ffffffff802083aa>]
Nov 12 10:01:35 saturn kernel: [33481.451131] RSP: e02b:ffff88009fc69f08  EFLAGS: 00000246
Nov 12 10:01:35 saturn kernel: [33481.451131] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffffffff802083aa
Nov 12 10:01:35 saturn kernel: [33481.451131] RDX: ffff88009fc69f20 RSI: 0000000000000000 RDI: 0000000000000001
Nov 12 10:01:35 saturn kernel: [33481.451131] RBP: 0000000000000000 R08: ffff88009fd2cb78 R09: ffff880001942ee0
Nov 12 10:01:35 saturn kernel: [33481.451131] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] FS:  00007f452255f6e0(0000) GS:ffffffff8053a100(0000) knlGS:0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] CS:  e033 DS: 002b ES: 002b
Nov 12 10:01:35 saturn kernel: [33481.451131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 12 10:01:35 saturn kernel: [33481.451131]
Nov 12 10:01:35 saturn kernel: [33481.451131] Call Trace:
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff8020e79d>] xen_safe_halt+0x90/0xa6
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff8020a0ce>] xen_idle+0x2e/0x66
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff80209d49>] cpu_idle+0x97/0xb9
Nov 12 10:01:35 saturn kernel: [33481.451131]
Nov 12 10:01:35 saturn kernel: [33481.451131] BUG: soft lockup - CPU#5 stuck for 61s! [swapper:0]
Nov 12 10:01:35 saturn kernel: [33481.451131] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.451131] CPU 5:
Nov 12 10:01:35 saturn kernel: [33481.451131] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.451131] Pid: 0, comm: swapper Not tainted 2.6.26-2-xen-amd64 #1
Nov 12 10:01:35 saturn kernel: [33481.451131] RIP: e030:[<ffffffff802083aa>]  [<ffffffff802083aa>]
Nov 12 10:01:35 saturn kernel: [33481.451131] RSP: e02b:ffff88009fc73f08  EFLAGS: 00000246
Nov 12 10:01:35 saturn kernel: [33481.451131] RAX: 0000000000000000 RBX: 0000000000000005 RCX: ffffffff802083aa
Nov 12 10:01:35 saturn kernel: [33481.451131] RDX: ffff88009fc73f20 RSI: 0000000000000000 RDI: 0000000000000001
Nov 12 10:01:35 saturn kernel: [33481.451131] RBP: 0000000000000000 R08: 0000000000000000 R09: 000a08128f195db4
Nov 12 10:01:35 saturn kernel: [33481.451131] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] FS:  00007f8f59bb36e0(0000) GS:ffffffff8053a280(0000) knlGS:0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] CS:  e033 DS: 002b ES: 002b
Nov 12 10:01:35 saturn kernel: [33481.451131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 12 10:01:35 saturn kernel: [33481.451131]
Nov 12 10:01:35 saturn kernel: [33481.451131] Call Trace:
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff802612bf>] rcu_pending+0x26/0x50
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff8020e79d>] xen_safe_halt+0x90/0xa6
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff8020a0ce>] xen_idle+0x2e/0x66
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff80209d49>] cpu_idle+0x97/0xb9
Nov 12 10:01:35 saturn kernel: [33481.451131]
Nov 12 10:01:35 saturn kernel: [33481.451131] BUG: soft lockup - CPU#4 stuck for 61s! [swapper:0]
Nov 12 10:01:35 saturn kernel: [33481.451131] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.451131] CPU 4:
Nov 12 10:01:35 saturn kernel: [33481.451131] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.451131] Pid: 0, comm: swapper Not tainted 2.6.26-2-xen-amd64 #1
Nov 12 10:01:35 saturn kernel: [33481.451131] RIP: e030:[<ffffffff802083aa>]  [<ffffffff802083aa>]
Nov 12 10:01:35 saturn kernel: [33481.451131] RSP: e02b:ffff88009fc71f08  EFLAGS: 00000246
Nov 12 10:01:35 saturn kernel: [33481.451131] RAX: 0000000000000000 RBX: 0000000000000004 RCX: ffffffff802083aa
Nov 12 10:01:35 saturn kernel: [33481.451131] RDX: ffff88009fc71f20 RSI: 0000000000000000 RDI: 0000000000000001
Nov 12 10:01:35 saturn kernel: [33481.451131] RBP: 0000000000000000 R08: 0000000000000000 R09: 000a08128d6e1eb4
Nov 12 10:01:35 saturn kernel: [33481.451131] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] FS:  00007f8f59bb36e0(0000) GS:ffffffff8053a200(0000) knlGS:0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] CS:  e033 DS: 002b ES: 002b
Nov 12 10:01:35 saturn kernel: [33481.451131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.451131] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 12 10:01:35 saturn kernel: [33481.451131]
Nov 12 10:01:35 saturn kernel: [33481.451131] Call Trace:
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff8020e79d>] xen_safe_halt+0x90/0xa6
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff8020a0ce>] xen_idle+0x2e/0x66
Nov 12 10:01:35 saturn kernel: [33481.451131]  [<ffffffff80209d49>] cpu_idle+0x97/0xb9
Nov 12 10:01:35 saturn kernel: [33481.451131]
Nov 12 10:01:35 saturn kernel: [33481.477578] BUG: soft lockup - CPU#0 stuck for 61s! [collectinfo.pl:18619]
Nov 12 10:01:35 saturn kernel: [33481.477578] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.477578] CPU 0:
Nov 12 10:01:35 saturn kernel: [33481.477578] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33481.477578] Pid: 18619, comm: collectinfo.pl Not tainted 2.6.26-2-xen-amd64 #1
Nov 12 10:01:35 saturn kernel: [33481.477578] RIP: e033:[<00007fab26646315>]  [<00007fab26646315>]
Nov 12 10:01:35 saturn kernel: [33481.477578] RSP: e02b:00007fffdda5b8e0  EFLAGS: 00000202
Nov 12 10:01:35 saturn kernel: [33481.477578] RAX: 00007fffdda5b990 RBX: 00000000085b7278 RCX: 0000000000b88010
Nov 12 10:01:35 saturn kernel: [33481.477578] RDX: 00007fffdda5b998 RSI: 00000000085b7278 RDI: 0000000000b88010
Nov 12 10:01:35 saturn kernel: [33481.477578] RBP: 000000000000000c R08: 00000000000000ff R09: 0000000010000000
Nov 12 10:01:35 saturn kernel: [33481.477578] R10: 0000000000000000 R11: 0000646e61726f6d R12: 0000000000ddaf70
Nov 12 10:01:35 saturn kernel: [33481.477578] R13: 0000000000ddb380 R14: 0000000004b63210 R15: 0000000000d73420
Nov 12 10:01:35 saturn kernel: [33481.477578] FS:  00007fab26b136e0(0000) GS:ffffffff8053a000(0000) knlGS:0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.477578] CS:  e033 DS: 0000 ES: 0000
Nov 12 10:01:35 saturn kernel: [33481.477578] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33481.477578] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 12 10:01:35 saturn kernel: [33481.477578]
Nov 12 10:01:35 saturn kernel: [33481.477578] Call Trace:
Nov 12 10:01:35 saturn kernel: [33481.477578]
Nov 12 10:01:35 saturn kernel: [33482.650354] BUG: soft lockup - CPU#1 stuck for 61s! [swapper:0]
Nov 12 10:01:35 saturn kernel: [33482.650354] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33482.650354] CPU 1:
Nov 12 10:01:35 saturn kernel: [33482.650354] Modules linked in: ip6table_filter ip6_tables iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_REJECT ipt_REDIRECT ipt_recent ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ipv6 quota_v2 joydev evdev ext3 jbd mbcache thermal_sys
Nov 12 10:01:35 saturn kernel: [33482.650354] Pid: 0, comm: swapper Not tainted 2.6.26-2-xen-amd64 #1
Nov 12 10:01:35 saturn kernel: [33482.650354] RIP: e030:[<ffffffff802083aa>]  [<ffffffff802083aa>]
Nov 12 10:01:35 saturn kernel: [33482.650354] RSP: e02b:ffff88009fc67f08  EFLAGS: 00000246
Nov 12 10:01:35 saturn kernel: [33482.650354] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff802083aa
Nov 12 10:01:35 saturn kernel: [33482.650354] RDX: ffff88009fc67f20 RSI: 0000000000000000 RDI: 0000000000000001
Nov 12 10:01:35 saturn kernel: [33482.650354] RBP: 0000000000000000 R08: 0000000000000000 R09: 000a08128f195db4
Nov 12 10:01:35 saturn kernel: [33482.650354] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33482.650354] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33482.650354] FS:  00007f452255f6e0(0000) GS:ffffffff8053a080(0000) knlGS:0000000000000000
Nov 12 10:01:35 saturn kernel: [33482.650354] CS:  e033 DS: 002b ES: 002b
Nov 12 10:01:35 saturn kernel: [33482.650354] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 12 10:01:35 saturn kernel: [33482.650354] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 12 10:01:35 saturn kernel: [33482.650354]
Nov 12 10:01:35 saturn kernel: [33482.650354] Call Trace:
Nov 12 10:01:35 saturn kernel: [33482.650354]  [<ffffffff8020e79d>] xen_safe_halt+0x90/0xa6
Nov 12 10:01:35 saturn kernel: [33482.650354]  [<ffffffff8020a0ce>] xen_idle+0x2e/0x66
Nov 12 10:01:35 saturn kernel: [33482.650354]  [<ffffffff80209d49>] cpu_idle+0x97/0xb9
Nov 12 10:01:35 saturn kernel: [33482.650354]

It appears that only my Virtualmin Pro XEN VM is affected by this. There are no such errors on the other two VMs (which have much less work to do). Also, everything on the host appears to be working fine. The guest locking up is a Debian x64 with 6 cores assigned to it. The host system has a Xeon processor with 4 corse + HT.

odd, my previous post seems empty, but when i click edit on it i can see what i wrote. can you see the message? anyhoo, in short: in the domU affected by lockups (sporadic very high load) i had messages in kern.log about the CPUs locking up for 61s and the clocksource going backwards. i now followed workaround 1 from http://wiki.debian.org/Xen#A.27clocksource.2BAC8-0.3ATimewentbackwards.27 and hope that the problem will go away...

did not help

top - 12:11:08 up 11:20,  2 users,  load average: 86.39, 63.43, 31.50
Tasks: 594 total,   1 running, 593 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.4%us,  0.3%sy,  0.0%ni, 66.7%id, 32.6%wa,  0.0%hi,  0.0%si,  0.1%st
Mem:   2621656k total,  2597228k used,    24428k free,     1752k buffers
Swap:  8388600k total,  2459132k used,  5929468k free,   470856k cached

vm is so slow it is barely useable

top - 12:14:10 up 11:23,  2 users,  load average: 123.23, 87.39, 46.30
Tasks: 686 total,   1 running, 675 sleeping,   0 stopped,  10 zombie
Cpu(s):  0.4%us,  0.3%sy,  0.0%ni,  0.0%id, 95.8%wa,  0.0%hi,  0.0%si,  3.4%st
Mem:   2621656k total,  2599068k used,    22588k free,     2064k buffers
Swap:  8388600k total,  2929420k used,  5459180k free,   412108k cached

i'm getting desperate, i can't afford to kill the vm several times a day :(

If you like, I could login to the VM myself and see what is using up so much IO time.

If you have the iotop command installed, it should be able to show you which process is responsible.

Another thing to check for is lack of RAM. I see you have swap enabled, which means that if RAM runs low the system will use the swap disk instead, which can be slow. According to your top output above, almost all of the VM's 2.5G of RAM is in use, and 3G of swap is being used too.. This could certainly account for the slowness.

Hi there.

The same system was running on a physical machine with less ram and much less cpu power and was doing fine, even with stuff on the swap partition.

I monitored iotop the whole day, all I can say is that there is nothing unusual there, in the scenario where the loads gets higher and higher and the iowait rises as well, the iotop shows just normal hd transfers, about 1Mb/sec maximum... In the other scenario, wher the system completely locks up for some seconds/minute, the load is very high after that, but iowait close to 0, and the load decreases back to normal.

I did some research (google / #xen channel on freenode) and found out that apparently the debian x64 kernel has some problems on domU when running with several vcpus. So what I will do next is reboot the domU into the lenny-backports xen kernel (which is 2.6.32 instead of 2.6.26), and hopefully fixes the problem. Also, I could add another 512 MB ram to make it 3 GB total, and probably get a server with more ram in the near future. That's why the domU is 64 bit in the first place, i wanted to have the option to increase ram ;)

I will let you know if my attempts fail and I still need help.

Try also running top and hitting M to see what is using up all the RAM on the system. Now that I think about it some more, this really looks like a RAM starvation issue. The high IO load is just a symptom of that.

hi! spamd is using 8%, mysqld 2%, then there are lots of php5-cgi processes using about 1.4% each. shouldn't the swap space fill up before anything more serious happens? that's the experience i had so far... anyway, i will tune the fcgid.conf configuration, cause i probably allowd too much processes to get better performance. I had the problem that a respawning php process took like 2-3 seconds to start, causing complaints by customers of websites with less traffic, where the processes would idle and close...

How many php processes are running? If there are too many, they could consume a lot of RAM.

BTW, this gets worse on 64-bit systems as RAM use for many processes doubles. There is no good reason to use a 64-bit install for web hosting in my opinion ..

ps aux lists 213 php processes right now... hehe... i basically reverted the fcgid.conf to the original state and will see how things go after the server reboot.

well, i thought i might want to go with 64 bit cause i might expand the ram to 4gb or beyond within a year, and i don't want to reinstall everything... bad choice probably, huh :(

213 is quite a lot - how many domains are on this system?

virtualmin says 165, but some of them are subdomains or aliases... so let's say maybe 100-120?

And do they all get a lot of traffic, or are most quiet? If you have that many domains and all are busy, I'd expect quite a lot of php-cgi processes, which in turn use up lots of RAM.

If you are seeing processes for domains that are idle, that seems like a problem that needs fixing.

Sorry for not getting back in the past days. I had to calm down and monitor things. There was indeed a RAM problem. In my experience I only had ram problems in cases where the swap space would fill up completely. Since I even had more physical ram on the domU than on my old physical server, I didn't think this would become a problem so soon. But then, my hosting business is growing... Anyway, I have raised the domU ram from 2.5 to 3gb and tuned the fcgid.conf file and ram usage / load have lowered. Ram is 2.5 out of 3 occupied and i have about 600 mb swap, load almost constantly at 0.x so that's fine for now.

However, there are still a couple of problems left, with certainly have nothing to do with ram. I previously ran the domU with 6 cores and eperienced kernel soft lockups. When I enlarged ram to 3 gb, I decided to assign 8 cores to the domU. Upon startup it immediately started locking up and acting weird, with kern.log filling with soft lockup errors. So I lowered to 4 assigned cores, and it seems to run fine since. Of course, that's too bad, since I am getting half the power right now... I have read there are some problems with xen on debian when running a 64 bit domU with several cores. I tried installing the lenny-backports xen kernel in the domU, but unfortunately it won't start up with that so i reverted to the stock kernel for now.

Also, I have not managed to fix the problem where the domU would pause for 3 minutes while setting up networking. My other domU, which is debian x86, does not have this problem and starts up really fast.

I'm not sure what could be causing those kernel lockups, sorry ... you might have to ask Xen or kernel experts about that.

Regarding the boot slowness, is there perhaps a way to have more details shown about what the boot process is doing exactly? I think on debian by default it shows very little information, as compared to redhat.

hi i have no idea, but here is a more extended excerpt.

Setting kernel variables (/etc/sysctl.conf)...done.
Mounting local filesystems...done.
Activating swapfile swap...done.
Checking quotas...
done.
Turning on quotas...done.
Setting up networking....
Configuring network interfaces...[   65.575438] NET: Registered protocol family 10
[   65.575693] lo: Disabled Privacy Extensions

[ nothing happens for about 3 minutes... ]

if-up.d/mountnfs[eth0]: waiting for interface lo before doing NFS mounts (warning).
done.
Starting "Shorewall firewall": [  255.391936] ip_tables: (C) 2000-2006 Netfilter Core Team
[  255.611149] Netfilter messages via NETLINK v0.30.
[  255.676785] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[  256.500212] ctnetlink v0.93: registering with nfnetlink.
[  257.887584] ClusterIP Version 0.8 loaded successfully
[  264.178015] ip6_tables: (C) 2000-2006 Netfilter Core Team
done.
INIT: Entering runlevel: 2
Starting enhanced syslogd: rsyslogd.
Starting domain name service...: bind9.
Starting OpenBSD Secure Shell server: sshd.
Starting MySQL database server: mysqld

The domU configuration file looks like this:

memory = 3072
maxmem = 4096
name = 'saturn'
vif = ['ip=xxx.xxx.xxx.xxx,mac=xx:xx:xx:xx:xx:xx,rate=95Mb/s']
address = 'xxx.xxx.xxx.xxx'
netmask = 'xxx.xxx.xxx.xxx'
disk = ['phy:/dev/lvm/saturn_img,sda1,w','phy:/dev/lvm/saturn_swap,sda2,w']
bootloader = "/usr/bin/pygrub"
vnc = 1
vnclisten = "0.0.0.0"
vncunused = 1
vncpasswd = "xxx"
vfb = [ 'type=vnc,vncunused=1,vncpasswd=xxx,vnclisten=0.0.0.0' ]
vcpus = 4
cpu_cap = 90
cpu_weight = 256

Btw the domU just suffered again from memory starvation and I had to destroy it :( Still, I suspect something is wrong with my fcgid configuration, as I noticed now that some domains have different fcgi-bin/php5.fcgi files than others. Could you maybe tell me what the default of the apache module configuration should look like, ad what the php5.fcgi files should look like? Here are mine:

<IfModule mod_fcgid.c>
  AddHandler    fcgid-script .fcgi
  IPCConnectTimeout 10
  IPCCommTimeout 20
# I added the following lines after the domU starved out again
  IdleScanInterval 60
  MaxProcessCount 96
  MaxRequestsPerProcess 500
  DefaultMaxClassProcessCount 4
  DefaultMinClassProcessCount 1
</IfModule>

Then, some php5.fcgi files look like this:

#!/bin/bash
ulimit -u 50
ulimit -v 512000
PHPRC=$PWD/../etc/php5
export PHPRC
umask 022
PHP_FCGI_CHILDREN=1
export PHP_FCGI_CHILDREN
SCRIPT_FILENAME=$PATH_TRANSLATED
export SCRIPT_FILENAME
/usr/bin/php5-cgi

some others look like this

#!/bin/bash
ulimit -u 50
ulimit -v 512000
PHPRC=$PWD/../etc/php5
export PHPRC
umask 022
export PHP_FCGI_CHILDREN
SCRIPT_FILENAME=$PATH_TRANSLATED
export SCRIPT_FILENAME
/usr/bin/php5-cgi

Basically one has PHP_FCGI_CHILDREN=1 the other one has not... now I googled but always read different opinions as to wether PHP_FCGI_CHILDREN should be there... Btw, where is the default php5.fcgi located that virtualmin puts into new domains?

The default contents of that file for new domains is set based on server templates - in recent versions, PHP_FCGI_CHILDREN is not set any more as it triggers creation of un-necessary processes.

To clear this for all domains, you should run :

virtualmin modify-web --all-domains --no-php-children

Thank you very much, that made things run much smoother. Also, I upgraded the kernel in the domU to the backports version, since then it only hangs a couple of seconds instead of a couple of minutes when configuring network. I also moved the other domUs on an other server, so I have more ram available. Still, running the domU with more than 4 cores makes it hang completely. Maybe a bug in the dom0 kernel, I don't know. I have a bad feeling about XEN and will probably move to KVM in the near future. Anyways, thank you very much for your support. I can sleep better now :)

Ok, great .. I will mark this as fixed then.

Automatically closed -- issue fixed for 2 weeks with no activity.