Hi All,
I am totally puzzled at the moment as to what Virtualmin is doing, after recently updating everything to the latest versions, I am getting the following CPU load averages and constant alerts from CFS.
CPU load averages 9.45 (1 min) 9.32 (5 mins) 9.77 (15 mins)
Running top via ssh I get the following
Processes: 175 total, 2 running, 4 stuck, 169 sleeping, 944 threads 16:54:15
Load Avg: 1.16, 1.13, 1.13 CPU usage: 3.74% user, 2.72% sys, 93.53% idle
SharedLibs: 14M resident, 14M data, 0B linkedit.
MemRegions: 55177 total, 917M resident, 48M private, 345M shared.
PhysMem: 2845M used (1000M wired), 4237M unused.
VM: 447G vsize, 1073M framework vsize, 11607078(0) swapins, 14171139(0) swapouts
Networks: packets: 14989373/17G in, 10427533/1423M out.
Disks: 2651509/109G read, 2162583/222G written.
PID COMMAND %CPU TIME #TH #WQ #PORT #MREGS MEM RPRVT PURG
19094 mdworker 0.0 00:00.03 3 0 52 67 2196K 1340K 0B
19093 mdworker 0.0 00:00.03 3 0 52 69 3084K 2228K 0B
19092 syncdefaults 0.0 00:00.28 6 2 88 82 5132K 3952K 0B
19091 mdworker 0.0 00:00.06 3 0 52 69 5164K 4256K 0B
19089 top 9.3 00:14.13 1/1 0 26 41 2204K 1972K 0B
19086 bash 0.0 00:00.00 1 0 19 31 616K 448K 0B
19085 login 0.0 00:00.01 2 0 30 52 1168K 840K 0B
19078 TextEdit 0.0 00:00.27 5 2 170 184 13M 6556K 20K
19070 CVMCompiler 0.0 00:00.73 2 1 32 80 24M 24M 12K
19067 Terminal 24.0 00:03.02 13 7 179 212 20M+ 15M+ 80K
19057 com.apple.We 0.0 00:02.84 14 2 183 331 28M 25M 36K
19055 netbiosd 0.0 00:00.07 2 1 42 53 1888K 1484K 0B
19049 com.apple.iC 0.0 00:00.24 4 0 82 82 3892K 3112K 0B
19040 rpcsvchost 0.0 00:00.02 16 1 44 82 1428K 1092K 0B
Not sure where Virtualmin is pulling those averages from, and I'm not sure what is causing it. First I thought my server got hacked and sending out SPAM, but there is nothing in the mail queue.
Anyone got any ideas? Restarting my server gets it back down to the usual average of 0.3 for a day or two, then it starts to build back up.
I got an alert for 11.4 5 min load average around a hour ago. The websites aren't getting any extra hits as usual, so it can't be that...
Howdy,
Hmm, the output above appears that it's from an Apple computer, not a Linux server that would be running Virtualmin. Is that process information from the correct system?
-Eric
Ooops you are correct, what I get for posting in haste - saying that, i cannot connect to the server by ssh, it asks me for a login, and then i enter my password then it just stays blank :S
At this moment in time, its now running 11.4
CPU load averages: 11.30 (1 mins) , 11.25 (5 mins) , 11.22 (15 mins) CPU type: Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz , 4 cores
21916 jamessimpson 3.0 % /usr/bin/php-cgi
22225 jamessimpson 3.0 % /usr/bin/php-cgi
21915 jamessimpson 2.0 % /usr/bin/php-cgi
23138 root 1.2 % /usr/libexec/webmin/proc/index_cpu.cgi
1772 mysql 0.5 % /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-e ...
19 root 0.4 % [events/0]
14555 drivingroads 0.4 % /usr/bin/php-cgi
14797 drivingroads 0.4 % /usr/bin/php-cgi
6827 bojotoolstore 0.3 % /usr/bin/php-cgi
7484 bojotoolstore 0.3 % /usr/bin/php-cgi
15398 drivingroads 0.3 % /usr/bin/php-cgi
18444 bojotoolstore 0.2 % /usr/bin/php-cgi
22486 apache 0.2 % /usr/sbin/httpd
78 root 0.1 % [kipmi0]
23139 root 0.1 % /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf
1 root 0.0 % /sbin/init
Howdy,
Well, there's a number of PHP related processes there... it's possible that means one or more of your sites is seeing an influx of traffic.
However, what is the output of these commands:
free -m
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -15
Also, can you run the command "ps auxw", and attach that output as a text file?
-Eric
Thats the thing, I cannot get onto SSH at the moment, it lets me login but then won't let me type anything.
It has happened before but i had to restart the server to allow me access again, which would mean i would be running normal processes again for a day or two.
Finally managed to connect
Top:
top - 21:26:57 up 4 days, 21:58, 12 users, load average: 21.79, 20.18, 17.46
Tasks: 256 total, 1 running, 248 sleeping, 0 stopped, 7 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16321220k total, 15633400k used, 687820k free, 390020k buffers
Swap: 2097144k total, 7880k used, 2089264k free, 11586296k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19 root 20 0 0 0 0 D 0.7 0.0 34:16.18 events/0
61 root 39 19 0 0 0 S 0.3 0.0 0:20.72 khugepaged
5119 root 20 0 153m 15m 1668 S 0.3 0.1 0:34.30 lfd
1 root 20 0 19356 1476 1232 S 0.0 0.0 0:00.62 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.05 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:02.98 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.69 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.59 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 0:00.64 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
9 root 20 0 0 0 0 S 0.0 0.0 0:00.58 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:00.38 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 0:00.39 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
13 root 20 0 0 0 0 S 0.0 0.0 0:01.15 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 0:00.35 watchdog/2
And now SSH is frozen again, and I cannot get past successful authentication
In your latest "top" output, there seem to be no processes using any considerable CPU power, yet your system load is excessively high. This could indicate that the system is waiting a great deal for other resources (RAM, HDD, network) to become available. Might indicate an overload there or hardware issues.
Also I noticed 12 users logged on, and 7 zombie processes. Those might be hanging sessions of your failed attempts to log on via SSH, but you might want to check those out, using the commands "w" and "last".
I also recommend the tool "atop" over "top", since it displays more information like disk, memory, swap and network usage, and records historical data, for later review. atop shows zombie processes with a "Z" in the state column.
You might have to hard-reboot the server if you can't reliably get in via SSH anymore. A system load of 20 will most likely prevent you from doing any serious work on the server.
When you can get in again, you might want to review the system and kernel logs, and install atop.
Right I have had to restart the server, as last night it got up to 40.1 CPU average. After restarting this morning I am able to get back into SSH
Output from atop
atop
ATOP - JSServer01 2014/08/30 11:05:26 --------- 10s elapsed
PRC | sys 0.14s | user 1.49s | #proc 182 | #zombie 0 | #exit 5 |
CPU | sys 2% | user 15% | irq 0% | idle 378% | wait 5% |
cpu | sys 1% | user 11% | irq 0% | idle 83% | cpu000 w 5% |
cpu | sys 0% | user 4% | irq 0% | idle 96% | cpu002 w 0% |
cpu | sys 0% | user 0% | irq 0% | idle 99% | cpu001 w 0% |
cpu | sys 0% | user 0% | irq 0% | idle 100% | cpu003 w 0% |
CPL | avg1 0.17 | avg5 0.39 | avg15 0.36 | csw 5269 | intr 2754 |
MEM | tot 15.6G | free 12.7G | cache 811.7M | buff 86.2M | slab 353.2M |
SWP | tot 2.0G | free 2.0G | | vmcom 2.7G | vmlim 9.8G |
LVM | Group00-root | busy 5% | read 10 | write 192 | avio 2.62 ms |
DSK | sda | busy 5% | read 10 | write 71 | avio 6.53 ms |
NET | transport | tcpi 38 | tcpo 37 | udpi 0 | udpo 0 |
NET | network | ipi 47 | ipo 37 | ipfrw 0 | deliv 38 |
NET | em1 0% | pcki 66 | pcko 37 | si 4 Kbps | so 24 Kbps |
NET | lo ---- | pcki 10 | pcko 10 | si 0 Kbps | so 0 Kbps |
PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPU CMD 1/5
2168 0.02s 0.82s 0K 0K 0K 8K -- - S 8% php-cgi
2383 0.01s 0.30s 0K 0K 0K 0K -- - S 3% php-cgi
1866 0.03s 0.27s 0K 0K 36K 100K -- - S 3% mysqld
2224 0.01s 0.04s 75780K 20K 48K 88K -- - S 1% httpd
4131 0.01s 0.04s 0K 0K - - NE 0 E 1%
78 0.03s 0.00s 0K 0K 0K 0K -- - S 0% kipmi0
It is showing normal usage now, so not sure what the hell is going on after a day or two.
Installing atop i did get a warning
There are unfinished transactions remaining. You might consider running yum-complete-transaction first to finish them.
So I ran that too, and it looks as if I cannot install what is required
yum-complete-transaction
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.melbourne.co.uk
* epel: mirror.bytemark.co.uk
* extras: mirror.bytemark.co.uk
* updates: mirrors.ukfast.co.uk
Checking for new repos for mirrors
There are 1 outstanding transactions to complete. Finishing the most recent one
The remaining transaction had 10 elements left to run
--> Running transaction check
---> Package automake.noarch 0:1.11.1-4.el6 will be installed
---> Package cloog-ppl.x86_64 0:0.15.7-1.2.el6 will be installed
---> Package cpp.x86_64 0:4.4.7-4.el6 will be installed
---> Package gcc.x86_64 0:4.4.7-4.el6 will be installed
---> Package gcc-c++.x86_64 0:4.4.7-4.el6 will be installed
---> Package libgomp.x86_64 0:4.4.7-4.el6 will be installed
---> Package libstdc++-devel.x86_64 0:4.4.7-4.el6 will be installed
---> Package mpfr.x86_64 0:2.4.1-6.el6 will be installed
---> Package php-devel.x86_64 0:5.3.3-27.el6_5 will be installed
--> Processing Dependency: php(x86-64) = 5.3.3-27.el6_5 for package: php-devel-5.3.3-27.el6_5.x86_64
---> Package ppl.x86_64 0:0.10.2-11.el6 will be installed
--> Finished Dependency Resolution
Error: Package: php-devel-5.3.3-27.el6_5.x86_64 (updates)
Requires: php(x86-64) = 5.3.3-27.el6_5
Installed: php-5.3.3-27.el6_5.1.x86_64 (@updates)
php(x86-64) = 5.3.3-27.el6_5.1
Available: php-5.3.3-26.el6.x86_64 (base)
php(x86-64) = 5.3.3-26.el6
Available: php-5.3.3-27.el6_5.x86_64 (updates)
php(x86-64) = 5.3.3-27.el6_5
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
Running free-m now (kinda pointless as it is back to normal now)
free -m
total used free shared buffers cached
Mem: 15938 2921 13017 0 88 816
-/+ buffers/cache: 2016 13922
Swap: 2047 0 2047M
And the netstat
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -15
19
4 127.0.0.1
2 81.156.223.142
1 servers)
1 Address
1 90.206.201.8
Hmm I think i may have found the issue
I seem to have thousands of these in the messages log
Aug 30 05:05:14 JSServer01 named[29765]: client 127.0.0.1#45585: query (cache) '131.205.13.211.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:15 JSServer01 named[29765]: client 127.0.0.1#43407: query (cache) '29.193.26.103.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:15 JSServer01 named[29765]: client 127.0.0.1#41691: query (cache) '241.150.174.195.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:15 JSServer01 named[29765]: client 127.0.0.1#37403: query (cache) '166.109.97.211.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:15 JSServer01 named[29765]: client 127.0.0.1#58532: query (cache) '241.150.174.195.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#44044: query (cache) '102.120.149.107.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#37691: query (cache) '91.34.135.174.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#57784: query (cache) '219.106.153.184.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#40505: query (cache) '204.5.106.41.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#35974: query (cache) '91.34.135.174.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#35621: query (cache) '53.79.234.212.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#44718: query (cache) '102.120.149.107.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:16 JSServer01 named[29765]: client 127.0.0.1#52370: query (cache) '53.79.234.212.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:17 JSServer01 named[29765]: client 127.0.0.1#42438: query (cache) '177.10.244.162.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:17 JSServer01 named[29765]: client 127.0.0.1#41674: query (cache) '202.209.241.61.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:18 JSServer01 named[29765]: client 127.0.0.1#56260: query (cache) '124.10.244.162.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:19 JSServer01 named[29765]: client 127.0.0.1#48054: query (cache) '166.109.97.211.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:22 JSServer01 named[29765]: client 127.0.0.1#49980: query (cache) '188.17.82.36.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#49930: query (cache) '204.5.106.41.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#57424: query (cache) '188.17.82.36.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#57964: query (cache) '120.107.255.193.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#35676: query (cache) '124.10.244.162.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:23 JSServer01 named[29765]: client 127.0.0.1#35009: query (cache) '101.95.101.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:24 JSServer01 named[29765]: client 127.0.0.1#47569: query (cache) '120.107.255.193.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:24 JSServer01 named[29765]: client 127.0.0.1#39782: query (cache) '227.58.73.203.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:24 JSServer01 named[29765]: client 127.0.0.1#50507: query (cache) '101.95.101.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:24 JSServer01 named[29765]: client 127.0.0.1#41356: query (cache) '156.12.244.162.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:25 JSServer01 named[29765]: client 127.0.0.1#43907: query (cache) '227.58.73.203.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:25 JSServer01 named[29765]: client 127.0.0.1#50367: query (cache) '179.107.160.163.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:25 JSServer01 named[29765]: client 127.0.0.1#58792: query (cache) '179.107.160.163.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:25 JSServer01 named[29765]: client 127.0.0.1#45449: query (cache) '182.233.15.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#35984: query (cache) '19.96.95.23.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#42738: query (cache) '19.96.95.23.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#57701: query (cache) '187.92.95.23.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#33209: query (cache) '77.113.182.192.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#51364: query (cache) '240.9.244.162.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:26 JSServer01 named[29765]: client 127.0.0.1#56060: query (cache) '240.9.244.162.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:27 JSServer01 named[29765]: client 127.0.0.1#54580: query (cache) '238.210.34.89.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:27 JSServer01 named[29765]: client 127.0.0.1#34927: query (cache) '187.92.95.23.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:27 JSServer01 named[29765]: client 127.0.0.1#54763: query (cache) '170.233.15.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:28 JSServer01 named[29765]: client 127.0.0.1#51508: query (cache) '170.233.15.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:28 JSServer01 named[29765]: client 127.0.0.1#34891: query (cache) '77.113.182.192.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:29 JSServer01 named[29765]: client 127.0.0.1#37835: query (cache) '181.233.15.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:29 JSServer01 named[29765]: client 127.0.0.1#47091: query (cache) '156.12.244.162.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:31 JSServer01 named[29765]: client 127.0.0.1#47907: query (cache) '167.13.244.162.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:31 JSServer01 named[29765]: client 127.0.0.1#42951: query (cache) '167.13.244.162.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:31 JSServer01 named[29765]: client 127.0.0.1#37369: query (cache) '223.59.200.220.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#54876: query (cache) '187.233.15.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#56875: query (cache) '187.233.15.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#56911: query (cache) '182.233.15.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#37661: query (cache) '171.233.15.199.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#35656: query (cache) '220.59.200.220.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:32 JSServer01 named[29765]: client 127.0.0.1#42569: query (cache) '33.114.193.123.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:33 JSServer01 named[29765]: client 127.0.0.1#40194: query (cache) '33.114.193.123.in-addr.arpa/PTR/IN' denied
Aug 30 05:05:33 JSServer01 named[29765]: client 127.0.0.1#43916: query (cache) '181.233.15.199.in-addr.arpa/PTR/IN' denied
Okay, Eric might be able to say more about the error you get when trying to finish package updates; I'm not familiar enough with CentOS (I'm assuming you're using that, or another distro that uses "yum").
Did this issue start just after you installed updates? Or did it happen before that?
Note that the 40 is not the CPU usage, but system load. CPU usage is usually expressed in form of a percentage that the CPU spends handling processes. In your case, that'd be a maximum of 400% or 100% for each core.
System load on the other hand basically tells you how many processes on the average are ready to execute per time unit (usually 1 minute, 5 minutes, 15 minutes). In addition to CPU, this also takes other required resources into account, e.g. when a process has to wait for HDD availability. With your 4-core CPU, a load of up to 4 is acceptable and "normal" if the system is very heavily used.
So a load of 40 means that 40 processes are ready to do something but can't, because resources are lacking. It's to be expected that the system is nearly unresponsive then. In your case, that's probably not CPU power (since your top output showed that the CPU was mostly idle), but something else.
A good candidate is the HDD, in case there's hardware trouble with it. What kind of HDD setup do you have in the server? Single disk? Software/hardware RAID? You might want to use the command
smartctl
to review the HDDs' status values.Since this only happens after a while, you might want to observe it for a bit and note if the system load goes up. You can review historical atop data by running
atop -r /var/log/atop.log
. When the load goes up, note if the disk is overloaded ("DSK % busy" is a good indicator), also check which processes use what amount of memory, disk, network etc. You can sort the output of atop accordingly and switch to different screens. Press "?" for a help screen.Also don't forget to check
last
to see what those 12 logins were during your last problem phase! It shows you all logins with username and IP address. Pay attention to any entries with unexpected users/IP addresses there!I checked the last login's and i can confirm they are all mine.
It also looks like my server may have been in a ddos attack maybe?
I am seeing a lot of these in the messages log
Aug 29 19:51:52 JSServer01 named[29765]: client 127.0.0.1#11277: query (cache) 'gmx.net/NS/IN' denied
Aug 29 19:51:52 JSServer01 named[29765]: client 127.0.0.1#11277: query (cache) 'cingular.com/NS/IN' denied
Aug 29 19:51:52 JSServer01 named[29765]: client 127.0.0.1#11277: query (cache) 'sourceforge.net/NS/IN' denied
Aug 29 19:50:18 JSServer01 named[29765]: client 127.0.0.1#52864: query (cache) 'intel.com/NS/IN' denied
Aug 29 19:50:18 JSServer01 named[29765]: client 127.0.0.1#52864: query (cache) 'msn.com/NS/IN' denied
Aug 29 19:50:18 JSServer01 named[29765]: client 127.0.0.1#52864: query (cache) 'comcast.net/NS/IN' denied
And then what looks like a dos attack?
Aug 30 01:11:41 JSServer01 kernel: Firewall: *TCP_OUT Blocked* IN= OUT=em1 SRC=149.255.100.109 DST=69.46.36.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=25880 DF PROTO=TCP SPT=50786 DPT=9050 WINDOW=14600 RES=0x00 SYN URGP=0 UID=508 GID=503
Aug 30 01:11:41 JSServer01 named[29765]: client 127.0.0.1#44437: query (cache) '187.88.217.189.in-addr.arpa/PTR/IN' denied
Aug 30 01:11:41 JSServer01 named[29765]: client 127.0.0.1#46883: query (cache) '187.88.217.189.in-addr.arpa/PTR/IN' denied
Aug 30 01:11:41 JSServer01 named[29765]: client 127.0.0.1#53390: query (cache) '225.222.197.69.in-addr.arpa/PTR/IN' denied
Aug 30 01:11:42 JSServer01 named[29765]: client 127.0.0.1#38526: query (cache) '252.55.186.210.in-addr.arpa/PTR/IN' denied
Aug 30 01:11:42 JSServer01 kernel: Firewall: *TCP_OUT Blocked* IN= OUT=em1 SRC=149.255.100.109 DST=69.46.36.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=25881 DF PROTO=TCP SPT=50786 DPT=9050 WINDOW=14600 RES=0x00 SYN URGP=0 UID=508 GID=503
Aug 30 01:11:42 JSServer01 named[29765]: client 127.0.0.1#56360: query (cache) '94.158.55.50.in-addr.arpa/PTR/IN' denied
Aug 30 01:11:42 JSServer01 named[29765]: client 127.0.0.1#33568: query (cache) '34.137.46.77.in-addr.arpa/PTR/IN' denied
Aug 30 01:11:43 JSServer01 named[29765]: client 127.0.0.1#55732: query (cache) '190.243.45.70.in-addr.arpa/PTR/IN' denied
Aug 30 01:11:43 JSServer01 named[29765]: client 127.0.0.1#57461: query (cache) '120.141.93.216.in-addr.arpa/PTR/IN' denied
Locutus, I run updates all the time to keep the server updated, but around a week ago there was quite a few updates which i ran, and then I enabled graylisting as i was starting to see a lot of spam emails coming through.
After that, I then started to get CSF alerts of high load averages, and then it seemed to get worse.
I am running a Dell Poweredge R210, which comes with a Dell Raid Card, and two 1TB hard drives set up in RAID 1
In virtualmin, it only shows the raid (SCSI device A Drive size 953.31 GB - Make and model Dell VIRTUAL DISK)
I have another machine which is running quite happily without the same issues, but that is running a software raid across two disks and I am able to query the raid / disks, but with this machine, I've never been able to query the raid, as I don't think there are any proper Dell drivers for the raid card to run Linux.
The raid card is a Dell SAS 6/iR Adapter
Hi Guys,
It started building up again, ran atop -r and this is the output
ATOP - JSServer01 2014/08/30 15:02:04 --------- 4h25m53s elapsed
PRC | sys 94.89s | user 19m30s | #proc 184 | #zombie 0 | #exit 0 |
CPU | sys 1% | user 19% | irq 0% | idle 371% | wait 9% |
cpu | sys 1% | user 9% | irq 0% | idle 82% | cpu000 w 8% |
cpu | sys 0% | user 5% | irq 0% | idle 94% | cpu002 w 1% |
cpu | sys 0% | user 3% | irq 0% | idle 97% | cpu001 w 0% |
cpu | sys 0% | user 2% | irq 0% | idle 98% | cpu003 w 0% |
CPL | avg1 0.27 | avg5 0.29 | avg15 0.27 | csw 5643189 | intr 6191011 |
MEM | tot 15.6G | free 11.8G | cache 1.4G | buff 232.5M | slab 406.8M |
SWP | tot 2.0G | free 2.0G | | vmcom 2.8G | vmlim 9.8G |
LVM | Group00-root | busy 10% | read 158419 | write 785040 | avio 1.76 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | avio 2.57 ms |
DSK | sda | busy 10% | read 112136 | write 262769 | avio 4.43 ms |
NET | transport | tcpi 534967 | tcpo 484902 | udpi 13309 | udpo 13651 |
NET | network | ipi 555500 | ipo 516192 | ipfrw 0 | deliv 548501 |
NET | em1 0% | pcki 492572 | pcko 649938 | si 36 Kbps | so 409 Kbps |
NET | lo ---- | pcki 101110 | pcko 101110 | si 13 Kbps | so 13 Kbps |
Window has been resized...
PID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/17
11352 1 6.33s 4m13s 310.1M 102.7M 324K 10720K N- - S 0 2% php-cgi
11353 1 5.98s 3m59s 310.1M 102.7M 124K 12436K N- - S 2 2% php-cgi
14890 1 7.86s 3m47s 286.6M 81180K 0K 11336K N- - S 1 1% php-cgi
1866 16 20.63s 1m45s 863.0M 63104K 81144K 1.0G N- - S 3 1% mysqld
6279 1 4.30s 79.64s 311.8M 104.1M 2692K 70844K N- - D 2 1% php-cgi
6992 1 2.12s 64.37s 278.9M 77108K 164K 4K N- - S 0 0% php-cgi
10698 1 2.92s 57.18s 301.1M 95656K 572K 52416K N- - S 1 0% php-cgi
6242 1 1.36s 39.79s 285.7M 78572K 80K 4K N- - S 0 0% php-cgi
6993 1 1.10s 33.55s 272.9M 66768K 220K 4K N- - S 2 0% php-cgi
78 1 21.30s 0.00s 0K 0K 0K 0K N- - S 3 0% kipmi0
6600 1 0.51s 17.45s 264.5M 63392K 176K 164K N- - S 0 0% php-cgi
I think I have figured it out - It's something to do with BIND - I think i've been going through DDOS attacks for some strange reason
I have just added this into named.conf
acl "trusted"{
My server ip address
My server ip address 2
My secondary DNS server IP address
localhost;
localnets;
};
options {
listen-on port 53 {
any;
};
listen-on-v6 port 53 {
any;
};
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named_stats.txt";
memstatistics-file "/var/named/data/named_mem_stats.txt";
allow-query { trusted; };
allow-transfer { trusted; };
allow-recursion { trusted;} ;
allow-query-cache { trusted; };
recursion no;
dnssec-enable yes;
dnssec-validation yes;
dnssec-lookaside auto;
/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";
also-notify {
};
};
I now see a lot of these type of warnings in my log file
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.198.26#53: query 'dansimpson.net/SPF/IN' denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.198.26#53: query 'dansimpson.net/SPF/IN' denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.198.26#53: query 'ns2.j5huh.net/A/IN' denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.198.26#53: query 'ns1.j5huh.net/A/IN' denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.192.25#21267: query 'ns1.j5huh.com/A/IN' denied
Aug 30 20:21:20 JSServer01 named[12935]: client 80.241.192.25#20384: query 'ns1.j5huh.com/A/IN' denied
Which I am assuming is remains of a DNS attack?
Well adding those DNS settings broke my websites, as I couldn't access them, although I have upped the firewall to block multiple queries which seems to have worked,
Does this give any clues? LVM and DSK are flashing red?
ATOP - JSServer01 2014/09/01 13:08:44 --------- 2m54s elapsed
PRC | sys 5.84s | user 2.64s | #proc 138 | #zombie 0 | #exit 0 |
CPU | sys 8% | user 7% | irq 0% | idle 307% | wait 78% |
cpu | sys 4% | user 2% | irq 0% | idle 25% | cpu000 w 69% |
cpu | sys 2% | user 4% | irq 0% | idle 88% | cpu001 w 5% |
cpu | sys 1% | user 1% | irq 0% | idle 96% | cpu002 w 2% |
cpu | sys 0% | user 0% | irq 0% | idle 97% | cpu003 w 2% |
CPL | avg1 1.38 | avg5 0.58 | avg15 0.21 | csw 248036 | intr 226145 |
MEM | tot 15.6G | free 14.2G | cache 501.0M | buff 14.9M | slab 334.3M |
SWP | tot 2.0G | free 2.0G | | vmcom 868.7M | vmlim 9.8G |
LVM | Group00-root | busy 78% | read 109666 | write 2872 | avio 1.21 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | avio 1.09 ms |
DSK | sda | busy 79% | read 65008 | write 1376 | avio 2.07 ms |
NET | transport | tcpi 24 | tcpo 24 | udpi 75 | udpo 102 |
NET | network | ipi 120 | ipo 135 | ipfrw 0 | deliv 102 |
NET | em1 0% | pcki 182 | pcko 85 | si 0 Kbps | so 0 Kbps |
NET | lo ---- | pcki 33 | pcko 33 | si 0 Kbps | so 0 Kbps |
*** system and process activity since boot ***
PID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/16
158 1 4.22s 0.98s 36096K 1368K 276K 16K N- - S 1 3% plymouthd
2158 1 0.04s 1.33s 239.1M 52280K 2804K 4K N- - S 0 1% spamd
1 1 0.56s 0.02s 19356K 1524K 409.7M 6968K N- - S 0 0% init
34 1 0.54s 0.00s 0K 0K 0K 0K N- - S 0 0% kblockd/0
78 1 0.32s 0.00s 0K 0K 0K 0K N- - S 3 0% kipmi0
437 1 0.01s 0.15s 10648K 756K 9268K 0K N- - S 2 0% udevd
2182 1 0.04s 0.01s 154.2M 13520K 11332K 7712K N- - S 3 0% postgrey
1843 2 0.01s 0.04s 37812K 4184K 1556K 4K N- - S 0 0% hald
2260 1 0.00s 0.04s 81296K 3408K 520K 8K N- - S 3 0% master
And this was from yesterday, when it started to build up again
ATOP - JSServer01 2014/08/31 00:00:01 --------- 6h17m12s elapsed
PRC | sys 3m58s | user 25m22s | #proc 201 | #zombie 0 | #exit 1 |
CPU | sys 2% | user 15% | irq 0% | idle 374% | wait 9% |
cpu | sys 0% | user 8% | irq 0% | idle 84% | cpu000 w 8% |
cpu | sys 0% | user 4% | irq 0% | idle 95% | cpu002 w 1% |
cpu | sys 0% | user 2% | irq 0% | idle 97% | cpu001 w 0% |
cpu | sys 0% | user 2% | irq 0% | idle 98% | cpu003 w 0% |
CPL | avg1 0.13 | avg5 0.16 | avg15 0.14 | csw 9523149 | intr 8306300 |
MEM | tot 15.6G | free 12.0G | cache 1.2G | buff 255.2M | slab 192.2M |
SWP | tot 2.0G | free 2.0G | | vmcom 3.1G | vmlim 9.8G |
LVM | Group00-root | busy 10% | read 158124 | write 917942 | avio 2.18 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | avio 0.88 ms |
DSK | sda | busy 10% | read 119043 | write 345707 | avio 5.04 ms |
NET | transport | tcpi 539048 | tcpo 506361 | udpi 43734 | udpo 44075 |
NET | network | ipi 598771 | ipo 564033 | ipfrw 0 | deliv 583078 |
NET | em1 0% | pcki 514411 | pcko 678076 | si 19 Kbps | so 301 Kbps |
NET | lo ---- | pcki 131997 | pcko 131997 | si 13 Kbps | so 13 Kbps |
*** system and process activity since boot ***
PID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/23
12952 1 9.41s 6m04s 303.8M 97396K 296K 18116K N- - S 0 2% php-cgi
12239 1 8.58s 5m44s 292.0M 86864K 684K 16916K N- - S 0 2% php-cgi
13618 1 7.58s 5m12s 310.1M 102.7M 32K 13992K N- - S 0 1% php-cgi
1772 15 26.10s 2m08s 798.8M 66204K 214.2M 1.2G N- - S 0 1% mysqld
78 1 2m13s 0.00s 0K 0K 0K 0K N- - S 3 1% kipmi0
6474 1 3.62s 95.06s 286.2M 78744K 53280K 12952K N- - S 0 0% php-cgi
3119 1 3.16s 84.38s 287.4M 80580K 105.0M 8660K N- - S 0 0% php-cgi
2571 33 4.86s 42.56s 2.6G 181.8M 155.9M 13296K N- - S 1 0% dsm_om_connsvc
20531 1 2.01s 27.72s 275.4M 69604K 476K 47256K N- - S 0 0% php-cgi
You posted the system activity since boot, you should also watch the ongoing activity. You can change the update interval with the i key. With t you can trigger a manual update.
It seems like the HDD is under constant high load. You can sort the process list by disk usage with shift-d and switch to disk details with d, to find out which process(es) are using the disk so much.
Right, I have to restart the server like every other day to get it back to normal processes enough for me to even login to SSH.
These logs are from the 5th - shows high LVM and DSK
ATOP - JSServer01 2014/09/05 13:29:42 --------- 3m22s elapsed
PRC | sys 6.72s | user 2.89s | #proc 141 | #trun 1 | #tslpi 161 | #tslpu 3 | #zombie 0 | clones 2157 | | #exit 0 |
CPU | sys 7% | user 6% | irq 0% | idle 304% | wait 82% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 4% | user 2% | irq 0% | idle 19% | cpu000 w 75% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 0% | user 3% | irq 0% | idle 93% | cpu001 w 4% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 3% | user 1% | irq 0% | idle 95% | cpu002 w 2% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 1% | user 0% | irq 0% | idle 97% | cpu003 w 2% | | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
CPL | avg1 1.17 | avg5 0.53 | | avg15 0.20 | | csw 256516 | intr 253915 | | | numcpu 4 |
MEM | tot 15.6G | free 14.2G | cache 489.2M | dirty 1.2M | buff 13.6M | slab 343.7M | | | | |
SWP | tot 2.0G | free 2.0G | | | | | | | vmcom 864.7M | vmlim 9.8G |
LVM | Group00-root | busy 82% | read 112338 | write 2805 | KiB/r 7 | KiB/w 4 | MBr/s 4.32 | MBw/s 0.05 | avq 4.86 | avio 1.45 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | KiB/r 4 | KiB/w 0 | MBr/s 0.01 | MBw/s 0.00 | avq 3.27 | avio 0.93 ms |
DSK | sda | busy 83% | read 67273 | write 1386 | KiB/r 13 | KiB/w 8 | MBr/s 4.46 | MBw/s 0.05 | avq 2.46 | avio 2.44 ms |
NET | transport | tcpi 28 | tcpo 27 | udpi 93 | udpo 145 | tcpao 2 | tcppo 1 | tcprs 1 | tcpie 0 | udpip 0 |
NET | network | ipi 151 | ipo 183 | ipfrw 0 | deliv 138 | | | | icmpi 17 | icmpo 9 |
NET | em1 0% | pcki 141 | pcko 122 | si 0 Kbps | so 0 Kbps | coll 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | lo ---- | pcki 33 | pcko 33 | si 0 Kbps | so 0 Kbps | coll 0 | erri 0 | erro 0 | drpi 0 | drpo 0 |
*** system and process activity since boot ***
PID TID RUID EUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR DSK CMD 1/8
1 - root root 1 0.55s 0.03s 19232K 1516K 428.7M 8236K N- - S 0 54% init
1038 - root root 1 0.01s 0.01s 108.0M 1804K 339.9M 1008K N- - S 0 42% rc
1923 - mysql mysql 11 0.01s 0.02s 477.5M 23128K 9304K 92K N- - S 1 1% mysqld
434 - root root 1 0.01s 0.17s 10760K 876K 9148K 0K N- - S 0 1% udevd
1973 - root root 1 0.03s 1.35s 239.1M 52280K 2804K 4K N- - S 0 0% spamd
1661 - haldaemo haldaemo 2 0.02s 0.03s 37824K 4200K 1560K 4K N- - S 0 0% hald
1323 - named named 7 0.02s 0.02s 382.6M 17392K 1468K 16K N- - S 0 0% named
346 - root root 1 0.00s 0.00s 0K 0K 0K 1128K N- - S 1 0% jbd2/dm-0-8
2072 - postfix postfix 1 0.00s 0.00s 81584K 3940K 1064K 0K N- - S 3 0% trivial-rewrit
1284 - root root 4 0.00s 0.00s 243.3M 1612K 416K 172K N- - S 0 0% rsyslogd
2073 - postfix postfix 1 0.00s 0.00s 81580K 3612K 572K 0K N- - S 0 0% smtp
2062 - root root 1 0.00s 0.03s 81296K 3408K 520K 8K N- - S 1 0% master
2106 - root root 1 0.01s 0.01s 269.3M 28532K 516K 4K N- - D 1 0% httpd
1662 - root root 1 0.00s 0.00s 20328K 1156K 520K 0K N- - S 0 0% hald-runner
2117 - root root 1 0.01s 0.00s 17532K 5252K 500K 4K N- - R 2 0% atop
1764 - root root 1 0.00s 0.00s 107.7M 1460K 368K 0K N- - S 2 0% mysqld_safe
2071 - postfix postfix 1 0.00s 0.00s 81520K 3504K 336K 0K N- - S 3 0% qmgr
157 - root root 1 5.11s 1.21s 36096K 1372K 276K 12K N- - S 1 0% plymouthd
It looks like init and rc are causing issues?
And this is the day before
The load has now shot up to over 1, and the dsk is flashing on atop
You again posted the "System activity since boot", you might want to observe the ongoing activity (press t to trigger a manual update of the screen) when the HDD is under high load, there check which processes use the most and how much.
Well i think i've managed to trace it down to what is causing the disk issues, two processes init and rc
Now what would be causing this?
ATOP - JSServer01 2014/09/25 12:51:08 --------- 4m36s elapsed
PRC | sys 8.92s | user 3.42s | #proc 138 | #trun 1 | #tslpi 158 | #tslpu 3 | #zombie 0 | clones 2275 | #exit 0 |
CPU | sys 6% | user 5% | irq 0% | idle 303% | wait 85% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 3% | user 2% | irq 0% | idle 16% | cpu000 w 79% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 3% | user 1% | irq 0% | idle 95% | cpu002 w 1% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 0% | user 2% | irq 0% | idle 93% | cpu001 w 4% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
cpu | sys 0% | user 0% | irq 0% | idle 98% | cpu003 w 1% | steal 0% | guest 0% | curf 3.06GHz | curscal ?% |
CPL | avg1 1.21 | avg5 0.67 | avg15 0.27 | | | csw 299648 | intr 335944 | | numcpu 4 |
MEM | tot 15.6G | free 14.1G | cache 519.1M | dirty 8.7M | buff 15.6M | slab 373.4M | | | |
SWP | tot 2.0G | free 2.0G | | | | | | vmcom 899.3M | vmlim 9.8G |
LVM | Group00-root | busy 86% | read 121974 | write 3222 | KiB/r 7 | KiB/w 3 | MBr/s 3.37 | MBw/s 0.05 | avio 1.89 ms |
LVM | Group00-swap | busy 0% | read 322 | write 0 | KiB/r 4 | KiB/w 0 | MBr/s 0.00 | MBw/s 0.00 | avio 0.93 ms |
DSK | sda | busy 86% | read 79345 | write 1608 | KiB/r 12 | KiB/w 8 | MBr/s 3.47 | MBw/s 0.05 | avio 2.94 ms |
NET | transport | tcpi 16 | tcpo 16 | udpi 178 | udpo 210 | tcpao 0 | tcppo 0 | tcprs 0 | udpip 0 |
NET | network | ipi 205 | ipo 236 | ipfrw 0 | deliv 197 | | | icmpi 3 | icmpo 9 |
NET | em1 0% | pcki 225 | pcko 180 | si 2 Kbps | so 0 Kbps | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | lo ---- | pcki 29 | pcko 29 | si 0 Kbps | so 0 Kbps | erri 0 | erro 0 | drpi 0 | drpo 0 |
*** system and process activity since boot ***
PID TID RDDSK WRDSK WCANCL DSK CMD 1/28
1 - 462.4M 7388K 8K 54% init
1110 - 345.5M 992K 472K 40% rc
2016 - 26432K 432K 0K 3% mysqld
2108 - 3456K 8436K 4K 1% postgrey
439 - 9272K 0K 0K 1% udevd