process lookup-domain-d 100% cpu

Hello, today I have a big problem with my mail server running Virtualmin Gpl 3.93. I have 200 mail on sendmail queue, I see a lot of processes lookup-domain-d using a lot of cpu. How can I resolve this issue? I have a big problem with all my customers. Thanks.

Status: 
Closed (fixed)

Comments

I disable this entry in Procmail:
VIRTUALMIN=|/etc/webmin/virtual-server/lookup-domain.pl $LOGNAME
Now server is working well. What does this process do? Is it necessary to work?
Please let me know as soos as possible if I need to activate it immediatly.
Thanks.

Howdy -- it sounds like you may have a lot of incoming email.

If you go into Webmin -> Servers -> Postfix -> Mail Queue, does the email you see there look normal? Or does it appear to be spam of some sort?

If you see spam in your queue, you can review the email headers by clicking the "View All Headers" link after you open the message.

Using the headers, you can tell where the email came from -- if it was sent directly from the server, meaning it's likely due to a spammer sending spam through a vulnerable web application. If it was sent remotely, it's likely due to one of your users getting some sort of virus infection on their PC.

Alternatively, if you don't see spam in your queue, it's also possible it's an auto-reply loop. If you see a lot of email to one particular user, and it's not spam, check to see if that user has an auto-reply setup, and if so, try disabling it.

I disable this entry in Procmail: VIRTUALMIN=|/etc/webmin/virtual-server/lookup-domain.p.

Well, it sounds like that's just the symptom, not the problem itself.

The problem itself is that you have a lot of email in the mail queue, and the lookup-domain program is how Virtualmin determines where to deliver the email.

However, you do want to make sure that the lookup daemon is running, for maximum efficiency. If it's not, a new process will be launched for each incoming email.

You can start that by running this command:

/etc/init.d/lookup-domain start

I'm using sendmail. I have no spam on queue and I don't have a lot of mail from one user. Now server has normal traffic and It is very fast, but if I try do activate that Procmail line It crash again. What does this line do?

If I try to run this command:

/etc/init.d/lookup-domain start

I receive this error:

Failed to bind to localhost port 11000 at /usr/libexec/webmin/virtual-server/lookup-domain-daemon.pl line 49

What does it mean?

That may mean it's already running... try this command instead:

/etc/init.d/lookup-domain restart

Done. It doesn't say me Ok o Failed. I think it restarted ok.

And now?

Well, email won't be scanned and delivered properly without that line in your procmailrc...

I'd recommend adding that back in there, and seeing if your system is running better now that some time has passed, and the lookup-domain daemon has been restarted.

If I enable it performance decrease immediatly. Mail queue increase immediatly and I see a lot of processed lookup-domain.p and lookup-domain-d using a lot of CPU. I can't leave my server on this situation.

It sounds like you may be receiving a high amount of incoming email... you mentioned your mail queue had 200 messages in it previously, how many do you see in there now?

You can determine that with this command:

mailq | tail -1

In this momen "Total request 23". Previously I had 200 messeges because server was very slow.

If this may help, problem begin this morning after a server restart yesterday evening to allow these updates to take effect:

1343810351 (mer 01 ago 2012 10:39:11 CEST): wbm-virtual-server-3.93.gpl-1.noarch
1343810379 (mer 01 ago 2012 10:39:39 CEST): wbt-virtual-server-theme-8.5-1.noarch
1343810426 (mer 01 ago 2012 10:40:26 CEST): usermin-1.510-1.noarch
1343810450 (mer 01 ago 2012 10:40:50 CEST): glibc-2.5-81.el5_8.4.i686
1343810474 (mer 01 ago 2012 10:41:14 CEST): nspr-4.9.1-4.el5_8.i386
1343810477 (mer 01 ago 2012 10:41:17 CEST): glibc-headers-2.5-81.el5_8.4.i386
1343810480 (mer 01 ago 2012 10:41:20 CEST): nss-3.13.5-4.el5_8.i386
1343810486 (mer 01 ago 2012 10:41:26 CEST): bind-libs-9.3.6-20.P1.el5_8.2.i386
1343810488 (mer 01 ago 2012 10:41:28 CEST): bind-9.3.6-20.P1.el5_8.2.i386
1343810495 (mer 01 ago 2012 10:41:35 CEST): libpurple-2.6.6-11.el5.4.i386
1343810500 (mer 01 ago 2012 10:41:40 CEST): bind-utils-9.3.6-20.P1.el5_8.2.i386
1343810501 (mer 01 ago 2012 10:41:41 CEST): nss-tools-3.13.5-4.el5_8.i386
1343810509 (mer 01 ago 2012 10:41:49 CEST): nscd-2.5-81.el5_8.4.i386
1343810517 (mer 01 ago 2012 10:41:57 CEST): perl-DBD-Pg-1.49-4.el5_8.i386
1343810521 (mer 01 ago 2012 10:42:01 CEST): xulrunner-10.0.6-2.el5_8.i386
1343810528 (mer 01 ago 2012 10:42:08 CEST): glibc-devel-2.5-81.el5_8.4.i386
1343810531 (mer 01 ago 2012 10:42:11 CEST): caching-nameserver-9.3.6-20.P1.el5_8.2.i386
1343810538 (mer 01 ago 2012 10:42:18 CEST): bind-chroot-9.3.6-20.P1.el5_8.2.i386
1343810539 (mer 01 ago 2012 10:42:19 CEST): firefox-10.0.6-1.el5.centos.i386

Is it possible to activate spam control without lookup-domain script?

Would it be possible for you to run the command :

strace -o /tmp/strace.txt -p `cat /var/webmin/lookup-domain-daemon.pid`

When lookup-domain-daemon is using 100% of the CPU, let it run for 5 seconds, hit ctrl-c , and then attach the /tmp/strace.txt file to this bug? That will let us see what it is doing with all that CPU..

I receive this error and program terminate immediatly:

bash: syntax error near unexpected token `newline'

Attach you can find output file. Thanks.

I think the problem is spamassassin. If I try to run it by procmail my system go to 100% cpu usage and email start increase in queue. I have SpamAssassin version 3.3.1. What do you think? Can we see the problem launching the command you ask before?

Sorry, our bug tracker mangled that command .. the correct command is :

strace -o /tmp/strace.txt -p \`cat /var/webmin/lookup-domain-daemon.pid\`

Here for you. I ran it 2 times. Please let me know as soon as possible. Thanks for your support.

Here another log.
I try to downgrade glibc and all dependencies to last release but seems not resolve issue.
After that I reboot and re-run "vmware-config-tool.pl" (this is a vm), not working.
So, I run another time command you suggest.
Please let me know. Thanks.

Hello Jamie,
the problem seems to be solved now.
I unistalled Acronis software from vm and host machine, restarted server and performance return normal.
Anyway, please, take a look to my log files and let me know if you see something useful for the future.
Thanks a lot for your support. Sorry for make you losing time.
Bye.

Thanks for those logs .. I am not seeing any signs that lookup-domain-daemon.pl is doing anything particularly CPU intensive though. Glad to hear you got this solved!

What does this "Acronis" software do exactly? If it is performing a backup that is very disk IO intensive, it certainly could be responsible for slowing down other processes..

Acronis software was installed but not working. It had got some configuration problem.
I also stop Acronis services before remove them but doesn't help.
I need to remove Acronis also from host machine (Windows Server 2008 with VmWare Server 2.0.2), reboot and then all works.
I don't know exactly what was the problem but It is very important that my mail server is up and running well.
Thanks a lot.

Automatically closed -- issue fixed for 2 weeks with no activity.