Having a big problem with greylisting...

Over the past few months, I have received complaints from a small percentage of users about unreceived email. All indications point to a greylisting problem. In particular, mail from Exchange 2003 servers and Hotmail/MSN seemed to be primary culprits in that they do not resend the original message after greylisting sends a busy. Of late I have noticed that gmail is doing the same thing.

Recently, I deleted the greylisting databases and restarted the greylisting system, but received even more complaints. One heavily affected user was receiving none of the email from maxell.com.

I pursued the possibility of an Exchange-related problem with David Schweikert (author of Postgrey), but he said that I was running the latest version and that any Exchange problems he was aware of were resolved many years ago.

As of yesterday, I have turned greylisting off and do not wish to leave it this way for long. My data suggests that greylisting reduces my SPAM by approx 600,000 messages per year.

I need definitive help with this and have been a paying customer for many years. Please formulate a resolution plan that does not necessitate many loops of forum communication.

JP

Status: 
Active

Comments

We appreciate your business over the years!

Greylisting does indeed make a remarkable difference in reducing spam.

However, I've never heard of the problem you're having before; and if even the Postgrey author doesn't know what's going on, that may indicate a problem that we're not able to correct.

In order for Greylisting to work, the servers in question have to resend their messages.

I do have a few questions for you though so that we can do some research --

  1. Now that you've disabled greylisting, have all of the email problems you were having gone away?

  2. What is the output of this command on your server: rpm -qa | grep postgrey

  3. If you could temporarily restart the greylisting -- what does this command output (which will show what parameters Postgrey is running with): ps auxw | grep postgrey

rpm -qa |grep postgrey:

postgrey-1.31-2.el5

regarding startup parameters (/etc/sysconfig/postgrey does not exist)

processname: postgrey

#

Source function library.

. /etc/rc.d/init.d/functions

Source networking configuration.

. /etc/sysconfig/network

Check that networking is up.

[ ${NETWORKING} = "no" ] && exit 0

prog=postgrey postgrey=/usr/sbin/$prog DBPATH=/var/spool/postfix/postgrey SOCKET=$DBPATH/socket OPTIONS="--unix=$SOCKET"

Source an auxiliary options file if we have one, and pick up OPTIONS,

if [ -r /etc/sysconfig/$prog ]; then . /etc/sysconfig/$prog fi

[ -x $postgrey -a -d $DBPATH ] || exit 0

RETVAL=0

start() { echo -n $"Starting $prog: " daemon $postgrey -d $OPTIONS RETVAL=$? echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog }

Please advise,

JP

Thanks for your response!

I'm not sure that you answered questions #1 and #3 though --

  1. Now that you've disabled greylisting, have all of the email problems you were having gone away?

  2. What is the output of this command: ps auxw | grep postgrey

Without /etc/sysconfig/posgrey, it'd probably be the defaults as can be determined from the init script you posted, but knowing the specifics of how it's running on your system will help us research the problems you're running into. Thanks!

Okay, here's what we did -- after some research, I do see some potential issues that could occur in the particular Postgrey version that's installed there now. While we hadn't heard of other folks having similar issues to what you described, we'd like to get that fixed before anyone does!

Joe just finished building a new Postgrey RPM. Try running this on your server:

yum install postgrey

That should pull down the latest Postgrey version from the Virtualmin repository. One of the things this new version will do is whitelist all the Google mail servers by default.

Let us know if that resolves the greylisting problems you've been seeing.

Have installed the postgrey rpm you pushed. Though your comments address gmail, what of msn/hotmail? We were getting most of our complaints from recipients who did not receive mail from known sender at:

1) MSN/Hotmail 2) What appear to be Microsoft Small Business Servers running Exchange 3 and GMail

Further postgrey mailling list suggests these runtime options:

What you should consider is using auto-whitelist-clients=1 and a very
long expire delay for the hosts whitelist. We use this one:

POSTGREY_OPTS="--inet=127.0.0.1:10023 --max-age=180 --lookup-by-host
--delay=2345 --auto-whitelist-clients=1 --retry-window=11h
--greylist-action=450"

[I presume the port 10023 setting is unique to them and would not be used by a vmin/postgrey install]

How do you feel about this??

JP

Though your comments address gmail, what of msn/hotmail?

That's a fine question!

The new version handles Google connections differently (specifically, it adds them all to a whitelist by default).

I don't believe it does that for MSN/Hotmail -- but what I'm hoping is that bugfixes in the new Postgrey version would also fix that issue for you.

Unfortunately, we haven't been able to reproduce the problem you're having, and no one else has posted similar trouble, so it's not something we can test.

However, if you continue to have problems with Hotmail and MSN even after the upgrade, you could always add those (and any other mail servers that are causing you problems) to the Postgrey whitelist, which is in /etc/postfix/postgrey_whitelist_clients.local.

I presume the port 10023 setting is unique to them and would not be used by a vmin/postgrey install

Port 10023 is the default port of a new Postgrey installation, and you're welcome to continue using that, it won't interfere with Virtualmin operation in any way.

What you should consider is using auto-whitelist-clients=1 and a very long expire delay for the hosts whitelist. We use this one:

We'll review the run-time options you mentioned, thanks for your input!

I'm not sure this is relevant, but...

How do you know the issue is with Postgrey? I am using Postgrey with VM pro, and I have had no issues with Gmail as it happens (that I know of).

However...

I have had issues with Microsoft (Exchange & hotmail). But these are not to do with Postgrey, but rather because they run an awful IP blacklisting system. For example, I recently changed IP on a server and suddenly no email was getting delivered to Hotmail. I had to apply to get the IP de-listed from Microsoft, and since then all has been well (the IP should never have been blacklisted in the first place. As I say, the system they are using is crap. IP blacklisting is a very blunt instrument anyway, but in any case we monitor our IPs via dnsstuff.com and they are clean).

"Thank you for contacting Microsoft Online Services Technical Support. This email is in reference to ticket number 1167227138, which was opened in regards to your delisting request for 216.157.xxx.xxx

The IP address you submitted has been reviewed and removed from our block lists. Please note that there may be a 1-2 hour delay before this change propagates through our entire system."

We too monitor our IPs and the issues I have mentioned go away when we turn off greylisting. More importantly, our detailed studies of postgreyreport output show the addresses that our clients are complaining about not receiving email from are in fact bounces from postgrey that never return from the originating sender.

Lastly, turning off greylisting pushes the load on SA up considerably. So it is desirable to find a proper solution.

JP

I just wanted to clarify your comment there -- are you saying that the new Postgrey version isn't resolving the problem for you? Or are you still testing that?

Just to say...

I have two CentOS 6.2 severs with VM Pro. One without the new patch for Postgrey, the other with the patch.

I sent email in from a Gmail account and Hotmail (live.co.uk) account that the servers hadn't seen before. All emails were delivered OK. On the server with the patch, the email from Gmail was not delayed (as you would expect).

I've only just recently got going with Postgrey and the massive reduction in load on SA is great. Jerry's problem has worried me. But I think in fact I'm fine.

I have installed the patch that was provided earlier in the support stream, but continue to have valid mail losses. The only common denominator I can discern is that the senders are using Microsoft SBS with Exchange. They send messages and postgrey sends busy, but the sending server never resends.

I need to get this resolved. The problem has, over the last 6 months, become a large dot on the radar for certain clients whereas previously it was essentially unheard of.

Please advise,

JP

We're sorry to hear that the more recent Postgrey version we pushed out didn't help!

Greylisting only works because sites resend their messages; since you seem to be working with some sites that don't, that becomes a rather huge issue.

That really only leaves you with two options:

  1. You can always whitelist sites that are giving you problems; you can whitelist servers that don't resend messages here:

/etc/postfix/postgrey_whitelist_clients.local

  1. If it's unreasonable to whitelist all the servers that are causing you problems, your only other reasonable option would be to disable greylisting altogether. I know it provides a huge reduction in spam for you, but if sites just aren't resending their messages, there really isn't much you can do there.

You could always consider adding in a commercial anti-spam service to your system there. For example, some folks really like Google's Postini, but there's others similar to that.

While we don't have a formal recommendation of which service to use, many of those commercial services are able to reduce spam, without the drawbacks that come with greylisting.

This is precisely what I am talking about. Virtualmin staff needs to look at these links. Perhaps there is a way to log the version of Exchange (for those Exchange servers that are dumb enough to broadcast versions) in the initial SMTP transaction so that we could evaluate postgreyreport and really get to the bottom of this.

JP

When connecting to a remote system, mail servers don't typically pass along any identification information outside of their "HELO" name in the initial greeting. And that's just their hostname.

Some mail servers may provide additional information when you connect to them, but that won't help with the issue you're seeing above without significant changes to the way Postgrey works. Postgrey only receives connections, it doesn't make any.

While we appreciate your confidence in our abilities, that's unfortunately not something we're going to be able to assist you with.

What you're seeing sounds like a problem with some versions of Exchange, which at best may be resolvable with a feature request to the Posgrey author -- but even that seems unlikely since it's both difficult to implement, and relies on all Exchange servers all identifying themselves with a version.

Sorry, but in your case, it really appears as if Postgrey may not be the right fit for you, unless you're able to whitelist mail servers that are giving you problems.

In your most recent response, there is a flaw in your assumptions.

The purpose of attempting to track either SMTP base platform or where possible version number, is only to demonstrate that a problem exists. As most SMTP servers are logged and frequently the FQDN indicates windows-based platform a la "exchange.blahblah.com" or other such indicators, any aggregate statistics would help demonstrate a larger problem worthy of additional investigation and migtigation on behalf of either VMin Team or postgrey author.

Without further regression the problem will just continue to lurk about and annoy. For me it appears to be almost every Exchange server that presents new mail.

JP

Joe's picture
Submitted by Joe on Wed, 04/25/2012 - 16:12 Pro Licensee

This is a bug in unpatched Exchange servers. There's nothing we're going to be able to do about it, though Microsoft has fixed it:

http://support.microsoft.com/?kbid=950757

The options are to whitelist the effected servers. You can use a regular expression in the whitelist, which may be useful for catching the "exchange.*" names you've mentioned.

So, add this to your whitelist and see if it helps:

/exchange.*/

I think that'd be the way to go in this case. It's pretty rare to run into unpatched Exchange servers this long after the fix was released, and it hasn't been an issue that has caused most folks trouble, but I guess your particular environment is exposed to more than usual.