We have an Exchange 2013 (now SP1) organization and Windows Server 2012 virtal machines. It features two mailbox and two CAS servers. The CAS servers are joined to an NLB cluster.
Both incoming and outgoing SMTP messages are routed via two Sendmail/CentOS-based mail relays. Those relays are configured to re-send incoming SMTP messages to the NLB cluster FQDN. Usually it works fine. However, sometimes incoming messages are stuck in the relay queues for hours. There is no visible system: the stuck messages can be big (up to the upper limit of 35 megabytes) or small (several hundred bytes), they can be spam or valid messages. Eventually those messages get sent autimatically, and I always can manually force Sendmail to send them immediately.
The error codes the Sendmail reports usually refer to timeouts. However, after the SP1 installation I've cought several times the status "Greylisting in action". Antispam agents on the mailbox servers are enabled, but where from on Earth grelylisting on Exchange could come from? AFAIK it never was a part of the Exchange antispam native protection engine.
Mail relay servers are specified as intermal mail servers in the Exchange oranization configuration. Maximum number of simultaneous connections on all of the Exchange servers is set to 5000. There are no error mesages related to the Transport Service in Application logs on all of the servers (to be more precise, after SP1 installation those errors appeared, but it seems they are not influencing the situation). The Exchange servers have a plenty of unallocated memory, the CPU load is minimal, so this is not a performance-related issue.
In general, it doesn't bother me much, but nobody can guarantee that one day it won't become a major problem. What else can I investigate to find the source of the issue?