|

Monitoring Postfix for Delivery Errors and TLS Failures

Running a production mail server means more than getting it installed and configured correctly. It means knowing when something goes wrong, ideally before your clients do. This article covers a practical monitoring approach for Postfix that alerts you to delivery failures, TLS handshake errors, and other problems that would otherwise sit silently in the logs.

This is part of the Debian 13 mail server series and pairs naturally with the Postfix TLS hardening guide.

Why monitoring matters

During the TLS hardening work covered in the hardening article, a NO_RENEGOTIATION option was temporarily added to tls_ssl_options. The immediate effect was silent: Postfix accepted outbound connections, attempted the TLS handshake, and failed. Emails to Microsoft 365 hosted domains started piling up in the deferred queue. Without active monitoring, this went unnoticed for several hours.

The log entries looked like this:

postfix/smtp: Cannot start TLS: handshake failure
postfix/smtp: to=<info@example-client.be>, relay=example-client-be.mail.protection.outlook.com:25, status=deferred (Cannot start TLS: handshake failure)

A monitoring script running every 30 minutes would have caught this within half an hour and sent an alert. Instead, multiple client emails were delayed by up to 30 minutes before the issue was identified and resolved.

What to monitor

The three most important Postfix log patterns for operational monitoring are:

status=deferred indicates mail that Postfix could not deliver and has placed in the retry queue. This includes TLS failures, greylisting, temporary remote server errors, and connection timeouts. Some deferrals are normal and self-resolving, such as greylisting. Others indicate a real problem that requires intervention.

Cannot start TLS indicates a TLS handshake failure on an outbound connection. This is almost always a configuration problem on your end or a compatibility issue with the remote server. Left unmonitored, affected emails sit in the queue and retry every few minutes, potentially delaying delivery for hours.

status=bounced indicates mail that Postfix permanently rejected or could not deliver after exhausting retries. Some bounces are expected and benign, such as spam test patterns rejected by Rspamd, or system mail to non-existent local addresses. Others indicate real delivery failures that need attention.

Filtering out known noise

Not all deferrals and bounces require intervention. A naive alert on every occurrence would generate too much noise to be useful. The following patterns are benign and should be excluded from alerts:

Greylisting is a common spam prevention technique where the receiving server temporarily rejects mail with a 450 code on the first attempt. Postfix retries automatically and delivery succeeds on the second attempt. These appear as Recipient address rejected: Greylisted in the deferred status.

Gtube pattern bounces are test emails used by spam filter testing tools. Rspamd correctly rejects these and the bounce is expected.

System mail to non-existent addresses such as root@yourdomain.eu bouncing because no mailbox exists. This is resolved by configuring a proper /etc/aliases entry, but until then generates daily noise.

The monitoring script

Create the script at /usr/local/sbin/postfix-alert.sh:

#!/bin/bash
ERRORS=$(journalctl -u postfix --since "1 hour ago" \
    | grep -E "status=deferred|Cannot start TLS|status=bounced" \
    | grep -v "Greylisted\|Gtube\|root@yourdomain.eu" \
    | wc -l)

if [ "$ERRORS" -gt 3 ]; then
    journalctl -u postfix --since "1 hour ago" \
        | grep -E "status=deferred|Cannot start TLS|status=bounced" \
        | grep -v "Greylisted\|Gtube\|root@yourdomain.eu" \
        | mail -s "Postfix delivery errors on $(hostname)" alerts@yourdomain.com
fi

Replace root@yourdomain.eu with any benign patterns specific to your setup, and alerts@yourdomain.com with your alert email address.

Make the script executable:

chmod +x /usr/local/sbin/postfix-alert.sh

Verify it runs without errors:

/usr/local/sbin/postfix-alert.sh
echo $?

Exit code 0 confirms it ran cleanly. If the mail command is not available, install it:

apt install bsd-mailx -y

Setting up the cron job

Add the script to root’s crontab to run every 30 minutes:

crontab -e

Add this line:

*/30 * * * * /usr/local/sbin/postfix-alert.sh

A 30-minute interval strikes a reasonable balance between catching problems quickly and avoiding alert fatigue. For a busier mail server with stricter SLAs you could reduce this to 15 minutes.

Tuning the threshold

The script currently alerts when more than 3 matching errors occur within the last hour. This threshold should be tuned based on your mail volume. On a low-traffic server handling a few dozen domains, 3 errors in an hour is unusual and worth investigating. On a higher-volume server you may want to raise this to 10 or even 20 to avoid false positives from temporary remote server issues.

The threshold also interacts with the filtering. If you have not yet added all benign patterns to the exclusion list, lower the threshold temporarily until you understand your normal error baseline.

Checking the queue manually

When you receive an alert, check the current queue state:

postqueue -p

An empty queue means the errors occurred but Postfix successfully retried. A non-empty queue means mail is still waiting. To force an immediate retry of all queued mail:

postqueue -f

To investigate a specific queued message by its queue ID:

postcat -q QUEUEID

Alternative: pflogsumm for daily summaries

The monitoring script above is designed for real-time alerting on operational problems. For a broader daily overview of mail server activity including delivery statistics, top senders and recipients, and error summaries, pflogsumm is the standard tool:

apt install pflogsumm -y

Generate a report for the previous day:

journalctl -u postfix --since yesterday --until today | pflogsumm -d yesterday

Add a daily cron job to email the report each morning:

0 7 * * * journalctl -u postfix --since yesterday --until today | pflogsumm -d yesterday | mail -s "Postfix daily report $(hostname)" alerts@yourdomain.com

The two approaches complement each other well. The alert script catches problems within 30 minutes. The pflogsumm report gives you a broader picture of mail server health over time.

Also check for Exim4

One issue discovered during production monitoring work worth documenting explicitly: Debian installs Exim4 as the default MTA. If Exim4 is still present alongside Postfix, it may silently intercept local system mail and handle delivery independently of Postfix, making log-based monitoring unreliable for local mail. Verify Exim4 is completely removed:

systemctl status exim4
which exim4
dpkg -l | grep exim

All three commands should return nothing or Unit exim4.service could not be found. If Exim4 is present, refer to step 4.1 of the mail server guide for removal instructions.

Summary

A 30-minute monitoring cycle with filtered alerting gives you rapid detection of real delivery problems without noise from expected bounces and temporary deferrals. Combined with pflogsumm for daily summaries and the TLS hardening covered in the companion article, this completes a production-grade monitoring setup for a self-hosted Postfix mail server.

The script is intentionally simple, a few lines of shell rather than a complex monitoring agent. It uses only tools already present on a standard Debian mail server, requires no additional dependencies beyond mailutils, and is easy to audit and modify as your mail environment evolves.

Similar Posts