Advanced Mail Filtering: ClamAV, Neural Networks, Machine Learning
Part 5 of the Building a Modern Mail Server on Debian 13 series
Introduction
Part 4 established a production-ready mail server with Hall of Fame security status on internet.nl and excellent spam filtering through Rspamd. You now have:
- ✅ Professional spam filtering with scoring and headers
- ✅ Complete email authentication (DKIM, SPF, DMARC)
- ✅ Greylisting for spam bot protection
- ✅ DANE/TLSA certificate pinning
- ✅ Automatic spam-to-Junk delivery
This is already a production-ready mail server suitable for most organizations. Part 5 is optional and adds advanced filtering layers that improve detection rates and reduce false positives through machine learning and collaborative intelligence.
What Part 5 Adds
Virus and Malware Protection:
- ClamAV – Scans attachments for viruses, malware, and phishing
Phishing Protection:
- OpenPhish & PhishTank – Real-time phishing URL detection
Machine Learning and Adaptation:
- Neural Networks – Pattern recognition for sophisticated spam
- Fuzzy Hashing – Detects near-duplicate spam campaigns
- Bayes Classifier – Statistical learning from your mail patterns
- Self-Learning – Automatic training from user actions
Production Considerations
Should you implement Part 5?
YES, implement if you:
- Handle sensitive data requiring virus scanning
- Experience sophisticated spam that bypasses basic filters
- Have users who make Junk/INBOX classification decisions
- Want adaptive filtering that learns from your mail patterns
- Need the highest possible spam detection rates
SKIP Part 5 if you:
- Run a small personal mail server (< 10 users)
- Have limited system resources (< 4GB RAM)
- Prefer simpler systems with fewer moving parts
- Are satisfied with Part 4’s detection rates
Resource requirements for Part 5:
- Additional 512MB-1GB RAM (primarily for ClamAV)
- 2-4GB disk space for virus signatures
- Modest CPU overhead for scanning and learning
Remember: Part 4 alone provides production-ready mail security. Part 5 enhances detection but isn’t required.
What You’ll Build
After completing this part, your mail server will have:
Multi-Layer Virus Protection
- ClamAV virus scanning – All attachments checked for malware
- Real-time signature updates – Fresh virus definitions daily
- Optional unofficial signatures – 18M+ additional virus definitions
- Safe failure mode – Mail delivered even if the scanner is down
Phishing Protection
- OpenPhish feed – Real-time phishing URL database
- PhishTank feed – Community-sourced phishing URLs
- Automatic updates – Feeds refresh hourly
- Credential theft prevention – Block fake login pages
Intelligent Learning Systems
- Neural network – Recognizes sophisticated spam patterns
- Fuzzy hashing – Detects spam with minor variations
- Bayes classifier – Learns from your specific mail patterns
- Auto-learning – Trains automatically from high-confidence decisions
User-Driven Training
- Self-learning pipeline – Learns from folder moves
- IMAPSieve integration – Automatic training scripts
- Ham and spam training – Both directions taught
- Continuous improvement – Gets smarter over time
Performance and Monitoring
- Resource-efficient – Optimized for production servers
- Detailed metrics – Per-module effectiveness tracking
- Learning progress – Monitor improvement over time
- Fallback strategies – Resilient to scanner failures
Prerequisites Check
From Part 4: Core Rspamd Setup
Verify you’ve completed Part 4 with these essentials:
# Check Rspamd is running
systemctl status rspamd
# Verify Valkey connection (as _rspamd user)
sudo -u _rspamd valkey-cli -s /run/valkey/valkey.sock ping
# Should respond: PONG
# Testing as _rspamd ensures Rspamd has proper permissions to access Valkey
# Check Rspamd metrics
rspamc stat
# Should show: Messages scanned, Actions taken
# Verify DKIM signing is configured (from Part 4)
# Note: This check only works if you've already sent mail from your server
journalctl -u rspamd --since "24 hours ago" | grep DKIM_SIGNED | tail -1
# Should show: DKIM_SIGNED(0.0){your-domain.com}
# If empty: Either no mail sent yet, or DKIM not configured
# If empty, send a test email via authenticated submission (port 587)
# This ensures mail goes through Rspamd for DKIM signing
# Replace 'user@example.com' with your actual email account and external recipient
source /root/mail-server-vars.sh
echo "Test email to verify DKIM signing" | \
swaks --to your-personal-email@gmail.com \
--from info@${DOMAIN} \
--server localhost:587 \
--auth-user info@${DOMAIN} \
--auth-password 'your-password' \
--tls
# Wait for processing, then check logs
sleep 5
journalctl -u rspamd --since "1 minute ago" | grep DKIM_SIGNED
# Should now show: DKIM_SIGNED(0.0){your-domain.com}
# Alternative: Verify DKIM keys exist without sending mail
ls -la /var/lib/rspamd/dkim/
# Should show: default.key (owned by _rspamd)
# Or check DKIM configuration directly
rspamd-dkim-keygen -s default -d ${DOMAIN} -k /var/lib/rspamd/dkim/default.key 2>&1 | grep -i "already exists"
# If key exists, DKIM is configuredExpected from Part 4:
- Rspamd installed and processing mail
- Valkey (Redis) running and connected
- DKIM keys created in
/var/lib/rspamd/dkim/ - DKIM signing outgoing mail
- ARC preserving authentication through mailing lists/forwarders
- SPF, DMARC records configured
- Spam headers being added
- Greylisting active
- Sieve spam filter moving mail to Junk
- Hall of Fame status on internet.nl
System Resources Check
Part 5 adds resource requirements, especially for ClamAV:
# Check available RAM
free -h
# Recommended minimum: 4GB total (2GB+ free)
# Check available disk space
df -h /
# Recommended minimum: 10GB free (for ClamAV signatures + learning data)
# Check CPU cores
nproc
# Recommended minimum: 2 cores (4+ for smooth performance)Resource guidelines:
- < 4GB RAM: Skip ClamAV or use on-demand scanning only
- 4-8GB RAM: Full setup works but monitor closely
- 8GB+ RAM: Comfortable for all Part 5 features
- SSD storage: Strongly recommended for learning databases
Verify Mail Flow
Ensure basic mail flow is working correctly:
# Source configuration
source /root/mail-server-vars.sh
# Send test email
echo "Part 5 Prerequisites Test - $(date)" | mail -s "Pre-P5 Test" info@${DOMAIN}
# Watch complete mail pipeline
journalctl -u postfix -u dovecot -u rspamd -f
# Check email was delivered
doveadm search -u info@${DOMAIN} subject "Pre-P5 Test"
# Should show message IDExpected behavior:
- Postfix receives mail
- Rspamd processes and scores mail
- Dovecot delivers to INBOX or Junk
- Sieve filter executes correctly
If all checks pass, you’re ready for Part 5!
Architecture Overview
Here’s how the advanced filtering pipeline integrates with your existing setup from Part 4:
┌──────────────────────────────────────────────────────┐
│ Incoming/Outgoing Mail │
└──────────────────────────┬───────────────────────────┘
│
┌──────────────────────────▼───────────────────────────┐
│ Postfix │
│ SMTP Server (Port 25) │
└──────────────────────────┬───────────────────────────┘
│
│ Milter Protocol
│
┌──────────────────────────▼───────────────────────────┐
│ Rspamd │
│ Advanced Multi-Layer Analysis │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Traditional │ │ Virus/Malware│ │ Learning │ │
│ │ Filters │ │ Protection │ │ Systems │ │
│ ├──────────────┤ ├──────────────┤ ├──────────────┤ │
│ │ • SPF/DKIM │ │ • ClamAV │ │ • Neural Net │ │
│ │ • Greylisting│ │ • Phishing │ │ • Bayes │ │
│ │ • Headers │ │ • RBL Checks │ │ • Fuzzy Hash │ │
│ │ • Reputation │ │ • Signatures │ │ • Auto-learn │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ Each module votes on message classification │
│ Final score = weighted combination of all signals │
│ │
└──────────────────────────┬───────────────────────────┘
│
│ Headers: X-Spam, X-Spam-Score
│ Actions: No action, Greylist, Add header, Reject
│
┌──────────────────────────▼───────────────────────────┐
│ Dovecot LMTP │
│ Mail Delivery │
└──────────────────────────┬───────────────────────────┘
│
┌──────────────────────────▼───────────────────────────┐
│ Sieve Filter │
│ if X-Spam: Yes → Junk │
│ else → INBOX │
└──────────────────────────┬───────────────────────────┘
│
┌──────────────────────────▼───────────────────────────┐
│ Final Delivery │
│ INBOX or Junk Folder │
└──────────────────────────┬───────────────────────────┘
│
│ User Actions
│ (Move Junk↔INBOX)
│
┌──────────────────────────▼───────────────────────────┐
│ IMAPSieve Scripts │
│ Automatic Learning Pipeline │
│ │
│ Move to Junk → Train as Spam │
│ Move to INBOX → Train as Ham │
│ ┌────────────┐ │
│ │ Learning │ │
│ │ Feedback │ │
│ └─────┬──────┘ │
└──────────────────────────┼───────────────────────────┘
│
│ Trained Models
│
┌──────────────────────────▼───────────────────────────┐
│ Valkey (Redis) │
│ Persistent Learning Data │
│ │
│ • Bayes Token Statistics │
│ • Neural Network Weights │
│ • Fuzzy Hash Database │
│ • Statistical Counters │
└──────────────────────────────────────────────────────┘
Data Flow Explained
Mail Arrival → Analysis:
- Postfix receives mail → Forwards to Rspamd via milter
- Rspamd performs parallel analysis:
- Traditional checks: SPF, DKIM, DMARC, headers, reputation, RBL
- Virus scanning: ClamAV (viruses, malware, phishing)
- Learning systems: Neural network, Bayes, fuzzy hashing
- Each module “votes” with a score (positive = spam, negative = ham)
- Final score = weighted combination of all signals
- Action taken based on thresholds (reject, greylist, add header, no action)
Mail Delivery → Learning:
- Sieve filter reads X-Spam header → Routes to INBOX or Junk
- User reviews mail and moves messages if needed
- IMAPSieve detects folder changes:
- Move to Junk → Train Rspamd as spam
- Move to INBOX → Train Rspamd as ham (not spam)
- Training updates stored in Valkey:
- Bayes: Token frequency statistics
- Neural network: Weight adjustments
- Fuzzy: Hash signatures
Continuous Improvement:
- More user corrections → Better learning
- More spam seen → Better pattern recognition
- System adapts to YOUR specific mail patterns
- Detection accuracy improves over time
Module Interaction
Voting system example:
Message: "Buy cheap pills now!"
├─ SPF: -0.2 (pass)
├─ DKIM: -0.2 (valid signature)
├─ Bayes: +5.0 (learned spam pattern)
├─ Neural: +4.5 (spam-like structure)
├─ Fuzzy: +3.0 (similar to previous spam)
├─ RBL: +2.8 (sender IP on blacklist)
└─ ClamAV: 0.0 (no virus)
Final Score: +14.9 / 15.0 → REJECTEach module contributes evidence. The combined score determines the action.
ClamAV Integration
ClamAV is an open-source antivirus engine that scans email attachments for viruses, malware, trojans, and phishing attempts.
Why ClamAV for Mail
What ClamAV catches:
- Viruses and malware – Executable files with malicious code
- Office macro viruses – Infected Word/Excel documents
- Phishing emails – Credential-stealing attempts
- Suspicious attachments – Password-protected archives, scripts
- Zero-day threats – Heuristic detection for unknown malware
Production considerations:
- RAM-intensive: Requires 512MB-1GB RAM for the signature database
- Disk space: Virus signatures consume 2-4GB
- CPU overhead: Scanning adds 50-200ms per message with attachments
- Scan timeouts: Configure reasonable limits to prevent blocking mail
Scan strategy:
- All attachments scanned before delivery
- Non-attachment mail passes immediately (minimal overhead)
- Safe failure mode: Mail delivered if the scanner is unavailable
- Daily signature updates for the latest threat detection
Installation
# Install ClamAV and update daemon
apt install -y clamav clamav-daemon
# ClamAV installs without running - fresh signature database needed
# Check service is disabled (expected)
systemctl status clamav-daemon
# Should show: disabled or inactive
# Start signature update (will take several minutes)
systemctl stop clamav-freshclam
freshclam
# Initial download takes 5-10 minutes
# Database is 200-400MB compressed, 2-4GB uncompressedMonitor signature update:
# Watch update progress
journalctl -u clamav-freshclam -f
# Expected output:
# Reading CVD header (main.cvd): OK
# main database available for download (version: 27)
# Downloading main.cvd [100%]
# Database updated (10,389,214 signatures)Start services after initial update:
# Enable and start freshclam (automatic updates)
systemctl enable clamav-freshclam
systemctl start clamav-freshclam
# Enable and start scanner daemon
systemctl enable clamav-daemon
systemctl start clamav-daemon
# Verify both services running
systemctl status clamav-daemon clamav-freshclamConfigure ClamAV for Mail Scanning
ClamAV’s default configuration needs tuning for mail server use.
Configure Scanner Performance
Edit /etc/clamav/clamd.conf:
# Backup original
cp /etc/clamav/clamd.conf /etc/clamav/clamd.conf.orig
# Edit configuration
vi /etc/clamav/clamd.confFind and modify these settings:
# Maximum file size to scan (25MB sufficient for mail)
MaxFileSize 25M
# Maximum scan size (some files expand - 100MB safe)
MaxScanSize 100M
# Maximum recursion level (for compressed archives)
MaxRecursion 10
# Maximum files in archive
MaxFiles 1000
# Phishing detection (important for email!)
PhishingSignatures yes
PhishingScanURLs yes
# Heuristic detection (catches unknown malware)
HeuristicScanPrecedence yes
# Alert on encrypted archives (suspicious in email)
AlertBrokenExecutables yes
AlertEncrypted yes
AlertEncryptedArchive yes
AlertEncryptedDoc yes
# Performance settings
# MaxThreads: Default is 12 (good for most systems)
# Reduce only if you have limited CPU cores (e.g., 2 cores = MaxThreads 2)
MaxThreads 12Configuration explained:
MaxFileSize 25M: Don’t scan enormous files (mail usually < 25MB)MaxScanSize 100M: Extraction limit (archives expand)MaxRecursion 10: How deep into nested archives to scanPhishing*: Essential for catching credential-stealing emailsHeuristic*: Detects suspicious patterns in unknown filesAlert*: Flag encrypted/broken files as suspiciousMaxThreads 12: Parallel scanning threads (default, good for 4+ core systems)- Reduce to 2-4 only if you have 2 CPU cores or less
- Keep default (12) for modern VPS/dedicated servers
Configure Automatic Updates
Edit /etc/clamav/freshclam.conf:
vi /etc/clamav/freshclam.confFind and verify these settings:
# Update frequency (24 = once per day)
Checks 24
# Database mirror (use default)
DatabaseMirror database.clamav.netRestart services to apply configuration:
systemctl restart clamav-daemon clamav-freshclam
# Verify ClamAV daemon is running and socket exists
systemctl status clamav-daemon
ls -la /run/clamav/clamd.sock /var/run/clamav/clamd.ctl
# Should show both sockets exist
# Verify no errors in logs
journalctl -u clamav-daemon -n 20
# Verify freshclam can notify clamd (no warnings)
journalctl -u clamav-freshclam -n 20 | grep -i warning
# Should be empty (no warnings about "Can't connect to clamd")Integrate ClamAV with Rspamd
Configure Rspamd to use ClamAV for virus scanning.
Enable ClamAV Module
Create /etc/rspamd/local.d/antivirus.conf:
cat > /etc/rspamd/local.d/antivirus.conf << 'EOF'
# ClamAV antivirus configuration
clamav {
# Enable ClamAV scanner
enabled = true;
# Virus scanning
type = "clamav";
# ClamAV socket (Debian default)
servers = "/var/run/clamav/clamd.ctl";
# Symbol for virus detection
symbol = "CLAM_VIRUS";
# Actions based on detection
action = "reject";
message = "Message rejected: Virus detected - %s";
# Scan execution
scan_mime_parts = true; # Scan all MIME parts
scan_text_mime = false; # Skip text-only parts (performance)
scan_image_mime = false; # Skip images without executables
# Timeout settings
timeout = 15.0; # 15 seconds max per scan
retransmits = 3; # Retry up to 3 times
# If ClamAV is down, deliver mail anyway (safe failure)
# Don't block legitimate mail because antivirus is offline
fail_action = "accept";
# Logging
log_clean = false; # Don't log clean messages (reduces noise)
# Patterns to detect in ClamAV response
patterns {
# Virus found pattern
virus = '^VIR';
# Phishing found pattern
phish = '^Heuristics\.Phishing';
}
# Whitelist specific file types (trusted content)
# Uncomment if needed:
# whitelist = "/etc/rspamd/antivirus_whitelist.map";
}
EOFConfiguration explained:
servers: Unix socket to ClamAV daemonsymbol = "CLAM_VIRUS": Rspamd symbol for detectionaction = "reject": What to do when virus foundscan_mime_parts = true: Scan all attachmentsscan_text/image = false: Skip non-executable parts (performance)timeout = 15.0: Reasonable timeout (prevents mail delays)fail_action = "accept": If ClamAV down, deliver mail anywaylog_clean = false: Only log virus detections (less journal noise)
Configure ClamAV Symbol Weight
Create /etc/rspamd/local.d/external_services_group.conf:
cat > /etc/rspamd/local.d/external_services_group.conf << 'EOF'
# External services symbol configuration
# ClamAV virus detection and phishing URL detection
symbols = {
"CLAM_VIRUS" {
weight = 0.0;
description = "Virus found by ClamAV";
# Virus detection ALWAYS rejects, regardless of score
# Weight 0.0 because we reject immediately on detection
# Not part of spam score - it's a hard block
}
"PHISHED_OPENPHISH" {
weight = 10.0;
description = "URL found in OpenPhish phishing database";
}
"PHISHED_PHISHTANK" {
weight = 10.0;
description = "URL found in PhishTank phishing database";
}
}
EOFSymbol weight explanations:
- CLAM_VIRUS (0.0): Virus detection is binary – reject action handles blocking, not scoring
- *PHISHED_ (10.0)**: Confirmed phishing URLs warrant immediate high score for rejection
Test Configuration and Restart
# Test Rspamd configuration
rspamadm configtest
# Should show: "syntax OK"
# Restart Rspamd to load ClamAV integration
systemctl restart rspamd
# Verify ClamAV module loaded
journalctl -u rspamd --since "5 minutes ago" | grep -i clam
# Should show ClamAV socket connectionTest ClamAV Scanning
Test virus detection using SWAKS from another server. Testing from localhost doesn’t reliably trigger ClamAV scanning.
From another server (not your mail server), run:
# Create EICAR test file
cat > /tmp/eicar.txt << 'EOF'
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
EOF
# Send test email with EICAR attachment
swaks --to info@yourdomain.com \
--from test@example.com \
--server mail.yourdomain.com \
--helo example.com \
--header "Subject: EICAR Virus Test" \
--body "Testing ClamAV virus detection" \
--attach /tmp/eicar.txtExpected result – Email rejected:
<** 550 5.7.1 Message rejected: Virus detected - Eicar-Signature
*** Error: Message rejected: Virus detectedOn your mail server, verify:
# Check for virus detection in logs
journalctl -u rspamd --since "5 minutes ago" | grep CLAM_VIRUS
# Expected output:
# CLAM_VIRUS(0.00){Eicar-Signature;}
# forced: reject "Virus detected: Eicar-Signature"
# Verify rejection
journalctl -u postfix --since "5 minutes ago" | grep "reject.*virus"
# Mail queue should be empty (virus rejected before queueing)
mailqTest clean email delivery:
# Send normal email without attachment
swaks --to info@yourdomain.com \
--from test@example.com \
--server mail.yourdomain.com \
--helo example.com \
--body "Clean test email"
# Should deliver successfullyTroubleshooting:
# Verify ClamAV is running
systemctl status clamav-daemon
# Test ClamAV directly
echo 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' | clamdscan -
# Should show: Eicar-Signature FOUND
# Update virus signatures if needed
freshclam
systemctl restart clamav-daemonMonitor ClamAV Activity
# ClamAV daemon status
systemctl status clamav-daemon
# Recent virus detections
journalctl -u rspamd --since "7 days ago" | grep CLAM_VIRUS
# Shows lines like:
# CLAM_VIRUS(0.00){Eicar-Signature;}
# forced: reject "Virus detected: %s"; score=nan (set by clamav)
# Count virus detections in last 7 days
journalctl -u rspamd --since "7 days ago" | grep CLAM_VIRUS | wc -l
# ClamAV signature database version
clamdscan --version
# Check last database update status
journalctl -u clamav-freshclam | grep -E "Database updated|database is up-to-date" | tail -n 5
# Shows either "Database updated" (when signatures were downloaded)
# or "database is up-to-date" (when already current)
# Current signature count
grep DatabaseDirectory /etc/clamav/clamd.conf | xargs ls -lh
# Shows main.cvd (main signatures), daily.cvd (daily updates)
# Verify ClamAV security and performance settings
cat /etc/clamav/clamd.conf | grep -E "MaxFileSize|MaxScanSize|MaxRecursion|MaxFiles|PhishingSignatures|PhishingScanURLs|HeuristicScanPrecedence|AlertBrokenExecutables|AlertEncrypted|AlertEncryptedArchive|AlertEncryptedDoc|MaxThreads"
# Expected settings (Debian defaults):
# MaxFileSize 100M - Maximum file size to scan
# MaxScanSize 400M - Maximum scan size (decompressed)
# MaxRecursion 17 - Archive nesting depth
# MaxFiles 10000 - Maximum files in archive
# PhishingSignatures yes - Enable phishing signatures
# PhishingScanURLs yes - Scan URLs for phishing
# HeuristicScanPrecedence no - Signature scan before heuristic
# AlertBrokenExecutables yes - Alert on corrupted executables
# AlertEncrypted no - Don't alert on encrypted files (would cause false positives)
# AlertEncryptedArchive no - Don't alert on encrypted archives
# AlertEncryptedDoc no - Don't alert on encrypted documents
# MaxThreads 10 - Number of scanning threadsOptional: ClamAV Unofficial Signatures
⚠️ WARNING: This can add upto 2GB of extra RAM usage. Use only on systems with 8GB RAM or more.
The unofficial signatures project provides significantly more virus definitions than the official ClamAV databases, detecting many more threats.
Resource requirements:
- An additional 2GB RAM
- Additional 1-2GB disk space
- Slightly longer scan times
When to use unofficial signatures:
- ✅ High-security environments
- ✅ Systems with ≥8GB RAM
- ✅ Need maximum threat detection
- ❌ Low-memory systems (< 8GB RAM)
- ❌ Basic personal mail servers
Installation:
# Clone the unofficial signatures repository
cd /opt
git clone https://github.com/extremeshok/clamav-unofficial-sigs
cd clamav-unofficial-sigs
# Install the script
cp clamav-unofficial-sigs.sh /usr/local/sbin/
chmod +x /usr/local/sbin/clamav-unofficial-sigs.sh
# Create configuration directory
mkdir -p /etc/clamav-unofficial-sigs
# Copy configuration files
cp config/os/os.debian.conf /etc/clamav-unofficial-sigs/os.conf
cp config/master.conf /etc/clamav-unofficial-sigs/
cp config/user.conf /etc/clamav-unofficial-sigs/
# Enable the configuration
sed -i "s/#user_configuration_complete=\"yes\"/user_configuration_complete=\"yes\"/g" /etc/clamav-unofficial-sigs/user.conf
# Install logrotate and man page
/usr/local/sbin/clamav-unofficial-sigs.sh --install-logrotate
/usr/local/sbin/clamav-unofficial-sigs.sh --install-man
# Install systemd service and timer
cp systemd/clamav-unofficial-sigs.service /etc/systemd/system/
cp systemd/clamav-unofficial-sigs.timer /etc/systemd/system/
systemctl daemon-reload
# Enable automatic updates
systemctl enable clamav-unofficial-sigs.timer
systemctl start clamav-unofficial-sigs.timer
# Run initial download (this will take several minutes)
/usr/local/sbin/clamav-unofficial-sigs.sh
# Restart ClamAV to load new signatures
systemctl restart clamav-daemonVerify installation:
# Check loaded signatures
clamscan --debug 2>&1 /dev/null | grep "loaded"
# Should show significantly more signatures after installation:
# Before: ~9 million signatures
# After: ~18+ million signatures
# Check memory usage
ps aux | grep clamd
# Expect ~2GB more RAM usage than beforeMonitor signature updates:
# Check when signatures were last updated
systemctl status clamav-unofficial-sigs.timer
# View update logs
journalctl -u clamav-unofficial-sigs.service | tail -50
# Verify signatures are current
ls -lh /var/lib/clamav/Maintenance:
The systemd timer automatically updates signatures daily. No manual intervention needed.
If you need to disable unofficial signatures:
# Stop and disable the timer
systemctl stop clamav-unofficial-sigs.timer
systemctl disable clamav-unofficial-sigs.timer
# Remove unofficial signature files (keep only official ones)
cd /var/lib/clamav/
rm -f *.ndb *.hdb *.fp *.ftm *.ign *.ign2 *.mdb *.ldb *.sfp *.yar*
# Restart ClamAV
systemctl restart clamav-daemon
# Memory usage should return to normal (~500MB-1GB)ClamAV Maintenance
Daily automatic tasks (already configured):
- Freshclam automatically updates signatures daily
- No manual intervention needed
- Updates applied without service restart
Weekly maintenance tasks:
# Check disk space (signatures grow over time)
df -h /var/lib/clamav/
# Expected: 2-4GB used for signature databases
# Expected with unofficial sigs: 4-6GB used
# Verify signatures are current
journalctl -u clamav-freshclam | grep -E "Database updated|database is up-to-date" | tail -n 5
# Check for update errors
journalctl -u clamav-freshclam --since "7 days ago" | grep -i error
# Verify ClamAV daemon is healthy
systemctl status clamav-daemonMonthly monitoring:
# Review virus detection statistics
journalctl -u rspamd --since "30 days ago" | grep CLAM_VIRUS | wc -l
# Count of virus detections this month
# Check resource usage
ps aux | grep clamd
# Monitor RAM usage (should be 500MB-1GB)
# With unofficial signatures: 2.5-3GBPhishing Protection
Phishing attacks try to steal credentials by impersonating legitimate services. Rspamd can check URLs against known phishing databases to block these attempts.
What is Phishing?
Phishing characteristics:
- Fake login pages mimicking banks, email providers and social media
- Urgent messages claiming account problems or security issues
- Links to malicious sites harvesting credentials
- Often uses legitimate-looking domains (paypa1.com instead of paypal.com)
Why phishing protection matters:
- Protects users from credential theft
- Prevents account compromise
- Blocks access to malware distribution sites
- Complements other security layers
Enable OpenPhish and PhishTank Feeds
OpenPhish and PhishTank maintain public feeds of known phishing URLs that are updated continuously. These are free, reliable services that provide excellent protection.
Configure phishing detection:
cat > /etc/rspamd/local.d/phishing.conf << 'EOF'
# Phishing URL detection using public feeds
# Note: Settings below are merged into the existing phishing { } block from modules.d
# Enable OpenPhish support
openphish_enabled = true;
# OpenPhish feed URL (moved to GitHub)
openphish_map = "https://raw.githubusercontent.com/openphish/public_feed/refs/heads/main/feed.txt";
# Set to true only if using premium feed
openphish_premium = false;
# Enable PhishTank feed
phishtank_enabled = true;
EOFConfiguration explained:
- Files in
local.d/are merged into the default config frommodules.d/ - Don’t wrap settings in
phishing { }– Rspamd does this automatically openphish_enabled = true: Activates OpenPhish checkingopenphish_map: OpenPhish free feed (now hosted on GitHub)openphish_premium = false: Uses free feed (set to true only with paid account)phishtank_enabled = true: Activates PhishTank checking
Note: Phishing symbol weights were already configured in the external_services_group.conf file created earlier in the ClamAV section.
Test and restart:
# Test configuration
rspamadm configtest
# Should show: "syntax OK"
# Restart Rspamd to load phishing feeds
systemctl restart rspamd
# Verify phishing module loaded
journalctl -u rspamd --since "1 minute ago" | grep -i phish
# Check that feeds are being downloaded
sleep 60 # Wait for initial feed download
journalctl -u rspamd --since "5 minutes ago" | grep -i "openphish\|phishtank" | tail -10Expected output:
rspamd: loaded openphish map from https://www.openphish.com/feed.txt
rspamd: loaded phishtank feedMonitor Phishing Protection
Important: Phishing symbols only appear after detecting actual phishing attempts. If you haven’t received phishing emails yet, statistics will be empty – this is normal!
Verify phishing protection is active:
# Confirm phishing module is loaded
journalctl -u rspamd --since "1 hour ago" | grep "init lua module phishing"
# Should show: init lua module phishing from /usr/share/rspamd/plugins/phishing.lua
# Check OpenPhish feed loaded successfully
journalctl -u rspamd --since "1 hour ago" | grep "parsed.*elements from openphish"
# Should show: parsed 300 elements from openphish feed (or similar number)
# Verify feed cache files exist
ls -lh /var/lib/rspamd/*.map | head -5
# Should show multiple .map files (phishing databases are cached here)
# Check when feeds will refresh
journalctl -u rspamd | grep "next check at" | grep -i "openphish" | tail -1
# Shows next automatic update timeMonitor phishing detections (only shows data after phishing emails are blocked):
# Check for phishing detections
journalctl -u rspamd | grep "PHISHED_" | tail -20
# Shows blocked phishing attempts like:
# PHISHED_OPENPHISH(10.0){http://malicious-site.com}
# Count phishing blocks in last 7 days
journalctl -u rspamd --since "7 days ago" | grep -E "PHISHED_OPENPHISH|PHISHED_PHISHTANK" | wc -l
# View recent phishing URLs blocked
journalctl -u rspamd --since "7 days ago" | grep "PHISHED_" | grep -oP 'https?://[^}]+' | sort -uNote: rspamc stat and /var/lib/rspamd/ won’t show “phish” until actual phishing is detected. The map cache files use hashed names, not “phish” in the filename.
Phishing Feed Updates
Automatic updates:
- OpenPhish: Feed refreshes automatically every hour
- PhishTank: Feed refreshes automatically every hour
- No manual intervention needed
Verify feeds are current:
# Check OpenPhish feed status
journalctl -u rspamd | grep -E "openphish.*read map data" | tail -1
# Should show: read map data [number] bytes
# View feed refresh schedule
journalctl -u rspamd | grep "next check at" | grep openphish | tail -1
# Shows when next update will occurWhat This Provides
✅ Real-time phishing URL detection – URLs checked against current threat databases
✅ Free public feeds – No premium account needed
✅ Automatic feed updates – Stays current without manual work
✅ Very low resource overhead – Simple URL lookups, minimal CPU/RAM
✅ High-confidence detection – Only confirmed phishing sites in feeds
✅ Complements other filtering – Works alongside ClamAV, Bayes, Neural networks
Detection rate improvement:
- Phishing protection catches credential-stealing attempts that traditional spam filters miss
- Particularly effective against targeted “spear phishing” attacks
- Works even when phishing emails have perfect SPF/DKIM/DMARC
Collaborative Spam Detection: Razor, Pyzor, and DCC
Overview
Three collaborative spam detection networks exist that work similarly:
- Razor – Collaborative network sharing fuzzy checksums of spam
- Pyzor – Similar to Razor, using hash-based spam digests
- DCC (Distributed Checksum Clearinghouse) – Bulk email detection via checksums
How They Work
All three operate on the same principle:
- Compute a signature/checksum of incoming messages
- Query a global network: “Has this been reported as spam?”
- If many reports exist, increase spam score
- Optionally report spam back to the network
Why We’re Not Implementing Them
After extensive testing, we’ve decided not to include these tools in this guide for the following reasons:
1. Technical Implementation Issues
Razor and Pyzor are legacy Perl applications that have significant compatibility issues with modern systemd security sandboxing:
- Permission problems with Perl module access for restricted users
- Complex systemd socket activation workarounds required
- Unreliable operation in restricted security contexts
- Time-consuming troubleshooting for marginal benefit
2. Marginal Value
With the comprehensive stack we’ve already implemented, these tools add minimal additional protection:
Your current anti-spam stack:
- ✅ ClamAV – Virus and malware detection
- ✅ Bayes classifier – Personalized statistical learning
- ✅ Neural networks – Advanced pattern recognition
- ✅ Fuzzy hashing – Spam variant detection
- ✅ RBL checks – Real-time IP/domain blacklists (Part 4)
- ✅ DKIM/SPF/DMARC – Email authentication (Part 4)
Adding Razor/Pyzor/DCC provides perhaps 1-2% additional detection rate at best, which doesn’t justify the added complexity.
3. Maintenance Burden
These tools require:
- Additional services to monitor and maintain
- Regular connectivity checks to external networks
- Troubleshooting when external services have issues
- Updates when protocols or servers change
4. Network Dependencies
All three depend on external networks being available and responsive. Network issues or service outages can:
- Slow down mail processing
- Create timeout errors in logs
- Require manual intervention
DCC Specific Issues
DCC has additional complications:
- Requires accepting a commercial license (even for free tier)
- More restrictive than other tools
- No significant advantage over Razor/Pyzor to justify the extra licensing complexity
Our Recommendation
Skip Razor, Pyzor, and DCC entirely. Your mail server will have:
✅ Excellent spam detection – The core stack catches 99%+ of spam
✅ Reliable operation – No legacy tool compatibility issues
✅ Easier maintenance – Fewer moving parts to monitor
✅ Better performance – No external network queries for every message
For Advanced Users
If you still want to implement these tools despite the challenges:
Razor/Pyzor:
- The official Rspamd documentation covers systemd socket integration
- Expect to spend significant time troubleshooting Perl module permissions
- May require disabling systemd security features
DCC:
- Visit https://www.rhyolite.com/dcc/
- Review and accept the license terms
- Follow Rspamd’s DCC module documentation
Warning: We don’t provide setup instructions for these tools because they don’t meet our reliability and value standards for production mail servers.
Neural Network Learning
Rspamd includes a powerful neural network that learns spam patterns through training.
How Rspamd Neural Networks Work
Architecture:
- Input layer: Rspamd symbols (SPF, DKIM, Bayes, etc.)
- Hidden layer: Pattern recognition and feature extraction
- Output layer: Ham vs Spam classification
Training process:
- Neural network observes Rspamd’s symbol outputs for each message
- Observes final classification (spam or ham) based on existing rules
- Adjusts internal weights to better predict classification
- Over time, learns patterns and symbol correlations
- Provides prediction even for messages that partially match known patterns
What it learns:
- Symbol correlations: Which combinations indicate spam
- Pattern recognition: Message structures typical of spam/ham
- Local patterns: Specific to YOUR mail patterns
- Nuanced scoring: Not binary, provides confidence scores
Example: Neural network learns that:
SPF_PASS + DKIM_ALLOW + Bayes_Ham + No_Pyzor_Match = Definitely ham (-3.0)
SPF_FAIL + No_DKIM + Bayes_Spam + Razor_Match = Definitely spam (+5.0)Configure Neural Network
Enable and configure the neural module:
cat > /etc/rspamd/local.d/neural.conf << 'EOF'
# Neural network configuration
# Use Valkey for persistent storage
servers = "/run/valkey/valkey.sock";
# Neural network structure
train {
max_trains = 1000; # Limit training iterations per session
max_usages = 20; # Limit influences per classification
max_iterations = 25; # Maximum epochs during training
learning_rate = 0.01; # How quickly network adjusts
# Training triggers
ham_score = -1.0; # Score below -1.0 trains as ham
spam_score = 6.0; # Score above 6.0 trains as spam
# Minimum symbols required for training
min_learns = 3;
}
# Network layers
layers = [
{
# Input layer size calculated from enabled symbols
size = auto;
},
{
# Hidden layer
size = 64;
activation = "relu";
},
{
# Output layer (binary classification)
size = 1;
activation = "sigmoid";
}
];
# Symbol for neural network prediction
symbol = "NEURAL_HAM";
symbol_spam = "NEURAL_SPAM";
# Enable per-user neural networks (learns per-domain patterns)
per_user = false; # Set to true if serving multiple distinct domains
# Pre-filter - only use neural network for uncertain messages
# Messages with clear spam/ham signals skip neural (performance)
pre_filter = {
min_score = 3.0; # Only check if current score between 3-7
max_score = 7.0;
}
EOFConfiguration explained:
max_trains: Limit training per session (prevents overtraining)ham_score / spam_score: Confidence thresholds for automatic traininglearning_rate: How aggressively to adjust weights (0.01 = cautious)layers: Network structure (input → 64-neuron hidden → output)pre_filter: Only invoke neural for uncertain messages (performance)per_user = false: Single neural network for entire server (simplest)
Configure Neural Symbols
Edit /etc/rspamd/local.d/neural_group.conf:
cat > /etc/rspamd/local.d/neural_group.conf << 'EOF'
# Neural network symbol group
# Neural network symbols
symbols = {
"NEURAL_HAM" {
weight = -3.0;
description = "Neural network ham prediction";
}
"NEURAL_SPAM" {
weight = 5.0;
description = "Neural network spam prediction";
}
}
EOFEnable Neural Training
Neural networks need training data. We’ll configure automatic training from high-confidence classifications:
cat > /etc/rspamd/local.d/neural_group.conf << 'EOF'
# Neural network training configuration
# Automatic training enabled
settings {
# Train automatically from high-confidence messages
train {
# Spam threshold (messages above this train as spam)
spam_score = 12.0;
# Ham threshold (messages below this train as ham)
ham_score = -5.0;
# Maximum number of training samples to store
max_trains = 10000;
# How often to run training (seconds)
learning_rate = 0.01;
}
}
EOFTest and Restart
# Test configuration
rspamadm configtest
# Should show: "syntax OK"
# Restart Rspamd to enable neural network
systemctl restart rspamd
# Verify neural module loaded
journalctl -u rspamd --since "5 minutes ago" | grep -i neural
# Check Valkey for neural network data
valkey-cli -s /run/valkey/valkey.sock --scan --pattern "rn:*"
# Should show neural network keys after some trainingMonitor Neural Network Learning
Check training progress:
# Neural network statistics
rspamc stat | grep -i neural
# Check for neural predictions in logs
journalctl -u rspamd | grep -E "NEURAL_HAM|NEURAL_SPAM" | tail -n 20
# Valkey neural network data
valkey-cli -s /run/valkey/valkey.sock --scan --pattern "rn:*" | wc -l
# Shows count of neural network data keysWhat to expect:
- First 100-200 messages: No neural predictions (insufficient training)
- After 500+ messages: Neural starts making predictions
- After 1000+ messages: Neural predictions become reliable
- Continuous improvement as more mail is processed
Neural network symbols in action:
Example spam:
├─ SPF: +2.0 (fail)
├─ Bayes: +4.0 (spam-like)
├─ Pyzor: +2.5 (detected)
├─ NEURAL_SPAM: +5.0 (network learned this pattern is spam)
└─ Final Score: +13.5 / 15.0 → ADD_HEADER (delivered to Junk)Fuzzy Hashing
Fuzzy hashing detects near-duplicate spam messages, catching spam campaigns with minor text variations.
How Fuzzy Hashing Works
Traditional spam filters fail on variations:
Message 1: "Buy cheap watches now!"
Message 2: "Buy cheap wat ches now!" ← Spaces added
Message 3: "Buy cheap w4tches now!" ← Characters changed
Traditional filter: 3 different messages
Fuzzy hash: 3 nearly identical messages → Spam pattern!Fuzzy hashing:
- Computes fuzzy hash of message content
- Stores hash with classification (spam or ham)
- New messages compared to stored hashes
- Near matches trigger spam score
Use cases:
- Mass spam campaigns: Same message sent with minor variations
- Personalized spam: Template with name/company variations
- Evasion techniques: Spammers deliberately vary messages slightly
Configure Fuzzy Storage
Enable Valkey-based fuzzy hash storage:
cat > /etc/rspamd/local.d/fuzzy_check.conf << 'EOF'
# Fuzzy hash configuration
# Disable the default rspamd.com fuzzy rule (we're using local storage)
rule "rspamd.com" {
enabled = false;
}
# Define our local fuzzy rule
rule "local" {
# Algorithm - mumhash is fast and effective
algorithm = "mumhash";
# Backend storage
backend = "redis";
servers = "/run/valkey/valkey.sock";
# Symbol for matches
symbol = "LOCAL_FUZZY";
# Flags
read_only = false; # Allow learning new hashes
skip_unknown = true; # Skip if no hash found
# Scoring
min_score = 1.0; # Weak match
max_score = 3.0; # Strong match
# Storage settings
expire = 2592000; # 30 days (2592000 seconds)
min_length = 100; # Don't hash very short messages
}
# Automatic fuzzy learning from high-confidence messages
fuzzy_learn {
# Learn spam hashes from clear spam
spam {
min_score = 12.0;
}
# Learn ham hashes from clear ham
ham {
max_score = -3.0;
}
}
EOFConfiguration explained:
rule "rspamd.com" { enabled = false; }: Disables default public fuzzy storagealgorithm = "mumhash": Fast modern hash algorithmbackend = "redis": Store hashes in Valkey (Redis-compatible)min_score / max_score: How much to add to spam score on matchexpire = 2592000: Keep fuzzy hashes for 30 daysmin_length = 100: Don’t bother hashing very short messages
Configure Fuzzy Symbols
Edit /etc/rspamd/local.d/fuzzy_group.conf:
cat > /etc/rspamd/local.d/fuzzy_group.conf << 'EOF'
# Fuzzy hash symbol configuration
symbols = {
"LOCAL_FUZZY" {
weight = 3.0;
description = "Fuzzy hash match (near-duplicate spam)";
}
"LOCAL_FUZZY_DENIED" {
weight = 3.0;
description = "Fuzzy hash match (known spam)";
}
"LOCAL_FUZZY_PROB" {
weight = 1.5;
description = "Fuzzy hash probable match";
}
}
EOFTest and Restart
# Test configuration
rspamadm configtest
# Restart Rspamd
systemctl restart rspamd
# Verify fuzzy module loaded
journalctl -u rspamd --since "5 minutes ago" | grep -i fuzzyTrain Fuzzy Hashes
Automatic training from high-confidence messages:
cat >> /etc/rspamd/local.d/fuzzy_check.conf << 'EOF'
# Automatic fuzzy learning
fuzzy_learn {
# Learn spam hashes from high-confidence spam
spam {
min_score = 12.0; # Only learn from clear spam
}
# Learn ham hashes from high-confidence ham
ham {
max_score = -3.0; # Only learn from clear ham
}
}
EOF
# Restart to apply
systemctl restart rspamdManual training (optional):
# Train specific message as spam (creates fuzzy hash)
rspamc learn_spam < /path/to/spam-message.eml
# Train specific message as ham
rspamc learn_ham < /path/to/ham-message.emlMonitor Fuzzy Hash Effectiveness
# Check fuzzy matches
journalctl -u rspamd | grep LOCAL_FUZZY | tail -n 20
# Check Valkey fuzzy database size
valkey-cli -s /run/valkey/valkey.sock --scan --pattern "fuzzy:*" | wc -l
# Shows count of stored fuzzy hashes
# View fuzzy statistics
rspamc stat | grep -i fuzzyWhat to expect:
- Fuzzy hashes accumulate over weeks/months
- Effectiveness increases with more spam seen
- Particularly useful for recurring spam campaigns
Bayes Classifier
The Bayes classifier uses statistical analysis to learn spam patterns specific to YOUR mail.
How Bayes Classification Works
Statistical learning:
- Token extraction: Breaks messages into words/tokens
- Probability calculation: Computes P(spam|token) for each token
- Combines probabilities: Overall spam probability for message
- Local learning: Learns patterns specific to your mail
Example tokens and learned probabilities:
Token: "viagra" → P(spam) = 0.95 (95% of messages with "viagra" were spam)
Token: "meeting" → P(spam) = 0.05 (5% of messages with "meeting" were spam)
Token: "invoice" → P(spam) = 0.30 (ambiguous - depends on context)
Message: "Urgent meeting about viagra invoice"
Bayes: Combines probabilities → Overall spam scoreWhy Bayes is powerful:
- Learns YOUR specific mail patterns
- Adapts to your correspondents
- Recognizes legitimate newsletters vs spam
- Gets smarter over time with training
Configure Bayes Classifier
Enable Bayes with Valkey backend:
cat > /etc/rspamd/local.d/classifier-bayes.conf << 'EOF'
# Bayes classifier configuration
# Backend storage (Valkey/Redis)
backend = "redis";
servers = "/run/valkey/valkey.sock";
# Token settings
tokenizer {
name = "osb"; # Orthogonal Sparse Bigrams (modern algorithm)
}
# Learning settings
learn_condition = [[
return function(task, is_spam, is_unlearn)
-- Only learn from high-confidence messages
local score = task:get_metric_score('default')[1]
-- Learn spam if score > 12
if is_spam and score > 12 then
return true
end
-- Learn ham if score < -3
if not is_spam and score < -3 then
return true
end
return false
end
]];
# Autolearn from high-confidence decisions
autolearn = true;
# Minimum tokens required for learning
min_learns = 200;
# Token frequency minimum
min_token_hits = 2;
# Per-user learning (set to true for multi-tenant)
per_user = false;
# Cache settings
cache {
backend = "redis";
servers = "/run/valkey/valkey.sock";
# Cache expiration
expire = 86400; # 1 day
}
EOFConfiguration explained:
backend = "redis": Store Bayes data in Valkeytokenizer = "osb": Modern bigram tokenizer (better than simple word tokens)learn_condition: Lua function to determine when to learn- Learn spam if score > 12 (high confidence)
- Learn ham if score < -3 (high confidence)
- Skip uncertain messages (avoid poisoning classifier)
autolearn = true: Learn automatically from high-confidence messagesmin_learns = 200: Need 200+ samples before making predictionsper_user = false: Single classifier for entire server
Configure Bayes Symbols
Edit /etc/rspamd/local.d/bayes_group.conf:
cat > /etc/rspamd/local.d/bayes_group.conf << 'EOF'
# Bayes classifier symbol configuration
symbols = {
"BAYES_HAM" {
weight = -3.0;
description = "Bayes classifier: Ham (not spam)";
}
"BAYES_SPAM" {
weight = 5.0;
description = "Bayes classifier: Spam";
}
}
EOFTest and Restart
# Test configuration
rspamadm configtest
# Restart Rspamd
systemctl restart rspamd
# Verify Bayes module loaded
journalctl -u rspamd --since "5 minutes ago" | grep -i bayesInitial Bayes Training
Bayes needs initial training before making predictions. You can train it from existing mail folders:
# Source configuration
source /root/mail-server-vars.sh
# Train from existing INBOX (ham)
doveadm fetch -u info@${DOMAIN} text mailbox INBOX ALL | rspamc learn_ham
# Train from existing Junk (spam)
doveadm fetch -u info@${DOMAIN} text mailbox Junk ALL | rspamc learn_spamExpected output:
success = true
learned = 25 # Number of messages learnedCheck Bayes statistics:
# View Bayes learning stats
rspamc stat | grep -i bayes
# Check Valkey Bayes database
valkey-cli -s /run/valkey/valkey.sock --scan --pattern "bayes:*" | head -n 20
# Shows Bayes token keysMonitor Bayes Effectiveness
# Recent Bayes classifications
journalctl -u rspamd | grep -E "BAYES_HAM|BAYES_SPAM" | tail -n 20
# Bayes statistics
rspamc stat | grep -i bayes
# Shows:
# Bayes learns: ham + spam count
# Tokens learned: total token database sizeWhat to expect:
- First 200 messages: No Bayes predictions (insufficient data)
- After 200+ ham and 200+ spam: Bayes starts predicting
- After 1000+ messages: Bayes becomes highly accurate
- Continuous improvement with more training
Self-Learning Setup
The ultimate goal: Your mail server learns from USER ACTIONS automatically.
How Self-Learning Works
User teaches the system:
User receives email → Rspamd classifies it → Delivers to INBOX or Junk
↓
User reviews classification
↓
User moves message if needed:
- Move from INBOX to Junk → "This is spam, learn it!"
- Move from Junk to INBOX → "This is ham, unlearn it!"
↓
IMAPSieve detects folder change → Triggers learning script
↓
Script calls Rspamd: rspamc learn_spam or learn_ham
↓
Rspamd updates: Bayes, Neural, Fuzzy databases
↓
Future similar messages classified better ✅Benefits:
- Zero manual training needed
- Users implicitly train the system by moving mail
- System learns YOUR specific mail patterns
- Continuous improvement over time
Step 1: Enable Sieve Protocols
Enable IMAPSieve in IMAP protocol:
cat > /etc/dovecot/conf.d/20-imap.conf << 'EOF'
###
### IMAP Protocol Settings
###
protocols {
imap = yes
}
protocol imap {
mail_plugins {
quota = yes
imap_quota = yes
imap_sieve = yes # Enable IMAPSieve for automatic learning
}
mail_max_userip_connections = 50
imap_idle_notify_interval = 29 mins
}
EOFEnable Sieve in LMTP protocol:
cat > /etc/dovecot/conf.d/20-lmtp.conf << 'EOF'
###
### LMTP Protocol Settings
###
protocols {
lmtp = yes
}
protocol lmtp {
mail_plugins {
quota = yes
sieve = yes # Enable Sieve for mail delivery
}
}
EOFStep 2: Configure IMAPSieve (Dovecot 2.4 Syntax)
IMPORTANT: Dovecot 2.4 uses a block-based configuration structure. The IMAPSieve rules MUST be under protocol imap section.
cat > /etc/dovecot/conf.d/90-sieve.conf << 'EOF'
##
## Dovecot 2.4 Sieve Configuration with IMAPSieve
##
# Personal sieve script location
sieve_script personal {
path = ~/sieve
active_path = ~/.dovecot.sieve
}
# Global spam filter - runs BEFORE user scripts
sieve_script before {
sieve_script_path = /var/lib/dovecot/sieve/spam-global.sieve
}
# Maximum script size
sieve_max_script_size = 1M
# Maximum number of actions per script
sieve_max_actions = 32
##
## Sieve / IMAPSieve configuration for Dovecot 2.4
##
# Sieve plugins (block-based syntax)
sieve_plugins {
sieve_imapsieve = yes
sieve_extprograms = yes
}
# Allow external program execution
sieve_pipe_bin_dir = /usr/local/bin
# Enable required Sieve extensions (block-based syntax)
sieve_global_extensions {
vnd.dovecot.pipe = yes
vnd.dovecot.environment = yes
imapsieve = yes
}
# IMAPSieve rules MUST be under protocol imap in Dovecot 2.4
protocol imap {
# When message moved TO Junk → learn spam
mailbox Junk {
sieve_script spam {
type = before
cause = copy
path = /var/lib/dovecot/sieve/global/report-spam.sieve
}
}
# When message moved FROM Junk → learn ham
imapsieve_from Junk {
sieve_script ham {
type = before
cause = copy
path = /var/lib/dovecot/sieve/global/report-ham.sieve
}
}
}
EOFConfiguration explained:
sieve_plugins { }: Block-based syntax for enabling plugins (Dovecot 2.4 style)sieve_global_extensions { }: Block-based syntax for extensions (Dovecot 2.4 style)protocol imap { }: IMAPSieve rules MUST be inside this block in Dovecot 2.4mailbox Junk { }: Triggers when mail is copied/moved TO Junk folderimapsieve_from Junk { }: Triggers when mail is copied/moved FROM Junk folder
Dovecot 2.4 Syntax Notes:
- Uses block-based syntax:
sieve_plugins { },sieve_global_extensions { } - IMAPSieve rules must be inside
protocol imap { }block - Old
sieve_plugins = sieve_imapsieve sieve_extprogramsstyle doesn’t work - No manual
sieveccompilation needed – Dovecot auto-compiles on first use
Step 3: Create Sieve Learning Scripts
# Create Sieve directory
mkdir -p /var/lib/dovecot/sieve/global
# Create spam learning Sieve script
cat > /var/lib/dovecot/sieve/global/report-spam.sieve << 'EOF'
require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"];
if environment :matches "imap.user" "*" {
pipe :copy "rspamd-learn-spam.sh" [ "${1}" ];
}
EOF
# Create ham learning Sieve script
cat > /var/lib/dovecot/sieve/global/report-ham.sieve << 'EOF'
require ["vnd.dovecot.pipe", "copy", "imapsieve", "environment", "variables"];
if environment :matches "imap.user" "*" {
pipe :copy "rspamd-learn-ham.sh" [ "${1}" ];
}
EOFHow the Sieve scripts work:
require ["vnd.dovecot.pipe", ...]: Load required Sieve extensionsenvironment :matches "imap.user" "*": Get the IMAP usernamepipe :copy "script" [ "${1}" ]: Pipe the email to the external script
Step 4: Create Rspamd Learning Shell Scripts
# Create spam learning script
cat > /usr/local/bin/rspamd-learn-spam.sh << 'EOF'
#!/bin/sh
exec /usr/bin/rspamc -h localhost:11334 learn_spam
EOF
# Create ham learning script
cat > /usr/local/bin/rspamd-learn-ham.sh << 'EOF'
#!/bin/sh
exec /usr/bin/rspamc -h localhost:11334 learn_ham
EOF
# Make scripts executable
chmod 755 /usr/local/bin/rspamd-learn-*.sh
chown root:root /usr/local/bin/rspamd-learn-*.shScript explained:
- Reads email from stdin (piped from Sieve)
- Sends to Rspamd’s learning API on localhost:11334
execreplaces the shell process (efficient)
Step 5: Set Permissions
# Set directory ownership and permissions FIRST (important!)
chown root:vmail /var/lib/dovecot/sieve/global
chmod 770 /var/lib/dovecot/sieve/global
# Set ownership for Sieve scripts
chown root:vmail /var/lib/dovecot/sieve/global/*.sieve
chmod 640 /var/lib/dovecot/sieve/global/*.sieveWhy these permissions:
- Dovecot runs as
vmailuser and needs to read the Sieve scripts - Directory needs
770(not 750) for vmail to write compiled.svbinfiles - Scripts should be writable only by root (security)
- Critical: Directory permissions must be set before file permissions
Common permission error: If you see “Permission denied” when moving mail, the directory likely needs 770 not 750 because Dovecot needs to create .svbin compiled files.
Step 6: Test and Apply Configuration
# Test Dovecot configuration
doveconf -n | grep -E "(sieve_plugins|sieve_extensions|mailbox Junk|imapsieve_from)"
# Should show:
# sieve_plugins = sieve_imapsieve sieve_extprograms
# sieve_extensions = +vnd.dovecot.pipe
# mailbox Junk {
# imapsieve_from Junk {
# Restart Dovecot
systemctl restart dovecot
# Check Dovecot status
systemctl status dovecotStep 7: Verify Files
# Check Sieve scripts exist
ls -la /var/lib/dovecot/sieve/global/
# Should show: report-spam.sieve, report-ham.sieve
# Check learning scripts exist
ls -la /usr/local/bin/rspamd-learn-*.sh
# Should show: rspamd-learn-spam.sh, rspamd-learn-ham.sh
# Verify permissions
stat /var/lib/dovecot/sieve/global/report-spam.sieve
# Should show: Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 5000/ vmail)Step 8: Test Self-Learning
Self-learning trains Rspamd based on your corrections. Test that the pipeline works by moving mail between folders.
Step 1: Send a test email
# From another server, send a clean test email
swaks --to info@yourdomain.com \
--from test@example.com \
--server mail.yourdomain.com \
--helo example.com \
--body "Test email for self-learning - $(date)"The email should arrive in your INBOX.
Step 2: Test spam learning (move TO Junk)
- Open your email client (Thunderbird, Roundcube, webmail, etc.)
- Find the test email in INBOX
- Move it to the Junk folder (drag and drop, or right-click → Move)
Step 3: Watch the server logs
On your mail server:
# Watch for self-learning activity
journalctl -u dovecot -u rspamd -fExpected output (SUCCESS):
dovecot: imap: sieve: executed pipe action: rspamd-learn-spam.sh
rspamd: rspamd_controller_learn_fin_task: <127.0.0.1> learned message as spam: <message-id>Step 4: Verify spam was learned
# Check recent spam learning
journalctl -u rspamd --since "5 minutes ago" | grep "learned message as spam"
# Should show:
# rspamd_controller_learn_fin_task: learned message as spamStep 5: Test ham learning (move FROM Junk back to INBOX)
- In your email client, go to the Junk folder
- Move the test email back to INBOX
- Watch the logs again
Expected output (SUCCESS):
dovecot: imap: sieve: executed pipe action: rspamd-learn-ham.sh
rspamd: rspamd_controller_learn_fin_task: <127.0.0.1> learned message as ham: <message-id>What this proves:
✅ IMAPSieve detects folder moves
✅ Sieve scripts execute learning commands
✅ Rspamd receives and processes learning requests
✅ Self-learning pipeline is working end-to-end
Common Issues:
Nothing happens when moving mail:
# 1. Check Sieve scripts compiled
ls -la /var/lib/dovecot/sieve/global/*.svbin
# Should show: report-spam.svbin and report-ham.svbin
# 2. Check directory permissions
ls -la /var/lib/dovecot/sieve/global/
# Should show: drwxrwx--- root vmail
# 3. Verify IMAPSieve configuration loaded
doveconf -n | grep -A 3 "mailbox Junk"
# 4. Check for Sieve errors
journalctl -u dovecot --since "10 minutes ago" | grep -i sieve
# 5. Restart Dovecot
systemctl restart dovecotScripts execute but learning fails:
# Check Rspamd is accepting learn commands
rspamc stat | grep -i bayes
# Test learning manually
echo "test" | rspamc learn_spam
# Should complete without errors### Understanding Bayes Training Requirements
**IMPORTANT:** You'll see this message when you start learning:bayes_classify: not classified as ham. The ham class needs more training samples. Currently: 0; minimum 200 required
This is completely normal! Bayes requires:
- ✅ Minimum 200 spam messages before it can classify spam
- ✅ Minimum 200 ham messages before it can classify ham
- ✅ Both thresholds must be met for Bayes to activate
Training Progress:
Messages Learned Bayes Status:
1-199 spam → Training (not yet active)
200+ spam → Waiting for ham training
1-199 ham → Training (not yet active)
200+ ham → Waiting for spam training
200+ spam + ham ✅ ACTIVE – Bayes now classifies!
What works immediately (no training needed):
- ✅ Neural networks (learns from every message)
- ✅ Fuzzy hashing (learns from high-confidence spam/ham)
- ✅ DKIM/SPF/DMARC (external validation)
- ✅ RBL checks (real-time blacklists)
- ✅ Phishing detection (URL databases)
What needs training:
- → Bayes classifier (200+ spam AND 200+ ham)
Monitor Learning Progress
Check training status:
# View current Bayes statistics
rspamc stat | grep -A 20 "Bayes"
# Expected output shows training progress:
# Statfile: BAYES_HAM type: redis; length: 50; total hits: 50; ...
# Statfile: BAYES_SPAM type: redis; length: 150; total hits: 150; ...
# (This shows 50 ham and 150 spam learned - need 200 of each)Count learned messages:
# Count spam learning events
journalctl -u rspamd | grep "learned message as spam" | wc -l
# Count ham learning events
journalctl -u rspamd | grep "learned message as ham" | wc -l
# View recent learning activity
journalctl -u rspamd --since "24 hours ago" | grep "learned message as"Check Valkey storage:
# Bayes data (tokens)
sudo -u _rspamd valkey-cli -s /run/valkey/valkey.sock --scan --pattern "bayes:*" | wc -l
# Neural network data
sudo -u _rspamd valkey-cli -s /run/valkey/valkey.sock --scan --pattern "rn:*" | wc -l
# Fuzzy hash data
sudo -u _rspamd valkey-cli -s /run/valkey/valkey.sock --scan --pattern "fuzzy:*" | wc -l
# View Valkey memory usage
sudo -u _rspamd valkey-cli -s /run/valkey/valkey.sock INFO memory | grep used_memory_humanTroubleshooting Self-Learning
Sieve scripts not executing:
# Check Dovecot can find the scripts
doveconf -n | grep sieve_script
# Check sievec auto-compilation
ls -la /var/lib/dovecot/sieve/global/*.svbin
# Should show compiled .svbin files (created automatically)
# Check for Sieve errors
journalctl -u dovecot | grep -i sieve | tail -20Learning scripts not being called:
# Verify scripts are executable
ls -la /usr/local/bin/rspamd-learn-*.sh
# Should show: -rwxr-xr-x
# Test learning script manually
echo "Subject: Test" | /usr/local/bin/rspamd-learn-spam.sh
# Should complete without errors
# Check Rspamd is accepting learn commands
rspamc stat
# Should show: Statfile: BAYES_SPAM type: redis ...Permissions errors:
# Check vmail user can read Sieve scripts
sudo -u vmail cat /var/lib/dovecot/sieve/global/report-spam.sieve
# Should output the script content
# Check directory permissions (must be 770 for .svbin compilation)
ls -la /var/lib/dovecot/sieve/ | grep global
# Should show: drwxrwx--- ... root vmail ... global
# Fix permissions if needed
chown root:vmail /var/lib/dovecot/sieve/global
chmod 770 /var/lib/dovecot/sieve/global
chown root:vmail /var/lib/dovecot/sieve/global/*.sieve
chmod 640 /var/lib/dovecot/sieve/global/*.sieveWhat Gets Updated
When self-learning runs, Rspamd updates:
- Bayes classifier – Token statistics for spam/ham
- Neural networks – Weight adjustments for pattern recognition
- Fuzzy hashes – If configured with automatic learning
Check Valkey storage:
# Bayes data
sudo -u _rspamd valkey-cli -s /run/valkey/valkey.sock --scan --pattern "bayes:*" | wc -l
# Neural data
sudo -u _rspamd valkey-cli -s /run/valkey/valkey.sock --scan --pattern "rn:*" | wc -l
# Fuzzy data
sudo -u _rspamd valkey-cli -s /run/valkey/valkey.sock --scan --pattern "fuzzy:*" | wc -lImportant Notes
✅ No manual sievec compilation needed – Dovecot 2.4 auto-compiles .sieve → .svbin on first use
✅ Both COPY and MOVE work – The cause = copy setting triggers on both operations
✅ Works with all IMAP clients – Thunderbird, Outlook, mobile apps, webmail
✅ Training is immediate – Rspamd updates happen as soon as mail is moved
❌ Don’t use old Dovecot 2.3 syntax – The imapsieve_mailbox1_name style doesn’t work in 2.4
❌ Don’t put settings in plugin { } blocks – Dovecot 2.4 uses top-level settings
❌ Don’t manually compile Sieve scripts – Let Dovecot handle compilation
Summary and Next Steps
Continue with:
Part 6: Rspamd Web Interface & Roundcube – Web-based monitoring, management, and webmail
Complete Mail Server Journey
You’ve now built a professional, production-ready mail server:
Part 1: Prerequisites and preparation
Part 2: Core mail server (Postfix, Dovecot, PostfixAdmin)
Part 3: Intrusion prevention (CrowdSec)
Part 4: Professional spam filtering (Rspamd, DKIM, Hall of Fame)
Part 5: Advanced filtering (ClamAV, Neural Networks, Bayes, Learning)
Part 6: Web interfaces (coming soon)
Your mail server now rivals commercial solutions in capability and security!
