| | |

CVE-2026-31431: Copy Fail, the Linux Exploit That Hid in Plain Sight for Nine Years

On April 29, 2026, a vulnerability was publicly disclosed that affects virtually every Linux system built since 2017. CVE-2026-31431, nicknamed “Copy Fail,” is a local privilege escalation flaw in the kernel’s cryptographic subsystem. It allows any unprivileged user to gain full root access using a script of less than 800 bytes.

No race conditions. No complex timing windows. No crash-prone retry loops. Just a clean, straight-line logic flaw that has been sitting in the kernel for nearly a decade.

If you run Linux in production, whether on bare metal, in virtual machines, or inside containers, this one deserves your attention.

What Copy Fail actually does

The flaw lives in the algif_aead module, which is part of the kernel’s AF_ALG interface. AF_ALG allows unprivileged userspace programs to access hardware-accelerated cryptographic operations directly through the kernel. It has been part of the default kernel configuration across all major distributions for years.

In 2017, a performance optimization was introduced (commit 72548b093ee3) that made AEAD encryption operations work “in-place,” meaning the source and destination buffers point to the same memory. That optimization, combined with the way splice() passes page cache references directly into crypto operations, created an unintended write path.

Here is the core of the problem: when a readable file is spliced into an AF_ALG socket, the kernel hands over references to the file’s page cache pages instead of making copies. The AEAD operation then writes its output back into those same pages. The result is that an unprivileged user can write four controlled bytes into the page cache of any readable file on the system.

Four bytes doesn’t sound like much, but the page cache is the in-memory representation of files that the kernel trusts implicitly. By targeting setuid binaries like /usr/bin/su, an attacker can modify the cached version of the binary in memory without changing anything on disk. The next time that binary is executed, it runs the attacker’s modified version, and the attacker gets a root shell.

This is what makes Copy Fail particularly elegant and dangerous. Traditional file integrity monitoring tools are blind to it because the on-disk file remains untouched.

Why this is worse than Dirty COW and Dirty Pipe

Linux has had high-profile privilege escalation vulnerabilities before. Dirty COW (CVE-2016-5195) required winning a race condition in the virtual memory subsystem and often needed multiple attempts. It sometimes crashed the system entirely. Dirty Pipe (CVE-2022-0847) was version-specific and required precise pipe buffer manipulation.

Copy Fail is neither of those things. It triggers reliably on the first attempt, works across all major distributions without modification, and does not crash the system. The proof-of-concept exploit was published within 24 hours of disclosure and has been confirmed to work on Debian, Ubuntu, Red Hat, SUSE, Arch, and Fedora. Attacks in the wild were reported almost immediately.

The CVSS score is 7.8 (High), but the practical severity is arguably higher because of the ease of exploitation and the breadth of affected systems.

The container escape problem

Copy Fail is not just a local privilege escalation. It is a container escape primitive.

The Linux page cache is shared across all processes on a host, including processes running inside different containers. If an attacker compromises an unprivileged container, they can use Copy Fail to corrupt setuid binaries that are visible to other containers or to the host itself. In environments where containers share a common base image layer, a compromised container can modify cached binaries from that shared layer, effectively breaking out of its isolation boundary.

This has direct implications for Kubernetes clusters, CI/CD runners, and any multi-tenant environment where untrusted workloads run in containers on shared nodes. The assumption that container isolation provides a meaningful security boundary takes another hit.

For those of us running multi-container architectures on platforms like Incus or LXD, the shared page cache is an architectural reality. Containers on the same host share the same kernel, the same page cache and therefore the same attack surface for this vulnerability.

How to check if you are vulnerable

First, determine whether the algif_aead module is built into your kernel or loaded as a module:

grep CONFIG_CRYPTO_USER_API_AEAD /boot/config-$(uname -r)

If the output is CONFIG_CRYPTO_USER_API_AEAD=m, the module is loadable and can be blacklisted as an immediate mitigation. If it shows CONFIG_CRYPTO_USER_API_AEAD=y, the module is compiled directly into the kernel and cannot be disabled without a kernel update.

Important: on RHEL-family distributions (Red Hat, AlmaLinux, Rocky, CloudLinux), the module is typically built into the kernel. The widely circulated modprobe workaround will appear to succeed, but does nothing. The commands run without errors, leaving the system completely unprotected. Do not rely on this workaround without verifying it actually worked.

Immediate mitigation

If the module is loadable (=m), blacklist it:

echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif-aead.conf
rmmod algif_aead 2>/dev/null

Verify the module is no longer loaded:

lsmod | grep algif_aead

This mitigation has no impact on the vast majority of production workloads. It does not affect dm-crypt/LUKS, kTLS, IPsec, OpenSSL, GnuTLS, NSS, or SSH. The only software affected would be applications explicitly configured to use the AF_ALG interface for AEAD ciphers, which is rare outside of specialized hardware crypto offload scenarios. You can check for such applications with:

lsof 2>/dev/null | grep AF_ALG

For containerized environments, CERT-EU recommends blocking AF_ALG socket creation via seccomp policies on all container workloads, regardless of patch status.

Patching

The upstream fix was committed on April 1, 2026 (commit a664bf3d603d). The patch reverts the 2017 in-place optimization, forcing AEAD operations to operate out-of-place again, with a separate source and destination buffer.

As of May 5, 2026, Debian has released patched kernels for both Bookworm (12) and Trixie (13). Arch Linux, Fedora, and current Ubuntu releases were already shipping kernel versions new enough to be unaffected. Red Hat Enterprise Linux and older Ubuntu LTS releases may still be pending.

Check for available updates:

apt update && apt list --upgradable 2>/dev/null | grep linux-image

If a new kernel is available, install it and reboot. On Debian, you can verify the fix is included by checking the changelog:

apt changelog linux-image-$(dpkg -l | grep 'ii.*linux-image-[0-9]' | awk '{print $2}' | sort -V | tail -1) 2>/dev/null | grep CVE-2026-31431

A note on reboot detection: unlike Ubuntu, Debian does not create /run/reboot-required by default after kernel updates. If you rely on that file to detect pending reboots, install needrestart and use needrestart -k to check whether the running kernel matches the latest installed version.

Walking through the fix on a production server

Here is what the process looked like on one of our Debian 12 (Bookworm) servers running on ARM64 hardware.

First, check the module type:

root@web04:~# grep CONFIG_CRYPTO_USER_API_AEAD /boot/config-$(uname -r)
CONFIG_CRYPTO_USER_API_AEAD=m

The module is loadable, so the blacklist approach works. Apply it immediately:

root@web04:~# echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif-aead.conf
root@web04:~# rmmod algif_aead 2>/dev/null
root@web04:~# lsmod | grep algif_aead
root@web04:~#

No output from lsmod means the module is unloaded and blocked from reloading. The server is now mitigated while we check for a patched kernel.

Next, check whether a fixed kernel is already installed but not yet running:

root@web04:~# uname -r
6.1.0-42-arm64
root@web04:~# dpkg -l | grep 'ii.*linux-image-[0-9]' | awk '{print $2}'
linux-image-6.1.0-42-arm64
linux-image-6.1.0-44-arm64
linux-image-6.1.0-45-arm64

Three kernels installed, running the oldest. Check if the newest one contains the fix:

root@web04:~# apt changelog linux-image-6.1.0-45-arm64 2>/dev/null | grep CVE-2026-31431
    - crypto: algif_aead - Revert to operating out-of-place (CVE-2026-31431)

The fix is present in 6.1.0-45. A reboot into the new kernel completes the remediation:

root@web04:~# reboot
...
root@web04:~# uname -r
6.1.0-45-arm64

The entire process, from checking vulnerability status to full remediation, took a few minutes of work plus the reboot window. There is nothing exotic about it. But it requires knowing that the vulnerability exists, understanding whether the mitigation actually works on your specific distribution, and having the access and process to act quickly.

Lessons from Copy Fail

Every few years, a vulnerability comes along that forces us to re-examine assumptions about Linux security. Copy Fail is one of those.

The flaw was not the result of a single careless commit. It emerged from the interaction of several individually reasonable design decisions made over many years: unprivileged access to kernel crypto, zero-copy data movement via splice, and an in-place performance optimization for AEAD operations. Each made sense in isolation. Together, they created a privilege escalation path that went unnoticed for nearly a decade.

For organizations running self-hosted infrastructure, particularly in Europe where data sovereignty requirements often mean you cannot simply hand off responsibility to a hyperscaler’s security team, this is a reminder of what operational security actually demands. It is not enough to deploy infrastructure. You have to maintain it, monitor it, and respond quickly when something like this surfaces.

The window between public disclosure and active exploitation was measured in hours, not days. If your patching process operates on a monthly cycle, you were exposed for the entire duration.

Further reading

Similar Posts