lsr: ls with io_uring | slacker news

Explicit Vulnerabilities (Documented CVEs and Exploits)

These are actual discovered vulnerabilities, typically assigned CVEs and often exploited in sandbox escapes or privilege escalations: 1. CVE-2021-3491 (Kernel 5.11+)

    Type: Privilege escalation

    Mechanism: Failure to check CAP_SYS_ADMIN before registering io_uring restrictions allowed unprivileged users to bypass sandboxing.

    Impact: Bypass of security policy mechanisms.

2. CVE-2022-29582

    Type: UAF (Use-After-Free)

    Mechanism: io_uring allowed certain memory structures to be freed and reused improperly.

    Impact: Local privilege escalation.

3. CVE-2023-2598

    Type: Race condition

    Mechanism: A race in the io_uring timeout code could lead to memory corruption.

    Impact: Arbitrary code execution or kernel crash.

4. CVE-2022-2602, CVE-2022-1116, etc.

    Type: UAF and out-of-bounds access

    Impact: Escalation from containers or sandboxed processes.

5. Exploit Tooling:

    Tools like io_uring_shock and custom kernel exploits often target io_uring in container escape scenarios (esp. with Docker or LXC).

Implicit Vulnerabilities (Architectural and Latent Risks)

These are not necessarily exploitable today, but reflect deeper systemic design risks or assumptions. 1. Shared Memory Abuse

    io_uring uses shared rings (memory-mapped via mmap) between kernel and user space.

    Risk: If ring buffer memory management has reference count bugs, attackers could force races, data corruption, or misuse stale pointers.

 2. User-Controlled Kernel Pointers

    Some features allow user-specified buffers, SQEs, and CQEs to reference arbitrary memory (e.g. via IORING_OP_PROVIDE_BUFFERS, IORING_OP_MSG_RING).

    Risk: Incomplete validation could allow crafting fake kernel structures or triggering speculative attacks.

 3. Speculative Execution & Side Channels

    Since io_uring relies on pre-submitted work queues and long-lived kernel threads, it opens timing side channels.

    Risk: Predictable scheduling or timing leaks, esp. combined with hardware speculation (Spectre-class).

 4. Bypassing seccomp or AppArmor Filters

    io_uring operations can effectively batch or obscure syscall behavior.

    Example: A program restricted from calling sendmsg() directly might still use io_uring to perform similar actions.

    Risk: Policy enforcement tools become less effective, requiring explicit io_uring filtering.

 5. Poor Auditability

    The batched and asynchronous nature makes logging or syscall audit trails incomplete or confusing.

    Risk: Harder for defenders or monitoring tools to track intent or detect misuse in real time.

 6. Ring Reuse + Threaded Offload

    With IORING_SETUP_SQPOLL or IORING_SETUP_IOPOLL, I/O workers can run in kernel threads detached from user context.

    Risk: Desynchronized security context can lead to privileged operations escaping sandbox context (e.g., post-chroot but pre-fork).

 7. File Descriptor Reuse and Lifecycle Mismatch

    Some operations in io_uring rely on fixed file descriptors or registered files. Race conditions with FD reuse or closing can cause inconsistencies.

    Risk: UAF, type confusion, or logic bombs triggered by kernel state confusion.

 Emerging Threat Vectors
 eBPF + io_uring

    Some exploits chain io_uring with eBPF to do arbitrary memory reads or writes. e.g., io_uring to perform controlled allocations, then eBPF to read or write memory.

 io_uring + userfaultfd

    Combining userfaultfd with io_uring allows very fine-grained control over page faults during I/O — great for fuzzing, also for exploit primitives.