
Linux gives you two primary mechanisms for handling asynchronous I/O: epoll, the veteran, and io_uring, the newcomer. If you are building anything that handles many concurrent connections, understanding the difference between these two systems matters for your performance ceiling.
Why Asynchronous I/O Exists
Every time your application reads from or writes to a network socket, a file, or any I/O descriptor, the operating system has to switch between user mode and kernel mode. Each switch takes time. When you have thousands of connections, those context switches add up fast and eat into your throughput.
Asynchronous I/O solves this by batching operations and reducing the number of times your code has to cross into kernel territory. The two Linux implementations take fundamentally different approaches to this problem.
Epoll: The Readiness Model
Epoll has been the standard since it landed in the Linux kernel in 2002 (kernel 2.5.44). It replaced the older select() and poll() calls, which could not scale well with large numbers of file descriptors.
The way epoll works is straightforward: you register file descriptors with an epoll instance, and the kernel tells you when any of them are ready for reading or writing. But here is the catch: epoll only tells you that I/O is possible, not that it is done. Your application still has to issue separate read() or write() syscalls to actually move the data.
In practice, this means two syscalls per I/O event: one to wait for the event with epoll_wait(), and one to perform the actual read or write. Each of these syscalls triggers a context switch. At scale, with tens of thousands of active connections, those context switches become the bottleneck.
Io_uring: The Completion Model
Io_uring appeared in the Linux kernel in 2019 (version 5.1), about 17 years after epoll. It takes a fundamentally different approach: instead of telling you when I/O is possible, it tells you when I/O is complete.
The mechanism uses two ring buffers shared between user space and the kernel: a submission queue and a completion queue. Your application writes I/O requests to the submission queue, and the kernel writes completed results to the completion queue. No context switches per operation during steady state.
By default, you still need to call io_uring_enter() to tell the kernel to process a batch of submissions and collect completions. But one call handles an entire batch, not a single operation. For truly zero-syscall operation, the IORING_SETUP_SQPOLL flag spins up a dedicated kernel thread that polls the submission queue continuously, at the cost of burning one CPU core.
Epoll vs io_uring: The Practical Differences
Architecture: epoll is a readiness model (tells you when you can do I/O), io_uring is a completion model (tells you when I/O is done). This single difference drives most of the performance gap.
Syscall overhead: With epoll, you pay two syscalls per I/O event. With io_uring, you pay one syscall per batch. With SQPOLL, you pay effectively zero during steady-state operation.
Batching: Epoll can return multiple events in one epoll_wait() call, but you still have to process each one individually with separate syscalls. Io_uring lets you submit and complete entire batches in one shot.
Kernel support: Epoll works on any Linux kernel from 2.5.44 onward. Io_uring requires kernel 5.1 or newer, which was released in 2019. Most production systems now run kernels new enough for io_uring, but embedded or older enterprise distributions may not.
Complexity: Epoll’s API is simpler. You create an instance, add file descriptors, and wait for events. Io_uring requires setting up shared memory rings, managing submission and completion queue entries, and dealing with more configuration options. The liburing helper library reduces this complexity but does not eliminate it.
When to Use Each
Use epoll when: your target environment has an old kernel, your application has modest connection counts (under a few thousand), or you want the simplest possible asynchronous I/O code. Most existing Linux network servers use epoll, and the ecosystem of tools and documentation around it is mature.
Use io_uring when: you are building a high-performance server handling tens of thousands of concurrent connections, you need file I/O (where io_uring also shines by allowing async reads and writes without threads), or you are starting a new project on modern Linux. The performance gains over epoll can be 2-3x for I/O-heavy workloads, depending on the specific use case.
Real-World Performance Numbers
The author of the original comparison tested both approaches by building a reverse proxy server called TinyGate. The epoll-based version showed a dramatic improvement over a naive threaded approach, but still lost to nginx and haproxy in benchmarks. Switching to io_uring required a full rewrite from scratch, but the resulting version approached the performance of established tools.
In benchmarks from the io_uring project itself, the SQPOLL mode showed up to 3x throughput improvements over epoll for network I/O at high connection counts, with lower tail latency due to fewer context switches.
The Tradeoff
Io_uring is more powerful but harder to use correctly. The ring buffer management, memory ordering requirements, and error handling across asynchronous boundaries make it significantly more complex than epoll. The liburing library helps, but you still need to understand the underlying mechanics to avoid subtle bugs.
For most developers building Linux servers in 2026, epoll remains the default choice for network I/O. But if you are pushing for maximum performance, especially in proxy servers, load balancers, or high-frequency trading systems, io_uring is the tool that gets you there.
