Async Internals
async fn is sugar. That sentence is the entire chapter, expanded.
You probably know async fn is sugar already. You may have seen the trait Future and noticed that async blocks evaluate to something implementing it. You have used .await and noticed that the function suspends and resumes. You have written tokio::main or async-std::main or whatever your runtime is and gotten on with your life.
This chapter is the part where we don’t get on with our life. We desugar everything, look at what the compiler is generating, and understand why every async lifetime error in the next chapter is the consequence of a mechanical transformation that the compiler does on your behalf and then refuses to apologize for.
What Future actually is
#![allow(unused)]
fn main() {
pub trait Future {
type Output;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}
pub enum Poll<T> {
Ready(T),
Pending,
}
}
Three things to notice.
poll takes Pin<&mut Self>, not &mut self. This is the entire reason Pin exists in the standard library, and we will spend chapter 6 on it. For now: Pin<&mut Self> is &mut self plus a promise that the futre will not be moved in memory. Some futures need that promise to be sound. Most don’t, but the trait has to assume the strongest constraint.
poll takes a Context<'_>, which contains a Waker. The waker is the future’s link back to the executor. When the future returns Poll::Pending, it is responsible for arranging — somehow — for cx.waker().wake() to be called when the future is ready to make progress. If it doesn’t, the executor will never poll it again, and the future will hang forever. The future is responsible for waking itself. The executor does not poll on a timer.
poll returns Poll<T>, not a result. A future is not “done” or “errored.” A future is “ready with a value” or “still working.” If you want errors, your T is a Result. The future machinery does not know about errors.
That’s the trait. Three lines, three subtle traps each.
The state machine
Now the desugaring. Take this:
#![allow(unused)]
fn main() {
async fn fetch_and_parse(url: &str) -> Result<Data, Error> {
let body = http_get(url).await?;
let parsed = parse(&body)?;
Ok(parsed)
}
}
What the compiler generates is, conceptually:
#![allow(unused)]
fn main() {
fn fetch_and_parse<'a>(url: &'a str) -> impl Future<Output = Result<Data, Error>> + 'a {
enum State<'a> {
Start { url: &'a str },
WaitingForGet { fut: HttpGetFuture<'a> },
Done,
}
struct FetchAndParseFuture<'a> { state: State<'a> }
impl<'a> Future for FetchAndParseFuture<'a> {
type Output = Result<Data, Error>;
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
loop {
match &mut self.state {
State::Start { url } => {
let fut = http_get(url);
self.state = State::WaitingForGet { fut };
}
State::WaitingForGet { fut } => {
// This Pin projection is the unsafe part; pin-project handles it for you.
let pinned_fut = unsafe { Pin::new_unchecked(fut) };
match pinned_fut.poll(cx) {
Poll::Pending => return Poll::Pending,
Poll::Ready(Err(e)) => {
self.state = State::Done;
return Poll::Ready(Err(e));
}
Poll::Ready(Ok(body)) => {
let parsed = parse(&body);
self.state = State::Done;
return Poll::Ready(parsed);
}
}
}
State::Done => panic!("polled after completion"),
}
}
}
}
FetchAndParseFuture { state: State::Start { url } }
}
}
This is roughly what async fn produces. The actual generated code is more efficient, more carefully structured, and uses internals that aren’t public, but the shape is this: an enum with a variant per await point, plus a fixed initial state and a terminal “done” state, plus a poll method that drives state transitions.
A few things become legible.
The await is a match on poll. When you write expr.await, the compiler generates a state transition: enter the “waiting” state with expr as the inner future, then loop polling it; when it returns Pending, return Pending; when it returns Ready(v), take v and continue.
The state machine contains the inner futures. When you let fut = http_get(url).await, the HttpGetFuture you got back from http_get is stored as a field of the outer state machine for as long as you’re awaiting it. This is the first hint at why Pin matters: that inner future has its own internal pointers, possibly into its own state, and those pointers would dangle if the outer state machine moved in memory.
The state machine borrows from its arguments. Notice the 'a lifetime parameter. The future returned by fetch_and_parse(url) borrows from url for its entire life. If url goes out of scope before the future completes, the future is invalid. The lifetime 'a is part of the future’s type and gets propagated through every level of nesting. This is the main source of async lifetime pain in chapter 5.
Local variables become enum fields. Any local that is alive across an await point gets stored in the state machine. This is why holding references across await points is the central source of async lifetime errors: the reference, and the data it borrows, both have to be live for as long as the future is, and the type system has to prove that.
What poll actually does
The executor’s job is to call poll on a future repeatedly until it returns Ready. That is the entire executor contract. The runtime is just code that calls poll. There is no magic in tokio or async-std or smol; they are libraries with executors, schedulers, and reactors that arrange for poll to be called at the right times.
When poll returns Pending, the future has registered its Waker somewhere — with the OS (epoll, kqueue, IOCP), with a timer wheel, with a channel — and that registration will eventually trigger a call to waker.wake(), which tells the executor “this future is ready, please call poll again.” The executor, when it gets that signal, schedules the future to be polled. It does not poll immediately; it puts the future into a runqueue. Eventually, some thread the executor controls picks the future up and polls it again.
This is cooperative scheduling. Nothing preempts a future. If your future runs a tight CPU-bound loop without any .await, it will run that loop to completion on the executor thread, blocking every other future scheduled on that thread. This is the “blocking the executor” problem and the reason tokio::task::spawn_blocking exists — to move CPU-bound work off the async threadpool and onto a worker pool that the executor doesn’t depend on.
The flip side: an await is a yield point. When your code reaches await, it gives the executor an opportunity to run something else. Between await points, your code runs synchronously and uninterruptibly on the executor thread. This has two consequences:
-
Code between
awaitpoints is de facto atomic from the executor’s perspective. If you take aMutexGuardand don’tawaitwhile holding it, no other future on this thread can take the same lock. (Other threads can; this isn’t a real critical section.) -
Code between
awaitpoints is not atomic from a multi-threaded perspective. If your future isSendand the executor is multi-threaded, the future can be moved between threads at everyawait. This meansawaitpoints are also where data races can sneak in if your shared state isn’t properly synchronized.
The single most important mental model in async Rust is: await is where things happen. Before await, you have set up state. After await, you have new state. At await, the world might change underneath you. Schedule, suspend, race, drop — all of these are at await. Code between awaits is boring and synchronous.
Why a runtime is required
Rust’s standard library defines Future, Poll, Context, and Waker. It does not provide an executor. There is no std::async::run_until_complete. You cannot run a Future with the standard library alone.
This is a deliberate design choice. The reasons:
-
Executor strategy is a deep technical decision. Should it be single-threaded or multi-threaded? Work-stealing or fixed assignment? Should I/O be epoll or io_uring or completion-based? Should there be priority queues? These are valid questions with conflicting answers, and standardizing one would foreclose the others.
-
Tying async to a single executor would freeze its evolution. The async ecosystem moved fast in 2018-2021 and is still moving. Being able to swap
tokioforasync-stdforsmolfor whatever comes next has been important to that motion, even if in practice 90% of production code usestokio. -
The standard library has very strong stability guarantees. Putting an executor in
stdwould lock its design forever. The Rust project chose not to.
The cost is that “hello world” in async Rust is tokio = "1" or async-std = "1" in your Cargo.toml, plus a #[tokio::main] macro on your main. It is, by some lights, embarrassing that this is the case. It is also the reason you can run async Rust on bare metal, in a kernel, in a microcontroller, in a browser via WebAssembly — wherever an executor can be written, async Rust can run.
Why await desugars to a state machine
This is worth one more pass, because the consequences are everywhere.
The naive way to implement await would be via threads. Each async fn becomes a thread; await becomes a thread sleep. The runtime can multiplex many tasks onto fewer OS threads via cooperative scheduling. This is the “stackful coroutine” or “green thread” model. Go uses it. Erlang uses it. It works.
Rust does not use it. Rust uses stackless coroutines, which is what the state machine desugaring produces. The differences:
- A stackful coroutine has its own stack, allocated separately, with a fixed maximum size. Switching between coroutines means switching stacks, which involves an actual context switch.
- A stackless coroutine has no stack of its own. Its “stack” is a fixed-size struct containing exactly the local variables that are live across yield points. Switching is a function return.
The stackless model has two big wins: it is much cheaper (no separate stack allocation, no context switch) and it is composable (a future is just a value of some type implementing Future; you can pass it around, store it in a struct, await it later, drop it without running it). The cost is that the entire coroutine has to fit in a fixed-size struct that the compiler computes at compile time, which means the compiler has to know, ahead of time, what every local variable’s type is. Including the types of nested futures, which are themselves state machines, which are themselves structs containing their locals’ types.
This is why your async function’s future type is enormous, deeply nested, and has a name that is both unspellable and visible in error messages. It is also why error messages mention sizes — sometimes futures get too big to fit on the stack, and you have to Box::pin them to move them to the heap.
What this chapter set up
You now have:
- A model of
Futureas “a thing the executor callspollon.” - A model of
awaitas “yield to the executor, with the inner future’s state stored in the outer state machine.” - A model of
async fnas “compiler-generated state machine that borrows from its arguments and stores its locals across yield points.” - A model of the executor as “code that calls
polland arranges forWakers to schedule subsequent polls.”
That model is enough to understand the next chapter, which is about what happens when the things you store across yield points have lifetimes, and about why Send-ness propagates through the entire state machine in ways that surprise you.
Sources
- The Asynchronous Programming in Rust book — especially chapter 2, “Under the Hood.”
- Aaron Turon’s Zero-cost futures in Rust (2016), which laid out the design that became today’s async.
- Withoutboats’s posts on Pin and the future of async, which are unfortunately scattered but collectively the best explanation of why the design landed where it did.
- The original Future RFC (RFC 2592) and
async/awaitRFC (RFC 2394).