Bcache does practically everything asynchronously, so in the process of
writing it I implemented a generic mechanism that was the culmination of
some old ideas I had. I don't know who else would need them, but they
are rather nifty so perhaps I ought to try and document them.

They're called closures, because they are almost but not entirely unlike
real closures:

struct closure;
closure_init(struct closure *cl, struct closure *parent);
closure_get(struct closure *cl);
closure_put(struct closure *cl, struct workqueue_struct *wq);

As you might've guessed they embed a refcount. This refcount corresponds
to the number of things the closure is waiting on; when it hits 0,
Something Happens. Closure_init() sets it to 1.

Why 1? Because a closure corresponds to an execution context; that is,
they can be running (or not running) and when they are running, the
closure itself is waiting on whoever's running it to finish.

Some examples should make this clearer. First, let's do some io
synchronously:

bio_submit_c(bio1, cl);
bio_submit_c(bio2, cl);
closure_sync(cl);

Closure_sync() sleeps until the refcount hits 1 - when it returns, we're
not waiting on anything (except for ourselves).

The asynchronous case is where it gets interesting:

bio_submit_c(bio1, cl);
bio_submit_c(bio2, cl);
return_f(cl, next_fn);

return_f() sets the next function to run, calls closure_put() and
returns (Yes, it's a macro that returns the calling function).

The end result is that next_fn() will be called when everything the
closure was waiting on has finished, with the closure itself as the
single argument.

That's why you have a refcount when you're running a closure - otherwise
next_fn() could run while your're still running.

And what's that good for? Well, sometimes you can't just sleep; if
you're waiting on IO but you were called from generic_make_request(),
that IO is never going to start until you return. Or if you're running
out of a workqueue, you're blocking other work items until you return.

The rest of the interface is for waiting on stuff.  If you want to wait
on something - say a bio to complete - you could stick a pointer to your
closure somewhere so the bio_endio() function can call closure_put() on
it. But if you have some event that many closures could be waiting on,
you need a list:

typedef closure_list_t;
bool closure_wait(closure_list_t *list, struct closure *cl);
void closure_run_wait(closure_list_t *list, struct workqueue_struct *wq);
closure_wait_on(list, wq, c, condition)

A closure can be on one wait list at a time (as they're implemented as a
singly linked list, with the next pointer part of the closure). A wait
list should correspond to an event - say, in bcache I've got one for
when the journal fills up so whoever can wait until there's room again.

closure_wait() will call closure_get(), then add the closure to the
waitlist - or return false if the closure was already on a waitlist.

closure_run_wait() will run everything on a given waitlist, and run it
out of the workqueue you pass it.

They're both lockless, the caller doesn't have to care about
synchronization, and they can both be run from hardirq context.

closure_wait_on() is a macro that waits on arbitrary events (much like
wait_event(). If your closure was in blocking mode, it additionally
sleeps until the event is true - otherwise it just adds you to the wait
list, and returns whether or not the condition you were waiting on was
true.

There's also closure_wait_on_async() which always waits asynchoronously
- maybe you have to check the condition in a spinlock.

Lastly, at the start the prototype for closure_init() took a second
argument...

Closures can wait on other closures - they implement a spaghetti stack.
I will leave the implications of that to the imagination.

*****************

Closures are meant to be lightweight and fast - they're 48 bytes on 64
bit machines, including the embedded workqueue.

They're also meant to be embedded into other structures, as is commonly
done in the kernel. You can (and should) think of that structure as a
stack frame; variables in your struct corresponding to local variables
and the struct closure containing what corresponds to the return
address.

I mentioned that closures can wait on other closures; more precisely, a
closure can be a child of another closure - in which case, it holds a
refcount on the parent.

That refcount is taken when you pass a non null parent to closure_init()
and it's dropped when a closure goes away. I don't recommend using the
parent field to do anything more interesting - reseating it or
manipulating the refcount directly.

A closure goes away when cl->remaining hits 0 and there's no next
function to run. (Conceptually - it doesn't free itself. If you want it
to free itself, make a destructor the last function it runs.)

Previously I mentioned return_f(), but I didn't explain what it really
means. Remember that return_f() takes a closure and the next function to
run - and the closure itself represents a stack frame complete with
return address.

Then return_f() is really a tailcall that also passes the return
address: i.e. return_f() is a function call in continuation passing
style.

Other random bits:
I mentioned closures can be in blocking mode. What's this good for?

I'll give an example from bcache. When we traverse the btree, we call
get_bucket() to get a btree node - looking it up in the cache, locking
it, and potentially reading it in from disk.

If we're doing a cache lookup for a read, we're processing a bio out of
generic_make_request() - which means if we had to read a btree node in
from disk we can't just sleep in get_bucket(). So get_bucket() adds us
to the btree node's wait list, starts the IO and returns -EAGAIN;
request_read() then calls return_f(cl, __request_read) and when it runs
again the btree node will be in memory.

However, if we're inserting into the b-tree we don't want to just bail
out, unlocking the b-tree nodes we were working on if we need to block
on something; insertions run out of a dedicated workqueue so they can
sleep. So insertions run in blocking mode, and when get_bucket() (and
more importantly, btree_alloc()) call closure_wait_on() it magically
does the right thing.

BcacheWiki: ClosureDocs (last edited 2011-08-01 03:37:47 by Kent)