Was writing all this stuff in an email, figured I should save it:
=Space efficiency=
As far as memory usage... The btree isn't pinned in memory except for the ~10 nodes in the lru. The pinned memory is all per bucket metadata; you want your bucket size to be equal to the erase block size so it'll be probably 128k or 512k; 128k with 128 gb of cache gives us 1M buckets.
I've got it divided up into a couple structs (partly to reduce padding, cache behavior should be better to, not that that'll matter when disk io is a factor). On 64 bit the total overhead per bucket should be 33 bytes, if I haven't missed anything. Obviously not as good as yours, but still small enough it shouldn't be an issue. The single biggest thing is the heap at a pointer and a long per bucket, if anyone really cares 8 bytes could be saved if they aren't going to have more than 2 billion buckets
The btree size is also important since you want it to be in memory. Keys are 16 bytes and can reference anywhere from one sector to an entire bucket. I don't know of anything that does much IO in sub page sized units, so in practice everything will be page sized granularity, and since you want to cache random IO and ignore sequential IO average key size is ideally going to be on the lower end. It does merge keys when possible, and it keeps multiple buckets open for writing simultaneously so it can keep contiguous or related data together, so average key size is going to depend strongly on workload.
So a full cache will give you 4 mb of keys per 1 gb of cache worst case, if they're all 4k. 128 gb of cache gives half a gig of btree.
I haven't checked yet how space efficient the btree is in practice. Since old keys are removed by garbage collection, we don't have any kind of bound on how much of a specific btree bucket is used by non stale keys. However, garbage collection is performed every so often to find data that's been rewritten (there's other constraints on when we have to gc, but this is the one that we hit in practice), so on average not more than a quarter of the keys in the btree should be stale. (Actually, that's not quite true - right now gc doesn't rewrite leaf nodes except to keep generation counters from rolling over, but checking the percentage of stale keys is trivial, I just hadn't thought of it till now
Then garbage collection can leave buckets less than half full - I haven't implemented btree merging because I don't think it'll really be needed in practice, mostly empty btree buckets will go away eventually for various reasons. But if it does turn out to be an issue it'll be relatively easy to add to garbage collection.
So at a guess I'd expect the btree to be around half space efficient on average. For disk space with 16 bit keys this should be fine. For memory usage, this could actually be improved a lot since btree buckets are used sequentially; once we've read it from disk or created a new one we know what pages don't contain anything yet, and those could be freed until they are needed. That'd make memory usage of the btree bounded and excellent.
The real reason for the btree though was locality of disk writes. Making an index that's fast enough on lookups is easy - cutting down random writes to the bare minimum was my goal. If it helps performance elsewhere, that's just gravy
Further off, checksums are going to be stored in the btree so that'll add a fixed, possibly configurable overhead.