Stash
Stash1 is a library for efficiently storing maps of keys to values when one
doesn't care what the keys are but wants blazing2 fast O(1)
insertions,
deletions, and lookups.
Use cases include file descriptor tables, session tables, or MIO context tables.
- Source: Stebalien/stash-rs
- Crate: stash
- API: rustdoc
If you know what this type of data structure is actually called, please tell me. I'm sure I'm not the first to use it.
Blazing means an order of magnitude faster than hash maps and btree maps.
Example
extern crate stash;
use stash::Stash;
fn main() {
let mut stash = Stash::new();
let key1 = stash.put("foo");
let key2 = stash.put("bar");
let key3 = stash.put("baz");
assert_eq!(stash[key1], "foo");
assert_eq!(stash[key2], "bar");
assert_eq!(stash[key3], "baz");
assert_eq!(stash.len(), 3);
assert_eq!(stash.take(key2), Some("bar"));
assert_eq!(stash.len(), 2);
let key4 = stash.put("bin");
assert_eq!(stash.len(), 3);
assert_eq!(stash[key4], "bin");
let mut values: Vec<_> = stash.into_iter().map(|(_, v)| v).collect();
values.sort();
assert_eq!(values, vec!["baz", "bin", "foo"]);
}
Variants
There are two variants: Stash
and UniqueStash
. The difference is that
Stash
stash uses usize
's for keys but will reuse a key after the associated
item is removed from the stash while UniqueStash
uses a special Tag
(~twice
as large) for keys but doesn't ever re-use keys. That is, the following may
fail (well, it will fail in practice but we don't guarantee that) for a
Stash
but will never fail for a UniqueStash
:
let idx1 = stash.put("test");
stash.take(); // Remove "test"
let idx2 = stash.put("test2");
assert!(idx1 != idx2); // Index reuse.
The trade-off here is really key-size versus key-reuse. Pay for what you use.
Design
Below is a very high-level design explanation. If you really want to understand how stash works under-the-hood, you'll have to read the source.
Stash
A stash is backed by a single vector where each "slot" is either occupied or vacant. Occupied slots contain values and vacant slots contain the index of the next vacant slot. This way, the vacant slots form a free-list. The stash also includes an index pointing to the first vacant slot (the first vacant slot contains an index pointing to the second vacant slot and so on).
To insert an item, we replace the first slot in the "free-list" with the new
item and return the index in the vector at which we inserted the item as the
"key". To remove (or take) an item, we simply simply remove it from the vector
and put a new entry in the free-list in its place. To lookup a key, we just
index into the vector (keys are indices). This means that insertions (push
),
deletions (take
), and lookups (get
) are truly O(1)
not O(bits(key))
as
they are with HashMaps or O(log(n))
as they are with ordered data structures.
UniqueStash
In the basic Stash
, keys are reused because they're simply indices into the
backing vector. In UniqueStash
, we fix this by adding a version field to each
slot.
Iteration
Unfortunately, iteration is O(capacity)
, not O(n)
, because we step over the
vacant slots when iterating. In the future, I intend to make this faster for
sparse stashes by storing whether or not a slot is occupied in a packed
bit-vector. With the help of the CPU cache and some fancy bit-twiddling, this
should be much faster.
Benchmarks
BTreeMap
test bench ... bench: 161 ns/iter (+/- 3)
test insert_delete ... bench: 67 ns/iter (+/- 4)
test iter ... bench: 230 ns/iter (+/- 2)
test iter_sparse ... bench: 2 ns/iter (+/- 0)
test lookup ... bench: 9 ns/iter (+/- 0)
HashMap
test bench ... bench: 315 ns/iter (+/- 8)
test insert_delete ... bench: 75 ns/iter (+/- 5)
test iter ... bench: 183 ns/iter (+/- 4)
test iter_sparse ... bench: 94 ns/iter (+/- 4)
test lookup ... bench: 23 ns/iter (+/- 0)
Stash
test bench ... bench: 25 ns/iter (+/- 0)
test insert_delete ... bench: 7 ns/iter (+/- 0)
test iter ... bench: 123 ns/iter (+/- 3)
test iter_sparse ... bench: 76 ns/iter (+/- 1)
test lookup ... bench: 1 ns/iter (+/- 0)
UniqueStash
test bench ... bench: 27 ns/iter (+/- 0)
test insert_delete ... bench: 8 ns/iter (+/- 0)
test iter ... bench: 143 ns/iter (+/- 2)
test iter_sparse ... bench: 90 ns/iter (+/- 1)
test lookup ... bench: 1 ns/iter (+/- 0)
Stash
Stash is a library for efficiently storing maps of keys to values when one
doesn't care what the keys are but wants blazing fast O(1)
insertions,
deletions, and lookups.
Use cases include file descriptor tables, session tables, or MIO context tables.
See the project page for more.