Stash

Stash¹ is a library for efficiently storing maps of keys to values when one doesn’t care what the keys are but wants blazing² fast O(1) insertions, deletions, and lookups.

Use cases include file descriptor tables, session tables, or MIO context tables.

Source: Stebalien/stash-rs
Crate: stash
API: rustdoc

If you know what this type of data structure is actually called, please tell me. I’m sure I’m not the first to use it.

Blazing means an order of magnitude faster than hash maps and btree maps.

Example

extern crate stash;
use stash::Stash;

fn main() {
    let mut stash = Stash::new();
    let key1 = stash.put("foo");
    let key2 = stash.put("bar");
    let key3 = stash.put("baz");

    assert_eq!(stash[key1], "foo");
    assert_eq!(stash[key2], "bar");
    assert_eq!(stash[key3], "baz");
    assert_eq!(stash.len(), 3);

    assert_eq!(stash.take(key2), Some("bar"));
    assert_eq!(stash.len(), 2);

    let key4 = stash.put("bin");
    assert_eq!(stash.len(), 3);
    assert_eq!(stash[key4], "bin");
    let mut values: Vec<_> = stash.into_iter().map(|(_, v)| v).collect();
    values.sort();
    assert_eq!(values, vec!["baz", "bin", "foo"]);
}

Variants

There are two variants: Stash and UniqueStash. The difference is that Stash stash uses usize’s for keys but will reuse a key after the associated item is removed from the stash while UniqueStash uses a special Tag (~twice as large) for keys but doesn’t ever re-use keys. That is, the following may fail (well, it will fail in practice but we don’t guarantee that) for a Stash but will never fail for a UniqueStash:

let idx1 = stash.put("test");
stash.take(); // Remove "test"
let idx2 = stash.put("test2");
assert!(idx1 != idx2); // Index reuse.

The trade-off here is really key-size versus key-reuse. Pay for what you use.

Design

Below is a very high-level design explanation. If you really want to understand how stash works under-the-hood, you’ll have to read the source.

Stash

A stash is backed by a single vector where each “slot” is either occupied or vacant. Occupied slots contain values and vacant slots contain the index of the next vacant slot. This way, the vacant slots form a free-list. The stash also includes an index pointing to the first vacant slot (the first vacant slot contains an index pointing to the second vacant slot and so on).

To insert an item, we replace the first slot in the “free-list” with the new item and return the index in the vector at which we inserted the item as the “key”. To remove (or take) an item, we simply simply remove it from the vector and put a new entry in the free-list in its place. To lookup a key, we just index into the vector (keys are indices). This means that insertions (push), deletions (take), and lookups (get) are truly O(1) not O(bits(key)) as they are with HashMaps or O(log(n)) as they are with ordered data structures.

UniqueStash

In the basic Stash, keys are reused because they’re simply indices into the backing vector. In UniqueStash, we fix this by adding a version field to each slot.

Iteration

Unfortunately, iteration is O(capacity), not O(n), because we step over the vacant slots when iterating. In the future, I intend to make this faster for sparse stashes by storing whether or not a slot is occupied in a packed bit-vector. With the help of the CPU cache and some fancy bit-twiddling, this should be much faster.

Benchmarks

Source

BTreeMap

test bench         ... bench:         161 ns/iter (+/- 3)
test insert_delete ... bench:          67 ns/iter (+/- 4)
test iter          ... bench:         230 ns/iter (+/- 2)
test iter_sparse   ... bench:           2 ns/iter (+/- 0)
test lookup        ... bench:           9 ns/iter (+/- 0)

HashMap

test bench         ... bench:         315 ns/iter (+/- 8)
test insert_delete ... bench:          75 ns/iter (+/- 5)
test iter          ... bench:         183 ns/iter (+/- 4)
test iter_sparse   ... bench:          94 ns/iter (+/- 4)
test lookup        ... bench:          23 ns/iter (+/- 0)

Stash

test bench         ... bench:          25 ns/iter (+/- 0)
test insert_delete ... bench:           7 ns/iter (+/- 0)
test iter          ... bench:         123 ns/iter (+/- 3)
test iter_sparse   ... bench:          76 ns/iter (+/- 1)
test lookup        ... bench:           1 ns/iter (+/- 0)

UniqueStash

test bench         ... bench:          27 ns/iter (+/- 0)
test insert_delete ... bench:           8 ns/iter (+/- 0)
test iter          ... bench:         143 ns/iter (+/- 2)
test iter_sparse   ... bench:          90 ns/iter (+/- 1)
test lookup        ... bench:           1 ns/iter (+/- 0)

2016-09-13

Stash is a library for efficiently storing maps of keys to values when one doesn’t care what the keys are but wants blazing fast O(1) insertions, deletions, and lookups.

Use cases include file descriptor tables, session tables, or MIO context tables.

See the project page for more.