Add `RawTable::vacuum` to clean up DELETED entries #255

cuviper · 2021-04-05T17:55:11Z

This cleans the table to ensure the maximum usable capacity in its
current allocation, rehashing items that need to move around
previously-deleted entries.

This cleans the table to ensure the maximum usable capacity in its current allocation, rehashing items that need to move around previously-deleted entries.

cuviper · 2021-04-05T17:55:42Z

This was inspired by indexmap-rs/indexmap#183, and the name by a database VACUUM, like PostgreSQL or SQLite. If there's interest, I could also add methods to HashMap and HashSet.

Amanieu · 2021-04-05T18:20:53Z

I have doubts about how useful this will be in practice. In almost all cases reserve_rehash will do the right thing and call rehash_in_place instead of growing the table. The heuristic in reserve_rehash will grow the table if it is more than half full, which avoids situations where you end up calling rehash_in_place on every insert if the table is at capacity and you are constantly removing & adding one item.

cuviper · 2021-04-05T18:29:41Z

I'm trying to provide an option for the concern in indexmap-rs/indexmap#183 (comment):

OK, so I guess that means I shouldn't use IndexMap in a scenario where I don't want to trigger memory allocations after the initial creation of the map?

In that scenario, I suppose they might be working more than half full, where reserve_rehash would want to grow. I'm not proposing to automatically vacuum, because the growth heuristic is probably better in general, but it may still be useful in memory-constrained environments.

Amanieu · 2021-04-05T18:58:59Z

You could just reserve 2x the needed capacity, which ensures that reserve_rehash will always use the rehash_in_place path.

Or just don't do anything: in the worst case you get a single reallocation, after which the rehash_in_place path is always used.

tesselode · 2021-04-06T06:47:11Z

To provide a bit more context, the memory constrained environment I'm working in is an audio library. Generally people working with audio don't want to allocate memory on the audio thread, because if the operating system takes a long time to allocate memory, it could result in audio stuttering, which is very unpleasant. So once I create a hash map, I never want to allocate memory for it again.

It may just be that I need a more specialized data structure.

Amanieu · 2021-04-06T06:55:25Z

As I've said before, hashbrown will automatically clean up deleted entries when it runs out of capacity so it won't call out to the allocator to grow the table unless you actually need it.

bors · 2023-08-24T20:51:25Z

☔ The latest upstream changes (presumably #458) made this pull request unmergeable. Please resolve the merge conflicts.

zacknewman · 2025-08-18T23:49:44Z

You could just reserve 2x the needed capacity, which ensures that reserve_rehash will always use the rehash_in_place path.

What do you mean by that exactly? When I allocate twice the capacity I actually need and ensure that I never have more than the initial capacity / 2 items, it's still possible for a re-allocation to occur or an unusable HashMap/HashSet which can't be inserted into due to capacity == 0. For example the below code does not panic on macOS 15.6 on an M4 Pro laptop using rustc 1.89.0:

use core::hash::{BuildHasher, Hasher};
use hashbrown::HashSet;
fn main() {
    const CAP: usize = 28;
    /// Well below `CAP / 2`.
    const MAX_LEN: usize = 8;
    let mut set = HashSet::with_capacity_and_hasher(CAP, BuildId);
    assert_eq!(CAP, set.capacity());
    for i in 0..MAX_LEN {
        set.insert(i);
    }
    assert_eq!(set.len(), MAX_LEN);
    assert_eq!(CAP, set.capacity());
    for i in 0..MAX_LEN {
        set.remove(&i);
    }
    assert!(set.is_empty());
    assert_eq!(CAP - MAX_LEN, set.capacity());
    for i in MAX_LEN..(MAX_LEN << 1) {
        set.insert(i);
    }
    assert_eq!(set.len(), MAX_LEN);
    assert_eq!(CAP - MAX_LEN, set.capacity());
    for i in MAX_LEN..(MAX_LEN << 1) {
        set.remove(&i);
    }
    assert!(set.is_empty());
    assert_eq!(CAP - (MAX_LEN << 1), set.capacity());
    for i in (MAX_LEN << 1)..((MAX_LEN << 1) + MAX_LEN) {
        set.insert(i);
    }
    assert_eq!(set.len(), MAX_LEN);
    assert_eq!(CAP - (MAX_LEN << 1), set.capacity());
    for i in (MAX_LEN << 1)..((MAX_LEN << 1) + MAX_LEN) {
        set.remove(&i);
    }
    assert!(set.is_empty());
    assert_eq!(CAP - ((MAX_LEN << 1) + MAX_LEN), set.capacity());
    for i in ((MAX_LEN << 1) + MAX_LEN)..CAP {
        set.insert(i);
    }
    assert!(set.len() < MAX_LEN);
    assert_eq!(set.len(), set.capacity());
    for i in ((MAX_LEN << 1) + MAX_LEN)..CAP {
        set.remove(&i);
    }
    assert!(set.is_empty());
    assert!(set.capacity() == 0);
}
#[derive(Clone, Copy, Debug, Default)]
struct Id(u64);
impl Hasher for Id {
    fn finish(&self) -> u64 {
        self.0
    }
    fn write_u64(&mut self, i: u64) {
        self.0 = i;
    }
    fn write_usize(&mut self, i: usize) {
        self.write_u64(i as u64);
    }
    fn write(&mut self, bytes: &[u8]) {
        if let Some(val) = bytes.get(..8) {
            let mut v = [0; 8];
            v.copy_from_slice(val);
            self.write_u64(u64::from_le_bytes(v));
        }
    }
}
#[derive(Clone, Copy, Debug, Default)]
struct BuildId;
impl BuildHasher for BuildId {
    type Hasher = Id;
    fn build_hasher(&self) -> Self::Hasher {
        Id(0)
    }
}

As we see, I pre-allocate more than 3 times the capacity I need and never insert more than what I need. Unfortunately it's possible to exhaust the capacity based on certain sequences of inserts and removals. I suppose I'm misinterpreting your suggestion of simply reserving twice the capacity I need.

Amanieu · 2025-08-19T01:09:13Z

The capacity reported by capacity() is fundamentally an estimate: if it reaches 0 then the table with re-hash in place without growing the table.

Add RawTable::vacuum to clean up DELETED entries

a4f56bf

This cleans the table to ensure the maximum usable capacity in its current allocation, rehashing items that need to move around previously-deleted entries.

cuviper mentioned this pull request Apr 5, 2021

IndexMap sometimes reports a lower capacity after removing elements indexmap-rs/indexmap#183

Closed

cuviper mentioned this pull request Dec 18, 2024

HashTable grows even though capacity is not exceeded #602

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `RawTable::vacuum` to clean up DELETED entries #255

Add `RawTable::vacuum` to clean up DELETED entries #255

Uh oh!

cuviper commented Apr 5, 2021

Uh oh!

cuviper commented Apr 5, 2021

Uh oh!

Amanieu commented Apr 5, 2021

Uh oh!

cuviper commented Apr 5, 2021

Uh oh!

Amanieu commented Apr 5, 2021

Uh oh!

tesselode commented Apr 6, 2021

Uh oh!

Amanieu commented Apr 6, 2021

Uh oh!

bors commented Aug 24, 2023

Uh oh!

zacknewman commented Aug 18, 2025

Uh oh!

Amanieu commented Aug 19, 2025

Uh oh!

Uh oh!

Add RawTable::vacuum to clean up DELETED entries #255

Are you sure you want to change the base?

Add RawTable::vacuum to clean up DELETED entries #255

Uh oh!

Conversation

cuviper commented Apr 5, 2021

Uh oh!

cuviper commented Apr 5, 2021

Uh oh!

Amanieu commented Apr 5, 2021

Uh oh!

cuviper commented Apr 5, 2021

Uh oh!

Amanieu commented Apr 5, 2021

Uh oh!

tesselode commented Apr 6, 2021

Uh oh!

Amanieu commented Apr 6, 2021

Uh oh!

bors commented Aug 24, 2023

Uh oh!

zacknewman commented Aug 18, 2025

Uh oh!

Amanieu commented Aug 19, 2025

Uh oh!

Uh oh!

Add `RawTable::vacuum` to clean up DELETED entries #255

Add `RawTable::vacuum` to clean up DELETED entries #255