-
Notifications
You must be signed in to change notification settings - Fork 318
Add RawTable::vacuum
to clean up DELETED entries
#255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This cleans the table to ensure the maximum usable capacity in its current allocation, rehashing items that need to move around previously-deleted entries.
This was inspired by indexmap-rs/indexmap#183, and the name by a database |
I have doubts about how useful this will be in practice. In almost all cases |
I'm trying to provide an option for the concern in indexmap-rs/indexmap#183 (comment):
In that scenario, I suppose they might be working more than half full, where |
You could just reserve 2x the needed capacity, which ensures that Or just don't do anything: in the worst case you get a single reallocation, after which the |
To provide a bit more context, the memory constrained environment I'm working in is an audio library. Generally people working with audio don't want to allocate memory on the audio thread, because if the operating system takes a long time to allocate memory, it could result in audio stuttering, which is very unpleasant. So once I create a hash map, I never want to allocate memory for it again. It may just be that I need a more specialized data structure. |
As I've said before, hashbrown will automatically clean up deleted entries when it runs out of capacity so it won't call out to the allocator to grow the table unless you actually need it. |
☔ The latest upstream changes (presumably #458) made this pull request unmergeable. Please resolve the merge conflicts. |
What do you mean by that exactly? When I allocate twice the capacity I actually need and ensure that I never have more than the initial capacity / 2 items, it's still possible for a re-allocation to occur or an unusable use core::hash::{BuildHasher, Hasher};
use hashbrown::HashSet;
fn main() {
const CAP: usize = 28;
/// Well below `CAP / 2`.
const MAX_LEN: usize = 8;
let mut set = HashSet::with_capacity_and_hasher(CAP, BuildId);
assert_eq!(CAP, set.capacity());
for i in 0..MAX_LEN {
set.insert(i);
}
assert_eq!(set.len(), MAX_LEN);
assert_eq!(CAP, set.capacity());
for i in 0..MAX_LEN {
set.remove(&i);
}
assert!(set.is_empty());
assert_eq!(CAP - MAX_LEN, set.capacity());
for i in MAX_LEN..(MAX_LEN << 1) {
set.insert(i);
}
assert_eq!(set.len(), MAX_LEN);
assert_eq!(CAP - MAX_LEN, set.capacity());
for i in MAX_LEN..(MAX_LEN << 1) {
set.remove(&i);
}
assert!(set.is_empty());
assert_eq!(CAP - (MAX_LEN << 1), set.capacity());
for i in (MAX_LEN << 1)..((MAX_LEN << 1) + MAX_LEN) {
set.insert(i);
}
assert_eq!(set.len(), MAX_LEN);
assert_eq!(CAP - (MAX_LEN << 1), set.capacity());
for i in (MAX_LEN << 1)..((MAX_LEN << 1) + MAX_LEN) {
set.remove(&i);
}
assert!(set.is_empty());
assert_eq!(CAP - ((MAX_LEN << 1) + MAX_LEN), set.capacity());
for i in ((MAX_LEN << 1) + MAX_LEN)..CAP {
set.insert(i);
}
assert!(set.len() < MAX_LEN);
assert_eq!(set.len(), set.capacity());
for i in ((MAX_LEN << 1) + MAX_LEN)..CAP {
set.remove(&i);
}
assert!(set.is_empty());
assert!(set.capacity() == 0);
}
#[derive(Clone, Copy, Debug, Default)]
struct Id(u64);
impl Hasher for Id {
fn finish(&self) -> u64 {
self.0
}
fn write_u64(&mut self, i: u64) {
self.0 = i;
}
fn write_usize(&mut self, i: usize) {
self.write_u64(i as u64);
}
fn write(&mut self, bytes: &[u8]) {
if let Some(val) = bytes.get(..8) {
let mut v = [0; 8];
v.copy_from_slice(val);
self.write_u64(u64::from_le_bytes(v));
}
}
}
#[derive(Clone, Copy, Debug, Default)]
struct BuildId;
impl BuildHasher for BuildId {
type Hasher = Id;
fn build_hasher(&self) -> Self::Hasher {
Id(0)
}
} As we see, I pre-allocate more than 3 times the capacity I need and never insert more than what I need. Unfortunately it's possible to exhaust the capacity based on certain sequences of inserts and removals. I suppose I'm misinterpreting your suggestion of simply reserving twice the capacity I need. |
The capacity reported by |
This cleans the table to ensure the maximum usable capacity in its
current allocation, rehashing items that need to move around
previously-deleted entries.