optimize the fixed32 and fixed64 encoding #772

tardyp · 2025-08-21T19:30:02Z

The current implementation of write_raw_little_endian32 is just using self.write_raw_bytes.

As this go through the function boundaries, the compiler is not smart enough to simplify the data copy, and end-up calling memcpy.

This inlined implementation is twice as fast as the previous according to the attached bench.

test bench_write_raw_fixed32   ... bench:       5,860.52 ns/iter (+/- 62.51)
test bench_write_raw_fixed64   ... bench:       4,303.08 ns/iter (+/- 29.47)
test bench_write_raw_varint_32 ... bench:       2,555.20 ns/iter (+/- 126.98)

now fixed32/64 is as expected a bit faster than varint

test bench_write_raw_fixed32   ... bench:       2,143.48 ns/iter (+/- 21.70)
test bench_write_raw_fixed64   ... bench:       2,209.28 ns/iter (+/- 32.42)
test bench_write_raw_varint_32 ... bench:       2,480.89 ns/iter (+/- 88.95)

tardyp · 2025-08-22T12:51:59Z

I think that the CI failure is a flaky test. let me know if you disagree.

optimize the fixed32 and fixed64 encoding

134d42b

tardyp force-pushed the optmize_fixed32 branch from 7496321 to 134d42b Compare August 21, 2025 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize the fixed32 and fixed64 encoding #772

optimize the fixed32 and fixed64 encoding #772

Uh oh!

tardyp commented Aug 21, 2025

Uh oh!

tardyp commented Aug 22, 2025

Uh oh!

Uh oh!

optimize the fixed32 and fixed64 encoding #772

Are you sure you want to change the base?

optimize the fixed32 and fixed64 encoding #772

Uh oh!

Conversation

tardyp commented Aug 21, 2025

Uh oh!

tardyp commented Aug 22, 2025

Uh oh!

Uh oh!