Make exponentiation significantly faster #221

axman6 · 2025-08-10T14:18:19Z

From #220, make the (^) function significantly faster - it does this two ways, by unrolling common low exponents (1-5), which allows GHC to inline the full set of multiplications, and in cases above 5, it uses a tail recursive loop, which is much faster than the non-tail recursive definition that's been in hiding in this package for over a decade (it probably is faster than Prelude's definition, but it's not fast).

Benchmarks, adding the 10th and 20th central moment to demonstrate the effect of the loop (I'm sure someone must care about that statistic...). I've only included benchmarks that actually call (^)

                   , master   , unrolled ,            , master      , unrolled
Name               , Mean (ps), Mean (ps), % of master, 2*Stdev (ps), 2*Stdev (ps)
All.sample.skewness, 112217871,  48159448,       42.91,     10580706, 3933644
All.sample.kurtosis, 125990136,  51025634,       40.49,      7770772, 3650532
All.sample.C.M. 2  ,  71661035,  45683325,       63.74,      6811708, 4477812
All.sample.C.M. 3  ,  78805029,  43542919,       55.25,      7553062, 3881564
All.sample.C.M. 4  ,  85555419,  45128808,       52.74,      8290264, 4030818
All.sample.C.M. 5  ,  94791943,  44017114,       46.43,      7973198, 4154740
All.sample.C.M. 10 , 136170214,  76725390,       56.34,     13295868, 6833108
All.sample.C.M. 20 , 218253906, 106618505,       48.85,     15665576, 7769680

Shimuuar · 2025-08-12T22:13:41Z

I've checked core and it looks like unrolling doesn't work as intended. ^ is never called with statically known arguments. It's inlined into centralMoment{,s} and they're not inlined so there's no chance for specialization.

Variants with and without unrolling have similar performance and I've got interested what is exact difference? I'll benchmark it

axman6 · 2025-08-13T00:35:54Z

That's interesting, have you looked at the core that gets produced with known exponents when calling centralMoment? I was definitely seeing differences in the benchmarks with the CM 2-5 benchmarks with the expanded definitions for the small powers, but the differences weren't huge.

Since the exponent arguments to centralMoments are likely to always be known, it feels like its definition should also be marked INLINE, it's not a large function.

While I'm thinking about it, do you know why the definition chooses to use centralMoment twice when the exponents are small? The definition calling G.foldl' should perform at least as well as the two separate definitions. Maybe I should do some benchmarks...

Shimuuar · 2025-08-13T10:07:59Z

Unrolling does have an effect. Here benchmarking results using PAPI benchmarks (TOT_CYC — number of CPU cycles and BR_INS — branching instructions):

01 is you variant, 02 removes unrolling, and 03 moves unrolling into centralMoment definition:

centralMoment a xs
    | a < 0  = error "Statistics.Sample.centralMoment: negative input"
    | a == 0 = 1
    | a == 1 = 0
    | a == 2 = expectation go2 xs
    | a == 3 = expectation go3 xs
    | a == 4 = expectation go4 xs
    | otherwise = expectation go xs

So unrolling does have some effect. It not very surprising. After all it changes body of inlined function. For small powers unrolled version traverses chain of conditional jumps and computes result. It could be friendlier to branch predictor. And moving conditionals out of the loop improves performance slightly. And curiously benchmark for CM2 is not affected by all this changes at all.

While I'm thinking about it, do you know why the definition chooses to use centralMoment twice when the exponents are small?

For n=0 and n=1 central moments could be computed in constant time. (1 and 0 correspondingly)

Shimuuar · 2025-08-15T17:38:33Z

Lets drop unrolling and then this PR could be merged

As it currently stands unrolling doesn't achieve specialization for central momenta of statically known powers. But simply turning ^ will give us a new baseline.

Make exponentiation significantly faster

089d200

BangPatterns instead of seq

539c470

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make exponentiation significantly faster #221

Make exponentiation significantly faster #221

Uh oh!

axman6 commented Aug 10, 2025 •

edited

Loading

Uh oh!

Shimuuar commented Aug 12, 2025

Uh oh!

axman6 commented Aug 13, 2025 •

edited

Loading

Uh oh!

Shimuuar commented Aug 13, 2025

Uh oh!

Shimuuar commented Aug 15, 2025

Uh oh!

Uh oh!

Make exponentiation significantly faster #221

Are you sure you want to change the base?

Make exponentiation significantly faster #221

Uh oh!

Conversation

axman6 commented Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shimuuar commented Aug 12, 2025

Uh oh!

axman6 commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shimuuar commented Aug 13, 2025

Uh oh!

Shimuuar commented Aug 15, 2025

Uh oh!

Uh oh!

axman6 commented Aug 10, 2025 •

edited

Loading

axman6 commented Aug 13, 2025 •

edited

Loading