Skip to content

Conversation

axman6
Copy link
Contributor

@axman6 axman6 commented Aug 10, 2025

From #220, make the (^) function significantly faster - it does this two ways, by unrolling common low exponents (1-5), which allows GHC to inline the full set of multiplications, and in cases above 5, it uses a tail recursive loop, which is much faster than the non-tail recursive definition that's been in hiding in this package for over a decade (it probably is faster than Prelude's definition, but it's not fast).

Benchmarks, adding the 10th and 20th central moment to demonstrate the effect of the loop (I'm sure someone must care about that statistic...). I've only included benchmarks that actually call (^)

                   , master   , unrolled ,            , master      , unrolled
Name               , Mean (ps), Mean (ps), % of master, 2*Stdev (ps), 2*Stdev (ps)
All.sample.skewness, 112217871,  48159448,       42.91,     10580706, 3933644
All.sample.kurtosis, 125990136,  51025634,       40.49,      7770772, 3650532
All.sample.C.M. 2  ,  71661035,  45683325,       63.74,      6811708, 4477812
All.sample.C.M. 3  ,  78805029,  43542919,       55.25,      7553062, 3881564
All.sample.C.M. 4  ,  85555419,  45128808,       52.74,      8290264, 4030818
All.sample.C.M. 5  ,  94791943,  44017114,       46.43,      7973198, 4154740
All.sample.C.M. 10 , 136170214,  76725390,       56.34,     13295868, 6833108
All.sample.C.M. 20 , 218253906, 106618505,       48.85,     15665576, 7769680

@Shimuuar
Copy link
Collaborator

I've checked core and it looks like unrolling doesn't work as intended. ^ is never called with statically known arguments. It's inlined into centralMoment{,s} and they're not inlined so there's no chance for specialization.

Variants with and without unrolling have similar performance and I've got interested what is exact difference? I'll benchmark it

@axman6
Copy link
Contributor Author

axman6 commented Aug 13, 2025

That's interesting, have you looked at the core that gets produced with known exponents when calling centralMoment? I was definitely seeing differences in the benchmarks with the CM 2-5 benchmarks with the expanded definitions for the small powers, but the differences weren't huge.

Since the exponent arguments to centralMoments are likely to always be known, it feels like its definition should also be marked INLINE, it's not a large function.

While I'm thinking about it, do you know why the definition chooses to use centralMoment twice when the exponents are small? The definition calling G.foldl' should perform at least as well as the two separate definitions. Maybe I should do some benchmarks...

@Shimuuar
Copy link
Collaborator

Unrolling does have an effect. Here benchmarking results using PAPI benchmarks (TOT_CYC — number of CPU cycles and BR_INS — branching instructions):
image

01 is you variant, 02 removes unrolling, and 03 moves unrolling into centralMoment definition:

centralMoment a xs
    | a < 0  = error "Statistics.Sample.centralMoment: negative input"
    | a == 0 = 1
    | a == 1 = 0
    | a == 2 = expectation go2 xs
    | a == 3 = expectation go3 xs
    | a == 4 = expectation go4 xs
    | otherwise = expectation go xs

So unrolling does have some effect. It not very surprising. After all it changes body of inlined function. For small powers unrolled version traverses chain of conditional jumps and computes result. It could be friendlier to branch predictor. And moving conditionals out of the loop improves performance slightly. And curiously benchmark for CM2 is not affected by all this changes at all.

While I'm thinking about it, do you know why the definition chooses to use centralMoment twice when the exponents are small?

For n=0 and n=1 central moments could be computed in constant time. (1 and 0 correspondingly)

@Shimuuar
Copy link
Collaborator

Lets drop unrolling and then this PR could be merged

As it currently stands unrolling doesn't achieve specialization for central momenta of statically known powers. But simply turning ^ will give us a new baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants