-
Notifications
You must be signed in to change notification settings - Fork 645
Perf: Fusion search for composed optimization #3258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3258 +/- ##
==========================================
+ Coverage 82.22% 82.32% +0.09%
==========================================
Files 973 978 +5
Lines 124018 124864 +846
==========================================
+ Hits 101971 102790 +819
- Misses 22047 22074 +27 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All CI failures are unrelated to the changes in this PR 🙃
Windows and mac failed because the url is unreachable for the text-generation example tokenizer.
Linux failed on tests::cube::kernel::bernoulli::tests::wald_wolfowitz_runs_test
(seems to be failing intermittently since the kernel was ported to cubecl repo).
Gonna re-run the jobs for sanity.
I added the ability to re-order operations in the execution stream to find more optimizations.
For the custom GELU fusion benchmarks, if I put an unrelated tensor operation in the middle of the computation, I get a 3x slowdown with the old method, since it broke fusion. With these changes, I don't experience any slowdown since the unrelated operation executes out of order.