-
Notifications
You must be signed in to change notification settings - Fork 645
[Perf] Convolution migration to NHWC #3090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3090 +/- ##
==========================================
- Coverage 81.08% 81.06% -0.03%
==========================================
Files 817 817
Lines 117326 117408 +82
==========================================
+ Hits 95131 95173 +42
- Misses 22195 22235 +40 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran the tests locally for cuda and wgpu/vulkan. Got a couple of failures on vulkan with f16 for convolutions, but it looks like they are also failing on main. So something broke since the 0.17 release.
With the failing tests I found a print statement for into data but that will be fixed in #3114.
Also linux-std runner keeps running out of disk space with all the updates we're doing recently. I think I'm just gonna disable the caching on this runner, more of an annoyance than anything else!
Sorry for the rant 😅 just happened to stumble upon multiple issues while testing the changes..
TL;DR: this PR looks good to me!
Pull Request Template
Checklist
run-checks all
script has been executed.Related Issues/PRs
Requires tracel-ai/cubecl#646 to get merged first
Changes
Reworks im2col and direct convolution to use NHWC layout, making the 2D forward pass completely NHWC. This saves unnecessary transpositions and is a prerequisite for eventually moving weights to NHWC overall. It also improves performance.
Testing
All tests pass with both the specific algorithms and autotune enabled.