-
Notifications
You must be signed in to change notification settings - Fork 645
Add tensor quantization #1963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tensor quantization #1963
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1963 +/- ##
==========================================
- Coverage 85.29% 85.15% -0.15%
==========================================
Files 798 804 +6
Lines 95512 96050 +538
==========================================
+ Hits 81471 81788 +317
- Misses 14041 14262 +221 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, but LGTM great job!
TensorPrimitive::Float(B::inner(tensor.tensor())) | ||
} | ||
|
||
fn from_inner<const D: usize>( | ||
inner: <Self::InnerKind as TensorKind<<B as AutodiffBackend>::InnerBackend>>::Primitive<D>, | ||
) -> <Self as TensorKind<B>>::Primitive<D> { | ||
B::from_inner(inner) | ||
TensorPrimitive::Float(B::from_inner(inner.tensor())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to implement a q_inner and q_from_inner at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this should be fairly straightforward. I'll add it.
Not sure what's up with the wgpu tests, they seem to be failing intermittently even on main
|
Working on it |
Checklist
run-checks all
script has been executed.Related Issues/PRs
Progress towards #464
Changes
Support for static per-tensor quantization.
DType::QFloat
QuantizedTensorPrimitive
QTensorOps
forquantize
/dequantize
opsTensorPrimitive
enum for float tensors to include float and qfloat (quantized type).tensor()
(dequantized) before calling the backend implementation on theFloatTensorPrimitive
QuantizationStrategy
enum with per-tensor affine and symmetric int8 quantizationNote: QAT support should be added in a future PR
Testing
Added unit tests for affine and symmetric per-tensor quantization