Skip to content

Conversation

nicolasvasilache
Copy link
Contributor

No description provided.

harsh-nod and others added 14 commits April 23, 2025 00:51
This PR adds a GEMM bias example and fixes some issues
relating to index propagation in this case.

Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Currently exposes a corner case where `resolve_threads_shapes` chokes on the result of an `tkw.reduction`
with a `tkw.mma` when followed by a binary.
The indexing contains quantities such as
```
'$WG0*BLOCK_B + BLOCK_B*floor($T0/64) + Piecewise((Mod($T0, 16), ~$MMA_ACC), (4*floor((Mod($T0, 64))/16), $MMA_ACC)) : Piecewise((1, ~$MMA_ACC), (4, $MMA_ACC)) : Piecewise((1, ~$MMA_ACC), (16, $MMA_ACC))'
```
and the `MMA_ACC` is not resolved, resulting in a comparison error btw static constants and Sympy expressions.

Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
To generate the valid IR, one needs to:
```
git revert db0d44b66fdca00fa9726ddd51ba96dc1732e20b
git revert 33120fb
git revert 2d60779
```

Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
…ion test with new APIs for compilation and execution

Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
The following procedure seems to allow to make some progress.
It uses the system as a blackbox, leaving decisions to the compiler / inference and performing fixups using those decisions once bugs appear.

The steps are:
1. Completely avoid vector_shapes and elements_per_thread; instead let the mma-driven inference tell us what it needs via iterative trial and error.
2. Inject roundtrip write/read to let the mma sink into a write and avoid partition_strided_operators bugs.
3. Via trial and error again, as the system complains about missing vector sizes (that cannot be inferred in the elementwise part), reinject vector sizes using values derived by the system in step 1.
4. Once stabilized, set the elements_per_thread to 4 for the elementwise trailing part.

This allows producing IR that executes without crashing but produces incorrect results.

Note: if BLOCK_D2 < 128, we get
```
error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
```

Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
@nicolasvasilache nicolasvasilache force-pushed the users/nico/moe-step-0-permute-bug branch 2 times, most recently from b4326c7 to 4da3949 Compare April 23, 2025 08:19
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
Signed-off-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
@nicolasvasilache nicolasvasilache force-pushed the users/nico/moe-step-0-permute-bug branch from 4da3949 to df7f4b9 Compare April 27, 2025 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants