-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[RISCV] AddEdge between mask producer and user of V0 #146855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Liao Chunyu (ChunyuLiao) ChangesIf two mask virtual registers have overlapping live ranges, may a move(vmv* v0, vx) will be generated to save the second mask. By moving the first mask's producer after the mask's use, the spill can be eliminated, and the move can be removed. Try to remove vmv1r: https://gcc.godbolt.org/z/zbsWvfWYW Full diff: https://github.com/llvm/llvm-project/pull/146855.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVVectorMaskDAGMutation.cpp b/llvm/lib/Target/RISCV/RISCVVectorMaskDAGMutation.cpp
index be54a8c95a978..96430fb2cd1e6 100644
--- a/llvm/lib/Target/RISCV/RISCVVectorMaskDAGMutation.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVectorMaskDAGMutation.cpp
@@ -68,11 +68,33 @@ class RISCVVectorMaskDAGMutation : public ScheduleDAGMutation {
void apply(ScheduleDAGInstrs *DAG) override {
SUnit *NearestUseV0SU = nullptr;
+ SmallVector<SUnit *, 8> DefMask;
for (SUnit &SU : DAG->SUnits) {
const MachineInstr *MI = SU.getInstr();
- if (MI->findRegisterUseOperand(RISCV::V0, TRI))
+ if (RISCVII::getLMul(MI->getDesc().TSFlags) != RISCVVType::LMUL_8 &&
+ isSoleUseCopyToV0(SU))
+ DefMask.push_back(&SU);
+
+ if (MI->findRegisterUseOperand(RISCV::V0, TRI)) {
NearestUseV0SU = &SU;
+ if (DefMask.size() > 1 && !MI->isCopy()) {
+ // Copy may not be a real use, so skip it here.
+ SUnit *FirstDefV0SU = DefMask[0];
+ SUnit *SecondDefV0SU = DefMask[1];
+ Register FirstVReg = FirstDefV0SU->getInstr()->getOperand(0).getReg();
+ Register SecondVReg =
+ SecondDefV0SU->getInstr()->getOperand(0).getReg();
+ LiveIntervals *LIS = static_cast<ScheduleDAGMILive *>(DAG)->getLIS();
+ LiveInterval &FirstLI = LIS->getInterval(FirstVReg);
+ LiveInterval &SecondLI = LIS->getInterval(SecondVReg);
+ if (FirstLI.overlaps(SecondLI))
+ DAG->addEdge(FirstDefV0SU, SDep(&SU, SDep::Artificial));
+ }
+ if (DefMask.size() > 0)
+ DefMask.erase(DefMask.begin());
+ }
+
if (NearestUseV0SU && NearestUseV0SU != &SU && isSoleUseCopyToV0(SU) &&
// For LMUL=8 cases, there will be more possibilities to spill.
// FIXME: We should use RegPressureTracker to do fine-grained
diff --git a/llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll b/llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll
index 206838917d004..28d6e631d524d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll
@@ -153,20 +153,19 @@ define <vscale x 2 x i32> @vwop_vscale_sext_i1i32_multiple_users(ptr %x, ptr %y,
; NO_FOLDING: # %bb.0:
; NO_FOLDING-NEXT: vsetvli a3, zero, e32, m1, ta, mu
; NO_FOLDING-NEXT: vlm.v v8, (a0)
-; NO_FOLDING-NEXT: vlm.v v9, (a1)
-; NO_FOLDING-NEXT: vlm.v v10, (a2)
-; NO_FOLDING-NEXT: vmv.v.i v11, 0
+; NO_FOLDING-NEXT: vmv.v.i v10, 0
; NO_FOLDING-NEXT: vmv.v.v v0, v8
-; NO_FOLDING-NEXT: vmerge.vim v12, v11, -1, v0
+; NO_FOLDING-NEXT: vmerge.vim v11, v10, -1, v0
+; NO_FOLDING-NEXT: vlm.v v0, (a1)
+; NO_FOLDING-NEXT: vlm.v v9, (a2)
+; NO_FOLDING-NEXT: vmerge.vim v12, v10, -1, v0
; NO_FOLDING-NEXT: vmv.v.v v0, v9
-; NO_FOLDING-NEXT: vmerge.vim v9, v11, -1, v0
-; NO_FOLDING-NEXT: vmv.v.v v0, v10
-; NO_FOLDING-NEXT: vmerge.vim v10, v11, -1, v0
-; NO_FOLDING-NEXT: vmul.vv v9, v12, v9
-; NO_FOLDING-NEXT: vsub.vv v11, v12, v10
+; NO_FOLDING-NEXT: vmerge.vim v9, v10, -1, v0
+; NO_FOLDING-NEXT: vmul.vv v10, v11, v12
+; NO_FOLDING-NEXT: vsub.vv v11, v11, v9
; NO_FOLDING-NEXT: vmv.v.v v0, v8
-; NO_FOLDING-NEXT: vadd.vi v10, v10, -1, v0.t
-; NO_FOLDING-NEXT: vor.vv v8, v9, v10
+; NO_FOLDING-NEXT: vadd.vi v9, v9, -1, v0.t
+; NO_FOLDING-NEXT: vor.vv v8, v10, v9
; NO_FOLDING-NEXT: vor.vv v8, v8, v11
; NO_FOLDING-NEXT: ret
;
@@ -174,20 +173,19 @@ define <vscale x 2 x i32> @vwop_vscale_sext_i1i32_multiple_users(ptr %x, ptr %y,
; FOLDING: # %bb.0:
; FOLDING-NEXT: vsetvli a3, zero, e32, m1, ta, mu
; FOLDING-NEXT: vlm.v v8, (a0)
-; FOLDING-NEXT: vlm.v v9, (a1)
-; FOLDING-NEXT: vlm.v v10, (a2)
-; FOLDING-NEXT: vmv.v.i v11, 0
+; FOLDING-NEXT: vmv.v.i v10, 0
; FOLDING-NEXT: vmv.v.v v0, v8
-; FOLDING-NEXT: vmerge.vim v12, v11, -1, v0
+; FOLDING-NEXT: vmerge.vim v11, v10, -1, v0
+; FOLDING-NEXT: vlm.v v0, (a1)
+; FOLDING-NEXT: vlm.v v9, (a2)
+; FOLDING-NEXT: vmerge.vim v12, v10, -1, v0
; FOLDING-NEXT: vmv.v.v v0, v9
-; FOLDING-NEXT: vmerge.vim v9, v11, -1, v0
-; FOLDING-NEXT: vmv.v.v v0, v10
-; FOLDING-NEXT: vmerge.vim v10, v11, -1, v0
-; FOLDING-NEXT: vmul.vv v9, v12, v9
-; FOLDING-NEXT: vsub.vv v11, v12, v10
+; FOLDING-NEXT: vmerge.vim v9, v10, -1, v0
+; FOLDING-NEXT: vmul.vv v10, v11, v12
+; FOLDING-NEXT: vsub.vv v11, v11, v9
; FOLDING-NEXT: vmv.v.v v0, v8
-; FOLDING-NEXT: vadd.vi v10, v10, -1, v0.t
-; FOLDING-NEXT: vor.vv v8, v9, v10
+; FOLDING-NEXT: vadd.vi v9, v9, -1, v0.t
+; FOLDING-NEXT: vor.vv v8, v10, v9
; FOLDING-NEXT: vor.vv v8, v8, v11
; FOLDING-NEXT: ret
%a = load <vscale x 2 x i1>, ptr %x
@@ -209,20 +207,19 @@ define <vscale x 2 x i8> @vwop_vscale_sext_i1i8_multiple_users(ptr %x, ptr %y, p
; NO_FOLDING: # %bb.0:
; NO_FOLDING-NEXT: vsetvli a3, zero, e8, mf4, ta, mu
; NO_FOLDING-NEXT: vlm.v v8, (a0)
-; NO_FOLDING-NEXT: vlm.v v9, (a1)
-; NO_FOLDING-NEXT: vlm.v v10, (a2)
-; NO_FOLDING-NEXT: vmv.v.i v11, 0
+; NO_FOLDING-NEXT: vmv.v.i v10, 0
; NO_FOLDING-NEXT: vmv1r.v v0, v8
-; NO_FOLDING-NEXT: vmerge.vim v12, v11, -1, v0
+; NO_FOLDING-NEXT: vmerge.vim v11, v10, -1, v0
+; NO_FOLDING-NEXT: vlm.v v0, (a1)
+; NO_FOLDING-NEXT: vlm.v v9, (a2)
+; NO_FOLDING-NEXT: vmerge.vim v12, v10, -1, v0
; NO_FOLDING-NEXT: vmv1r.v v0, v9
-; NO_FOLDING-NEXT: vmerge.vim v9, v11, -1, v0
-; NO_FOLDING-NEXT: vmv1r.v v0, v10
-; NO_FOLDING-NEXT: vmerge.vim v10, v11, -1, v0
-; NO_FOLDING-NEXT: vmul.vv v9, v12, v9
-; NO_FOLDING-NEXT: vsub.vv v11, v12, v10
+; NO_FOLDING-NEXT: vmerge.vim v9, v10, -1, v0
+; NO_FOLDING-NEXT: vmul.vv v10, v11, v12
+; NO_FOLDING-NEXT: vsub.vv v11, v11, v9
; NO_FOLDING-NEXT: vmv1r.v v0, v8
-; NO_FOLDING-NEXT: vadd.vi v10, v10, -1, v0.t
-; NO_FOLDING-NEXT: vor.vv v8, v9, v10
+; NO_FOLDING-NEXT: vadd.vi v9, v9, -1, v0.t
+; NO_FOLDING-NEXT: vor.vv v8, v10, v9
; NO_FOLDING-NEXT: vor.vv v8, v8, v11
; NO_FOLDING-NEXT: ret
;
@@ -230,20 +227,19 @@ define <vscale x 2 x i8> @vwop_vscale_sext_i1i8_multiple_users(ptr %x, ptr %y, p
; FOLDING: # %bb.0:
; FOLDING-NEXT: vsetvli a3, zero, e8, mf4, ta, mu
; FOLDING-NEXT: vlm.v v8, (a0)
-; FOLDING-NEXT: vlm.v v9, (a1)
-; FOLDING-NEXT: vlm.v v10, (a2)
-; FOLDING-NEXT: vmv.v.i v11, 0
+; FOLDING-NEXT: vmv.v.i v10, 0
; FOLDING-NEXT: vmv1r.v v0, v8
-; FOLDING-NEXT: vmerge.vim v12, v11, -1, v0
+; FOLDING-NEXT: vmerge.vim v11, v10, -1, v0
+; FOLDING-NEXT: vlm.v v0, (a1)
+; FOLDING-NEXT: vlm.v v9, (a2)
+; FOLDING-NEXT: vmerge.vim v12, v10, -1, v0
; FOLDING-NEXT: vmv1r.v v0, v9
-; FOLDING-NEXT: vmerge.vim v9, v11, -1, v0
-; FOLDING-NEXT: vmv1r.v v0, v10
-; FOLDING-NEXT: vmerge.vim v10, v11, -1, v0
-; FOLDING-NEXT: vmul.vv v9, v12, v9
-; FOLDING-NEXT: vsub.vv v11, v12, v10
+; FOLDING-NEXT: vmerge.vim v9, v10, -1, v0
+; FOLDING-NEXT: vmul.vv v10, v11, v12
+; FOLDING-NEXT: vsub.vv v11, v11, v9
; FOLDING-NEXT: vmv1r.v v0, v8
-; FOLDING-NEXT: vadd.vi v10, v10, -1, v0.t
-; FOLDING-NEXT: vor.vv v8, v9, v10
+; FOLDING-NEXT: vadd.vi v9, v9, -1, v0.t
+; FOLDING-NEXT: vor.vv v8, v10, v9
; FOLDING-NEXT: vor.vv v8, v8, v11
; FOLDING-NEXT: ret
%a = load <vscale x 2 x i1>, ptr %x
@@ -445,16 +441,15 @@ define <vscale x 2 x i32> @vwop_vscale_zext_i1i32_multiple_users(ptr %x, ptr %y,
; NO_FOLDING-NEXT: vsetvli a3, zero, e32, m1, ta, mu
; NO_FOLDING-NEXT: vlm.v v0, (a0)
; NO_FOLDING-NEXT: vlm.v v8, (a2)
-; NO_FOLDING-NEXT: vlm.v v9, (a1)
-; NO_FOLDING-NEXT: vmv.v.i v10, 0
-; NO_FOLDING-NEXT: vmerge.vim v11, v10, 1, v0
+; NO_FOLDING-NEXT: vmv.v.i v9, 0
+; NO_FOLDING-NEXT: vmerge.vim v10, v9, 1, v0
; NO_FOLDING-NEXT: vmv.v.v v0, v8
-; NO_FOLDING-NEXT: vmerge.vim v8, v10, 1, v0
-; NO_FOLDING-NEXT: vadd.vv v10, v11, v8
-; NO_FOLDING-NEXT: vsub.vv v8, v11, v8
-; NO_FOLDING-NEXT: vmv.v.v v0, v9
-; NO_FOLDING-NEXT: vor.vv v10, v10, v11, v0.t
-; NO_FOLDING-NEXT: vor.vv v8, v10, v8
+; NO_FOLDING-NEXT: vmerge.vim v8, v9, 1, v0
+; NO_FOLDING-NEXT: vlm.v v0, (a1)
+; NO_FOLDING-NEXT: vadd.vv v9, v10, v8
+; NO_FOLDING-NEXT: vsub.vv v8, v10, v8
+; NO_FOLDING-NEXT: vor.vv v9, v9, v10, v0.t
+; NO_FOLDING-NEXT: vor.vv v8, v9, v8
; NO_FOLDING-NEXT: ret
;
; FOLDING-LABEL: vwop_vscale_zext_i1i32_multiple_users:
@@ -462,16 +457,15 @@ define <vscale x 2 x i32> @vwop_vscale_zext_i1i32_multiple_users(ptr %x, ptr %y,
; FOLDING-NEXT: vsetvli a3, zero, e32, m1, ta, mu
; FOLDING-NEXT: vlm.v v0, (a0)
; FOLDING-NEXT: vlm.v v8, (a2)
-; FOLDING-NEXT: vlm.v v9, (a1)
-; FOLDING-NEXT: vmv.v.i v10, 0
-; FOLDING-NEXT: vmerge.vim v11, v10, 1, v0
+; FOLDING-NEXT: vmv.v.i v9, 0
+; FOLDING-NEXT: vmerge.vim v10, v9, 1, v0
; FOLDING-NEXT: vmv.v.v v0, v8
-; FOLDING-NEXT: vmerge.vim v8, v10, 1, v0
-; FOLDING-NEXT: vadd.vv v10, v11, v8
-; FOLDING-NEXT: vsub.vv v8, v11, v8
-; FOLDING-NEXT: vmv.v.v v0, v9
-; FOLDING-NEXT: vor.vv v10, v10, v11, v0.t
-; FOLDING-NEXT: vor.vv v8, v10, v8
+; FOLDING-NEXT: vmerge.vim v8, v9, 1, v0
+; FOLDING-NEXT: vlm.v v0, (a1)
+; FOLDING-NEXT: vadd.vv v9, v10, v8
+; FOLDING-NEXT: vsub.vv v8, v10, v8
+; FOLDING-NEXT: vor.vv v9, v9, v10, v0.t
+; FOLDING-NEXT: vor.vv v8, v9, v8
; FOLDING-NEXT: ret
%a = load <vscale x 2 x i1>, ptr %x
%b = load <vscale x 2 x i1>, ptr %y
@@ -493,16 +487,15 @@ define <vscale x 2 x i8> @vwop_vscale_zext_i1i8_multiple_users(ptr %x, ptr %y, p
; NO_FOLDING-NEXT: vsetvli a3, zero, e8, mf4, ta, mu
; NO_FOLDING-NEXT: vlm.v v0, (a0)
; NO_FOLDING-NEXT: vlm.v v8, (a2)
-; NO_FOLDING-NEXT: vlm.v v9, (a1)
-; NO_FOLDING-NEXT: vmv.v.i v10, 0
-; NO_FOLDING-NEXT: vmerge.vim v11, v10, 1, v0
+; NO_FOLDING-NEXT: vmv.v.i v9, 0
+; NO_FOLDING-NEXT: vmerge.vim v10, v9, 1, v0
; NO_FOLDING-NEXT: vmv1r.v v0, v8
-; NO_FOLDING-NEXT: vmerge.vim v8, v10, 1, v0
-; NO_FOLDING-NEXT: vadd.vv v10, v11, v8
-; NO_FOLDING-NEXT: vsub.vv v8, v11, v8
-; NO_FOLDING-NEXT: vmv1r.v v0, v9
-; NO_FOLDING-NEXT: vor.vv v10, v10, v11, v0.t
-; NO_FOLDING-NEXT: vor.vv v8, v10, v8
+; NO_FOLDING-NEXT: vmerge.vim v8, v9, 1, v0
+; NO_FOLDING-NEXT: vlm.v v0, (a1)
+; NO_FOLDING-NEXT: vadd.vv v9, v10, v8
+; NO_FOLDING-NEXT: vsub.vv v8, v10, v8
+; NO_FOLDING-NEXT: vor.vv v9, v9, v10, v0.t
+; NO_FOLDING-NEXT: vor.vv v8, v9, v8
; NO_FOLDING-NEXT: ret
;
; FOLDING-LABEL: vwop_vscale_zext_i1i8_multiple_users:
@@ -510,16 +503,15 @@ define <vscale x 2 x i8> @vwop_vscale_zext_i1i8_multiple_users(ptr %x, ptr %y, p
; FOLDING-NEXT: vsetvli a3, zero, e8, mf4, ta, mu
; FOLDING-NEXT: vlm.v v0, (a0)
; FOLDING-NEXT: vlm.v v8, (a2)
-; FOLDING-NEXT: vlm.v v9, (a1)
-; FOLDING-NEXT: vmv.v.i v10, 0
-; FOLDING-NEXT: vmerge.vim v11, v10, 1, v0
+; FOLDING-NEXT: vmv.v.i v9, 0
+; FOLDING-NEXT: vmerge.vim v10, v9, 1, v0
; FOLDING-NEXT: vmv1r.v v0, v8
-; FOLDING-NEXT: vmerge.vim v8, v10, 1, v0
-; FOLDING-NEXT: vadd.vv v10, v11, v8
-; FOLDING-NEXT: vsub.vv v8, v11, v8
-; FOLDING-NEXT: vmv1r.v v0, v9
-; FOLDING-NEXT: vor.vv v10, v10, v11, v0.t
-; FOLDING-NEXT: vor.vv v8, v10, v8
+; FOLDING-NEXT: vmerge.vim v8, v9, 1, v0
+; FOLDING-NEXT: vlm.v v0, (a1)
+; FOLDING-NEXT: vadd.vv v9, v10, v8
+; FOLDING-NEXT: vsub.vv v8, v10, v8
+; FOLDING-NEXT: vor.vv v9, v9, v10, v0.t
+; FOLDING-NEXT: vor.vv v8, v9, v8
; FOLDING-NEXT: ret
%a = load <vscale x 2 x i1>, ptr %x
%b = load <vscale x 2 x i1>, ptr %y
|
d8bcf5a
to
9e746bf
Compare
; CHECK-NEXT: vsetvli a2, zero, e32, m8, ta, mu | ||
; CHECK-NEXT: vadd.vi v24, v24, 1, v0.t | ||
; CHECK-NEXT: vmv1r.v v0, v5 | ||
; CHECK-NEXT: vsetvli a2, zero, e8, mf2, ta, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes a vmv but adds two more vtype toggles. But it may not matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please wait a few days in case that @preames may have some comments.
If there are multiple masks producers followed by multiple masked consumers, may a move(vmv* v0, vx) will be generated to save mask. By moving the mask's producer after the mask's use, the spill can be eliminated, and the move can be removed.
9e746bf
to
3770f9a
Compare
If there are no more comments, I will merge this patch tomorrow. |
This commit is breaking several RVV configurations. https://lab.llvm.org/staging/#/builders/210/builds/1044 |
This reverts commit aee21c3.
This reverts commit aee21c3. As noted <#146855 (comment)> this causes compile errors for several RVV configurations: fatal error: error in backend: SmallVector unable to grow. Requested capacity (4294967296) is larger than maximum value for size type (4294967295)
I've landed a revert for the time being in order to unbreak things, so the issue can be investigated without undue time pressure. |
…V0 (#146855)" This reverts commit aee21c3. As noted <llvm/llvm-project#146855 (comment)> this causes compile errors for several RVV configurations: fatal error: error in backend: SmallVector unable to grow. Requested capacity (4294967296) is larger than maximum value for size type (4294967295)
please avoid force push unless necessary |
for 'MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/mesh.cpp' |
…" (#148566) The defmask vector cannot contain instructions that use V0. for `MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/mesh.cpp` Save `%173:vrm2nov0 = PseudoVMERGE_VVM_M2 undef %173:vrm2nov0(tied-def 0), %116:vrm2, %173:vrm2nov0, killed $v0, -1, 5 `to def mask caused crash.
If two mask virtual registers have overlapping live ranges, may a move(vmv* v0, vx) will be generated to save the second mask. By moving the first mask's producer after the mask's use, the spill can be eliminated, and the move can be removed.
Try to remove vmv1r: https://gcc.godbolt.org/z/zbsWvfWYW
Before this patch, the loop body:
After this patch: