[RISCV] Convert LWU to LW if possible in RISCVOptWInstrs #144703

asb · 2025-06-18T13:47:19Z

After the refactoring in #149710 the logic change is trivial.

Motivation for preferring sign-extended 32-bit loads (LW) vs zero-extended (LWU):

LW is compressible while LWU is not.
Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Similar normalisation could be done for LHU and LH, but this is less well motivated as there is a compressed LHU (and if performing the change in RISCVOptWInstrs it wouldn't be done for RV32). There is a compressed LBU but not LB, meaning doing a similar normalisation for byte-sized loads would actually be a regression in terms of code size. Load narrowing when allowed by hasAllNBitUsers isn't explored in this patch.

This changes ~20500 instructions in an RVA22 build of the llvm-test-suite including SPEC 2017. As part of the review, the option of doing the change at ISel time was explored but was found to be less effective.

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note). Why prefer sign-extended loads? * Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb. * Helps to minimise the diff vs RV32 (e.g. LWU vs LW) * Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring. Issues or open questions with this patch as stands: * Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on. * RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32. * Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit. This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.

llvmbot · 2025-06-18T13:47:54Z

@llvm/pr-subscribers-backend-risc-v

Author: Alex Bradbury (asb)

Changes

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb.
Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.

Patch is 468.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144703.diff

58 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+45)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll (+5-11)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll (+5-11)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+96-96)
(modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+30-15)
(modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/bfloat.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/double-convert-strict.ll (+6-12)
(modified) llvm/test/CodeGen/RISCV/double-convert.ll (+6-12)
(modified) llvm/test/CodeGen/RISCV/float-convert-strict.ll (+10-22)
(modified) llvm/test/CodeGen/RISCV/float-convert.ll (+10-22)
(modified) llvm/test/CodeGen/RISCV/fold-mem-offset.ll (+132-65)
(modified) llvm/test/CodeGen/RISCV/half-arith.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/half-convert-strict.ll (+15-27)
(modified) llvm/test/CodeGen/RISCV/half-convert.ll (+21-39)
(modified) llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll (+15-7)
(modified) llvm/test/CodeGen/RISCV/local-stack-slot-allocation.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/mem64.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/memcpy-inline.ll (+94-94)
(modified) llvm/test/CodeGen/RISCV/memcpy.ll (+32-32)
(modified) llvm/test/CodeGen/RISCV/memmove.ll (+35-35)
(modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+160-160)
(modified) llvm/test/CodeGen/RISCV/prefer-w-inst.mir (+2-2)
(modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+512-512)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+33-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+97-97)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-vrgather.ll (+32-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-load-store.ll (-23)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+87-87)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+17-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+2-7)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+31-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+31-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (-14)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+35-17)
(modified) llvm/test/CodeGen/RISCV/rvv/memcpy-inline.ll (+13-13)
(modified) llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+15-6)
(modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/stack-clash-prologue.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/unaligned-load-store.ll (+13-13)
(modified) llvm/test/CodeGen/RISCV/urem-seteq-illegal-types.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-by-byte-multiple-legalization.ll (+132-132)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-legalization.ll (+72-72)
(modified) llvm/test/CodeGen/RISCV/zdinx-boundary-check.ll (+3-3)

diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index ed61236415ccf..a141bae55b70f 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -71,6 +71,8 @@ class RISCVOptWInstrs : public MachineFunctionPass {
                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
   bool appendWSuffixes(MachineFunction &MF, const RISCVInstrInfo &TII,
                        const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
+  bool convertZExtLoads(MachineFunction &MF, const RISCVInstrInfo &TII,
+                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesCFG();
@@ -788,6 +790,47 @@ bool RISCVOptWInstrs::appendWSuffixes(MachineFunction &MF,
   return MadeChange;
 }
 
+bool RISCVOptWInstrs::convertZExtLoads(MachineFunction &MF,
+                                      const RISCVInstrInfo &TII,
+                                      const RISCVSubtarget &ST,
+                                      MachineRegisterInfo &MRI) {
+  bool MadeChange = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      unsigned WOpc;
+      int UsersWidth;
+      switch (MI.getOpcode()) {
+      default:
+        continue;
+      case RISCV::LBU:
+        WOpc = RISCV::LB;
+        UsersWidth = 8;
+        break;
+      case RISCV::LHU:
+        WOpc = RISCV::LH;
+        UsersWidth = 16;
+        break;
+      case RISCV::LWU:
+        WOpc = RISCV::LW;
+        UsersWidth = 32;
+        break;
+      }
+
+      if (hasAllNBitUsers(MI, ST, MRI, UsersWidth)) {
+        LLVM_DEBUG(dbgs() << "Replacing " << MI);
+        MI.setDesc(TII.get(WOpc));
+        MI.clearFlag(MachineInstr::MIFlag::NoSWrap);
+        MI.clearFlag(MachineInstr::MIFlag::NoUWrap);
+        MI.clearFlag(MachineInstr::MIFlag::IsExact);
+        LLVM_DEBUG(dbgs() << "     with " << MI);
+        MadeChange = true;
+      }
+    }
+  }
+
+  return MadeChange;
+}
+
 bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -808,5 +851,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (ST.preferWInst())
     MadeChange |= appendWSuffixes(MF, TII, ST, MRI);
 
+  MadeChange |= convertZExtLoads(MF, TII, ST, MRI);
+
   return MadeChange;
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
index a49e94f4bc910..620c5ecc6c1e7 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
@@ -246,17 +246,11 @@ define double @fcvt_d_wu(i32 %a) nounwind {
 }
 
 define double @fcvt_d_wu_load(ptr %p) nounwind {
-; RV32IFD-LABEL: fcvt_d_wu_load:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    lw a0, 0(a0)
-; RV32IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fcvt_d_wu_load:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    lwu a0, 0(a0)
-; RV64IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fcvt_d_wu_load:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    lw a0, 0(a0)
+; CHECKIFD-NEXT:    fcvt.d.wu fa0, a0
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_d_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
index fa093623dd6f8..bbea7929a304e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
@@ -232,17 +232,11 @@ define float @fcvt_s_wu(i32 %a) nounwind {
 }
 
 define float @fcvt_s_wu_load(ptr %p) nounwind {
-; RV32IF-LABEL: fcvt_s_wu_load:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    lw a0, 0(a0)
-; RV32IF-NEXT:    fcvt.s.wu fa0, a0
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fcvt_s_wu_load:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    lwu a0, 0(a0)
-; RV64IF-NEXT:    fcvt.s.wu fa0, a0
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fcvt_s_wu_load:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lw a0, 0(a0)
+; CHECKIF-NEXT:    fcvt.s.wu fa0, a0
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_s_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
index 9690302552090..65838f51fc920 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
@@ -748,7 +748,7 @@ define signext i32 @ctpop_i32_load(ptr %p) nounwind {
 ;
 ; RV64ZBB-LABEL: ctpop_i32_load:
 ; RV64ZBB:       # %bb.0:
-; RV64ZBB-NEXT:    lwu a0, 0(a0)
+; RV64ZBB-NEXT:    lw a0, 0(a0)
 ; RV64ZBB-NEXT:    cpopw a0, a0
 ; RV64ZBB-NEXT:    ret
   %a = load i32, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
index cd59c9e01806d..ba058ca0b500a 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
@@ -114,7 +114,7 @@ define i64 @pack_i64_2(i32 signext %a, i32 signext %b) nounwind {
 define i64 @pack_i64_3(ptr %0, ptr %1) {
 ; RV64I-LABEL: pack_i64_3:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    lwu a0, 0(a0)
+; RV64I-NEXT:    lw a0, 0(a0)
 ; RV64I-NEXT:    lwu a1, 0(a1)
 ; RV64I-NEXT:    slli a0, a0, 32
 ; RV64I-NEXT:    or a0, a0, a1
@@ -122,8 +122,8 @@ define i64 @pack_i64_3(ptr %0, ptr %1) {
 ;
 ; RV64ZBKB-LABEL: pack_i64_3:
 ; RV64ZBKB:       # %bb.0:
-; RV64ZBKB-NEXT:    lwu a0, 0(a0)
-; RV64ZBKB-NEXT:    lwu a1, 0(a1)
+; RV64ZBKB-NEXT:    lw a0, 0(a0)
+; RV64ZBKB-NEXT:    lw a1, 0(a1)
 ; RV64ZBKB-NEXT:    pack a0, a1, a0
 ; RV64ZBKB-NEXT:    ret
   %3 = load i32, ptr %0, align 4
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
index 69519c00f88ea..27c6d0240f987 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
@@ -8,13 +8,13 @@ define void @lshr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -85,13 +85,13 @@ define void @shl_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -162,13 +162,13 @@ define void @ashr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -244,25 +244,25 @@ define void @lshr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -395,25 +395,25 @@ define void @shl_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -541,25 +541,25 @@ define void @ashr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -695,7 +695,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -707,7 +707,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -729,7 +729,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1028,7 +1028,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1040,7 +1040,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1062,7 +1062,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1361,7 +1361,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1373,7 +1373,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1395,7 +1395,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1690,7 +1690,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1702,7 +1702,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1724,7 +1724,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2020,7 +2020,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2032,7 +2032,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2054,7 +2054,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2353,7 +2353,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2365,7 +2365,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2387,7 +2387,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2697,7 +2697,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2705,7 +2705,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    lbu s5, 17(a0)
 ; RV64I-NEXT:    lbu s6, 18(a0)
@@ -2719,7 +2719,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s8, 20(a0)
 ; RV64I-NEXT:    lbu s9, 21(a0)
 ; RV64I-NEXT:    lbu s10, 22(a0)
-; RV64I-NEXT:    lbu s11, 23(a0)
+; RV64I-NEXT:    lb s11, 23(a0)
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2741,7 +2741,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s2, 28(a0)
 ; RV64I-NEXT:    lbu s3, 29(a0)
 ; RV64I-NEXT:    lbu s4, 30(a0)
-; RV64I-NEXT:    lbu a0, 31(a0)
+; RV64I-NEXT:    lb a0, 31(a0)
 ; RV64I-NEXT:    slli s9, s9, 8
 ; RV64I-NEXT:    slli s11, s11, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2763,7 +2763,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a0, 4(a1)
 ; RV64I-NEXT:    lbu s1, 5(a1)
 ; RV64I-NEXT:    lbu s4, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli s8, s8, 8
 ; RV64I-NEXT:    or s7, s8, s7
 ; RV64I-NEXT:    slli s1, s1, 8
@@ -3621,7 +3621,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -3629,7 +3629,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    ...
[truncated]

llvmbot · 2025-06-18T13:47:55Z

@llvm/pr-subscribers-llvm-globalisel

Author: Alex Bradbury (asb)

Changes

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb.
Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.

Patch is 468.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144703.diff

58 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+45)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll (+5-11)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll (+5-11)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+96-96)
(modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+30-15)
(modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/bfloat.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/double-convert-strict.ll (+6-12)
(modified) llvm/test/CodeGen/RISCV/double-convert.ll (+6-12)
(modified) llvm/test/CodeGen/RISCV/float-convert-strict.ll (+10-22)
(modified) llvm/test/CodeGen/RISCV/float-convert.ll (+10-22)
(modified) llvm/test/CodeGen/RISCV/fold-mem-offset.ll (+132-65)
(modified) llvm/test/CodeGen/RISCV/half-arith.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/half-convert-strict.ll (+15-27)
(modified) llvm/test/CodeGen/RISCV/half-convert.ll (+21-39)
(modified) llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll (+15-7)
(modified) llvm/test/CodeGen/RISCV/local-stack-slot-allocation.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/mem64.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+8-8)
(modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/memcpy-inline.ll (+94-94)
(modified) llvm/test/CodeGen/RISCV/memcpy.ll (+32-32)
(modified) llvm/test/CodeGen/RISCV/memmove.ll (+35-35)
(modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+160-160)
(modified) llvm/test/CodeGen/RISCV/prefer-w-inst.mir (+2-2)
(modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+512-512)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+33-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+97-97)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-vrgather.ll (+32-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-load-store.ll (-23)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+87-87)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+17-8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+5-5)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+2-7)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+31-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+31-15)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (-14)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+35-17)
(modified) llvm/test/CodeGen/RISCV/rvv/memcpy-inline.ll (+13-13)
(modified) llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+15-6)
(modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
(modified) llvm/test/CodeGen/RISCV/stack-clash-prologue.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/unaligned-load-store.ll (+13-13)
(modified) llvm/test/CodeGen/RISCV/urem-seteq-illegal-types.ll (+1-1)
(modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-by-byte-multiple-legalization.ll (+132-132)
(modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-legalization.ll (+72-72)
(modified) llvm/test/CodeGen/RISCV/zdinx-boundary-check.ll (+3-3)

diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index ed61236415ccf..a141bae55b70f 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -71,6 +71,8 @@ class RISCVOptWInstrs : public MachineFunctionPass {
                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
   bool appendWSuffixes(MachineFunction &MF, const RISCVInstrInfo &TII,
                        const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
+  bool convertZExtLoads(MachineFunction &MF, const RISCVInstrInfo &TII,
+                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesCFG();
@@ -788,6 +790,47 @@ bool RISCVOptWInstrs::appendWSuffixes(MachineFunction &MF,
   return MadeChange;
 }
 
+bool RISCVOptWInstrs::convertZExtLoads(MachineFunction &MF,
+                                      const RISCVInstrInfo &TII,
+                                      const RISCVSubtarget &ST,
+                                      MachineRegisterInfo &MRI) {
+  bool MadeChange = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      unsigned WOpc;
+      int UsersWidth;
+      switch (MI.getOpcode()) {
+      default:
+        continue;
+      case RISCV::LBU:
+        WOpc = RISCV::LB;
+        UsersWidth = 8;
+        break;
+      case RISCV::LHU:
+        WOpc = RISCV::LH;
+        UsersWidth = 16;
+        break;
+      case RISCV::LWU:
+        WOpc = RISCV::LW;
+        UsersWidth = 32;
+        break;
+      }
+
+      if (hasAllNBitUsers(MI, ST, MRI, UsersWidth)) {
+        LLVM_DEBUG(dbgs() << "Replacing " << MI);
+        MI.setDesc(TII.get(WOpc));
+        MI.clearFlag(MachineInstr::MIFlag::NoSWrap);
+        MI.clearFlag(MachineInstr::MIFlag::NoUWrap);
+        MI.clearFlag(MachineInstr::MIFlag::IsExact);
+        LLVM_DEBUG(dbgs() << "     with " << MI);
+        MadeChange = true;
+      }
+    }
+  }
+
+  return MadeChange;
+}
+
 bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -808,5 +851,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (ST.preferWInst())
     MadeChange |= appendWSuffixes(MF, TII, ST, MRI);
 
+  MadeChange |= convertZExtLoads(MF, TII, ST, MRI);
+
   return MadeChange;
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
index a49e94f4bc910..620c5ecc6c1e7 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
@@ -246,17 +246,11 @@ define double @fcvt_d_wu(i32 %a) nounwind {
 }
 
 define double @fcvt_d_wu_load(ptr %p) nounwind {
-; RV32IFD-LABEL: fcvt_d_wu_load:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    lw a0, 0(a0)
-; RV32IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fcvt_d_wu_load:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    lwu a0, 0(a0)
-; RV64IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fcvt_d_wu_load:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    lw a0, 0(a0)
+; CHECKIFD-NEXT:    fcvt.d.wu fa0, a0
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_d_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
index fa093623dd6f8..bbea7929a304e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
@@ -232,17 +232,11 @@ define float @fcvt_s_wu(i32 %a) nounwind {
 }
 
 define float @fcvt_s_wu_load(ptr %p) nounwind {
-; RV32IF-LABEL: fcvt_s_wu_load:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    lw a0, 0(a0)
-; RV32IF-NEXT:    fcvt.s.wu fa0, a0
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fcvt_s_wu_load:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    lwu a0, 0(a0)
-; RV64IF-NEXT:    fcvt.s.wu fa0, a0
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fcvt_s_wu_load:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lw a0, 0(a0)
+; CHECKIF-NEXT:    fcvt.s.wu fa0, a0
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_s_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
index 9690302552090..65838f51fc920 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
@@ -748,7 +748,7 @@ define signext i32 @ctpop_i32_load(ptr %p) nounwind {
 ;
 ; RV64ZBB-LABEL: ctpop_i32_load:
 ; RV64ZBB:       # %bb.0:
-; RV64ZBB-NEXT:    lwu a0, 0(a0)
+; RV64ZBB-NEXT:    lw a0, 0(a0)
 ; RV64ZBB-NEXT:    cpopw a0, a0
 ; RV64ZBB-NEXT:    ret
   %a = load i32, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
index cd59c9e01806d..ba058ca0b500a 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
@@ -114,7 +114,7 @@ define i64 @pack_i64_2(i32 signext %a, i32 signext %b) nounwind {
 define i64 @pack_i64_3(ptr %0, ptr %1) {
 ; RV64I-LABEL: pack_i64_3:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    lwu a0, 0(a0)
+; RV64I-NEXT:    lw a0, 0(a0)
 ; RV64I-NEXT:    lwu a1, 0(a1)
 ; RV64I-NEXT:    slli a0, a0, 32
 ; RV64I-NEXT:    or a0, a0, a1
@@ -122,8 +122,8 @@ define i64 @pack_i64_3(ptr %0, ptr %1) {
 ;
 ; RV64ZBKB-LABEL: pack_i64_3:
 ; RV64ZBKB:       # %bb.0:
-; RV64ZBKB-NEXT:    lwu a0, 0(a0)
-; RV64ZBKB-NEXT:    lwu a1, 0(a1)
+; RV64ZBKB-NEXT:    lw a0, 0(a0)
+; RV64ZBKB-NEXT:    lw a1, 0(a1)
 ; RV64ZBKB-NEXT:    pack a0, a1, a0
 ; RV64ZBKB-NEXT:    ret
   %3 = load i32, ptr %0, align 4
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
index 69519c00f88ea..27c6d0240f987 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
@@ -8,13 +8,13 @@ define void @lshr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -85,13 +85,13 @@ define void @shl_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -162,13 +162,13 @@ define void @ashr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -244,25 +244,25 @@ define void @lshr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -395,25 +395,25 @@ define void @shl_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -541,25 +541,25 @@ define void @ashr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -695,7 +695,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -707,7 +707,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -729,7 +729,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1028,7 +1028,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1040,7 +1040,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1062,7 +1062,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1361,7 +1361,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1373,7 +1373,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1395,7 +1395,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1690,7 +1690,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1702,7 +1702,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1724,7 +1724,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2020,7 +2020,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2032,7 +2032,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2054,7 +2054,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2353,7 +2353,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2365,7 +2365,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2387,7 +2387,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2697,7 +2697,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2705,7 +2705,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    lbu s5, 17(a0)
 ; RV64I-NEXT:    lbu s6, 18(a0)
@@ -2719,7 +2719,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s8, 20(a0)
 ; RV64I-NEXT:    lbu s9, 21(a0)
 ; RV64I-NEXT:    lbu s10, 22(a0)
-; RV64I-NEXT:    lbu s11, 23(a0)
+; RV64I-NEXT:    lb s11, 23(a0)
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2741,7 +2741,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s2, 28(a0)
 ; RV64I-NEXT:    lbu s3, 29(a0)
 ; RV64I-NEXT:    lbu s4, 30(a0)
-; RV64I-NEXT:    lbu a0, 31(a0)
+; RV64I-NEXT:    lb a0, 31(a0)
 ; RV64I-NEXT:    slli s9, s9, 8
 ; RV64I-NEXT:    slli s11, s11, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2763,7 +2763,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a0, 4(a1)
 ; RV64I-NEXT:    lbu s1, 5(a1)
 ; RV64I-NEXT:    lbu s4, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli s8, s8, 8
 ; RV64I-NEXT:    or s7, s8, s7
 ; RV64I-NEXT:    slli s1, s1, 8
@@ -3621,7 +3621,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -3629,7 +3629,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    ...
[truncated]

github-actions · 2025-06-18T13:50:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

topperc · 2025-06-18T14:38:16Z

If I remember right, there is no c.lb in Zcb. Does this make compression worse?

asb · 2025-06-18T14:50:25Z

If I remember right, there is no c.lb in Zcb. Does this make compression worse?

You're right, I'll update to exclude LB.

EDIT: Now pushed, and patch description updated.

topperc · 2025-06-18T19:02:47Z

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp. That isn't restricted to RV64. Though it won't work across basic blocks.

llvm/test/CodeGen/RISCV/bf16-promote.ll

topperc · 2025-06-18T22:52:44Z

I think changing LWU->LW is useful for compression and minimizing RV32/RV64 delta.

I'm skeptical that LWU->LW and LHU->LH will help us match gcc better. Here are trivial examples where LLVM used LH/LW and gcc used LHU/LWU. https://godbolt.org/z/cbje4aT7P I guess it could be better on average, but it certainly doesn't guarantee a match to gcc.

asb · 2025-06-19T14:47:34Z

I think changing LWU->LW is useful for compression and minimizing RV32/RV64 delta.

Agreed.

I'm skeptical that LWU->LW and LHU->LH will help us match gcc better. Here are trivial examples where LLVM used LH/LW and gcc used LHU/LWU. https://godbolt.org/z/cbje4aT7P I guess it could be better on average, but it certainly doesn't guarantee a match to gcc.

That may be true. I spotted this looking at how we had some LWU that GCC didn't in a workload and wondering if it was an indicator of us overall making some different/worse choices in terms of sign/zero extension (and of course found it was just a case where either was equivalent), and I'm sure I've seen the same before. But I totally believe there are other cases where we make different choices. I'll quantify how often it kicks in, but I probably wouldn't mind dropping LHU->LH. Perhaps the argument for this kind of change is more just "canonicalisation".

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp

I'll try that and report back.

asb · 2025-06-25T14:17:02Z

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp

I'm limiting scope to just LWU => LW for now (there perhaps isn't much of an argument for LHU => LH beyond perhaps picking a "canonical form). I implemented this at ISel with the following patch:

--- a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
@@ -208,7 +208,8 @@ bool RISCVInstructionSelector::hasAllNBitUsers(const MachineInstr &MI,
           MI.getOpcode() == TargetOpcode::G_AND ||
           MI.getOpcode() == TargetOpcode::G_OR ||
           MI.getOpcode() == TargetOpcode::G_XOR ||
-          MI.getOpcode() == TargetOpcode::G_SEXT_INREG || Depth != 0) &&
+          MI.getOpcode() == TargetOpcode::G_SEXT_INREG ||
+          MI.getOpcode() == TargetOpcode::G_ZEXTLOAD || Depth != 0) &&
          "Unexpected opcode");

   if (Depth >= RISCVInstructionSelector::MaxRecursionDepth)
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3532,7 +3532,8 @@ bool RISCVDAGToDAGISel::hasAllNBitUsers(SDNode *Node, unsigned Bits,
           Node->getOpcode() == ISD::SRL || Node->getOpcode() == ISD::AND ||
           Node->getOpcode() == ISD::OR || Node->getOpcode() == ISD::XOR ||
           Node->getOpcode() == ISD::SIGN_EXTEND_INREG ||
-          isa<ConstantSDNode>(Node) || Depth != 0) &&
+          Node->getOpcode() == ISD::LOAD || isa<ConstantSDNode>(Node) ||
+          Depth != 0) &&
          "Unexpected opcode");

   if (Depth >= SelectionDAG::MaxRecursionDepth)
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -2083,6 +2083,13 @@ class binop_allwusers<SDPatternOperator operator>
   let GISelPredicateCode = [{ return hasAllWUsers(MI); }];
 }

+class unaryop_allwusers<SDPatternOperator operator>
+    : PatFrag<(ops node:$arg), (i64(operator node:$arg)), [{
+  return hasAllWUsers(N);
+}]> {
+  let GISelPredicateCode = [{ return hasAllWUsers(MI); }];
+}
+
 def sexti32_allwusers : PatFrag<(ops node:$src),
                                 (sext_inreg node:$src, i32), [{
   return hasAllWUsers(N);
@@ -2157,6 +2164,7 @@ def : Pat<(or_is_add 33signbits_node:$rs1, simm12:$imm),

 def : LdPat<sextloadi32, LW, i64>;
 def : LdPat<extloadi32, LW, i64>;
+def : LdPat<unaryop_allwusers<zextloadi32>, LW, i64>;
 def : LdPat<zextloadi32, LWU, i64>;
 def : LdPat<load, LD, i64>;

I've found that this is much less effective than the RISCVOptWInstrs change:

Doing it at ISel results in ~4600 instructions changed across the test suite
In RISCVOoptWInstrs (updated to do LWU=>LW only) changes ~20500 instructions
There is no additional benefit in doing both (as you'd expected - but I ran this just to check).

I'm about to push changes to this patch to limit it to just the LWU change. Looking at the diffs between doing it at isel and this way, there are some cases where at first glance I would have thought would have been handled - I'll pick through a couple just to check there's nothing surprising going on.

lenary

LGTM

lenary · 2025-07-03T01:34:57Z

To maybe add some more info to my LGTM: I accept this is probably not the perfect place for this, but it's a place that seems to work.

topperc · 2025-07-03T23:49:51Z

llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp

@@ -808,5 +831,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
  if (ST.preferWInst())
    MadeChange |= appendWSuffixes(MF, TII, ST, MRI);

+  MadeChange |= convertZExtLoads(MF, TII, ST, MRI);


Can we combine stripWSuffixes/appendWSuffixes/convertZExtLoads into a single function that walks over the function once and does the right thing?

I'll try that out.

asb · 2025-07-09T12:55:17Z

I'm about to push changes to this patch to limit it to just the LWU change. Looking at the diffs between doing it at isel and this way, there are some cases where at first glance I would have thought would have been handled - I'll pick through a couple just to check there's nothing surprising going on.

I looked into this some more, and an issue with my SDag patch posted above is that if you have a load where the result is used by another instruction, but the chain is also used by a store, then that use of the chain causes hasAllNbitUsers to return false. So adding this fixes that:

@@ -3548,6 +3549,9 @@ bool RISCVDAGToDAGISel::hasAllNBitUsers(SDNode *Node, unsigned Bits,
     // Users of this node should have already been instruction selected
     if (!User->isMachineOpcode())
       return false;
+    // Skip uses of a chain result.
+    if (Node->getValueType(Use.getResNo()) == MVT::Other)
+      continue;

     // TODO: Add more opcodes?
     switch (User->getMachineOpcode()) {

This bumps us up to ~8400 changed instructions across llvm-test-suite so still limited vs the RISCVOptWInstrs approach.

I think investigating doing this at ISel time as well may be worthwhile in the fullness of time, but for now doing it at RISCVOptWInstrs seems the simplest path forwards that solves the immediate problem.

…d-normalisation

…es into canonicalizeWSuffixes This refactor was suggested in <llvm#144703>. I have checked for unexpected changes by comparing builds of llvm-test-suite with/without this refactor, including with preferWInst force enabled.

asb · 2025-07-20T14:18:55Z

I've implemented the refactoring suggested by @topperc in #149710. I'll merge in the changes and rework this once that lands. As noted in that PR, the logic changes for this patch become more minimal after that change:

--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -736,7 +736,8 @@ bool RISCVOptWInstrs::canonicalizeWSuffixes(MachineFunction &MF,
     for (MachineInstr &MI : MBB) {
       std::optional<unsigned> WOpc;
       std::optional<unsigned> NonWOpc;
-      switch (MI.getOpcode()) {
+      unsigned OrigOpc = MI.getOpcode();
+      switch (OrigOpc) {
       default:
         continue;
       case RISCV::ADDW:
@@ -786,7 +787,8 @@ bool RISCVOptWInstrs::canonicalizeWSuffixes(MachineFunction &MF,
         MadeChange = true;
         continue;
       }
-      if (ShouldPreferW && WOpc.has_value() && hasAllWUsers(MI, ST, MRI)) {
+      if ((ShouldPreferW || OrigOpc == RISCV::LWU) && WOpc.has_value() &&
+          hasAllWUsers(MI, ST, MRI)) {
         LLVM_DEBUG(dbgs() << "Replacing " << MI);
         MI.setDesc(TII.get(WOpc.value()));
         MI.clearFlag(MachineInstr::MIFlag::NoSWrap);

(The above retains the same behaviour as the current implementation of this PR - always converting LWU to LW if possible. It could be made to always convert LD to LW as happens with ST.preferWInst, but it's not clear it's beneficial to and I'm wary of introducing additional load narrowing. Specifically, I don't feel confident enough about this always being a semantically correct change - for instance if the LD would have caused an exception but the LW wouldn't have).

…xes into canonicalizeWSuffixes (#149710) This refactor was suggested in <#144703>. I have checked for unexpected changes by comparing builds of llvm-test-suite with/without this refactor, including with preferWInst force enabled.

…d-normalisation Note RISCVOptWInstrs changes temporarily reverted as they will be reimplemented followed recent refactor that just landed.

…ppendWSuffixes into canonicalizeWSuffixes (#149710) This refactor was suggested in <llvm/llvm-project#144703>. I have checked for unexpected changes by comparing builds of llvm-test-suite with/without this refactor, including with preferWInst force enabled.

asb · 2025-07-21T10:48:21Z

I'm going to go ahead and land this even though the implementation has changed a bit since the last LGTM:

The logic change has become trivial since [RISCV][NFC] Combine RISCVOptWInstrs::stripWSuffixes and appendWSuffixes into canonicalizeWSuffixes #149710 and the intended approach was flagged in that PR
The implementation approach hasn't changed meaningfully since the previous review (just in a different function in the same pass)
There has long been agreement on the approach.

…xes into canonicalizeWSuffixes (llvm#149710) This refactor was suggested in <llvm#144703>. I have checked for unexpected changes by comparing builds of llvm-test-suite with/without this refactor, including with preferWInst force enabled.

After the refactoring in llvm#149710 the logic change is trivial. Motivation for preferring sign-extended 32-bit loads (LW) vs zero-extended (LWU): * LW is compressible while LWU is not. * Helps to minimise the diff vs RV32 (e.g. LWU vs LW) * Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring. Similar normalisation could be done for LHU and LH, but this is less well motivated as there is a compressed LHU (and if performing the change in RISCVOptWInstrs it wouldn't be done for RV32). There is a compressed LBU but not LB, meaning doing a similar normalisation for byte-sized loads would actually be a regression in terms of code size. Load narrowing when allowed by hasAllNBitUsers isn't explored in this patch. This changes ~20500 instructions in an RVA22 build of the llvm-test-suite including SPEC 2017. As part of the review, the option of doing the change at ISel time was explored but was found to be less effective.

asb requested review from preames, topperc and wangpc-pp June 18, 2025 13:47

llvmbot added backend:RISC-V llvm:globalisel labels Jun 18, 2025

Missed formatting

1863ec3

Don't convert LBU to LB

7a5774e

lenary reviewed Jun 18, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/bf16-promote.ll Outdated Show resolved Hide resolved

Leave LHU alone as well (to sstreamline this change)

1489c1b

asb changed the title ~~[RISCV] Switch to sign-extended loads if possible in RISCVOptWInstrs~~ [RISCV] Convert LWU to LW if possible in RISCVOptWInstrs Jun 25, 2025

asb added 2 commits June 25, 2025 15:19

Simplify convertZExtLoads

cc80b32

Add comment

04e2fb0

lenary approved these changes Jul 3, 2025

View reviewed changes

topperc reviewed Jul 3, 2025

View reviewed changes

asb mentioned this pull request Jul 9, 2025

[RISCV][RFC] Add additional opcodes to RISCVDAGToDAGISel::hasAllNBitUsers #147728

Open

asb added 3 commits July 16, 2025 02:48

Merge remote-tracking branch 'origin/main' into 2025q2-riscv-zext-loa…

b7b970d

…d-normalisation

Fix typo

d2e1a04

Merge remote-tracking branch 'origin/main' into 2025q2-riscv-zext-loa…

8656327

…d-normalisation

asb mentioned this pull request Jul 20, 2025

[RISCV][NFC] Combine RISCVOptWInstrs::stripWSuffixes and appendWSuffixes into canonicalizeWSuffixes #149710

Merged

asb added 2 commits July 21, 2025 10:51

Merge remote-tracking branch 'origin/main' into 2025q2-riscv-zext-loa…

d43e6da

…d-normalisation Note RISCVOptWInstrs changes temporarily reverted as they will be reimplemented followed recent refactor that just landed.

Reimplement LWU to LW conversion after refactor

15c8fd7

asb merged commit fc69f25 into llvm:main Jul 21, 2025
8 of 9 checks passed

This was referenced Jul 23, 2025

test abhinavgaba/llvm-project#2

Closed

Add dataFence plugin interface abhinavgaba/llvm-project#3

Closed

[RISCV] Convert LWU to LW if possible in RISCVOptWInstrs #144703

[RISCV] Convert LWU to LW if possible in RISCVOptWInstrs #144703

Uh oh!

Conversation

asb commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 18, 2025

Uh oh!

llvmbot commented Jun 18, 2025

Uh oh!

github-actions bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asb commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented Jun 18, 2025

Uh oh!

Uh oh!

topperc commented Jun 18, 2025

Uh oh!

asb commented Jun 19, 2025

Uh oh!

asb commented Jun 25, 2025

Uh oh!

lenary left a comment

Choose a reason for hiding this comment

Uh oh!

lenary commented Jul 3, 2025

Uh oh!

topperc Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

asb Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

asb commented Jul 9, 2025

Uh oh!

asb commented Jul 20, 2025

Uh oh!

asb commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

asb commented Jun 18, 2025 •

edited

Loading

github-actions bot commented Jun 18, 2025 •

edited

Loading

topperc commented Jun 18, 2025 •

edited

Loading

asb commented Jun 18, 2025 •

edited

Loading