Skip to content

[LoopInterchange] Ignore the cost-model, force interchange if legal #148858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 18, 2025
Merged
16 changes: 15 additions & 1 deletion llvm/lib/Transforms/Scalar/LoopInterchange.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ enum class RuleTy {
PerLoopCacheAnalysis,
PerInstrOrderCost,
ForVectorization,
Ignore
};

} // end anonymous namespace
Expand Down Expand Up @@ -106,7 +107,10 @@ static cl::list<RuleTy> Profitabilities(
clEnumValN(RuleTy::PerInstrOrderCost, "instorder",
"Prioritize the IVs order of each instruction"),
clEnumValN(RuleTy::ForVectorization, "vectorize",
"Prioritize vectorization")));
"Prioritize vectorization"),
clEnumValN(RuleTy::Ignore, "ignore",
"Ignore profitability, force interchange (does not "
"work with other options)")));

#ifndef NDEBUG
static bool noDuplicateRules(ArrayRef<RuleTy> Rules) {
Expand Down Expand Up @@ -1286,6 +1290,11 @@ std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
bool LoopInterchangeProfitability::isProfitable(
const Loop *InnerLoop, const Loop *OuterLoop, unsigned InnerLoopId,
unsigned OuterLoopId, CharMatrix &DepMatrix, CacheCostManager &CCM) {

// Return true if interchange is forced.
if (Profitabilities.size() == 1 && Profitabilities[0] == RuleTy::Ignore)
return true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary if we handle RuleTy::Ignore properly in the later switch statement.

// isProfitable() is structured to avoid endless loop interchange. If the
// highest priority rule (isProfitablePerLoopCacheAnalysis by default) could
// decide the profitability then, profitability check will stop and return the
Expand All @@ -1311,6 +1320,11 @@ bool LoopInterchangeProfitability::isProfitable(
shouldInterchange =
isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix);
break;
case RuleTy::Ignore:
// TODO? We ignore the force option when it appears in a list, i.e. it
// should occur as the only option to be effective, as mentioned in the
// help.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO? We ignore the force option when it appears in a list, i.e. it
// should occur as the only option to be effective, as mentioned in the
// help.
shouldInterchange = true;

break;
}

// If this rule could determine the profitability, don't call subsequent
Expand Down
43 changes: 43 additions & 0 deletions llvm/test/Transforms/LoopInterchange/force-interchange.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
; RUN: opt < %s -passes=loop-interchange -pass-remarks-output=%t -disable-output -loop-interchange-profitabilities=ignore -S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orce interchange can occur odd behavior in some cases. Can you add a test to show it? Here is an example:

int A[4][4][4];
for (int i = 0; i < 4; i++)
  for (int j = 1; j < 4; j++)
    for (int k = 0; k < 4; k++)
      A[3-i][j-1][k] = A[i][j][k];

IIUIC, the following scenario happens due to the bubble-sort based heuristic:

1. Try to interchange the 2nd loop (j-loop) and 3rd loop (k-loop). It is legal, so they are interchanged.

2. Try to interchange the 1st loop (i-loop) and 2nd loop (k-loop, because the j-loop and k-loop were interchanged). It is illegal, so they are NOT interchanged.

3. Try to interchange the 2nd loop (k-loop) and 3rd loop (j-loop). It is legal indeed, so they are interchanged again.

Consequently, the first interchange is reverted, and the final loop order remains the same as the original. I believe this is acceptable, considering that the force interchange option is intended for testing purposes. However, it is worth clarifying this behavior with an example.

Could you please add this? It's fine to do it in another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, I will commit this and look at that suggestion early next week.
This allows me to run more testing over the weekend.
I have already found a couple of interchange problems, but can raise issues when this option is available.

; RUN: FileCheck --input-file=%t %s

; There should be no reason to interchange this, unless it is forced.
;
; for (int i = 0; i<1024; i++)
; for (int j = 0; j<1024; j++)
; A[i][j] = 42;
;
; CHECK: --- !Passed
; CHECK-NEXT: Pass: loop-interchange
; CHECK-NEXT: Name: Interchanged
; CHECK-NEXT: Function: f
; CHECK-NEXT: Args:
; CHECK-NEXT: - String: Loop interchanged with enclosing loop.
; CHECK-NEXT: ...

@A = dso_local local_unnamed_addr global [1024 x [1024 x i32]] zeroinitializer, align 4

define dso_local void @f() local_unnamed_addr #0 {
entry:
br label %for.cond1.preheader

for.cond1.preheader:
%indvars.iv17 = phi i64 [ 0, %entry ], [ %indvars.iv.next18, %for.cond.cleanup3 ]
br label %for.body4

for.cond.cleanup:
ret void

for.cond.cleanup3:
%indvars.iv.next18 = add nuw nsw i64 %indvars.iv17, 1
%exitcond20.not = icmp eq i64 %indvars.iv.next18, 1024
br i1 %exitcond20.not, label %for.cond.cleanup, label %for.cond1.preheader

for.body4:
%indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body4 ]
%arrayidx6 = getelementptr inbounds nuw [1024 x [1024 x i32]], ptr @A, i64 0, i64 %indvars.iv17, i64 %indvars.iv
store i32 42, ptr %arrayidx6, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, 1024
br i1 %exitcond.not, label %for.cond.cleanup3, label %for.body4
}
Loading