Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
8fa4896
implement llvm-readobj --coff-pseudoreloc stub
kikairoya Jul 31, 2025
48f4280
find pseudo-reloc symbols
kikairoya Jul 30, 2025
fff11ce
dump raw reloc data
kikairoya Jul 31, 2025
be77e1c
show symbol name
kikairoya Jul 31, 2025
145c3e4
use support::ulittle32_t
kikairoya Aug 4, 2025
9ff81f6
use named functor instead of unnamed lambda
kikairoya Aug 4, 2025
4cd5e29
note the option "Cygwin/MinGW specific" and reorder alphabetically
kikairoya Aug 6, 2025
6980cb1
fix handling of errors
kikairoya Aug 6, 2025
8a5a7f8
range check for reloc record region
kikairoya Aug 6, 2025
8e02d4a
cache import directory size
kikairoya Aug 6, 2025
0dcc26d
reorganize test
kikairoya Aug 6, 2025
b182e0e
early exit when not a PE
kikairoya Aug 6, 2025
f48e21a
dump target symbol
kikairoya Aug 5, 2025
41c0e01
reduce usage of auto
kikairoya Aug 9, 2025
93b46cd
use sorted vector
kikairoya Aug 9, 2025
09b643e
reorganize warnings
kikairoya Aug 9, 2025
f57fd35
reorganize test 1/2
kikairoya Aug 9, 2025
45bcfeb
reorganize test 2/2
kikairoya Aug 9, 2025
56d51f6
add missing tests
kikairoya Aug 10, 2025
46cb5bc
warn about missing imported symbol
kikairoya Aug 11, 2025
6e6e760
format
kikairoya Aug 11, 2025
e9ab396
format, style, wording (nfc)
kikairoya Aug 11, 2025
6866552
cleanup
kikairoya Aug 20, 2025
2475486
include the section number into the warning about pointed section dif…
kikairoya Aug 14, 2025
673bebd
include the pointed address into the warning about address reversal
kikairoya Aug 14, 2025
09572ae
include the referenced address into the warning about not pointing an…
kikairoya Aug 20, 2025
e80bdb6
include the address into the warning about address overrun
kikairoya Aug 20, 2025
5ccd671
use reportUniqueWarning
kikairoya Aug 14, 2025
706cffa
format
kikairoya Aug 21, 2025
1d75f4c
swap the operands in the comparison and allow END == end of section
kikairoya Aug 21, 2025
29fd44f
include the section size into the warning about address overrun
kikairoya Aug 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
97 changes: 97 additions & 0 deletions llvm/test/tools/llvm-readobj/COFF/pseudoreloc.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/pseudoreloc.exe | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, it'd be nicer to synthesize the test binary from yaml with yaml2obj rather than checking in a binary. (We do have some binaries checked in from before, but we'd like to keep that to a minimum, e.g. for binaries that can't be synthesized with yaml2obj yet.)

I see that you have the full procedure included for regenerating the binary, that's nice and appreciated! If converting it to yaml, it's also somewhat customary to strip down the size of it by removing unnecessary data from it. Perhaps it's not necessary in this case if the payload of each section is only a couple dozens of bytes anyway though. But if it is, the instructions would unfortunately end with obj2yaml pseudoreloc.exe > pseudoreloc.exe.yaml # and manually strip down the .yaml file. But if there's not that much unnecessary in there, perhaps we don't need to strip it manually at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Using yaml2obj, I was able to edit the content using sed and easily add tests for errors.

RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/nop.exe.coff-x86-64 | FileCheck %s --check-prefix=NOSYM
RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/trivial.obj.coff-i386 | FileCheck %s --check-prefix=NORELOC

CHECK: Format: COFF-i386
CHECK-NEXT: Arch: i386
CHECK-NEXT: AddressSize: 32bit
CHECK-NEXT: PseudoReloc [
CHECK-NEXT: Entry {
CHECK-NEXT: Symbol: 0x{{[0-9A-Z]+}}
CHECK-NEXT: SymbolName: sym1
CHECK-NEXT: Target: 0x{{[0-9A-Z]+}}
CHECK-NEXT: BitWidth: {{[0-9]+}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While these regex patterns are nice for accepting anything that we'd expect to output, it also makes the test kinda weak - we could start outputting different wrong addresses (and bitwidth sizes), without the test catching it - that's not ideal. So I think it would be good to check the actual numbers as well, to make sure the test catches any potential breakage in it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileCheck has the concept of numeric capture and substitutions and even simple maths like additions and subtractions. It sounds like this would be a good use-case for this functionality. See https://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-numeric-substitution-blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've rewritten to use numeric capture.

CHECK-NEXT: }
CHECK-NEXT: Entry {
CHECK-NEXT: Symbol: 0x{{[0-9A-Z]+}}
CHECK-NEXT: SymbolName: sym2
CHECK-NEXT: Target: 0x{{[0-9A-Z]+}}
CHECK-NEXT: BitWidth: {{[0-9]+}}
CHECK-NEXT: }
CHECK-NEXT: Entry {
CHECK-NEXT: Symbol: 0x{{[0-9A-Z]+}}
CHECK-NEXT: SymbolName: sym1
CHECK-NEXT: Target: 0x{{[0-9A-Z]+}}
CHECK-NEXT: BitWidth: {{[0-9]+}}
CHECK-NEXT: }
CHECK-NEXT: ]

NOSYM-NOT: PseudoReloc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of needing to repeat the -NOT condition before/after the positive check that we do look for, FileCheck also has got an option --implicit-check-not which you can consider using.

NOSYM: The symbol table has been stripped
NOSYM-NOT: PseudoReloc

NORELOC-NOT: PseudoReloc
NORELOC: The symbols for runtime pseudo-relocation are not found
NORELOC-NOT: PseudoReloc


pseudoreloc.exe is generated by following script:

#--- generate.sh
llvm-mc -triple i386-mingw32 -filetype obj pseudoreloc.dll.s -o pseudoreloc.dll.o
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps include a split-file command here as well, for easier execution if one wants to try it?

ld.lld -m i386pe --dll pseudoreloc.dll.o -o pseudoreloc.dll -entry=
llvm-mc -triple i386-mingw32 -filetype obj pseudoreloc.s -o pseudoreloc.o
ld.lld -m i386pe pseudoreloc.o pseudoreloc.dll -o pseudoreloc.exe -entry=start

#--- pseudoreloc.dll.s
.data
.globl _sym1
_sym1:
.long 0x11223344
.globl _sym2
_sym2:
.long 0x55667788
.section .drectve
.ascii " -export:sym1,data "
.ascii " -export:sym2,data "
.addrsig

#--- pseudoreloc.s
.text
.globl _start
_start:
mov _local1b, %eax
movsb (%eax), %ecx
mov _local2, %eax
movsb (%eax), %edx
mov _local1a, %eax
movsb (%eax), %eax
add %edx, %eax
add %ecx, %eax
ret

.globl __pei386_runtime_relocator
__pei386_runtime_relocator:
mov ___RUNTIME_PSEUDO_RELOC_LIST__, %eax
mov ___RUNTIME_PSEUDO_RELOC_LIST_END__, %ecx
sub %ecx, %eax
ret

.data
.globl _local1a
.p2align 2
_local1a:
.long _sym1+1

.globl _local2
.p2align 2
_local2:
.long _sym2+1

.globl _local1b
.p2align 2
_local1b:
.long _sym1+3

.addrsig

117 changes: 117 additions & 0 deletions llvm/tools/llvm-readobj/COFFDumper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ class COFFDumper : public ObjDumper {
void printCOFFExports() override;
void printCOFFDirectives() override;
void printCOFFBaseReloc() override;
void printCOFFPseudoReloc() override;
void printCOFFDebugDirectory() override;
void printCOFFTLSDirectory() override;
void printCOFFResources() override;
Expand Down Expand Up @@ -2000,6 +2001,122 @@ void COFFDumper::printCOFFBaseReloc() {
}
}

void COFFDumper::printCOFFPseudoReloc() {
const StringRef RelocBeginName = Obj->getArch() == Triple::x86
? "___RUNTIME_PSEUDO_RELOC_LIST__"
: "__RUNTIME_PSEUDO_RELOC_LIST__";
const StringRef RelocEndName = Obj->getArch() == Triple::x86
? "___RUNTIME_PSEUDO_RELOC_LIST_END__"
: "__RUNTIME_PSEUDO_RELOC_LIST_END__";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have the differing behaviour between i386 and other architectures, perhaps it would be good with two test files, to cover both cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the test for x86_64 and switched testee source to LLVM IR to support multiple arch.


COFFSymbolRef RelocBegin, RelocEnd;
auto Count = Obj->getNumberOfSymbols();
if (Count == 0) {
W.startLine() << "The symbol table has been stripped\n";
return;
}
for (auto i = 0u;
i < Count && (!RelocBegin.getRawPtr() || !RelocEnd.getRawPtr()); ++i) {
auto Sym = Obj->getSymbol(i);
if (Sym.takeError())
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC you can't just ignore the errors like this (the error classes has got a destructor that aborts if you haven't actually done anything with the error). See other similar functions here for ways of doing it; e.g. if (!Sym) { consumeError(Sym.takeError()); return; } if we just want to ignore the error, or if (!Sym) reportError(Sym.takeError(), Obj->getFileName()); if we want to print it.

These error classes are quite tricky to use in that sense, so ideally one would need to have tested triggering all of these error cases - and unfortunately it can probably be quite hard to actually force these to fail as well... Perhaps by hex editing a binary to make symbol string offsets out of bounds?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be a call to reportWarning with the error, or something along those lines IMO. Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Warning about this potentially causes too noisy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you've realised this, but reportWarning only emits the first instance of each identical warning, so if the same warning were to be reported lots of times, only the first one will appear in the actual warning output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Expected<COFFSymbolRef> SymOrErr = Obj->getSymbol(i + 100) to enforce to emit warnings, I got 6 completely same warning messages while dumping pseudoreloc.test.tmp.exe-x86_64 as following:

$ bin/llvm-readobj test/tools/llvm-readobj/COFF/Output/pseudoreloc.test.tmp.exe-x86_64 --coff-pseudoreloc

File: test/tools/llvm-readobj/COFF/Output/pseudoreloc.test.tmp.exe-x86_64
Format: COFF-x86-64
Arch: x86_64
AddressSize: 64bit
PseudoReloc [
bin/llvm-readobj: warning: 'test/tools/llvm-readobj/COFF/Output/pseudoreloc.test.tmp.exe-x86_64': Invalid data was encountered while parsing the file
bin/llvm-readobj: warning: 'test/tools/llvm-readobj/COFF/Output/pseudoreloc.test.tmp.exe-x86_64': Invalid data was encountered while parsing the file
bin/llvm-readobj: warning: 'test/tools/llvm-readobj/COFF/Output/pseudoreloc.test.tmp.exe-x86_64': Invalid data was encountered while parsing the file
bin/llvm-readobj: warning: 'test/tools/llvm-readobj/COFF/Output/pseudoreloc.test.tmp.exe-x86_64': Invalid data was encountered while parsing the file
bin/llvm-readobj: warning: 'test/tools/llvm-readobj/COFF/Output/pseudoreloc.test.tmp.exe-x86_64': Invalid data was encountered while parsing the file
bin/llvm-readobj: warning: 'test/tools/llvm-readobj/COFF/Output/pseudoreloc.test.tmp.exe-x86_64': Invalid data was encountered while parsing the file
bin/llvm-readobj: warning: 'test/tools/llvm-readobj/COFF/Output/pseudoreloc.test.tmp.exe-x86_64': the marker symbols for runtime pseudo-relocation were not found
]

What does "each identical warning" mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, I got a bit mixed up.

ObjDumper (from which COFFDumper derives) has a reportUniqueWarning method, which you should use instead of reportWarning. There was discussion quite some time ago that we'd replace reportWarning entirely with reportUniqueWarning usage within llvm-readobj and I thought that had happened, but apparently not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll use reportUniqueWarning. Thank you!

auto Name = Obj->getSymbolName(*Sym);
if (Name.takeError())
continue;
if (*Name == RelocBeginName) {
if (Sym->getSectionNumber() > 0)
RelocBegin = *Sym;
} else if (*Name == RelocEndName) {
if (Sym->getSectionNumber() > 0)
RelocEnd = *Sym;
}
}
if (!RelocBegin.getRawPtr() || !RelocEnd.getRawPtr()) {
W.startLine()
<< "The symbols for runtime pseudo-relocation are not found\n";
return;
}

ArrayRef<uint8_t> Data;
auto Section = Obj->getSection(RelocBegin.getSectionNumber());
if (auto E = Section.takeError()) {
reportError(std::move(E), Obj->getFileName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer reportWarning over reportError in dumping code, because it allows execution to continue e.g. to run other dumping operations that have been requested. Same throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed all reportError to reportWarning.

return;
}
if (auto E = Obj->getSectionContents(*Section, Data)) {
reportError(std::move(E), Obj->getFileName());
return;
}
ArrayRef<uint8_t> RawRelocs =
Data.take_front(RelocEnd.getValue()).drop_front(RelocBegin.getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before subtracting RelocEnd from RelocBegin, we'd need to validate that they indeed point into the same section.

struct alignas(4) PseudoRelocationHeader {
PseudoRelocationHeader(uint32_t Signature)
: Zero1(0), Zero2(0), Signature(Signature) {}
support::ulittle32_t Zero1;
support::ulittle32_t Zero2;
support::ulittle32_t Signature;
};
const PseudoRelocationHeader HeaderV2(1);
if (RawRelocs.size() < sizeof(HeaderV2) ||
(memcmp(RawRelocs.data(), &HeaderV2, sizeof(HeaderV2)) != 0)) {
reportWarning(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If RawRelocs.size() == 0, we probably should print a different message than this? (Not sure if the linker actually ever does produce that, and what the runtime would do with it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime treat such case as a old header-less relocation block (V1). Should we support the V1 relocation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't need to bother with the V1 format. But if RawRelocs.size() == 0, then there's no data to interpret as V1 either, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, right. I should treat the case of RawRelocs.size() == 0 as just an empty relocation block.

createStringError("Invalid runtime pseudo-relocation records"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM style guide says warnings and errors should start with lower-case letters.

Obj->getFileName());
return;
}
struct alignas(4) PseudoRelocationRecord {
support::ulittle32_t Symbol;
support::ulittle32_t Target;
support::ulittle32_t BitSize;
};
ArrayRef<PseudoRelocationRecord> RelocRecords(
reinterpret_cast<const PseudoRelocationRecord *>(
RawRelocs.data() + sizeof(PseudoRelocationHeader)),
(RawRelocs.size() - sizeof(PseudoRelocationHeader)) /
sizeof(PseudoRelocationRecord));

struct CachingImportedSymbolLookup {
const StringRef *find(const COFFObjectFile *Obj, uint32_t EntryRVA) {
if (auto Ite = ImportedSymbols.find(EntryRVA);
Ite != ImportedSymbols.end())
return &Ite->second;

for (auto D : Obj->import_directories()) {
uint32_t RVA;
if (auto E = D.getImportAddressTableRVA(RVA))
reportError(std::move(E), Obj->getFileName());
if (EntryRVA < RVA)
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a similar check we could do, if (EntryRVA > RVA + ImportAddressTableSize) to avoid iterating over the table, if we easily could see that the address is past the end of this table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the exact end position of the list cannot be calculated directly, I calculated an approximate value using a table list sorted by RVA.

for (auto S : D.imported_symbols()) {
if (RVA == EntryRVA) {
StringRef &NameDst = ImportedSymbols[RVA];
if (auto E = S.getSymbolName(NameDst))
reportError(std::move(E), Obj->getFileName());
return &NameDst;
}
RVA += Obj->is64() ? 8 : 4;
}
}

return nullptr;
}

private:
DenseMap<uint32_t, StringRef> ImportedSymbols;
};
CachingImportedSymbolLookup ImportedSymbols;

ListScope D(W, "PseudoReloc");
for (const auto &Reloc : RelocRecords) {
DictScope Entry(W, "Entry");
W.printHex("Symbol", Reloc.Symbol);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice to have feature here, which probably is out of scope for the initial version at least, would be to figure out which data block it belongs to (e.g. which function, or which variable contains the pseudo relocation). But as COFF symbols only have offset but not size, we can't probably do that easily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier than expected. This is because searching __RUNTIME_PSEUDO_RELOC_LIST__ requires iterating through the symbol table, which allows us to construct a RVA-to-symbol map.

if (const auto *Sym = ImportedSymbols.find(Obj, Reloc.Symbol))
W.printString("SymbolName", *Sym);
W.printHex("Target", Reloc.Target);
W.printNumber("BitWidth", Reloc.BitSize);
}
}

void COFFDumper::printCOFFResources() {
ListScope ResourcesD(W, "Resources");
for (const SectionRef &S : Obj->sections()) {
Expand Down
1 change: 1 addition & 0 deletions llvm/tools/llvm-readobj/ObjDumper.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ class ObjDumper {
virtual void printCOFFExports() { }
virtual void printCOFFDirectives() { }
virtual void printCOFFBaseReloc() { }
virtual void printCOFFPseudoReloc() {}
virtual void printCOFFDebugDirectory() { }
virtual void printCOFFTLSDirectory() {}
virtual void printCOFFResources() {}
Expand Down
3 changes: 3 additions & 0 deletions llvm/tools/llvm-readobj/Opts.td
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ def codeview_ghash : FF<"codeview-ghash", "Enable global hashing for CodeView ty
def codeview_merged_types : FF<"codeview-merged-types", "Display the merged CodeView type stream">, Group<grp_coff>;
def codeview_subsection_bytes : FF<"codeview-subsection-bytes", "Dump raw contents of codeview debug sections and records">, Group<grp_coff>;
def coff_basereloc : FF<"coff-basereloc", "Display .reloc section">, Group<grp_coff>;
def coff_pseudoreloc
: FF<"coff-pseudoreloc", "Display runtime pseudo-relocations">,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should include the word "mingw" in the option description as well, as this isn't relevant for general PE-COFF?

Group<grp_coff>;
def coff_debug_directory : FF<"coff-debug-directory", "Display debug directory">, Group<grp_coff>;
def coff_directives : FF<"coff-directives", "Display .drectve section">, Group<grp_coff>;
def coff_exports : FF<"coff-exports", "Display export table">, Group<grp_coff>;
Expand Down
4 changes: 4 additions & 0 deletions llvm/tools/llvm-readobj/llvm-readobj.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ static bool CodeViewEnableGHash;
static bool CodeViewMergedTypes;
bool CodeViewSubsectionBytes;
static bool COFFBaseRelocs;
static bool COFFPseudoRelocs;
static bool COFFDebugDirectory;
static bool COFFDirectives;
static bool COFFExports;
Expand Down Expand Up @@ -305,6 +306,7 @@ static void parseOptions(const opt::InputArgList &Args) {
opts::CodeViewMergedTypes = Args.hasArg(OPT_codeview_merged_types);
opts::CodeViewSubsectionBytes = Args.hasArg(OPT_codeview_subsection_bytes);
opts::COFFBaseRelocs = Args.hasArg(OPT_coff_basereloc);
opts::COFFPseudoRelocs = Args.hasArg(OPT_coff_pseudoreloc);
opts::COFFDebugDirectory = Args.hasArg(OPT_coff_debug_directory);
opts::COFFDirectives = Args.hasArg(OPT_coff_directives);
opts::COFFExports = Args.hasArg(OPT_coff_exports);
Expand Down Expand Up @@ -492,6 +494,8 @@ static void dumpObject(ObjectFile &Obj, ScopedPrinter &Writer,
Dumper->printCOFFDirectives();
if (opts::COFFBaseRelocs)
Dumper->printCOFFBaseReloc();
if (opts::COFFPseudoRelocs)
Dumper->printCOFFPseudoReloc();
if (opts::COFFDebugDirectory)
Dumper->printCOFFDebugDirectory();
if (opts::COFFTLSDirectory)
Expand Down