-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[llvm-readobj][COFF] Implement --coff-pseudoreloc in llvm-readobj to dump runtime pseudo-relocation records #151816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
8fa4896
48f4280
fff11ce
be77e1c
145c3e4
9ff81f6
4cd5e29
6980cb1
8a5a7f8
8e02d4a
0dcc26d
b182e0e
f48e21a
41c0e01
93b46cd
09b643e
f57fd35
45bcfeb
56d51f6
46cb5bc
6e6e760
e9ab396
6866552
2475486
673bebd
09572ae
e80bdb6
5ccd671
706cffa
1d75f4c
29fd44f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/pseudoreloc.exe | FileCheck %s | ||
RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/nop.exe.coff-x86-64 | FileCheck %s --check-prefix=NOSYM | ||
RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/trivial.obj.coff-i386 | FileCheck %s --check-prefix=NORELOC | ||
|
||
CHECK: Format: COFF-i386 | ||
CHECK-NEXT: Arch: i386 | ||
CHECK-NEXT: AddressSize: 32bit | ||
CHECK-NEXT: PseudoReloc [ | ||
CHECK-NEXT: Entry { | ||
CHECK-NEXT: Symbol: 0x{{[0-9A-Z]+}} | ||
CHECK-NEXT: SymbolName: sym1 | ||
CHECK-NEXT: Target: 0x{{[0-9A-Z]+}} | ||
CHECK-NEXT: BitWidth: {{[0-9]+}} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While these regex patterns are nice for accepting anything that we'd expect to output, it also makes the test kinda weak - we could start outputting different wrong addresses (and bitwidth sizes), without the test catching it - that's not ideal. So I think it would be good to check the actual numbers as well, to make sure the test catches any potential breakage in it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FileCheck has the concept of numeric capture and substitutions and even simple maths like additions and subtractions. It sounds like this would be a good use-case for this functionality. See https://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-numeric-substitution-blocks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! I've rewritten to use numeric capture. |
||
CHECK-NEXT: } | ||
CHECK-NEXT: Entry { | ||
CHECK-NEXT: Symbol: 0x{{[0-9A-Z]+}} | ||
CHECK-NEXT: SymbolName: sym2 | ||
CHECK-NEXT: Target: 0x{{[0-9A-Z]+}} | ||
CHECK-NEXT: BitWidth: {{[0-9]+}} | ||
CHECK-NEXT: } | ||
CHECK-NEXT: Entry { | ||
CHECK-NEXT: Symbol: 0x{{[0-9A-Z]+}} | ||
CHECK-NEXT: SymbolName: sym1 | ||
CHECK-NEXT: Target: 0x{{[0-9A-Z]+}} | ||
CHECK-NEXT: BitWidth: {{[0-9]+}} | ||
CHECK-NEXT: } | ||
CHECK-NEXT: ] | ||
|
||
NOSYM-NOT: PseudoReloc | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of needing to repeat the |
||
NOSYM: The symbol table has been stripped | ||
NOSYM-NOT: PseudoReloc | ||
|
||
NORELOC-NOT: PseudoReloc | ||
NORELOC: The symbols for runtime pseudo-relocation are not found | ||
NORELOC-NOT: PseudoReloc | ||
|
||
|
||
pseudoreloc.exe is generated by following script: | ||
|
||
#--- generate.sh | ||
llvm-mc -triple i386-mingw32 -filetype obj pseudoreloc.dll.s -o pseudoreloc.dll.o | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps include a |
||
ld.lld -m i386pe --dll pseudoreloc.dll.o -o pseudoreloc.dll -entry= | ||
llvm-mc -triple i386-mingw32 -filetype obj pseudoreloc.s -o pseudoreloc.o | ||
ld.lld -m i386pe pseudoreloc.o pseudoreloc.dll -o pseudoreloc.exe -entry=start | ||
|
||
#--- pseudoreloc.dll.s | ||
.data | ||
.globl _sym1 | ||
_sym1: | ||
.long 0x11223344 | ||
.globl _sym2 | ||
_sym2: | ||
.long 0x55667788 | ||
.section .drectve | ||
.ascii " -export:sym1,data " | ||
.ascii " -export:sym2,data " | ||
.addrsig | ||
|
||
#--- pseudoreloc.s | ||
.text | ||
.globl _start | ||
_start: | ||
mov _local1b, %eax | ||
movsb (%eax), %ecx | ||
mov _local2, %eax | ||
movsb (%eax), %edx | ||
mov _local1a, %eax | ||
movsb (%eax), %eax | ||
add %edx, %eax | ||
add %ecx, %eax | ||
ret | ||
|
||
.globl __pei386_runtime_relocator | ||
__pei386_runtime_relocator: | ||
mov ___RUNTIME_PSEUDO_RELOC_LIST__, %eax | ||
mov ___RUNTIME_PSEUDO_RELOC_LIST_END__, %ecx | ||
sub %ecx, %eax | ||
ret | ||
|
||
.data | ||
.globl _local1a | ||
.p2align 2 | ||
_local1a: | ||
.long _sym1+1 | ||
|
||
.globl _local2 | ||
.p2align 2 | ||
_local2: | ||
.long _sym2+1 | ||
|
||
.globl _local1b | ||
.p2align 2 | ||
_local1b: | ||
.long _sym1+3 | ||
|
||
.addrsig | ||
|
||
jh7370 marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -95,6 +95,7 @@ class COFFDumper : public ObjDumper { | |
void printCOFFExports() override; | ||
void printCOFFDirectives() override; | ||
void printCOFFBaseReloc() override; | ||
void printCOFFPseudoReloc() override; | ||
void printCOFFDebugDirectory() override; | ||
void printCOFFTLSDirectory() override; | ||
void printCOFFResources() override; | ||
|
@@ -2000,6 +2001,122 @@ void COFFDumper::printCOFFBaseReloc() { | |
} | ||
} | ||
|
||
void COFFDumper::printCOFFPseudoReloc() { | ||
const StringRef RelocBeginName = Obj->getArch() == Triple::x86 | ||
? "___RUNTIME_PSEUDO_RELOC_LIST__" | ||
: "__RUNTIME_PSEUDO_RELOC_LIST__"; | ||
const StringRef RelocEndName = Obj->getArch() == Triple::x86 | ||
? "___RUNTIME_PSEUDO_RELOC_LIST_END__" | ||
: "__RUNTIME_PSEUDO_RELOC_LIST_END__"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As we have the differing behaviour between i386 and other architectures, perhaps it would be good with two test files, to cover both cases? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added the test for x86_64 and switched testee source to LLVM IR to support multiple arch. |
||
|
||
COFFSymbolRef RelocBegin, RelocEnd; | ||
auto Count = Obj->getNumberOfSymbols(); | ||
if (Count == 0) { | ||
W.startLine() << "The symbol table has been stripped\n"; | ||
return; | ||
} | ||
for (auto i = 0u; | ||
i < Count && (!RelocBegin.getRawPtr() || !RelocEnd.getRawPtr()); ++i) { | ||
auto Sym = Obj->getSymbol(i); | ||
if (Sym.takeError()) | ||
continue; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC you can't just ignore the errors like this (the error classes has got a destructor that aborts if you haven't actually done anything with the error). See other similar functions here for ways of doing it; e.g. These error classes are quite tricky to use in that sense, so ideally one would need to have tested triggering all of these error cases - and unfortunately it can probably be quite hard to actually force these to fail as well... Perhaps by hex editing a binary to make symbol string offsets out of bounds? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should just be a call to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. Warning about this potentially causes too noisy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if you've realised this, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With
What does "each identical warning" mean? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My apologies, I got a bit mixed up.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. I'll use |
||
auto Name = Obj->getSymbolName(*Sym); | ||
if (Name.takeError()) | ||
continue; | ||
if (*Name == RelocBeginName) { | ||
if (Sym->getSectionNumber() > 0) | ||
RelocBegin = *Sym; | ||
} else if (*Name == RelocEndName) { | ||
if (Sym->getSectionNumber() > 0) | ||
RelocEnd = *Sym; | ||
} | ||
} | ||
if (!RelocBegin.getRawPtr() || !RelocEnd.getRawPtr()) { | ||
jh7370 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
W.startLine() | ||
<< "The symbols for runtime pseudo-relocation are not found\n"; | ||
return; | ||
} | ||
|
||
ArrayRef<uint8_t> Data; | ||
auto Section = Obj->getSection(RelocBegin.getSectionNumber()); | ||
if (auto E = Section.takeError()) { | ||
reportError(std::move(E), Obj->getFileName()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've changed all |
||
return; | ||
} | ||
if (auto E = Obj->getSectionContents(*Section, Data)) { | ||
reportError(std::move(E), Obj->getFileName()); | ||
return; | ||
} | ||
ArrayRef<uint8_t> RawRelocs = | ||
Data.take_front(RelocEnd.getValue()).drop_front(RelocBegin.getValue()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before subtracting |
||
struct alignas(4) PseudoRelocationHeader { | ||
PseudoRelocationHeader(uint32_t Signature) | ||
: Zero1(0), Zero2(0), Signature(Signature) {} | ||
support::ulittle32_t Zero1; | ||
support::ulittle32_t Zero2; | ||
support::ulittle32_t Signature; | ||
}; | ||
const PseudoRelocationHeader HeaderV2(1); | ||
if (RawRelocs.size() < sizeof(HeaderV2) || | ||
(memcmp(RawRelocs.data(), &HeaderV2, sizeof(HeaderV2)) != 0)) { | ||
reportWarning( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The runtime treat such case as a old header-less relocation block (V1). Should we support the V1 relocation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, we don't need to bother with the V1 format. But if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, yes, right. I should treat the case of |
||
createStringError("Invalid runtime pseudo-relocation records"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LLVM style guide says warnings and errors should start with lower-case letters. |
||
Obj->getFileName()); | ||
return; | ||
} | ||
struct alignas(4) PseudoRelocationRecord { | ||
support::ulittle32_t Symbol; | ||
support::ulittle32_t Target; | ||
support::ulittle32_t BitSize; | ||
}; | ||
ArrayRef<PseudoRelocationRecord> RelocRecords( | ||
reinterpret_cast<const PseudoRelocationRecord *>( | ||
RawRelocs.data() + sizeof(PseudoRelocationHeader)), | ||
(RawRelocs.size() - sizeof(PseudoRelocationHeader)) / | ||
sizeof(PseudoRelocationRecord)); | ||
|
||
struct CachingImportedSymbolLookup { | ||
const StringRef *find(const COFFObjectFile *Obj, uint32_t EntryRVA) { | ||
if (auto Ite = ImportedSymbols.find(EntryRVA); | ||
Ite != ImportedSymbols.end()) | ||
return &Ite->second; | ||
|
||
for (auto D : Obj->import_directories()) { | ||
uint32_t RVA; | ||
if (auto E = D.getImportAddressTableRVA(RVA)) | ||
reportError(std::move(E), Obj->getFileName()); | ||
if (EntryRVA < RVA) | ||
continue; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a similar check we could do, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the exact end position of the list cannot be calculated directly, I calculated an approximate value using a table list sorted by RVA. |
||
for (auto S : D.imported_symbols()) { | ||
if (RVA == EntryRVA) { | ||
StringRef &NameDst = ImportedSymbols[RVA]; | ||
if (auto E = S.getSymbolName(NameDst)) | ||
reportError(std::move(E), Obj->getFileName()); | ||
return &NameDst; | ||
} | ||
RVA += Obj->is64() ? 8 : 4; | ||
} | ||
} | ||
|
||
return nullptr; | ||
} | ||
|
||
private: | ||
DenseMap<uint32_t, StringRef> ImportedSymbols; | ||
}; | ||
CachingImportedSymbolLookup ImportedSymbols; | ||
|
||
ListScope D(W, "PseudoReloc"); | ||
for (const auto &Reloc : RelocRecords) { | ||
DictScope Entry(W, "Entry"); | ||
W.printHex("Symbol", Reloc.Symbol); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another nice to have feature here, which probably is out of scope for the initial version at least, would be to figure out which data block it belongs to (e.g. which function, or which variable contains the pseudo relocation). But as COFF symbols only have offset but not size, we can't probably do that easily. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's easier than expected. This is because searching |
||
if (const auto *Sym = ImportedSymbols.find(Obj, Reloc.Symbol)) | ||
W.printString("SymbolName", *Sym); | ||
W.printHex("Target", Reloc.Target); | ||
W.printNumber("BitWidth", Reloc.BitSize); | ||
} | ||
} | ||
|
||
void COFFDumper::printCOFFResources() { | ||
ListScope ResourcesD(W, "Resources"); | ||
for (const SectionRef &S : Obj->sections()) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -82,6 +82,9 @@ def codeview_ghash : FF<"codeview-ghash", "Enable global hashing for CodeView ty | |
def codeview_merged_types : FF<"codeview-merged-types", "Display the merged CodeView type stream">, Group<grp_coff>; | ||
def codeview_subsection_bytes : FF<"codeview-subsection-bytes", "Dump raw contents of codeview debug sections and records">, Group<grp_coff>; | ||
def coff_basereloc : FF<"coff-basereloc", "Display .reloc section">, Group<grp_coff>; | ||
def coff_pseudoreloc | ||
: FF<"coff-pseudoreloc", "Display runtime pseudo-relocations">, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps this should include the word "mingw" in the option description as well, as this isn't relevant for general PE-COFF? |
||
Group<grp_coff>; | ||
def coff_debug_directory : FF<"coff-debug-directory", "Display debug directory">, Group<grp_coff>; | ||
def coff_directives : FF<"coff-directives", "Display .drectve section">, Group<grp_coff>; | ||
def coff_exports : FF<"coff-exports", "Display export table">, Group<grp_coff>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, it'd be nicer to synthesize the test binary from yaml with
yaml2obj
rather than checking in a binary. (We do have some binaries checked in from before, but we'd like to keep that to a minimum, e.g. for binaries that can't be synthesized withyaml2obj
yet.)I see that you have the full procedure included for regenerating the binary, that's nice and appreciated! If converting it to yaml, it's also somewhat customary to strip down the size of it by removing unnecessary data from it. Perhaps it's not necessary in this case if the payload of each section is only a couple dozens of bytes anyway though. But if it is, the instructions would unfortunately end with
obj2yaml pseudoreloc.exe > pseudoreloc.exe.yaml # and manually strip down the .yaml file
. But if there's not that much unnecessary in there, perhaps we don't need to strip it manually at all.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Using yaml2obj, I was able to edit the content using
sed
and easily add tests for errors.