diff --git a/compiler-rt/include/profile/InstrProfData.inc b/compiler-rt/include/profile/InstrProfData.inc index f5de23ff4b94d..25df899b3f361 100644 --- a/compiler-rt/include/profile/InstrProfData.inc +++ b/compiler-rt/include/profile/InstrProfData.inc @@ -123,6 +123,8 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \ /* INSTR_PROF_RAW_HEADER start */ /* Definition of member fields of the raw profile header data structure. */ +/* Please update llvm/docs/InstrProfileFormat.rst as appropriate when updating + raw profile format. */ #ifndef INSTR_PROF_RAW_HEADER #define INSTR_PROF_RAW_HEADER(Type, Name, Initializer) #else diff --git a/llvm/docs/InstrProfileFormat.rst b/llvm/docs/InstrProfileFormat.rst new file mode 100644 index 0000000000000..2069b87a245a1 --- /dev/null +++ b/llvm/docs/InstrProfileFormat.rst @@ -0,0 +1,480 @@ +=================================== +Instrumentation Profile Format +=================================== + +.. contents:: + :local: + + +Overview +========= + +Clang supports two types of profiling via instrumentation [1]_: frontend-based +and IR-based, and both could support a variety of use cases [2]_ . +This document describes two binary serialization formats (raw and indexed) to +store instrumented profiles with a specific emphasis on IRPGO use case, in the +sense that when specific header fields and payload sections have different ways +of interpretation across use cases, the documentation is based on IRPGO. + +.. note:: + Frontend-generated profiles are used together with coverage mapping for + `source-based code coverage`_. The `coverage mapping format`_ is different from + profile format. + +.. _`source-based code coverage`: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html +.. _`coverage mapping format`: https://llvm.org/docs/CoverageMappingFormat.html + +Raw Profile Format +=================== + +The raw profile is generated by running the instrumented binary. The raw profile +data from an executable or a shared library [3]_ consists of a header and +multiple sections, with each section as a memory dump. The raw profile data needs +to be reasonably compact and fast to generate. + +There are no backward or forward version compatiblity guarantees for the raw profile +format. That is, compilers and tools `require`_ a specific raw profile version +to parse the profiles. + +.. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558 + +To feed profiles back into compilers for an optimized build (e.g., via +``-fprofile-use`` for IR instrumentation), a raw profile must to be converted into +indexed format. + +General Storage Layout +----------------------- + +The storage layout of raw profile data format is illustrated below. Basically, +when the raw profile is read into an memory buffer, the actual byte offset of a +section is inferred from the section's order in the layout and size information +of all the sections ahead of it. + +:: + + +----+-----------------------+ + | | Magic | + | +-----------------------+ + | | Version | + | +-----------------------+ + H | Size Info for | + E | Section 1 | + A +-----------------------+ + D | Size Info for | + E | Section 2 | + R +-----------------------+ + | | ... | + | +-----------------------+ + | | Size Info for | + | | Section N | + +----+-----------------------+ + P | Section 1 | + A +-----------------------+ + Y | Section 2 | + L +-----------------------+ + O | ... | + A +-----------------------+ + D | Section N | + +----+-----------------------+ + + +.. note:: + Sections might be padded to meet specific alignment requirements. For + simplicity, header fields and data sections solely for padding purpose are + omitted in the data layout graph above and the rest of this document. + +Header +------- + +``Magic`` + Magic number encodes profile format (raw, indexed or text). For the raw format, + the magic number also encodes the endianness (big or little) and C pointer + size (4 or 8 bytes) of the platform on which the profile is generated. + + A factory method reads the magic number to construct reader properly and returns + error upon unrecognized format. Specifically, the factory method and raw profile + reader implementation make sure that a raw profile file could be read back on + a platform with the opposite endianness and/or the other C pointer size. + +``Version`` + The lower 32 bits specify the actual version and the most significant 32 bits + specify the variant types of the profile. IR-based instrumentation PGO and + context-sensitive IR-based instrumentation PGO are two variant types. + +``BinaryIdsSize`` + The byte size of `binary id`_ section. + +``NumData`` + The number of profile metadata. The byte size of `profile metadata`_ section + could be computed with this field. + +``NumCounter`` + The number of entries in the profile counter section. The byte size of `counter`_ + section could be computed with this field. + +``NumBitmapBytes`` + The number of bytes in the profile `bitmap`_ section. + +``NamesSize`` + The number of bytes in the name section. + +.. _`CountersDelta`: + +``CountersDelta`` + This field records the in-memory address difference between the `profile metadata`_ + and counter section in the instrumented binary, i.e., ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``. + + It's used jointly with the `CounterPtr`_ field to compute the counter offset + relative to ``start(__llvm_prf_cnts)``. Check out calculation-of-counter-offset_ + for a visualized explanation. + + .. note:: + The ``__llvm_prf_data`` object file section might not be loaded into memory + when instrumented binary runs or might not get generated in the instrumented + binary in the first place. In those cases, ``CountersDelta`` is not used and + other mechanisms are used to match counters with instrumented code. See + `lightweight instrumentation`_ and `binary profile correlation`_ for examples. + +``BitmapDelta`` + This field records the in-memory address difference between the `profile metadata`_ + and bitmap section in the instrumented binary, i.e., ``start(__llvm_prf_bits) - start(__llvm_prf_data)``. + + It's used jointly with the `BitmapPtr`_ to find the bitmap of a profile data + record, in a similar way to how counters are referenced as explained by + calculation-of-counter-offset_ . + + Similar to `CountersDelta`_ field, this field may not be used in non-PGO variants + of profiles. + +``NamesDelta`` + Records the in-memory address of name section. Not used except for raw profile + reader error checking. + +``ValueKindLast`` + Records the number of value kinds. Macro `VALUE_PROF_KIND`_ defines the value + kinds with a description of the kind. + +.. _`VALUE_PROF_KIND`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186 + +Payload Sections +------------------ + +Binary Ids +^^^^^^^^^^^ +Stores the binary ids of the instrumented binaries to associate binaries with +profiles for source code coverage. See `binary id`_ RFC for the design. + +.. _`profile metadata`: + +Profile Metadata +^^^^^^^^^^^^^^^^^^ + +This section stores the metadata to map counters and value profiles back to +instrumented code regions (e.g., LLVM IR for IRPGO). + +The in-memory representation of the metadata is `__llvm_profile_data`_. +Some fields are used to reference data from other sections in the profile. +The fields are documented as follows: + +.. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/include/profile/InstrProfData.inc#L65-L95 + +``NameRef`` + The MD5 of the function's PGO name. PGO name has the format + ``[]`` where ```` and + ```` are provided for local-linkage functions to tell possibly + identical functions. + +.. _FuncHash: + +``FuncHash`` + A checksum of the function's IR, taking control flow graph and instrumented + value sites into accounts. See `computeCFGHash`_ for details. + +.. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685 + +.. _`CounterPtr`: + +``CounterPtr`` + The in-memory address difference between profile data and the start of corresponding + counters. Counter position is stored this way (as a link-time constant) to reduce + instrumented binary size compared with snapshotting the address of symbols directly. + See `commit a1532ed`_ for further information. + +.. _`commit a1532ed`: https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844 + + .. note:: + ``CounterPtr`` might represent a different value for non-IRPGO use case. For + example, for `binary profile correlation`_, it represents the absolute address of counter. + When in doubt, check source code. + +.. _`BitmapPtr`: + +``BitmapPtr`` + The in-memory address difference between profile data and the start address of + corresponding bitmap. + + .. note:: + Similar to `CounterPtr`_, this field may represent a different value for non-IRPGO use case. + +``FunctionPointer`` + Records the function address when instrumented binary runs. This is used to + map the profiled callee address of indirect calls to the ``NameRef`` during + conversion from raw to indexed profiles. + +``Values`` + Represents value profiles in a two dimensional array. The number of elements + in the first dimension is the number of instrumented value sites across all + kinds. Each element in the first dimension is the head of a linked list, and + the each element in the second dimension is linked list element, carrying + ```` as payload. This is used by compiler runtime when + writing out value profiles. + + .. note:: + Value profiling is supported by frontend and IR PGO instrumentation, + but it's not supported in all cases (e.g., `lightweight instrumentation`_). + +``NumCounters`` + The number of counters for the instrumented function. + +``NumValueSites`` + This is an array of counters, and each counter represents the number of + instrumented sites for a kind of value in the function. + +``NumBitmapBytes`` + The number of bitmap bytes for the function. + +.. _`counter`: + +Profile Counters +^^^^^^^^^^^^^^^^^ + +For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_ +are stored contiguously and in an order that is consistent with instrumentation points selection. + +.. _calculation-of-counter-offset: + +As mentioned above, the recorded counter offset is relative to the profile metadata. +So how are function counters located in the raw profile data? + +Basically, the profile reader iterates profile metadata (from the `profile metadata`_ +section) and makes use of the recorded relative distances, as illustrated below. + +:: + + + --> start(__llvm_prf_data) --> +---------------------+ ------------+ + | | Data 1 | | + | +---------------------+ =====|| | + | | Data 2 | || | + | +---------------------+ || | + | | ... | || | + Counter| +---------------------+ || | + Delta | | Data N | || | + | +---------------------+ || | CounterPtr1 + | || | + | CounterPtr2 || | + | || | + | || | + + --> start(__llvm_prf_cnts) --> +---------------------+ || | + | ... | || | + +---------------------+ -----||----+ + | Counter for | || + | Data 1 | || + +---------------------+ || + | ... | || + +---------------------+ =====|| + | Counter for | + | Data 2 | + +---------------------+ + | ... | + +---------------------+ + | Counter for | + | Data N | + +---------------------+ + + +In the graph, + +* The profile header records ``CounterDelta`` with the value as ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``. + We will call it ``CounterDeltaInitVal`` below for convenience. +* For each profile data record ``ProfileDataN``, ``CounterPtr`` is recorded as + ``start(CounterN) - start(ProfileDataN)``, where ``ProfileDataN`` is the N-th + entry in ``__llvm_prf_data``, and ``CounterN`` represents the corresponding + profile counters. + +Each time the reader advances to the next data record, it `updates`_ ``CounterDelta`` +to minus the size of one ``ProfileData``. + +.. _`updates`: https://github.com/llvm/llvm-project/blob/17ff25a58ee4f29816d932fdb75f0d305718069f/llvm/include/llvm/ProfileData/InstrProfReader.h#L439-L444 + +For the counter corresponding to the first data record, the byte offset +relative to the start of the counter section is calculated as ``CounterPtr1 - CounterDeltaInitVal``. +When profile reader advances to the second data record, note ``CounterDelta`` +is updated to ``CounterDeltaInitVal - sizeof(ProfileData)``. +Thus the byte offset relative to the start of the counter section is calculated +as ``CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))``. + +.. _`bitmap`: + +Bitmap +^^^^^^^ +This section is used for source-based `Modified Condition/Decision Coverage`_ code coverage. Check out `Bitmap RFC`_ +for the design. + +.. _`Modified Condition/Decision Coverage`: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage +.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244 + +Names +^^^^^^ + +This section contains possibly compressed concatenated string of functions' PGO +names. If compressed, zlib library is used. + +Function names serve as keys in the PGO data hash table when raw profiles are +converted into indexed profiles. They are also crucial for ``llvm-profdata`` to +show the profiles in a human-readable way. + +Value Profile Data +^^^^^^^^^^^^^^^^^^^^ + +This section contains the profile data for value profiling. + +The value profiles corresponding to a profile metadata are serialized contiguously +as one record, and value profile records are stored in the same order as the +respective profile data, such that a raw profile reader `advances`_ the pointer to +profile data and the pointer to value profile records simutaneously [5]_ to find +value profiles for a per function, per `FuncHash`_ profile data. + +.. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457 + +Indexed Profile Format +=========================== + +Indexed profiles are generated from ``llvm-profdata``. In the indexed profiles, +function data are organized as on-disk hash table such that compilers can +look up profile data for functions in an IR module. + +Compilers and tools must retain backward compatibility with indexed profiles. +That is, a tool or a compiler built at newer versions of code must understand +profiles generated by older tools or compilers. + +General Storage Layout +----------------------- + +:: + + +-----------------------+---+ + | Magic | | + +-----------------------+ | + | Version | | + +-----------------------+ | + | HashType | H + +-----------------------+ E + +-------| HashOffset | A + | +-----------------------+ D + +-----------| MemProfOffset | E + | | +-----------------------+ R + | | +--| BinaryIdOffset | | + | | | +-----------------------+ | + +---------------| TemporalProf- | | + | | | | | TracesOffset | | + | | | | +-----------------------+---+ + | | | | | Profile Summary | | + | | | | +-----------------------+ P + | | +------>| Function data | A + | | | +-----------------------+ Y + | +---------->| MemProf profile data | L + | | +-----------------------+ O + | +->| Binary Ids | A + | +-----------------------+ D + +-------------->| Temporal profiles | | + +-----------------------+---+ + +Header +-------- + +``Magic`` + The purpose of the magic number is to be able to tell if the profile is an + indexed profile. + +``Version`` + Similar to raw profile version, the lower 32 bits specify the version of the + indexed profile and the most significant 32 bits are reserved to specify the + variant types of the profile. + +``HashType`` + The hashing scheme for on-disk hash table keys. Only MD5 hashing is used as of + writing. + +``HashOffset`` + An on-disk hash table stores the per-function profile records. This field records + the offset of this hash table's metadata (i.e., the number of buckets and + entries), which follows right after the payload of the entire hash table. + +``MemProfOffset`` + Records the byte offset of MemProf profiling data. + +``BinaryIdOffset`` + Records the byte offset of binary id sections. + +``TemporalProfTracesOffset`` + Records the byte offset of temporal profiles. + +Payload Sections +------------------ + +(CS) Profile Summary +^^^^^^^^^^^^^^^^^^^^^ +This section is right after profile header. It stores the serialized profile +summary. For context-sensitive IR-based instrumentation PGO, this section stores +an additional profile summary corresponding to the context-sensitive profiles. + +Function data +^^^^^^^^^^^^^^^^^^ +This section stores functions and their profiling data as an on-disk hash table. +Profile data for functions with the same name are grouped together and share one +hash table entry (the functions may come from different shared libraries for +instance). The profile data for them are organized as a sequence of key-value +pair where the key is `FuncHash`_, and the value is profiled information (represented +by `InstrProfRecord`_) for the function. + +.. _`InstrProfRecord`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/llvm/include/llvm/ProfileData/InstrProf.h#L693 + +MemProf Profile data +^^^^^^^^^^^^^^^^^^^^^^ +This section stores function's memory profiling data. See +`MemProf binary serialization format RFC`_ for the design. + +.. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html + +Binary Ids +^^^^^^^^^^^^^^^^^^^^^^ +The section is used to carry on `binary id`_ information from raw profiles. + +Temporal Profile Traces +^^^^^^^^^^^^^^^^^^^^^^^^ +The section is used to carry on temporal profile information from raw profiles. +See `temporal profiling`_ for the design. + +Profile Data Usage +======================================= + +``llvm-profdata`` is the command line tool to display and process instrumentation- +based profile data. For supported usages, check out `llvm-profdata documentation `_. + +.. [1] For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation +.. [2] For example, IR-based instrumentation supports `lightweight instrumentation`_ + and `temporal profiling`_. Frontend instrumentation could support `single-byte counters`_. +.. [3] A raw profile file could contain the concatenation of multiple raw + profiles, for example, from an executable and its shared libraries. Raw + profile reader could parse all raw profiles from the file correctly. +.. [4] The counter section is used by a few variant types (like temporal + profiling) and might have different semantics there. +.. [5] The step size of data pointer is the ``sizeof(ProfileData)``, and the step + size of value profile pointer is calcuated based on the number of collected + values. + +.. _`lightweight instrumentation`: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 +.. _`temporal profiling`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068 +.. _`single-byte counters`: https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685 +.. _`binary profile correlation`: https://discourse.llvm.org/t/rfc-add-binary-profile-correlation-to-not-load-profile-metadata-sections-into-memory-at-runtime/74565 +.. _`binary id`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst index 006df613bc5e7..2f450ef46025a 100644 --- a/llvm/docs/UserGuides.rst +++ b/llvm/docs/UserGuides.rst @@ -43,6 +43,7 @@ intermediate LLVM representation. HowToCrossCompileBuiltinsOnArm HowToCrossCompileLLVM HowToUpdateDebugInfo + InstrProfileFormat InstrRefDebugInfo LinkTimeOptimization LoopTerminology @@ -177,6 +178,9 @@ Optimizations referencing, to determine variable locations for debug info in the final stages of compilation. +:doc:`InstrProfileFormat` + This document explains two binary formats of instrumentation-based profiles. + Code Generation --------------- diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h index 36be2e7d869e7..87e7bbbd727ee 100644 --- a/llvm/include/llvm/ProfileData/InstrProf.h +++ b/llvm/include/llvm/ProfileData/InstrProf.h @@ -1035,7 +1035,8 @@ const HashT HashType = HashT::MD5; inline uint64_t ComputeHash(StringRef K) { return ComputeHash(HashType, K); } // This structure defines the file header of the LLVM profile -// data file in indexed-format. +// data file in indexed-format. Please update llvm/docs/InstrProfileFormat.rst +// as appropriate when updating the indexed profile format. struct Header { uint64_t Magic; uint64_t Version; diff --git a/llvm/include/llvm/ProfileData/InstrProfData.inc b/llvm/include/llvm/ProfileData/InstrProfData.inc index f5de23ff4b94d..25df899b3f361 100644 --- a/llvm/include/llvm/ProfileData/InstrProfData.inc +++ b/llvm/include/llvm/ProfileData/InstrProfData.inc @@ -123,6 +123,8 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \ /* INSTR_PROF_RAW_HEADER start */ /* Definition of member fields of the raw profile header data structure. */ +/* Please update llvm/docs/InstrProfileFormat.rst as appropriate when updating + raw profile format. */ #ifndef INSTR_PROF_RAW_HEADER #define INSTR_PROF_RAW_HEADER(Type, Name, Initializer) #else