DWARF and relocations

DWARF and linker relocations

In order to explain why relocation processing is important when reading DWARF info, a brief introduction into DWARF2 format is necessary.

Note: while “canonical” dwarfdump is a self-contained utility (linked statically with libdwarf), its variant shipped with Sun Studio depends on the dynamic library libdwarf.so, which actually does most of the job. So when I write dwarfdump I usually mean libdwarf[.a|.so].

Brief intro into DWARF

DWARF is a widely used debugging data format. It is suitable for virtually any source language and, in Sun Studio, it is used to represent debugging information for C, C++ and Fortran. There exist three versions of the standard - 1.1, 2.0 and most recent 3.0. Sun Studio compilers generate version 2.0, commonly referred to as DWARF2.

DWARF2 information is stored in a table of records (contained in the .debug_info ELF section) called the Debug Information Entries (DIE). Each DIE has a

  • type (for example, DW_TAG_compile_unit),
  • several attribute/value pairs that describe the entry (for example, DW_AT_language=DW_LANG_C_plus_plus).

In order to save space, attribute names are not stored and each DIE type has a corresponding entry in another table (stored in the .debug_abbrev ELF section) that lists all the attributes used in a compilation unit (CU) along with their types for a given DIE type. For example:

DW_TAG_compile_unit DW_children_yes
DW_AT_name                 DW_FORM_string
DW_AT_language             DW_FORM_data1
...
DW_AT_low_poc              DW_FORM_addr
DW_AT_high_pc	             DW_FORM_addr
...

This entry describes a CU, which contains a name (string), source language attribute (1 byte data), addresses of the beginning and the end (4/8-byte data, depending on the memory model), and other fields.

So DIE type is just an index into that table of abbreviations:

.debug_info 
  CU abbrev_offset=0
  DW_TAG_compile_unit (code=1) ----+
  ...                              |
  ...                              |
                                   | (points to debug_abbrev entry #1)  
.debug_abbrev                      |
  DW_TAG_compile_unit (code=1)  <--+
    DW_AT_producer           DW_FORM_strp
    DW_AT_language           DW_FORM_data1
    ...
  DW_TAG_variable (code=2)
    DW_AT_name               DW_FORM_strp
    DW_AT_decl_file          DW_FORM_data1
    ...    

Relocations that affect DWARF data

When several object files are linked together with

$ ld -r file1.o file2.o -o combined.o

to produce a relocatable file combined.o, all the .debug_abbrev sections are glued together by the linker into one section, which invalidates indices of the abbreviation tables for all but first object file.

For example, if the second file, file2.o, contained a description of a DW_TAG_variable in its own .debug_abbrev table at index 4 and the first file, file1.o, had, for instance, a DW_TAG_typedef at that index, dwarfdump for combined.o would look in the .debug_abbrev table at index 4 thinking that it describes a variable, while this entry actually describes a typedef. There’s no way to validate such a reference. Results may vary from wrong data printed to a crash.

In order to solve this problem, the debug info header for every compilation unit has the “abbrev offset” field, which points to the beginning of the abbrev table part of that compilation unit. This field is always 0 for .o files produced from one source file; since there’s only one compilation unit, the abbreviations table starts from byte 0 of the .debug_abbrev section. This abbrev_offset field is updated by the corresponding relocation record when object files are linked together.

When linker is asked to generate executable or shared library, it applies this kind of relocations and the resulting load object has correct abbrev_offset for each CU. When the -r linker option is in effect, it is supposed to generate a file that has all its relocations intact, so ld copies (the updated versions of) relocations from input files into the output file.

Let’s take a look at this relocation record. On Solaris, for sparcv9 (64-bit) object file, it looks like this:

$ readelf -r file2.o
Relocation section '.rela.debug_info' at offset 0x4a8 contains 3 entries:
  Offset          Info           Type          Sym. Value    Sym. Name + Addend
00000000000e  000300000036 R_SPARC_UA64     0000000000000000 .debug_abbrev + 0
...

There are more relocation records, but only one refers to the section .debug_abbrev, which gives a good hint: after all, only one field in debug_info depends on knowing the “address” of the debug_abbrev section. More thorough examination (or rather, calculation) involving offset =14 (0xe) leads to the same conclusion: this relocation record updates abbrev_offset.

After file1.o and file2.o are linked together with the -r linker option, combined.o would have two relocation records relative to the .debug_abbrev section:

  $ readelf -r combined.o | grep
debug_abbrev
00000000000e  000c00000036 R_SPARC_UA64     0000000000000000 .debug_abbrev + 0
000000000136  000c00000036 R_SPARC_UA64     0000000000000000 .debug_abbrev + 3b

First one is obviously for the first file as the offset is too small and the second one is intended to update the abbrev offset for second file. Note that it is a RELA-type relocation, relocation with an addend, which in this case if 0x3b or 59. It means that abbreviations table of the second file starts at offset of 59 bytes in the .debug_abbrev section. It also means that the location this record is supposed to update probably contains zero (for REL-type relocations, it would contain the addend - 59 in this case).

Here’s how it looks from the DWARF point of view (combined.o):

.debug_info section
  CU file1.o, abbrev_offset=0
    DW_TAG_compile_unit (code=1) ---> (points to debug_abbrev entry N1)
    ...                                
    ...
  CU file2.o, abbrev_offset=59
    DW_TAG_compile_unit (code=1) ---> (points to debug_abbrev entry N59+1=60)
    ...                                
    ...

.debug_abbrev section      

1:  DW_TAG_compile_unit (code=1) <-- part of table for file1.o starts from here
    DW_AT_producer             DW_FORM_strp
    DW_AT_language             DW_FORM_data1
    ...
2:  DW_TAG_variable (code=2)
    DW_AT_name                 DW_FORM_strp
    DW_AT_decl_file            DW_FORM_data1
    ...
    ...
60: DW_TAG_compile_unit (code=1)  <-- part of table for file2.o starts from here
    DW_AT_producer             DW_FORM_strp
    DW_AT_language             DW_FORM_data1
    ...

On x86, the relocation record is of type REL, which means that addend is supposed to be in the location to be modified; in other words, in the abbrev_offset field. Therefore, on x86, the linker writes the correct offset into the debug info header, making the relocation entry for the debug_abbrev redundant, at least for dwarfdump. Which is why dwarfdump “just works” on x86 and sparcv8.

On SPARCv9 (as well as on x64), the relocation record is of type RELA, meaning that addend is stored in the relocation entry itself. So when producing a relocatable object file (ld -r), the linker does not touch the abbrev_offset field in the section, it changes the relocation record for second compilation unit (file2.o) and puts the correct offset into that relocation record. In order to obtain the right value of abbrev_offset, one has to perform relocation first.

References

Maxim Kartashev

Maxim Kartashev
Pragmatic, software engineer. Working for Altium Tasking on compilers and tools for embedded systems.