DWARF and relocations
DWARF and linker relocations
In order to explain why relocation processing is important when reading DWARF info, a brief introduction into DWARF2 format is necessary.
Note: while “canonical” dwarfdump is a self-contained utility (linked statically with libdwarf), its variant shipped with Sun Studio depends on the dynamic library libdwarf.so, which actually does most of the job. So when I write
dwarfdump
I usually meanlibdwarf[.a|.so]
.
Brief intro into DWARF
DWARF is a widely used debugging data format. It is suitable for virtually any source language and, in Sun Studio, it is used to represent debugging information for C, C++ and Fortran. There exist three versions of the standard - 1.1, 2.0 and most recent 3.0. Sun Studio compilers generate version 2.0, commonly referred to as DWARF2.
DWARF2 information is stored in a table of records (contained in the
.debug_info
ELF section) called the Debug Information Entries (DIE). Each
DIE has a
- type (for example,
DW_TAG_compile_unit
), - several attribute/value pairs that describe the entry (for example,
DW_AT_language=DW_LANG_C_plus_plus
).
In order to save space, attribute names are not stored and each DIE type has
a corresponding entry in another table (stored in the .debug_abbrev
ELF
section) that lists all the attributes used in a compilation unit (CU) along with
their types for a given DIE type. For example:
DW_TAG_compile_unit DW_children_yes
DW_AT_name DW_FORM_string
DW_AT_language DW_FORM_data1
...
DW_AT_low_poc DW_FORM_addr
DW_AT_high_pc DW_FORM_addr
...
This entry describes a CU, which contains a name (string), source language attribute (1 byte data), addresses of the beginning and the end (4/8-byte data, depending on the memory model), and other fields.
So DIE type is just an index into that table of abbreviations:
.debug_info
CU abbrev_offset=0
DW_TAG_compile_unit (code=1) ----+
... |
... |
| (points to debug_abbrev entry #1)
.debug_abbrev |
DW_TAG_compile_unit (code=1) <--+
DW_AT_producer DW_FORM_strp
DW_AT_language DW_FORM_data1
...
DW_TAG_variable (code=2)
DW_AT_name DW_FORM_strp
DW_AT_decl_file DW_FORM_data1
...
Relocations that affect DWARF data
When several object files are linked together with
$ ld -r file1.o file2.o -o combined.o
to produce a relocatable file combined.o
, all the
.debug_abbrev
sections are glued together by the linker into one
section, which invalidates indices of the abbreviation tables for all
but first object file.
For example, if the second file, file2.o
, contained a description of a
DW_TAG_variable
in its own .debug_abbrev
table at index 4 and the first
file, file1.o
, had, for instance, a DW_TAG_typedef
at that index,
dwarfdump
for combined.o
would look in the .debug_abbrev
table at index 4
thinking that it describes a variable, while this entry actually describes a
typedef. There’s no way to validate such a reference. Results may vary from
wrong data printed to a crash.
In order to solve this problem, the debug info header for every compilation
unit has the “abbrev offset” field, which points to the beginning of the
abbrev table part of that compilation unit. This field is always 0 for .o
files produced from one source file; since there’s only one compilation unit,
the abbreviations table starts from byte 0 of the .debug_abbrev
section. This
abbrev_offset
field is updated by the corresponding relocation record when
object files are linked together.
When linker is asked to generate executable or shared library, it applies this
kind of relocations and the resulting load object has correct abbrev_offset
for each CU. When the -r
linker option is in effect, it is supposed to
generate a file that has all its relocations intact, so ld
copies (the
updated versions of) relocations from input files into the output file.
Let’s take a look at this relocation record. On Solaris, for sparcv9 (64-bit) object file, it looks like this:
$ readelf -r file2.o
Relocation section '.rela.debug_info' at offset 0x4a8 contains 3 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000e 000300000036 R_SPARC_UA64 0000000000000000 .debug_abbrev + 0
...
There are more relocation records, but only one refers to the section
.debug_abbrev
, which gives a good hint: after all, only one field in
debug_info
depends on knowing the “address” of the debug_abbrev
section.
More thorough examination (or rather, calculation) involving offset =14 (0xe
)
leads to the same conclusion: this relocation record updates abbrev_offset
.
After file1.o
and file2.o
are linked
together with the -r
linker option, combined.o
would have two relocation
records relative to the .debug_abbrev
section:
$ readelf -r combined.o | grep
debug_abbrev
00000000000e 000c00000036 R_SPARC_UA64 0000000000000000 .debug_abbrev + 0
000000000136 000c00000036 R_SPARC_UA64 0000000000000000 .debug_abbrev + 3b
First one is obviously for the first file as the offset is too small and the
second one is intended to update the abbrev offset for second file. Note that
it is a RELA-type relocation, relocation with an addend, which in this case if
0x3b
or 59. It means that abbreviations table of the second file starts at
offset of 59 bytes in the .debug_abbrev
section. It also means that the
location this record is supposed to update probably contains zero (for REL-type
relocations, it would contain the addend - 59 in this case).
Here’s how it looks from the DWARF point of view (combined.o
):
.debug_info section
CU file1.o, abbrev_offset=0
DW_TAG_compile_unit (code=1) ---> (points to debug_abbrev entry N1)
...
...
CU file2.o, abbrev_offset=59
DW_TAG_compile_unit (code=1) ---> (points to debug_abbrev entry N59+1=60)
...
...
.debug_abbrev section
1: DW_TAG_compile_unit (code=1) <-- part of table for file1.o starts from here
DW_AT_producer DW_FORM_strp
DW_AT_language DW_FORM_data1
...
2: DW_TAG_variable (code=2)
DW_AT_name DW_FORM_strp
DW_AT_decl_file DW_FORM_data1
...
...
60: DW_TAG_compile_unit (code=1) <-- part of table for file2.o starts from here
DW_AT_producer DW_FORM_strp
DW_AT_language DW_FORM_data1
...
On x86, the relocation record is of type REL, which means that addend is
supposed to be in the location to be modified; in other words, in the
abbrev_offset
field. Therefore, on x86, the linker writes the correct offset
into the debug info header, making the relocation entry for the debug_abbrev
redundant, at least for dwarfdump
. Which is why dwarfdump “just works” on
x86 and sparcv8.
On SPARCv9 (as well as on x64),
the relocation record is of type RELA, meaning that addend is stored in
the relocation entry itself. So when producing a relocatable object file
(ld -r
), the linker does not touch the abbrev_offset
field in the section, it changes
the relocation record for second compilation unit (file2.o
) and puts
the correct offset into that relocation record. In order to obtain the right
value of abbrev_offset
, one has to perform relocation first.
References
- DWARF standard.
- David Anderson’s
page, the source of
dwarfdump
andlibdwarf
.