Time to post another technical article from the queue, I think. I’ll be away on a family holiday for a week. Enjoy!
Anyway, ELF is a big deal for autopackage and Linux developers in general, but it’s largely mysterious. So in a similar vein to the Mach-O article, here is a quick tutorial on it. Actually Mach-O and ELF are very similar, but some of the names and commands are different.
First steps
The life of a process starts in the kernel. The kernel loader code maps the executable file into memory, then reads the ELF headers to locate the PT_INTERP header. Mapping it in is a very similar to Mach-O so I’ll defer to the linked article on that. PT_INTERP tells the kernel the path to the “ELF Interpreter” – better known as the dynamic linker:
mike@linux:/> readelf -aW /usr/bin/readelf|grep interpreter
[Requesting program interpreter: /lib/ld-linux.so.2]
On Linux this is always the same value, but it can be modified if there’s a reason to do so. The dynamic linker therefore runs entirely in userspace, and is bootstrapped into life by the kernel. That’s a common pattern that you can see on Windows and OS X too (and of course every ELF platform). The ELF file format and ABI is a UNIX-wide standard. Each UNIXs version is slightly different, naturally, but still the specification allows for a lot of code re-use.
Start of dynamic linking

The dynamic linker is some scary code. When it first gets control from the kernel, it can’t use global variables, nor can it do regular function calls. Instead it first dynamically links itself, and once its environment is sane, it proceeds to link the rest of the program together. This is a recursive process, which involves looking at the DT_NEEDED commands in the ELF file, and loading each shared library it needs into memory, then looking at the DT_NEEDED commands in that new shared library and so on.
You can see what an ELF file needs with the following command:
objdump -p myprogram | grep NEEDED
This is NOT the same as the ldd program, which dumps a list of every single thing loaded into a process by that ELF file, including dependencies of dependencies. Usually, if built correctly (with –as-needed) this list of needed files will be much smaller than the output from ldd.
Shared libraries and dependencies
The DT_NEEDED headers don’t specify file names directly. Instead they specify sonames. This is an additional layer of indirection designed to allow for simplistic versioning of files. It’s not enough, as we’ll see in a minute. The dynamic linker treats a soname as a filename however, and will scan directories in the search path looking for a filename equal to the given soname. Usually this will be a symbolic link to the real library.
The symlinks are created by the dynamic linker cache regeneration program, ldconfig. This program creates an mmappable cache file for fast lookup, and makes sure a symlink for each soname exists. If two libraries in a directory share the same soname, then ldconfig will try and figure out which one is newest by doing a numeric comparison of the filenames, and picking that one. The algorithm is simplistic and is implemented in _dl_cache_libcmp on Linux, so it’s worth being careful that your libraries will always compare correctly with this algorithm. Chances are good it will be fine if you use sensible naming.
This isn’t the same as allowing multiple versions to be in use at once, all soname versioning does it allow two to exist on disk simultaneously – the newer version will always override the older version however.
Lazy linking
The dynamic linkers job is to “link” programs together by rewriting the jump tables that sit at the top of the file in memory. Essentially, when one ELF file calls a function that is supposed to exist in another, the code generated first fetches its address from the global offset table (GOT), an array of addresses, and then jumps to it. Filling out this table is the dynamic linkers job, and is called symbol relocation. By default relocation is lazy …. that is, it’s delayed until the last possible moment. That’s a useful optimisation to make because often a program won’t call every function it’s linked to, it will depend on how the program is used. ELF is very slow at looking up symbols, so this also speeds up startup time.

To implement lazy linking, when the GOT is first initialized each entry points to an offset in the procedure linkage table (PLT). This is an array of code blocks. When executed the code block will jump into the dynamic linker and find the real address of the function, and then rewrite the GOT so next time the function is called the PLT is not involved. The PLT is generated ahead of time by the the “ld” compile time linker, and you can see it by running objdump again:
mike@linux:/> objdump -d -j .plt /usr/bin/objdump|head -n 20
/usr/bin/objdump: file format elf32-i386
Disassembly of section .plt:
08049da8 <xmalloc @plt-0x10>:
8049da8: ff 35 54 b9 08 08 pushl 0x808b954
8049dae: ff 25 58 b9 08 08 jmp *0x808b958
8049db4: 00 00 add %al,(%eax)
08049db8 <xmalloc @plt>:
8049db8: ff 25 5c b9 08 08 jmp *0x808b95c
8049dbe: 68 00 00 00 00 push $0x0
8049dc3: e9 e0 ff ff ff jmp 8049da8 <_ init +0x18>
08049dc8 <disassembler @plt>:
8049dc8: ff 25 60 b9 08 08 jmp *0x808b960
8049dce: 68 08 00 00 00 push $0x8
8049dd3: e9 d0 ff ff ff jmp 8049da8 <_ init +0x18>
The first number on the left is the absolute address in memory where the plt is supposed to be loaded. The series of two-digit hex numbers are the opcodes being fed to the CPU, and the instructions on the right are the disassembly of these code blocks. The first PLT entry is special and modified to point to the dynamic linker itself.
Symbol scoping
This particular design quirk of ELF has caused me and many other developers bad headaches. In ELF, you might expect that if your library has a DT_NEEDED entry for libfoo.so.1, and libfoo exports a function foo_function then calling foo_function from your code would result in control being transferred to libfoo.so.1 – but you’d be wrong.
In ELF every symbol is loaded into a global scope. That is, whenever a shared library is loaded, the symbols it exports are added to the end of a big global list, and every time the linker wishes to resolve a symbol it scans that list from head to tail looking for a match. This means that in fact the call to that function could end up anywhere, even deep inside the place you were being called from!
This problem tends to hit in the following ways:
- Somebody copies and pastes an internal function from one library to another, but changes the way it works or its prototype as the code evolves. The second library is one day used in the same program as the first, purely by chance, and the two different versions of the function interfere with each other.
- A library breaks backwards compatibility and uses a new soname. However, the program is still being wrongly linked to the old version of the library because it’s in use somewhere else by the program. For instance, consider the case of a C++ program like Inkscape which links against version 6 of the C++ standard library, and also GTKSpell, which in turn loads Enchant, which in turn loads GNU ASpell, which is also implemented in C++ and is linked against version 5 of the standard library. This doesn’t work because they conflict.
Symbol versioning and visibility
Symbol versioning is an attempt to fix this problem. It allows each symbol to be tagged with an extra piece of text that isn’t a part of the library API (that is, the way you write the code) but is part of the library ABI (that is, what the operating system sees). A symbol with a version tag is printed like this: foo_function@LIBFOO_1.2, and won’t match in a symbol scan against foo_function@LIBFOO_1.3.
Symbol visibility can help control the problem with people copying and pasting code then changing. Libraries and programs usually export their internals to the world, and these symbols are linked like any other. By setting the visibility of a symbol to “hidden” it’s taken out of this scheme, which quite apart from improving robustness also improves performance.
There are two ways to set symbol visibility – either at the same time as applying version tags, or independently of symbol versioning. To do it independently the -fvisibility=hidden switch can be given to GCC. To do it as part of the version tagging process, just use the * wildcard in the local: section of the version script. The GCC switch is more convenient for C++ users. Version scripts? Yes, that’s a file you feed to the “ld” compile time linker. A version script says what symbols should receive what version tag. You can set it from within the source code as well if you like, however, this is rather non-portable.
One symbol can have multiple versions. In other words, a single API can have multiple ABIs. This is a very exotic and GNU-specific feature, as a result, nearly nothing except glibc uses it. The ABI to use is decided at compile time, whichever is newest wins the fight and is “burned” into the binary. If that symbol version isn’t available at runtime the linker will refuse to start the program. This is not a very good versioning scheme in my opinion, because if a program is recompiled (as happens all the time in a world where compile == install) then it risks being linked against a symbol version that it’s not compatible with. It’s possible to select which symbol version you will be linked to in the source code ahead of time, however, the glibc headers don’t do that, so we have to force it using apbuild (this was the first of of our build-on-new, run-on-old tricks).
November 5, 2006 at 12:42 am |
Nice and simple article for a very complex thing. Thanks
Siva.