File.........: 9 - Build procedure overview.txt
Copyright....: (C) 2011 Yann E. MORIN <yann.morin.1998@anciens.enib.fr>
License......: Creative Commons Attribution Share Alike (CC-by-sa), v2.5


How is a toolchain constructed? /
_______________________________/

This is the result of a discussion with Francesco Turco <mail@fturco.org>:
  http://sourceware.org/ml/crossgcc/2011-01/msg00060.html

Francesco has a nice tutorial for beginners, along with a sample, step-by-
step procedure to build a toolchain for an ARM target from an x86_64 Debian
host:
  http://fturco.org/wiki/doku.php?id=debian:cross-compiler

Thank you Francesco for initiating this!


I want a cross-compiler! What is this toolchain you're speaking about? |
-----------------------------------------------------------------------+

A cross-compiler is in fact a collection of different tools set up to
tightly work together. The tools are arranged in a way that they are
chained, in a kind of cascade, where the output from one becomes the
input to another one, to ultimately produce the actual binary code that
runs on a machine. So, we call this arrangement a "toolchain". When
a toolchain is meant to generate code for a machine different from the
machine it runs on, this is called a cross-toolchain.


So, what are those components in a toolchain? |
----------------------------------------------+

The components that play a role in the toolchain are first and foremost
the compiler itself. The compiler turns source code (in C, C++, whatever)
into assembly code. The compiler of choice is the GNU compiler collection,
well known as 'gcc'.

The assembly code is interpreted by the assembler to generate object code.
This is done by the binary utilities, such as the GNU 'binutils'.

Once the different object code files have been generated, they got to get
aggregated together to form the final executable binary. This is called
linking, and is achieved with the use of a linker. The GNU 'binutils' also
come with a linker.

So far, we get a complete toolchain that is capable of turning source code
into actual executable code. Depending on the Operating System, or the lack
thereof, running on the target, we also need the C library. The C library
provides a standard abstraction layer that performs basic tasks (such as
allocating memory, printing output on a terminal, managing file access...).
There are many C libraries, each targeted to different systems. For the
Linux /desktop/, there is glibc or eglibc or even uClibc, for embedded Linux,
you have a choice of eglibc or uClibc, while for system without an Operating
System, you may use newlib, dietlibc, or even none at all. There a few other
C libraries, but they are not as widely used, and/or are targeted to very
specific needs (eg. klibc is a very small subset of the C library aimed at
building constrained initial ramdisks).

Under Linux, the C library needs to know the API to the kernel to decide
what features are present, and if needed, what emulation to include for
missing features. That API is provided by the kernel headers. Note: this
is Linux-specific (and potentially a very few others), the C library on
other OSes do not need the kernel headers.


And now, how do all these components chained together? |
-------------------------------------------------------+

So far, all major components have been covered, but yet there is a specific
order they need to be built. Here we see what the dependencies are, starting
with the compiler we want to ultimately use. We call that compiler the
'final compiler'.

  - the final compiler needs the C library, to know how to use it,
but:
  - building the C library requires a compiler

A needs B which needs A. This is the classic chicken'n'egg problem... This
is solved by building a stripped-down compiler that does not need the C
library, but is capable of building it. We call it a bootstrap, initial, or
core compiler. So here is the new dependency list:

  - the final compiler needs the C library, to know how to use it,
  - building the C library requires a core compiler
but:
  - the core compiler needs the C library headers and start files, to know
    how to use the C library

B needs C which needs B. Chicken'n'egg, again. To solve this one, we will
need to build a C library that will only install its headers and start
files. The start files are a very few files that gcc needs to be able to
turn on thread local storage (TLS) on an NPTL system. So now we have:

  - the final compiler needs the C library, to know how to use it,
  - building the C library requires a core compiler
  - the core compiler needs the C library headers and start files, to know
    how to use the C library
but:
  - building the start files require a compiler

Geez... C needs D which needs C, yet again. So we need to build a yet
simpler compiler, that does not need the headers and does need the start
files. This compiler is also a bootstrap, initial or core compiler. In order
to differentiate the two core compilers, let's call that one "core pass 1",
and the former one "core pass 2". The dependency list becomes:

  - the final compiler needs the C library, to know how to use it,
  - building the C library requires a compiler
  - the core pass 2 compiler needs the C library headers and start files,
    to know how to use the C library
  - building the start files requires a compiler
  - we need a core pass 1 compiler

And as we said earlier, the C library also requires the kernel headers.
There is no requirement for the kernel headers, so end of story in this
case:

  - the final compiler needs the C library, to know how to use it,
  - building the C library requires a core compiler
  - the core pass 2 compiler needs the C library headers and start files,
    to know how to use the C library
  - building the start files requires a compiler and the kernel headers
  - we need a core pass 1 compiler

We need to add a few new requirements. The moment we compile code for the
target, we need the assembler and the linker. Such code is, of course,
built from the C library, so we need to build the binutils before the C
library start files, and the complete C library itself. Also, some code
in gcc will turn to run on the target as well. Luckily, there is no
requirement for the binutils. So, our dependency chain is as follows:

  - the final compiler needs the C library, to know how to use it, and the
    binutils
  - building the C library requires a core pass 2 compiler and the binutils
  - the core pass 2 compiler needs the C library headers and start files,
    to know how to use the C library, and the binutils
  - building the start files requires a compiler, the kernel headers and the
    binutils
  - the core pass 1 compiler needs the binutils

Which turns in this order to build the components:

  1 binutils
  2 core pass 1 compiler
  3 kernel headers
  4 C library headers and start files
  5 core pass 2 compiler
  6 complete C library
  7 final compiler

Yes! :-) But are we done yet?

In fact, no, there are still missing dependencies. As far as the tools
themselves are involved, we do not need anything else.

But gcc has a few pre-requisites. It relies on a few external libraries to
perform some non-trivial tasks (such as handling complex numbers in
constants...). There are a few options to build those libraries. First, one
may think to rely on a Linux distribution to provide those libraries. Alas,
they were not widely available until very, very recently. So, if the distro
is not too recent, chances are that we will have to build those libraries
(which we do below). The affected libraries are:

  - the GNU Multiple Precision Arithmetic Library, GMP
  - the C library for multiple-precision floating-point computations with
    correct rounding, MPFR
  - the C library for the arithmetic of complex numbers, MPC

The dependencies for those libraries are:

  - MPC requires GMP and MPFR
  - MPFR requires GMP
  - GMP has no pre-requisite

So, the build order becomes:

  1 GMP
  2 MPFR
  3 MPC
  4 binutils
  5 core pass 1 compiler
  6 kernel headers
  7 C library headers and start files
  8 core pass 2 compiler
  9 complete C library
 10 final compiler

Yes! Or yet some more?

This is now sufficient to build a functional toolchain. So if you've had
enough for now, you can stop here. Or if you are curious, you can continue
reading.

gcc can also make use of a few other external libraries. These additional,
optional libraries are used to enable advanced features in gcc, such as
loop optimisation (GRAPHITE) and Link Time Optimisation (LTO). If you want
to use these, you'll need three additional libraries:

To enable GRAPHITE:
  - the Parma Polyhedra Library, PPL
  - the Chunky Loop Generator, using the PPL backend, CLooG/PPL

To enable LTO:
  - the ELF object file access library, libelf

The dependencies for those libraries are:

  - PPL requires GMP
  - CLooG/PPL requires GMP and PPL
  - libelf has no pre-requisites

The list now looks like (optional libs with a *):

  1 GMP
  2 MPFR
  3 MPC
  4 PPL *
  5 CLooG/PPL *
  6 libelf *
  7 binutils
  8 core pass 1 compiler
  9 kernel headers
 10 C library headers and start files
 11 core pass 2 compiler
 12 complete C library
 13 final compiler

This list is now complete! Wouhou! :-)


So the list is complete. But why does crosstool-NG have more steps? |
--------------------------------------------------------------------+

The already thirteen steps are the necessary steps, from a theoretical point
of view. In reality, though, there are small differences; there are three
different reasons for the additional steps in crosstool-NG.

First, the GNU binutils do not support some kinds of output. It is not possible
to generate 'flat' binaries with binutils, so we have to use another component
that adds this support: elf2flt. Another binary utility called sstrip has been
added. It allows for super-stripping the target binaries, although it is not
strictly required.

Second, some C libraries require another step after the compiler is built, to
install additional stuff. This is the case for mingw and newlib. Hence the
libc_finish step.

Third, crosstool-NG can also build some additional debug utilities to run on
the target. This is where we build, for example, the cross-gdb, the gdbserver
and the native gdb (the last two run on the target, the first runs on the
same machine as the toolchain). The others (strace, ltrace, DUMA and dmalloc)
are absolutely not related to the toolchain, but are nice-to-have stuff that
can greatly help when developing, so are included as goodies (and they are
quite easy to build, so it's OK; more complex stuff is not worth the effort
to include in crosstool-NG).
