A no-frills guide to the C/C++ compiler toolchain.
There are four distinct steps involved in transforming a C or C++ source file into an executable binary: preprocessing, compiling, assembling, and linking.
In theory each step is the responsibility of a dedicated tool: the preprocessor cpp
, the compiler cc
, the assembler as
, and the linker ld
. In practice the compiler will happily orchestrate all four steps for us and we can build a simple C or C++ program using a single command:
$ cc source.c
By default the resulting executable will be given the rather unappealing name of a.out
— short for assembler output — but we can fix this by specifying a custom output name:
$ cc -o name source.c
We'll look briefly below at each step of the compilation process and summarize some of the most useful options available.
The interface we'll describe was developed originally for GCC — the GNU C Compiler — and its supporting toolchain. This interface was later mimicked by Clang, which aimed to be a drop-in replacement for GCC, and so now applies to both. It's a little crufty and inconsistent but the desire for backwards compatibility means we're stuck with it for the foreseeable future.
(To avoid repeating the awkward "C or C++" all the time we'll assume below that we're compiling a C program, but the steps and options are identical for both. Just substitute a .cpp
extension in place of .c
for C++ source files.)
The C preprocessor cpp
is responsible for executing #
directives and expanding macros. It takes a .c
source file as input and outputs an expanded source file, still written in C.
Preprocessed files typically aren't retained, but when they are the convention is to give them a .i
extension. (I have no idea why.)
We can use the compiler's -E
flag to view the preprocessed source. Output is printed to standard out by default unless we also use the -o
flag to specify an output filename.
$ cc -E source.c
The following preprocessor options are available (and can be passed directly to the compiler):
-C
|
Retain source comments in the output. |
-D <name>=<value>
|
Define the named symbol before preprocessing. If no value is specified the symbol will have a default value of 1. |
-I <directory>
|
Add the specified directory to the search path for #include files.
|
-P
|
Omit debugging information from the output. |
-U <name>
|
Undefine the named symbol before preprocessing. |
The compiler cc
translates a source file written in C into assembly language.
Assembly language is a human-readable representation of the binary machine code that actually runs on the computer's hardware; as such it's specific to the CPU architecture of the target system.
Assembly language files typically aren't retained but we can view them using the compiler's -S
flag which halts the compilation process after they've been generated.
$ cc -S source.c
This will generate a .s
assembly file for each input file provided.
The assembler as
translates source files written in assembly language into executable binary code. It outputs a single .o
object file for each input file provided.
The compiler defaults to automatically deleting these object files but we can retain them using the -c
flag.
$ cc -c source.c
This instructs the compiler to compile and assemble the object files but stop before linking them into an executable.
Linking is the final stage of the compilation process. The linker ld
combines multiple object files into a single executable file. It also links in code from the standard library and any other external libraries referenced by the files.
The C standard library is linked in automatically. To link in a static library libfoo.a
located on the default library search path we use the -l
flag:
$ cc source.c -lfoo
Note that the standard lib
prefix and .a
(archive) extension are omitted. To link to a library that isn't on the default search path we have two options:
We can specify the library's full filepath as if it were a source or object file:
$ cc source.c /path/to/lib/libfoo.a
We can add the containing directory to the search path using the -L
flag:
$ cc source.c -L/path/to/lib -lfoo
Note that libraries must be specified after the source or object files that reference them.
The compiler will happily accept multiple input files in varying stages of compilation:
$ cc src.c asm.s obj.o
In this case src.c
will be compiled and assembled, asm.s
will be assembled, and the two resulting object files will be linked with obj.o
into an executable.
Turn on compiler warnings with the following flags:
-Wall -Wextra --std=c99 --pedantic
The -Wall
and -Wextra
flags turn on most of the compiler's available warnings. The --std=c99
flag instructs the compiler to use the C99 standard (available options include c90
and c11
). The final --pedantic
flag turns on a number of additional warnings specific to the particular standard chosen.
Warnings can be turned off individually, e.g.
-Wno-unused-parameter
will tell the compiler to stop bugging us about unused parameters.
A static library is simply a collection or archive of object files. Static libraries are created using the ar
(archiver) tool and by convention are given a lib
prefix and .a
extension.
$ ar -rv libfoo.a one.o two.o three.o
Static libraries are built into the executable at compiletime — they do not have to be present on the system at runtime.
A dynamic or shared object library is a special collection of object files that can be loaded by a program at runtime. Dynamic libraries are created using the compiler's --shared
flag and by convention are given a lib
prefix and .so
extension.
$ cc --shared -o libfoo.so one.o two.o three.o
Dynamic libraries can be used in two ways:
An executable can be linked against a dynamic library at compiletime. Multiple executables can then share a single library instance, which must be available on the system at runtime.
An executable can dynamically load and unload library files at runtime using the system's dynamic linking functions. Libraries used in this way can form the basis of a plugin system for an application.