OCaml LLVM bindings tutorial, part 4

Target Triple and Data Layout

While LLVM IR is (or should be) target independent, there are a few things that are not. For example, the support for some instructions, the padding and alignment inside structures, the endianness, the size of pointers, etc. All these things are specified in two attributes of modules: the target triple, and the data layout.

In the current (3.5) version of LLVM, these two attributes are optional. However, they could become mandatory in the future, so it is best specifying them.

Note: in my personal opinion, specifying that inside the module is clearly redundant with the -march= option of llc. Most of this could have been handled by compiler flags, instead of creating situations where one can give a target triple in the module, and use a different target on the command-line. Let’s suppose that there are “historic” reasons.

The target triple is a string that describes the target host. It is usually a simple string (i686), or a minus-separated (-) string to give the full architecture (x86_64-apple-macosx10.7.0). It is the same as the argument of the -march=<target> option of clang or gcc, so this one should be easy to guess.

The data layout is a compact string, for example e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128, that describes the specifications of the data layout in memory. All fields are minus-separated (-).

In the previous example string, this can be decoded as:

e: little-endian
m:e: ELF mangling of names is enabled
p:32:32: size of a pointer is 32 bits, preferred alignment is 32 bits
f64:32:64: for floating point size 64 bits, abi is 32 bits and alignment is 64 bits
f80:32: for floating point size 80 bits, abi is 32 bits
n8:16:32: set of native integer widths of target CPU
S128: natural alignment of stack is 128 bits

The string format is detailed in the LLVM datalayout section of the LLVM Language Reference.

Specifying the target triple and data layout can be tedious and error-prone. Instead of building the string manually, we’ll use the LLVM functions to find the target, the machine and the data layout from the target triple:

let lltarget  = Llvm_target.Target.by_triple triple in
let llmachine = Llvm_target.TargetMachine.create ~triple:triple lltarget in
let lldly     = Llvm_target.TargetMachine.data_layout llmachine in

Here, triple is the name of the target architecture, for example x86 of x86_64. Then, we can set this information into the module:

set_target_triple (Llvm_target.TargetMachine.triple llmachine) llm ;
set_data_layout (Llvm_target.DataLayout.as_string lldly) llm ;

If you want to print the values (for debugging purposes):

Printf.printf "lltarget: %s\n" (Llvm_target.Target.name lltarget);
Printf.printf "llmachine: %s\n" (Llvm_target.TargetMachine.triple llmachine);
Printf.printf "lldly: %s\n" (Llvm_target.DataLayout.as_string lldly) ;

We create a function to easily add the data layout and target triple, and will use that for every tutorial from now.

Module verification

To verify a module, LLVM provides a very help function that will run many tests, and print the validation report to stderr, and abort if the module is invalid. To call it, just add the llvm.analysis pkg to the Makefile, and call:

Llvm_analysis.assert_valid_module llm ;

Trust me, you really want to use this. This will save you a lot of trouble. In fact, if you produce an invalid LLVM module, all tools will probably just segfault, including OCaml bindings functions. Unless you compiled LLVM in debug mode, the segfaults give no clue of what the problem is. So if you don’t want to become crazy, always verify the generated LLVM modules.

Calling functions

As usual in C, to call a function you first need to declare its prototype. In the previous tutorial, we’ve seen how to declare the prototype of a simple (fixed number of arguments) function, for example, to declare the equivalent to the C function int32_t test(void):

let i32_t = i32_type llctx in
let fty = function_type i32_t [| |] in
let f = define_function "test" fty llm in

Our current example is to create the call printf("Hello, world!\n"). However, printf belongs to another kind of functions, accepting a variable number of arguments.

The first argument of printf is a (constant) string. There is no such type in LLVM, the equivalent being a pointer to an integer of size 8 (int8_t *).

We define the equivalent prototype of int32_t printf(int8_t*, ...):

let i8_t = i8_type llctx in
let i32_t = i32_type llctx in
let printf_ty = var_arg_function_type i32_t [| pointer_type i8_t |] in
let printf = declare_function "printf" printf_ty llm in

This gives a perfectly usable definition of printf. While this works, we also should add some function attributes. These attributes are important, because they help the LLVM compiler for optimizations and verifications, and in some cases they are even required to not generate wrong code. Attributes are defined in the Attribute module of llvm.mli.

One attribute to add to the printf function is nounwind, meaning that it will not raise any exception:

add_function_attr printf Attribute.Nounwind ;

The other kind of attributes that can be set is on parameters. Here, the nocapture attribute is added on the first parameter, to declare that printf does not make any copy of it, that survives the callee of printf.

add_param_attr (param printf 0) Attribute.Nocapture ;

Remember that attributes are declarative, they are not checked. If you declare wrong attributes, the compiler can generate wrong code, that will probably be invalid or segfault at runtime.

Now that the prototype is correct, we only need to call printf. The last thing to do is to create the constant string.

In LLVM, a constant string is a global constant, defined as a NULL-terminated array of characters. It needs to be declared as a global value:

let s = build_global_stringptr "Hello, world!\n" "" llbuilder in

Remember that this only works for constant strings.

Last thing before using it as argument to printf: the type of the constant is not the same. The constant has type [15 x i8], which means an array of 15 elements of integers of size 8, while the expected type is i8*.

It’s not the same (even if some C programmers thinks so), so it must be converted to get the address of the first element of the array. This is done using the getelementptr function (often called GEP):

let zero = const_int i32_t 0 in
let s = build_in_bounds_gep s [| zero |] "" llbuilder in

Note that, this function is so confusing that it has its own FAQ in the documentation!

Finally, call the printf function, and return:

let _ = build_call printf [| s |] "" llbuilder in
let _ = build_ret (const_int i32_t 0) llbuilder in

Test

The Previous tutorial already covered the compilation of the module, so I’ll just show the instructions:

$ LD_LIBRARY_PATH=/usr/lib/ocaml/llvm-3.5/ ./build/tutorial04/src/tutorial04.byte 2>hello.bc
$ llc-3.5 hello.bc
$ clang -o hello hello.s

and the execution:

$ ./hello 
Hello, world!

Additional notes

In fact, the conversion is optional (at least in OCaml bindings for LLVM 3.5). If you call printf without the GEP, dumping the module will show you that LLVM has inserted an inline GEP:

%0 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([15 x i8]* @0, i32 0, i32 0))

Next time

This part may seem very long for a rather small result. Yet, we’ve seen some useful functions: how to set the target triple and data layout, and how to verify the module. These functions should always be used, to simplify creating valid code, and not spend hours debugging LLVM for segfaults.

The complete code is in the part4 directory of project ocaml-llvm-tutorial.

In part 5 we’ll see how to use the execution engine to create dynamically generated code, using the JIT (Just-In Time) engine.

Feedback/comments welcome!

gcc security features (part 2)

(See part 1)

Remember: you must compile with -02 if you want the checks to be effective

DEB_BUILD_HARDENING_FORTIFY (gcc/g++ -D_FORTIFY_SOURCE=2)

The idea behind FORTIFY_SOURCE is relatively simple: there are cases where the compiler can know the size of a buffer (if it’s a fixed sized buffer on the stack, as in the example, or if the buffer just came from a malloc() function call). With a known buffer size, functions that operate on the buffer can make sure the buffer will not overflow.

Example:

void foo(char *string)
{
    char buf[20];
    strcpy(buf, string);
}

Execution will fail:

[home ~/harden] ./bad $(perl -e 'print "A"x100')
zsh: segmentation fault  ./bad $(perl -e 'print "A"x100')

When compiling with -D_FORTIFY_SOURCE=2, gcc will add some checks to detect the overflow and terminate the program:

[home ~/harden] DEB_BUILD_HARDENING=1 make
[home ~/harden] ./bad $(perl -e 'print "A"x100')

*** buffer overflow detected ***: ./bad terminated
======= Backtrace: =========
/lib/libc.so.6(__fortify_fail+0x37)[0x2ba8d18fb787]
/lib/libc.so.6[0x2ba8d18f9e70]
./bad(main+0x26)[0x555555554856]
/lib/libc.so.6(__libc_start_main+0xf4)[0x2ba8d18411c4]
./bad[0x555555554789]
======= Memory map:  ========
2ba8d1607000-2ba8d1622000 r-xp 00000000 03:01 468316                     /lib/ld-2.7.so
2ba8d1622000-2ba8d1625000 rw-p 2ba8d1622000 00:00 0 
2ba8d1821000-2ba8d1823000 rw-p 0001a000 …

gcc security features (part 1)
Date Wed 02 April 2008 Tags Security Compiler Debian

Since recent versions (>= 4.0, maybe before), gcc (and ld) has some nice security features. Debian has created a wrapper for the toolchain, to make the use of these features easy.

To install the wrapper, run:
```
apt-get install hardening-wrapper
```
To enable the hardening features, you have to export the environment variable:
```
export DEB_BUILD_HARDENING=1
```
The features include additional checks for printf-like functions, stack protector, using address-space layout randomization (ASLR), marking ELF-sections as read-only after loading when possible, etc.

Please note that you must compile with *-02* if you want the checks to be effective

DEB_BUILD_HARDENING_FORMAT (gcc/g++ -Wformat -Wformat-security)

Ask gcc to make additional checks on format strings, to prevent attacks.

The following code, for ex:
```
printf(buf);
```
will result in a warning:
```
[home ~/harden] DEB_BUILD_HARDENING=1 make
gcc     bad.c   -o bad
bad.c: In function ‘main’:
bad.c:10: warning: format not a string literal and no format arguments
```
Why is this code vulnerable ? Because the buffer (buf) could contain format characters like %s, and the printf function will interpret these characters to pop arguments from the stack, and can result in the execution of arbitrary code.

Solution:
- Replace previous code by
```
printf("%s",buf);
```
- Remember this …
read more
Sections and variables initialization
Date Tue 19 February 2008 Tags Programming Compiler

Default init

ANSI C requires all uninitialized static and global variables to be initialized with 0 (§6.7.8 of the C99 definition). This means you can rely on the following behavior:
```
int global;
void function() {
  printf("%d\n",global);
}
```
This will print 0, and it is guaranteed by the standard.

However, this is not handled by the compiler. All you will be able to see is that the variable is put in the bss section:
```
08049560 l     O .bss   00000004              static_var.1279
08049564 g     O .bss   00000004              global_var
```
It is the startup code of the linker which initializes the variables.

The C compiler usually puts variables that are supposed to be initialized with 0 in the .bss section instead of the .data section. Opposed to the .data section, the .bss section does not contain actual data, it just specifies the size of all elements it contains. The C compiler just *assumes* that the linker, loader, or the startup code of the C library initializes this block of memory with 0. This is an optimization; .data elements occupy space in the image (or ROM or flash memory) and in RAM whereas .bss elements need to occupy RAM space only if …
read more

Page 1 / 1

links

Tags

Pollux's corner Compiler

Other articles

OCaml LLVM bindings tutorial, part 3

Modules

Functions

OCaml LLVM bindings tutorial, part 2

LLVM objects

Values

OCaml LLVM bindings tutorial, part 1

Installation

Project Layout

First application

Materials for my talk at SSTIC 2015 - PICON : Control Flow Integrity on LLVM IR

gcc security features (part 2)

DEB_BUILD_HARDENING_FORTIFY (gcc/g++ -D_FORTIFY_SOURCE=2)

gcc security features (part 1)

DEB_BUILD_HARDENING_FORMAT (gcc/g++ -Wformat -Wformat-security)

Sections and variables initialization

Default init

links

Social

Tags

Pollux's corner Compiler

Other articles

Modules

Functions

LLVM objects

Values

Installation

Project Layout

First application

DEB_BUILD_HARDENING_FORTIFY (gcc/g++ -D_FORTIFY_SOURCE=2)

DEB_BUILD_HARDENING_FORMAT (gcc/g++ -Wformat -Wformat-security)

Default init