OCaml LLVM bindings tutorial, part 4

Target Triple and Data Layout

While LLVM IR is (or should be) target independent, there are a few things that are not. For example, the support for some instructions, the padding and alignment inside structures, the endianness, the size of pointers, etc. All these things are specified in two attributes of modules: the target triple, and the data layout.

In the current (3.5) version of LLVM, these two attributes are optional. However, they could become mandatory in the future, so it is best specifying them.

Note: in my personal opinion, specifying that inside the module is clearly redundant with the -march= option of llc. Most of this could have been handled by compiler flags, instead of creating situations where one can give a target triple in the module, and use a different target on the command-line. Let’s suppose that there are “historic” reasons.

The target triple is a string that describes the target host. It is usually a simple string (i686), or a minus-separated (-) string to give the full architecture (x86_64-apple-macosx10.7.0). It is the same as the argument of the -march=<target> option of clang or gcc, so this one should be easy to guess.

The data layout is a compact string, for example e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128, that describes the specifications of the data layout in memory. All fields are minus-separated (-).

In the previous example string, this can be decoded as:

e: little-endian
m:e: ELF mangling of names is enabled
p:32:32: size of a pointer is 32 bits, preferred alignment is 32 bits
f64:32:64: for floating point size 64 bits, abi is 32 bits and alignment is 64 bits
f80:32: for floating point size 80 bits, abi is 32 bits
n8:16:32: set of native integer widths of target CPU
S128: natural alignment of stack is 128 bits

The string format is detailed in the LLVM datalayout section of the LLVM Language Reference.

Specifying the target triple and data layout can be tedious and error-prone. Instead of building the string manually, we’ll use the LLVM functions to find the target, the machine and the data layout from the target triple:

let lltarget  = Llvm_target.Target.by_triple triple in
let llmachine = Llvm_target.TargetMachine.create ~triple:triple lltarget in
let lldly     = Llvm_target.TargetMachine.data_layout llmachine in

Here, triple is the name of the target architecture, for example x86 of x86_64. Then, we can set this information into the module:

set_target_triple (Llvm_target.TargetMachine.triple llmachine) llm ;
set_data_layout (Llvm_target.DataLayout.as_string lldly) llm ;

If you want to print the values (for debugging purposes):

Printf.printf "lltarget: %s\n" (Llvm_target.Target.name lltarget);
Printf.printf "llmachine: %s\n" (Llvm_target.TargetMachine.triple llmachine);
Printf.printf "lldly: %s\n" (Llvm_target.DataLayout.as_string lldly) ;

We create a function to easily add the data layout and target triple, and will use that for every tutorial from now.

Module verification

To verify a module, LLVM provides a very help function that will run many tests, and print the validation report to stderr, and abort if the module is invalid. To call it, just add the llvm.analysis pkg to the Makefile, and call:

Llvm_analysis.assert_valid_module llm ;

Trust me, you really want to use this. This will save you a lot of trouble. In fact, if you produce an invalid LLVM module, all tools will probably just segfault, including OCaml bindings functions. Unless you compiled LLVM in debug mode, the segfaults give no clue of what the problem is. So if you don’t want to become crazy, always verify the generated LLVM modules.

Calling functions

As usual in C, to call a function you first need to declare its prototype. In the previous tutorial, we’ve seen how to declare the prototype of a simple (fixed number of arguments) function, for example, to declare the equivalent to the C function int32_t test(void):

let i32_t = i32_type llctx in
let fty = function_type i32_t [| |] in
let f = define_function "test" fty llm in

Our current example is to create the call printf("Hello, world!\n"). However, printf belongs to another kind of functions, accepting a variable number of arguments.

The first argument of printf is a (constant) string. There is no such type in LLVM, the equivalent being a pointer to an integer of size 8 (int8_t *).

We define the equivalent prototype of int32_t printf(int8_t*, ...):

let i8_t = i8_type llctx in
let i32_t = i32_type llctx in
let printf_ty = var_arg_function_type i32_t [| pointer_type i8_t |] in
let printf = declare_function "printf" printf_ty llm in

This gives a perfectly usable definition of printf. While this works, we also should add some function attributes. These attributes are important, because they help the LLVM compiler for optimizations and verifications, and in some cases they are even required to not generate wrong code. Attributes are defined in the Attribute module of llvm.mli.

One attribute to add to the printf function is nounwind, meaning that it will not raise any exception:

add_function_attr printf Attribute.Nounwind ;

The other kind of attributes that can be set is on parameters. Here, the nocapture attribute is added on the first parameter, to declare that printf does not make any copy of it, that survives the callee of printf.

add_param_attr (param printf 0) Attribute.Nocapture ;

Remember that attributes are declarative, they are not checked. If you declare wrong attributes, the compiler can generate wrong code, that will probably be invalid or segfault at runtime.

Now that the prototype is correct, we only need to call printf. The last thing to do is to create the constant string.

In LLVM, a constant string is a global constant, defined as a NULL-terminated array of characters. It needs to be declared as a global value:

let s = build_global_stringptr "Hello, world!\n" "" llbuilder in

Remember that this only works for constant strings.

Last thing before using it as argument to printf: the type of the constant is not the same. The constant has type [15 x i8], which means an array of 15 elements of integers of size 8, while the expected type is i8*.

It’s not the same (even if some C programmers thinks so), so it must be converted to get the address of the first element of the array. This is done using the getelementptr function (often called GEP):

let zero = const_int i32_t 0 in
let s = build_in_bounds_gep s [| zero |] "" llbuilder in

Note that, this function is so confusing that it has its own FAQ in the documentation!

Finally, call the printf function, and return:

let _ = build_call printf [| s |] "" llbuilder in
let _ = build_ret (const_int i32_t 0) llbuilder in

Test

The Previous tutorial already covered the compilation of the module, so I’ll just show the instructions:

$ LD_LIBRARY_PATH=/usr/lib/ocaml/llvm-3.5/ ./build/tutorial04/src/tutorial04.byte 2>hello.bc
$ llc-3.5 hello.bc
$ clang -o hello hello.s

and the execution:

$ ./hello 
Hello, world!

Additional notes

In fact, the conversion is optional (at least in OCaml bindings for LLVM 3.5). If you call printf without the GEP, dumping the module will show you that LLVM has inserted an inline GEP:

%0 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([15 x i8]* @0, i32 0, i32 0))

Next time

This part may seem very long for a rather small result. Yet, we’ve seen some useful functions: how to set the target triple and data layout, and how to verify the module. These functions should always be used, to simplify creating valid code, and not spend hours debugging LLVM for segfaults.

The complete code is in the part4 directory of project ocaml-llvm-tutorial.

In part 5 we’ll see how to use the execution engine to create dynamically generated code, using the JIT (Just-In Time) engine.

Feedback/comments welcome!

Other articles

OCaml LLVM bindings tutorial, part 3
Date Wed 24 June 2015 Tags Programming OCaml LLVM Compiler

See also:
- OCaml LLVM bindings tutorial, part 1
- OCaml LLVM bindings tutorial, part 2
The previous articles explain how to build applications using the OCaml-LLVM bindings, and how to use the API to manipulate the LLVM objects. This was the “read-only” part of the tutorial, which can be used to analyze LLVM IR.

This part explains how to create LLVM IR, and write a simple application from scratch, and see how to build and run it.

Modules

As in the previous tutorial, we need to create a context and a module:
```
let llctx = global_context () in
let llm = create_module llctx "mymodule" in
```
Functions

There are two actions that can be done on functions:
- declare_function to give only a declaration of the prototype,
- define_function to give both the declaration and the implementation.
In both cases, we need to give the signature (return type, number and type of arguments) of the function.

This is pretty similar to C. We’ll use this to declare the function int main(void).

The int type is a bit problematic in LLVM (and in C, but for other reasons): integer types must have a known size in LLVM. While this does not change the architecture-independent property …
read more
OCaml LLVM bindings tutorial, part 2
Date Thu 11 June 2015 Tags Programming OCaml LLVM Compiler

See also:
- OCaml LLVM bindings tutorial, part 1
In the previous tutorial, we’ve seen how to use ocamlbuild and make to build a simple application. In this part, we’ll start exploring the API, and see how to access values and attributes of LLVM objects.

The base of the code is the same as in part 1: it reads an existing LLVM bitcode file, for example one generated by clang.

As in previous tutorial part, knowing the LLVM C++ API is not required (but can help).

LLVM objects

The top-level container is a module (llmodule). The module contains global variables, types and functions, which in turn contains basic blocks, and basic blocks contain instructions.

Values

In the OCaml bindings, all objects (variables, functions, instructions) are instances of the opaque type llvalue.

A value has a type, a name, a definition, a list of users, and other things like attributes (for ex. visibility or linkage options) or aliases.

Each value has a type (lltype), which is a composite object to define the type of a value and its arguments. To match the real type, it needs to be converted to a TypeKind.t:
```
let rec print_type llty =
  let ty = Llvm …
```
read more
OCaml LLVM bindings tutorial, part 1
Date Tue 09 June 2015 Tags Programming OCaml LLVM Compiler

This is the first part of a tutorial series, on how to use the OCaml bindings for LLVM. Why use OCaml bindings ? Because you can avoid using the C++ API, spending huge amounts of time compiling Clang sources, then your plugin, then debugging the segfaults again and again. The bindings are stable, cover most of the API, and are quite simple to use, thanks to the Debian packages.

This tutorial is written based on a Debian Sid, things may differ but should stay similar on other distributions.

The objectives of this first part are:
- install the required packages
- setup a build environment for ocamlbuild
- build a simple application that reads an LLVM bitcode file and prints it
Installation

The required packages are:
- llvm-3.5-dev
- libllvm-3.5-ocaml-dev
- the LLVM and OCaml compilers (llvm-3.5, ocaml)
- optionally, clang
The current LLVM version is 3.6, however the OCaml bindings are currently disabled (See Debian bug #783919), because of changes in the required dependencies.

Project Layout

The sources are organized as follows:
```
part1/
├── build
├── Makefile
└── src
    └── tutorial01.ml
```
First application

First, create file src/tutorial01.ml:
```
let _ =
  let llctx = Llvm.global_context () in
  let llmem = Llvm.MemoryBuffer.of_file Sys.argv.(1) in
  let …
```
read more
Materials for my talk at SSTIC 2015 - PICON : Control Flow Integrity on LLVM IR
Date Mon 08 June 2015 Tags Programming LLVM Compiler Security

Here are the materials for the talk PICON : Control Flow Integrity on LLVM IR, given during SSTIC 2015. While SSTIC is a french-speaking conference, I publish here in English because my other posts also are in English.

Here is the summary, from the website:

Control flow integrity has been a well explored field of software security for more than a decade.

However, most of the proposed approaches are stalled in a proof of concept state - when the implementation is publicly available - or have been designed with a minimal performance overhead as their primary objective, sacrificing security.

Currently, none of the proposed approaches can be used to fully protect real-world programs compiled with most common compilers (e.g. GCC, Clang/LLVM).

In this paper we describe a control flow integrity enforcement mechanism whose main objective is security. Our approach is based on compile-time code instrumentation, making the program communicate with its external execution monitor. The program is terminated by the monitor as soon as a control flow integrity violation is detected.

Our approach is implemented as an LLVM plugin and is working on LLVM’s Intermediate Representation.
- Article (EN)
- Slides (FR)
- Video (FR)
Code is currently being published (with an opensource …
read more
new project: djedi
Date Mon 09 May 2011 Tags Visualization Python Graphs

I have started a new project (yet another), pretty different from my usual programming languages: a framework for visualizing data in a browser. This framework is a Extract-Transform-Visualize tool, where data come from a database and are rendered by the browser.

Features

While some other project exist, I wanted to create a project with the following features:
- simplicity: it provide objects (widgets) that you just place in your page as you want. It also provides dashboards to manage widgets, and in its simplest form you just give the name of a div element where a graph will be rendered.
- modularity: every part of the project can be replaced easily by another component, either on the server-side (you only need an ajax server, not especially django) or the client-side (you can use javascript, svg, flash etc.)
- interactive: interactions are important, to make the interface pretty, and also to navigate in data, or to enhance visualization. Most recent web toolkits allow a good number of interactions and animations (and most of them, without using flash)
- working with big data sets: existing toolkits generally fail when dealing with big databases. Here, all requests are asynchronous and are designed to work on big tables …
read more
Python scripts in GDB
Date Mon 20 December 2010 Tags Programming gdb debug Python

Since version 7.0, gdb has gained the ability to execute Python scripts. This allows to write gdb extensions, commands, or manipulate data in a very easy way. It can also allow to manipulate graphic data (by spawning commands in threads), change the program, or even write a firewall (ahem ..). I’ll assume you’re familiar with both gdb commands and basic Python scripts.

The first and very basic test is to check a simple command
```
(gdb) python print "Hello, world !"
Hello, world !
```
So far so good. Yet, printing hello world won’t help us to debug our programs :)

The reference documentation can be found here, but does not really help for really manipulating data. I’ll try to give a few examples here.

The Python script

The first thing to do is to write a script (we’ll call it gdb-wzdftpd.py) containing the Python commands.

We will define a command to print the Glib’s type GList, with nodes and content (which is stored using a void*).

To define a new command, we have to create a new class inherited from gdb.Command. This class has two mandatory methods, __init__ and invoke.

Gdb redirects stdout and stderr to …
read more
animated charts in python and Qt
Date Sun 04 October 2009 Tags Programming Python Graphs Visualization

I’m currently trying to generated interactive (and animated) charts in Python + Qt. The wanted library would be:
- portable: this is one of the reasons of the choice of PyQt
- simple: same reason
- interactive: I want to be able to select, for example, the slices of a pie chart. A signal of events like Qt’s would be perfect
- animated: this is useless, but looking at things like AnyChart or FusionCharts, the result is really nice !
- light on dependencies: relying on tons of libs makes the project hard to maintain and not portable, especially for windows where there is not packaging and dependency system.
- free software
A quick search gave me the following products:
- matplotlib: mostly for scientific plots, but there is a nice number of options, a well-documented API.
- pyQwt: Python bindings for Qwt. Again, it’s more scientific plot than charts
- cairoplot: projects looks dead (or in the "yeah, the project’s not finished, but we’re recoding it in \$LANG to be faster" syndrome, which is more or less the same). It generates images, though item maps can be extracted. The name tells it, it uses Cairo.
- pyCha: some nice charts, uses Cairo. Very simple API (not …
read more
libnetfilter-{queue,log} bindings release
Date Sun 03 May 2009 Tags Netfilter Python Perl Programming

I just released nfqueue-bindings 0.2 and nflog-bindings 0.1. Despite the difference of versions, functions are almost the same :)

Here is a short diff since previous version:
```
Add af_family argument to bind operations (allow IPv6 binds)
Add notes on set_queue_maxlen requiring a kernel >= 2.6.20
bugfix: use queue number when creating queue
bugfix: really link Perl binding to Perl library 
Fix cmake warning
```
Get them on nfqueue-bindings and nflog-bindings.

read more
NFQueue bindings (2)
Date Sun 08 June 2008 Tags Netfilter Python Perl Firewall

The code for nfqueue-bindings is now almost ready, I have made some progress since last week:
- you can now modify packets in live, and send the new packet with the verdict
- new functions are wrapped, and the creation of the queue can be done in one function
- more examples
I have presented a special script for SSTIC, using the weather to decide if a packet should be accepted or dropped :)While the utility of the module still has to be proven, it is a good example of how easy it is to use the new bindings.

The slides can be found online here, and contains some code examples (with some funny things ;). They are in french, but they should be quite easy to understand.

Random ideas:
- The Netfilter workshop will be held in Paris from 30 September to 3 October 2008.
- Eric has presented nf3d, a nice tool to view netfilter logs (from ulogd2) in 3D.
Gamers will recognize a nice try to convert network logs into Guitar Hero tracks ;)
- Some people have weird habits at SSTIC !
read more

Page 1 / 2 »

links

Tags

Pollux's corner Programming

Other articles

OCaml LLVM bindings tutorial, part 3

Modules

Functions

OCaml LLVM bindings tutorial, part 2

LLVM objects

Values

OCaml LLVM bindings tutorial, part 1

Installation

Project Layout

First application

Materials for my talk at SSTIC 2015 - PICON : Control Flow Integrity on LLVM IR

new project: djedi

Features

Python scripts in GDB

The Python script

animated charts in python and Qt

libnetfilter-{queue,log} bindings release

NFQueue bindings (2)

links

Social

Tags

Pollux's corner Programming

Other articles

Modules

Functions

LLVM objects

Values

Installation

Project Layout

First application

Features

The Python script