OCaml LLVM bindings tutorial, part 2
In the previous tutorial, we’ve seen how to use
make to build
a simple application. In this part, we’ll start exploring the API, and see how
to access values and attributes of LLVM objects.
The base of the code is the same as in part 1: it reads an existing LLVM bitcode file, for example one generated by clang.
As in previous tutorial part, knowing the LLVM C++ API is not required (but can help).
The top-level container is a module (
llmodule). The module contains global
variables, types and functions, which in turn contains basic blocks, and basic blocks
In the OCaml bindings, all objects (variables, functions, instructions) are
instances of the opaque type
A value has a type, a name, a definition, a list of users, and other things like attributes (for ex. visibility or linkage options) or aliases.
Each value has a type (
lltype), which is a composite object to define the type
of a value and its arguments. To match the real type, it needs to be converted
let rec print_type llty = let ty = Llvm.classify_type llty in match ty with | Llvm.TypeKind.Function -> Printf.printf " function\n" | Llvm.TypeKind.Pointer -> Printf.printf " pointer to" ; print_type (Llvm.element_type llty) | _ -> Printf.printf " other type\n"
We define a simple function to print a few informations about the input
let print_val lv = Printf.printf "Value\n" ; Printf.printf " name %s\n" (Llvm.value_name lv) ; let llty = Llvm.type_of lv in Printf.printf " type %s\n" (Llvm.string_of_lltype llty) ; print_type llty ; ()
lookup_function can be used to get the
llvalue associated to a function.
It returns an
llvalue option, so we must use
match to check if the function exists:
let opt_lv = Llvm.lookup_function "main" llm in match opt_lv with | Some lv -> print_val lv | None -> Printf.printf "'main' function not found\n"
If you don’t know the name of the functions, or simply wants to iterate on all
functions, you can use the
fold_left_functions, and similar functions:
Llvm.iter_functions print_val llm ; let count = Llvm.fold_left_functions (fun acc lv -> print_val lv ; acc + 1 ) 0 llm in Printf.printf "Functions count: %d\n" count ;
If you run the above code, please note that when iterating on functions, you always get a pointer to the function, not the function directly.
As usual in OCaml, it is better to use the tail-recursive functions (for ex,
fold_right_functions is not), especially when running on large LLVM modules.
Hopefully, the documentation clearly indicates if the iteration functions are
tail-recursive or not.
Basic blocks and instructions
In LLVM, a function is made of basic blocks, which are lists of instructions.
Basic blocks have zero or more instructions, but they must be ended by a
terminator instruction, which indicates which blocks must be executed after
the current one is ended. Basically, a terminator instruction is a flow change
A function has at least one basic block, the entry point.
The LLVM instructions are in single-step assignment (SSA) form: a value is created by an instruction and can be assigned only once, and an instruction must only use values that are previously defined (in more precise words, the definition of a value must dominate all of its uses).
It is very important that the LLVM bitcode is well-formed: all constraints
will be checked by the compiler, and the module will be rejected if not correct.
Or, since the LLVM source code is abused the
assert instruction a lot, you
will get a segmentation fault if the compiler is in release mode …
For example, to iterate on all instructions of all basic blocks of a function:
let print_fun lv = Llvm.iter_blocks (fun llbb -> Printf.printf " bb: %s\n" (Llvm.value_name (Llvm.value_of_block (llbb))) ; Llvm.iter_instrs (fun lli -> Printf.printf " instr: %s\n" (Llvm.string_of_llvalue lli) ) llbb ) lv
Note that the order on the iteration of basic blocks is the iteration on the oriented graph (the control flow graph) of the function.
Access to global variables is done using similar functions:
In this part, we’ve covered how to access base elements of LLVM using the OCaml bindings. Using this, it is rather easy to develop applications to analyze LLVM bitcode, check some properties, etc.
Example code is in the
part2 directory of project
To get it, run
$ git clone https://github.com/chifflier/ocaml-llvm-tutorial.git $ cd ocaml-llvm-tutorial $ cd part2 $ make
In part 3, we’ll see how to create or modify LLVM bitcode: functions, instructions, values, etc.