OCaml LLVM bindings tutorial, part 2
See also:
In the previous tutorial, we’ve seen how to use ocamlbuild
and make
to build
a simple application. In this part, we’ll start exploring the API, and see how
to access values and attributes of LLVM objects.
The base of the code is the same as in part 1: it reads an existing LLVM bitcode file, for example one generated by clang.
As in previous tutorial part, knowing the LLVM C++ API is not required (but can help).
LLVM objects
The top-level container is a module (llmodule
). The module contains global
variables, types and functions, which in turn contains basic blocks, and basic blocks
contain instructions.
Values
In the OCaml bindings, all objects (variables, functions, instructions) are
instances of the opaque type llvalue
.
A value has a type, a name, a definition, a list of users, and other things like attributes (for ex. visibility or linkage options) or aliases.
Each value has a type (lltype
), which is a composite object to define the type
of a value and its arguments. To match the real type, it needs to be converted
to a TypeKind.t
:
let rec print_type llty =
let ty = Llvm.classify_type llty in
match ty with
| Llvm.TypeKind.Function -> Printf.printf " function\n"
| Llvm.TypeKind.Pointer -> Printf.printf " pointer to" ; print_type (Llvm.element_type llty)
| _ -> Printf.printf " other type\n"
We define a simple function to print a few informations about the input
llvalue
argument:
let print_val lv =
Printf.printf "Value\n" ;
Printf.printf " name %s\n" (Llvm.value_name lv) ;
let llty = Llvm.type_of lv in
Printf.printf " type %s\n" (Llvm.string_of_lltype llty) ;
print_type llty ;
()
Functions
The lookup_function
can be used to get the llvalue
associated to a function.
It returns an llvalue option
, so we must use match
to check if the function exists:
let opt_lv = Llvm.lookup_function "main" llm in
match opt_lv with
| Some lv -> print_val lv
| None -> Printf.printf "'main' function not found\n"
If you don’t know the name of the functions, or simply wants to iterate on all
functions, you can use the iter_functions
, fold_left_functions
, and similar functions:
Llvm.iter_functions print_val llm ;
let count =
Llvm.fold_left_functions
(fun acc lv ->
print_val lv ;
acc + 1
)
0
llm
in
Printf.printf "Functions count: %d\n" count ;
If you run the above code, please note that when iterating on functions, you always get a pointer to the function, not the function directly.
As usual in OCaml, it is better to use the tail-recursive functions (for ex,
fold_right_functions
is not), especially when running on large LLVM modules.
Hopefully, the documentation clearly indicates if the iteration functions are
tail-recursive or not.
Basic blocks and instructions
In LLVM, a function is made of basic blocks, which are lists of instructions.
Basic blocks have zero or more instructions, but they must be ended by a
terminator instruction, which indicates which blocks must be executed after
the current one is ended. Basically, a terminator instruction is a flow change
(ret
, br
, switch
, indirectbr
, invoke
, resume
), or unreachable
.
A function has at least one basic block, the entry point.
The LLVM instructions are in single-step assignment (SSA) form: a value is created by an instruction and can be assigned only once, and an instruction must only use values that are previously defined (in more precise words, the definition of a value must dominate all of its uses).
It is very important that the LLVM bitcode is well-formed: all constraints
will be checked by the compiler, and the module will be rejected if not correct.
Or, since the LLVM source code is abused the assert
instruction a lot, you
will get a segmentation fault if the compiler is in release mode …
For example, to iterate on all instructions of all basic blocks of a function:
let print_fun lv =
Llvm.iter_blocks
(fun llbb ->
Printf.printf " bb: %s\n" (Llvm.value_name (Llvm.value_of_block (llbb))) ;
Llvm.iter_instrs
(fun lli ->
Printf.printf " instr: %s\n" (Llvm.string_of_llvalue lli)
)
llbb
)
lv
Note that the order on the iteration of basic blocks is the iteration on the oriented graph (the control flow graph) of the function.
Global variables
Access to global variables is done using similar functions: iter_globals
,
fold_left_globals
, etc.
Next time
In this part, we’ve covered how to access base elements of LLVM using the OCaml bindings. Using this, it is rather easy to develop applications to analyze LLVM bitcode, check some properties, etc.
Example code is in the part2
directory of project
ocaml-llvm-tutorial.
To get it, run
$ git clone https://github.com/chifflier/ocaml-llvm-tutorial.git
$ cd ocaml-llvm-tutorial
$ cd part2
$ make
In part 3, we’ll see how to create or modify LLVM bitcode: functions, instructions, values, etc.