Other articles

  1. OCaml LLVM bindings tutorial, part 3

    See also:

    The previous articles explain how to build applications using the OCaml-LLVM bindings, and how to use the API to manipulate the LLVM objects. This was the “read-only” part of the tutorial, which can be used to analyze LLVM IR.

    This part explains how to create LLVM IR, and write a simple application from scratch, and see how to build and run it.

    Modules

    As in the previous tutorial, we need to create a context and a module:

    let llctx = global_context () in
    let llm = create_module llctx "mymodule" in
    

    Functions

    There are two actions that can be done on functions:

    • declare_function to give only a declaration of the prototype,
    • define_function to give both the declaration and the implementation.

    In both cases, we need to give the signature (return type, number and type of arguments) of the function.

    This is pretty similar to C. We’ll use this to declare the function int main(void).

    The int type is a bit problematic in LLVM (and in C, but for other reasons): integer types must have a known size in LLVM. While this does not change the architecture-independent property …

    read more
  2. OCaml LLVM bindings tutorial, part 2

    See also:

    In the previous tutorial, we’ve seen how to use ocamlbuild and make to build a simple application. In this part, we’ll start exploring the API, and see how to access values and attributes of LLVM objects.

    The base of the code is the same as in part 1: it reads an existing LLVM bitcode file, for example one generated by clang.

    As in previous tutorial part, knowing the LLVM C++ API is not required (but can help).

    LLVM objects

    The top-level container is a module (llmodule). The module contains global variables, types and functions, which in turn contains basic blocks, and basic blocks contain instructions.

    Values

    In the OCaml bindings, all objects (variables, functions, instructions) are instances of the opaque type llvalue.

    A value has a type, a name, a definition, a list of users, and other things like attributes (for ex. visibility or linkage options) or aliases.

    Each value has a type (lltype), which is a composite object to define the type of a value and its arguments. To match the real type, it needs to be converted to a TypeKind.t:

    let rec print_type llty =
      let ty = Llvm …
    read more
  3. OCaml LLVM bindings tutorial, part 1

    LLVM

    OCaml

    This is the first part of a tutorial series, on how to use the OCaml bindings for LLVM. Why use OCaml bindings ? Because you can avoid using the C++ API, spending huge amounts of time compiling Clang sources, then your plugin, then debugging the segfaults again and again. The bindings are stable, cover most of the API, and are quite simple to use, thanks to the Debian packages.

    This tutorial is written based on a Debian Sid, things may differ but should stay similar on other distributions.

    The objectives of this first part are:

    • install the required packages
    • setup a build environment for ocamlbuild
    • build a simple application that reads an LLVM bitcode file and prints it

    Installation

    The required packages are:

    • llvm-3.5-dev
    • libllvm-3.5-ocaml-dev
    • the LLVM and OCaml compilers (llvm-3.5, ocaml)
    • optionally, clang

    The current LLVM version is 3.6, however the OCaml bindings are currently disabled (See Debian bug #783919), because of changes in the required dependencies.

    Project Layout

    The sources are organized as follows:

    part1/
    ├── build
    ├── Makefile
    └── src
        └── tutorial01.ml
    

    First application

    First, create file src/tutorial01.ml:

    let _ =
      let llctx = Llvm.global_context () in
      let llmem = Llvm.MemoryBuffer.of_file Sys.argv.(1) in
      let …
    read more
  4. Materials for my talk at SSTIC 2015 - PICON : Control Flow Integrity on LLVM IR

    Here are the materials for the talk PICON : Control Flow Integrity on LLVM IR, given during SSTIC 2015. While SSTIC is a french-speaking conference, I publish here in English because my other posts also are in English.

    Here is the summary, from the website:

    Control flow integrity has been a well explored field of software security for more than a decade.

    However, most of the proposed approaches are stalled in a proof of concept state - when the implementation is publicly available - or have been designed with a minimal performance overhead as their primary objective, sacrificing security.

    Currently, none of the proposed approaches can be used to fully protect real-world programs compiled with most common compilers (e.g. GCC, Clang/LLVM).

    In this paper we describe a control flow integrity enforcement mechanism whose main objective is security. Our approach is based on compile-time code instrumentation, making the program communicate with its external execution monitor. The program is terminated by the monitor as soon as a control flow integrity violation is detected.

    Our approach is implemented as an LLVM plugin and is working on LLVM’s Intermediate Representation.

    Code is currently being published (with an opensource …

    read more
  5. gcc security features (part 2)

    (See part 1)

    Remember: you must compile with -02 if you want the checks to be effective

    DEB_BUILD_HARDENING_FORTIFY (gcc/g++ -D_FORTIFY_SOURCE=2)

    The idea behind FORTIFY_SOURCE is relatively simple: there are cases where the compiler can know the size of a buffer (if it’s a fixed sized buffer on the stack, as in the example, or if the buffer just came from a malloc() function call). With a known buffer size, functions that operate on the buffer can make sure the buffer will not overflow.

    Example:

    void foo(char *string)
    {
        char buf[20];
        strcpy(buf, string);
    }
    

    Execution will fail:

    [home ~/harden] ./bad $(perl -e 'print "A"x100')
    zsh: segmentation fault  ./bad $(perl -e 'print "A"x100')
    

    When compiling with -D_FORTIFY_SOURCE=2, gcc will add some checks to detect the overflow and terminate the program:

    [home ~/harden] DEB_BUILD_HARDENING=1 make
    [home ~/harden] ./bad $(perl -e 'print "A"x100')
    
    *** buffer overflow detected ***: ./bad terminated
    ======= Backtrace: =========
    /lib/libc.so.6(__fortify_fail+0x37)[0x2ba8d18fb787]
    /lib/libc.so.6[0x2ba8d18f9e70]
    ./bad(main+0x26)[0x555555554856]
    /lib/libc.so.6(__libc_start_main+0xf4)[0x2ba8d18411c4]
    ./bad[0x555555554789]
    ======= Memory map:  ========
    2ba8d1607000-2ba8d1622000 r-xp 00000000 03:01 468316                     /lib/ld-2.7.so
    2ba8d1622000-2ba8d1625000 rw-p 2ba8d1622000 00:00 0 
    2ba8d1821000-2ba8d1823000 rw-p 0001a000 …
    read more
  6. gcc security features (part 1)

    Since recent versions (>= 4.0, maybe before), gcc (and ld) has some nice security features. Debian has created a wrapper for the toolchain, to make the use of these features easy.

    To install the wrapper, run:

    apt-get install hardening-wrapper
    

    To enable the hardening features, you have to export the environment variable:

    export DEB_BUILD_HARDENING=1
    

    The features include additional checks for printf-like functions, stack protector, using address-space layout randomization (ASLR), marking ELF-sections as read-only after loading when possible, etc.

    Please note that you must compile with *-02* if you want the checks to be effective

    DEB_BUILD_HARDENING_FORMAT (gcc/g++ -Wformat -Wformat-security)

    Ask gcc to make additional checks on format strings, to prevent attacks.

    The following code, for ex:

    printf(buf);
    

    will result in a warning:

    [home ~/harden] DEB_BUILD_HARDENING=1 make
    gcc     bad.c   -o bad
    bad.c: In function ‘main’:
    bad.c:10: warning: format not a string literal and no format arguments
    

    Why is this code vulnerable ? Because the buffer (buf) could contain format characters like %s, and the printf function will interpret these characters to pop arguments from the stack, and can result in the execution of arbitrary code.

    Solution:

    • Replace previous code by
    printf("%s",buf);
    
    • Remember this …
    read more
  7. Sections and variables initialization

    Default init

    ANSI C requires all uninitialized static and global variables to be initialized with 0 (§6.7.8 of the C99 definition). This means you can rely on the following behavior:

    int global;
    void function() {
      printf("%d\n",global);
    }
    

    This will print 0, and it is guaranteed by the standard.

    However, this is not handled by the compiler. All you will be able to see is that the variable is put in the bss section:

    08049560 l     O .bss   00000004              static_var.1279
    08049564 g     O .bss   00000004              global_var
    

    It is the startup code of the linker which initializes the variables.

    The C compiler usually puts variables that are supposed to be initialized with 0 in the .bss section instead of the .data section. Opposed to the .data section, the .bss section does not contain actual data, it just specifies the size of all elements it contains. The C compiler just *assumes* that the linker, loader, or the startup code of the C library initializes this block of memory with 0. This is an optimization; .data elements occupy space in the image (or ROM or flash memory) and in RAM whereas .bss elements need to occupy RAM space only if …

    read more

Page 1 / 1