cle — Binary Loader

CLE is an extensible binary loader. Its main goal is to take an executable program and any libraries it depends on and produce an address space where that program is loaded and ready to run.

The primary interface to CLE is the Loader class.

Loading Interface

class cle.loader.Loader(main_binary, auto_load_libs=True, force_load_libs=None, skip_libs=None, main_opts=None, lib_opts=None, custom_ld_path=None, ignore_import_version_numbers=True, rebase_granularity=16777216, except_missing_libs=False, gdb_map=None, gdb_fix=False, aslr=False)

The loader loads all the objects and exports an abstraction of the memory of the process. What you see here is an address space with loaded and rebased binaries.

Variables:
  • memory (cle.memory.Clemory) – The loaded, rebased, and relocated memory of the program.
  • main_bin – The object representing the main binary (i.e., the executable).
  • shared_objects – A dictionary mapping loaded library names to the objects representing them.
  • all_objects – A list containing representations of all the different objects loaded.
  • requested_objects – A set containing the names of all the different shared libraries that were marked as a dependency by somebody.
  • tls_object – An object dealing with the region of memory allocated for thread-local storage.

When reference is made to a dictionary of options, it requires a dictionary with zero or more of the following keys:

  • backend : “elf”, “pe”, “ida”, “blob” : which loader backend to use
  • custom_arch : The archinfo.Arch object to use for the binary
  • custom_base_addr : The address to rebase the object at
  • custom_entry_point : The entry point to use for the object

More keys are defined on a per-backend basis.

Parameters:main_binary – The path to the main binary you’re loading, or a file-like object with the binary in it.

The following parameters are optional.

Parameters:
  • auto_load_libs – Whether to automatically load shared libraries that loaded objects depend on.
  • force_load_libs – A list of libraries to load regardless of if they’re required by a loaded object.
  • skip_libs – A list of libraries to never load, even if they’re required by a loaded object.
  • main_opts – A dictionary of options to be used loading the main binary.
  • lib_opts – A dictionary mapping library names to the dictionaries of options to be used when loading them.
  • custom_ld_path – A list of paths in which we can search for shared libraries.
  • ignore_import_version_numbers – Whether libraries with different version numbers in the filename will be considered equivalent, for example libc.so.6 and libc.so.0
  • rebase_granularity – The alignment to use for rebasing shared objects
  • except_missing_libs – Throw an exception when a shared library can’t be found.
  • gdb_map – The output of info proc mappings or info sharedlibrary in gdb. This will be used to determine the base address of libraries.
  • gdb_fix – If info sharedlibrary was used, the addresses gdb gives us are in fact the addresses of the .text sections. We need to fix them to get the real load addresses.
  • aslr – Load libraries in symbolic address space.
get_initializers()

Return a list of all the initializers that should be run before execution reaches the entry point, in the order they should be run.

get_finalizers()

Return a list of all the finalizers that should be run before the program exits. I’m not sure what order they should be run in.

static load_object(path, options=None, compatible_with=None, is_main_bin=False)

Load a file with some backend. Try to identify the type of the file to autodetect which backend to use.

Parameters:path (str) – The path to the file to load

The following parameters are optional.

Parameters:
  • options (dict) – A dictionary of keyword arguments to the backend. Can contain a backend key to force the use of a specific backend
  • compatiable_with – Another backend object that this file must be compatible with. This method will throw a CLECompatibilityError if the file at the given path is not compatibile with this parameter.
  • is_main_bin (bool) – Whether this file is the main executable of whatever process we are loading
static identify_object(path)

Returns the correct loader for the file at path. Returns None if it’s a blob or some unknown type. TODO: Implement some binwalk-like thing to carve up blobs aotmatically

add_object(obj, base_addr=None)

Add object obj to the memory map, rebased at base_addr. If base_addr is None CLE will pick a safe one. Registers all its dependencies.

relocate()

Attemts to resolve all yet-unresolved relocations in all loaded objects. It is appropriate to call this repeatedly.

whats_at(addr)

Tells you what’s at addr in terms of the offset in one of the loaded binary objects.

max_addr()

The maximum address loaded as part of any loaded object (i.e., the whole address space).

min_addr()

The minimum address loaded as part of any loaded object (i.e., the whole address space).

find_symbol_name(addr)

Return the name of the function starting at addr.

find_plt_stub_name(addr)

Return the name of the PLT stub starting at addr.

find_module_name(addr)

Return the name of the loaded module containing addr.

find_symbol_got_entry(symbol)

Look for the address of a GOT entry for symbol.

Returns:The address of the symbol if found, None otherwise.

Backends

class cle.backends.Region(offset, vaddr, filesize, memsize)

A region of memory that is mapped in the object’s file.

Variables:
  • offset – The offset into the file the region starts.
  • vaddr – The virtual address.
  • filesize – The size of the region in the file.
  • memsize – The size of the region when loaded into memory.

The prefix v- on a variable or parameter name indicates that it refers to the virtual, loaded memory space, while a corresponding variable without the v- refers to the flat zero-based memory of the file.

When used next to each other, addr and offset refer to virtual memory address and file offset, respectively.

contains_addr(addr)

Does this region contain this virtual address?

contains_offset(offset)

Does this region contain this offset into the file?

addr_to_offset(addr)

Convert a virtual memory address into a file offset

offset_to_addr(offset)

Convert a file offset into a virtual memory address

max_addr

The maximum virtual address of this region

min_addr

The minimum virtual address of this region

max_offset

The maximum file offset of this region

min_offset()

The minimum file offset of this region

class cle.backends.Segment(offset, vaddr, filesize, memsize)

Simple representation of an ELF file segment.

class cle.backends.Section(name, offset, vaddr, size)

Simple representation of a loaded section.

Variables:

name (str) – The name of the section

Parameters:
  • name (str) – The name of the section
  • offset (int) – The offset into the binary file this section begins
  • vaddr (int) – The address in virtual memory this section begins
  • size (int) – How large this section is
is_readable

Whether this section has read permissions

is_writable

Whether this section has write permissions

is_executable

Whether this section has execute permissions

class cle.backends.Regions(lst=None)

A container class acting as a list of regions (sections or segments). Additionally, it keeps an sorted list of those regions to allow fast lookups.

We assume none of the regions overlap with others.

raw_list

Get the internal list. Any change to it is not tracked, and therefore _sorted_list will not be updated. Therefore you probably does not want to modify the list.

Returns:The internal list container.
Return type:list
max_addr

Get the highest address of all regions.

Returns:The highest address of all regions, or None if there is no region available.

rtype: int or None

append(region)

Append a new Region instance into the list.

Parameters:region (Region) – The region to append.
Returns:None
find_region_containing(addr)

Find the region that contains a specific address. Returns None if none of the regions covers the address.

Parameters:addr (int) – The address.
Returns:The region that covers the specific address, or None if no such region is found.
Return type:Region or None
class cle.backends.Backend(binary, is_main_bin=False, compatible_with=None, filename=None, **kwargs)

Main base class for CLE binary objects.

An alternate interface to this constructor exists as the static method cle.loader.Loader.load_object()

Variables:
  • binary – The path to the file this object is loaded from
  • is_main_bin – Whether this binary is loaded as the main executable
  • segments – A listing of all the loaded segments in this file
  • sections – A listing of all the demarked sections in the file
  • sections_map – A dict mapping from section name to section
  • symbols_by_addr – A mapping from address to Symbol
  • imports – A mapping from symbol name to import symbol
  • resolved_imports – A list of all the import symbols that are successfully resolved
  • relocs – A list of all the relocations in this binary
  • irelatives – A list of tuples representing all the irelative relocations that need to be performed. The first item in the tuple is the address of the resolver function, and the second item is the address of where to write the result. The destination address is not rebased.
  • jmprel – A mapping from symbol name to the address of its jump slot relocation, i.e. its GOT entry.
  • arch (archinfo.arch.Arch) – The architecture of this binary
  • os (str) – The operating system this binary is meant to run under
  • compatible_with – Another Backend object this object must be compatibile with, or None
  • rebase_addr (int) – The base address of this object in virtual memory
  • deps – A list of names of shared libraries this binary depends on
  • linking – ‘dynamic’ or ‘static’
  • requested_base – The base address this object requests to be loaded at, or None
  • pic (bool) – Whether this object is position-independent
  • execstack (bool) – Whether this executable has an executable stack
  • provides (str) – The name of the shared library dependancy that this object resolves
Parameters:
  • binary – The path to the binary to load
  • is_main_bin – Whether this binary should be loaded as the main executable
  • compatible_with – An optional Backend object to force compatibility with
contains_addr(addr)

Is addr in one of the binary’s segments/sections we have loaded? (i.e. is it mapped into memory ?)

find_segment_containing(addr)

Returns the segment that contains addr, or None.

find_section_containing(addr)

Returns the section that contains addr or None.

get_min_addr()

This returns the lowest virtual address contained in any loaded segment of the binary.

get_max_addr()

This returns the highest virtual address contained in any loaded segment of the binary.

set_got_entry(symbol_name, newaddr)

This overrides the address of the function defined by symbol with the new address newaddr. This is used to call simprocedures instead of actual code,

get_initializers()

Stub function. Should be overridden by backends that can provide initializer functions that ought to be run before execution reaches the entry point. Addresses should be rebased.

get_finalizers()

Stub function. Like get_initializers, but with finalizers.

get_symbol(name)

Stub function. Implement to find the symbol with name name.

class cle.backends.elf.ELFSymbol(owner, symb)

Represents a symbol for the ELF format.

Variables:
  • elftype (str) – The type of this symbol as an ELF enum string
  • binding (str) – The binding of this symbol as an ELF enum string
  • section – The section associated with this symbol, or None
class cle.backends.elf.ELF(binary, **kwargs)

The main loader class for statically loading ELF executables. Uses the pyreadelf library where useful.

get_symbol(symid, symbol_table=None)

Gets a Symbol object for the specified symbol.

Parameters:symid – Either an index into .dynsym or the name of a symbol.
class cle.backends.pe.PE(*args, **kwargs)

Representation of a PE (i.e. Windows) binary.

class cle.backends.blob.Blob(path, custom_arch=None, custom_offset=None, *args, **kwargs)

Representation of a binary blob, i.e. an executable in an unknown file format.

Parameters:
  • custom_arch – (required) an archinfo.Arch for the binary blob.
  • custom_offset – Skip this many bytes from the beginning of the file.
function_name(addr)

Blobs don’t support function names.

in_which_segment(addr)

Blobs don’t support segments.

class cle.backends.cgc.CGC(binary, *args, **kwargs)

Backend to support the CGC elf format used by the Cyber Grand Challenge competition.

See : https://github.com/CyberGrandChallenge/libcgcef/blob/master/cgc_executable_format.md

class cle.backends.backedcgc.BackedCGC(path, memory_backer=None, register_backer=None, writes_backer=None, permissions_map=None, current_allocation_base=None, *args, **kwargs)

This is a backend for CGC executables that allows user provide a memory backer and a register backer as the initial state of the running binary.

Parameters:
  • path – File path to CGC executable.
  • memory_backer – A dict of memory content, with beginning address of each segment as key and actual memory content as data.
  • register_backer – A dict of all register contents. EIP will be used as the entry point of this executable.
  • permissions_map – A dict of memory region to permission flags
  • current_allocation_base – An integer representing the current address of the top of the CGC heap.
class cle.backends.metaelf.MetaELF(*args, **kwargs)

A base class that implements functions used by all backends that can load an ELF.

plt

Maps names to addresses.

reverse_plt

Maps addresses to names.

get_call_stub_addr(name)

Takes the name of an imported function and returns the address of the stub function that jumps to it.

is_ppc64_abiv1

Returns whether the arch is powerpc64 ABIv1.

Returns:True if powerpc64 ABIv1, False otherwise.
class cle.backends.elfcore.CoreNote(n_type, name, desc)

This class is used when parsing the NOTES section of a core file.

class cle.backends.elfcore.ELFCore(binary, **kwargs)

Loader class for ELF core files.

class cle.backends.idabin.IDABin(binary, *args, **kwargs)

Get information from binaries using IDA.

in_which_segment(addr)

Return the segment name at address addr (IDA).

function_name(addr)

Return the function name at address addr (IDA).

get_symbol_addr(sym)

Get the address of the symbol sym from IDA.

Returns:An address.
get_min_addr()

Get the min address of the binary (IDA).

get_max_addr()

Get the max address of the binary (IDA).

resolve_import_dirty(sym, new_val)

Resolve import for symbol sym the dirty way, i.e. find all references to it in the code and replace it with the address new_val inline (instead of updating GOT slots). Don’t use this unless you really have to, use resolve_import_with() instead.

set_got_entry(name, newaddr)

Resolve import name with address newaddr. That is, update the GOT entry for name with newaddr.

is_thumb(addr)

Is the address addr in thumb mode ? (ARM).

get_strings()

Extract strings from binary (IDA).

Returns:An array of strings.

Relocations

CLE’s loader implements program relocation data on a plugin basis. If you would like to add more relocation implementations, do so by subclassing the Relocation class and overriding any relevant methods or properties. Put your subclasses in a module in the relocations package. The name of the subclass will be used to determine when to use it! Look at the existing versions for details.

class cle.backends.relocations.Relocation(owner, symbol, addr, addend=None)

A representation of a relocation in a binary file. Smart enough to relocate itself.

Variables:
  • owner_obj – The binary this relocation was originaly found in, as a cle object
  • symbol – The Symbol object this relocation refers to
  • addr – The address in owner_obj this relocation would like to write to
  • rebased_addr – The address in the global memory space this relocation would like to write to
  • resolvedby – If the symbol this relocation refers to is an import symbol and that import has been resolved, this attribute holds the symbol from a different binary that was used to resolve the import.
  • resolved – Whether the application of this relocation was succesful
relocate(solist, bypass_compatibility=False)

Applies this relocation. Will make changes to the memory object of the object it came from.

This implementation is a generic version that can be overridden in subclasses.

Parameters:solist – A list of objects from which to resolve symbols.

Thread-local storage

class cle.tls.TLSObj(modules, filetype='unknown')

CLE implements thread-local storage by treating the TLS region as another object to be loaded. Because of the complex interactions between TLS and all the other objects that can be loaded into memory, each TLS object will perform some basic initialization when instanciated, and then once all other objects have been loaded, finalize() is called.

finalize()

Lay out the TLS initialization images into memory.

class cle.tls.elf_tls.ELFTLSObj(modules)

This class is used when parsing the Thread Local Storage of an ELF binary. It heavily uses the TLSArchInfo namedtuple from archinfo.

ELF TLS is implemented based on the following documents:

thread_pointer

The thread pointer. This is a technical term that refers to a specific location in the TLS segment.

user_thread_pointer

The thread pointer that is exported to the user

get_addr(module_id, offset)

basically __tls_get_addr.

class cle.tls.pe_tls.PETLSObj(modules)

This class is used when parsing the Thread Local Storage of a PE binary. It represents both the TLS array and the TLS data area for a specific thread.

In memory the PETLSObj is laid out as follows:

+----------------------+---------------------------------------+
| TLS array            | TLS data area                         |
+----------------------+---------------------------------------+

A more detailed description of the TLS array and TLS data areas is given below.

TLS array

The TLS array is an array of addresses that points into the TLS data area. In memory it is laid out as follows:

+-----------+-----------+-----+-----------+
|  address  |  address  | ... |  address  |
+-----------+-----------+-----+-----------+
| index = 0 | index = 1 |     | index = n |
+-----------+-----------+-----+-----------+

The size of each address is architecture independant (e.g. on X86 it is 4 bytes). The number of addresses in the TLS array is equal to the number of modules that contain TLS data. At load time (i.e. in the finalize method), each module is assigned an index into the TLS array. The address of this module’s TLS data area is then stored at this location in the array.

TLS data area

The TLS data area directly follows the TLS array and contains the actual TLS data for each module. In memory it is laid out as follows:

+----------+-----------+----------+-----------+-----+
| TLS data | zero fill | TLS data | zero fill | ... |
+----------+-----------+----------+-----------+-----+
|       module a       |       module b       | ... |
+---------------------------------------------------+

The size of each module’s TLS data area is variable and can be found in the module’s tls_data_size property. The same applies to the zero fill. At load time (i.e in the finalize method), the initial TLS data values are copied into the TLS data area. Because a TLS index is also assigned to each module, we can access a module’s TLS data area using this index into the TLS array to get the start address of the TLS data.

get_tls_data_addr(tls_idx)

Get the start address of a module’s TLS data area via the module’s TLS index.

From the PE/COFF spec:

The code uses the TLS index and the TLS array location (multiplying the index by the word size and using it as an offset into the array) to get the address of the TLS data area for the given program and module.

Misc. Utilities

exception cle.errors.CLEError

Base class for errors raised by CLE.

exception cle.errors.CLEUnknownFormatError

Error raised when CLE encounters an unknown executable file format.

exception cle.errors.CLEFileNotFoundError

Error raised when a file does not exist.

exception cle.errors.CLEInvalidBinaryError

Error raised when an executable file is invalid or corrupted.

exception cle.errors.CLEOperationError

Error raised when a problem is encountered in the process of loading an executable.

exception cle.errors.CLECompatibilityError

Error raised when loading an executable that is not currently supported by CLE.

class cle.memory.Clemory(arch, root=False)

An object representing a memory space. Uses “backers” and “updates” to separate the concepts of loaded and written memory and make lookups more efficient.

Accesses can be made with [index] notation.

add_backer(start, data)

Adds a backer to the memory.

Parameters:
  • start – The address where the backer should be loaded.
  • data – The backer itself. Can be either a string or another Clemory.
read_bytes(addr, n, orig=False)

Read up to n bytes at address addr in memory and return an array of bytes.

Reading will stop at the beginning of the first unallocated region found, or when n bytes have been read.

write_bytes(addr, data)

Write bytes from data at address addr.

write_bytes_to_backer(addr, data)

Write bytes from data at address addr to backer instead of self._updates. This is only needed when writing a huge amount of data.

read_addr_at(where, orig=False)

Read addr stored in memory as a series of bytes starting at where.

write_addr_at(where, addr)

Writes addr into a series of bytes in memory at where.

stride_repr

Returns a representation of memory in a list of (start, end, data) where data is a string.

seek(value)

The stream-like function that sets the “file’s” current position. Use with read().

Parameters:value – The position to seek to.
read(nbytes)

The stream-like function that reads up to a number of bytes starting from the current position and updates the current position. Use with seek().

Up to nbytes bytes will be read, halting at the beginning of the first unmapped region encountered.

cbackers

This function directly returns a list of already-flattened cbackers. It’s designed for performance purpose. GirlScout uses it. Use this property at your own risk!

read_bytes_c(addr)

Read n bytes at address addr in cbacked memory, and returns a cffi buffer pointer.

Note: We don’t support reading across segments for performance concerns.

class cle.patched_stream.PatchedStream(stream, patches)

An object that wraps a readable stream, performing passthroughs on seek and read operations, except to make it seem like the data has actually been patched by the given patches.

Parameters:
  • stream – The stream to patch
  • patches – A list of tuples of (addr, patch data)