archinfo — Arch Information Repository

archinfo is a collection of classes that contain architecture-specific information. It is useful for cross-architecture tools (such as pyvex).

class archinfo.arch.Endness

Endness specifies the byte order for integer values

Variables:
  • LE – little endian, least significant byte is stored at lowest address
  • BE – big endian, most significant byte is stored at lowest address
class archinfo.arch.Arch(endness)

A collection of information about a given architecture. This class should be subclasses for each different architecture, and then that subclass should be registered with the register_arch method.

A good number of assumptions are made that code is being processed under the VEX IR - for instance, it is expected the register file offsets are expected to match code generated by PyVEX.

Arches maybe compared with == and !=.

Variables:
  • name (str) – The name of the arch
  • bits (int) – The number of bits in a word
  • vex_arch (str) – The VEX enum name used to identify this arch
  • qemu_name (str) – The name used by QEMU to identify this arch
  • ida_processor (str) – The processor string used by IDA to identify this arch
  • triplet (str) – The triplet used to identify a linux system on this arch
  • max_inst_bytes (int) – The maximum number of bytes in a single instruction
  • ip_offset (int) – The offset of the instruction pointer in the register file
  • sp_offset (int) – The offset of the stack pointer in the register file
  • bp_offset (int) – The offset of the base pointer in the register file
  • lr_offset (int) – The offset of the link register (return address) in the register file
  • ret_offset (int) – The offset of the return value register in the register file
  • vex_conditional_helpers (bool) – Whether libVEX will generate code to process the conditional flags for this arch using ccalls
  • syscall_num_offset (int) – The offset in the register file where the syscall number is stored
  • call_pushes_ret (bool) – Whether this arch’s call instruction causes a stack push
  • stack_change (int) – The change to the stack pointer caused by a push instruction
  • memory_endness (str) – The endness of memory, as a VEX enum
  • register_endness (str) – The endness of registers, as a VEX enum. Should usually be same as above
  • sizeof (dict) – A mapping from C type to variable size in bits
  • cs_arch – The capstone arch value for this arch
  • cs_mode – The capstone mode value for this arch
  • uc_arch – The unicorn engine arch value for this arch
  • uc_mode – The unicorn engine mode value for this arch
  • uc_const – The unicorn engine constants module for this arch
  • uc_prefix – The prefix used for variables in the unicorn engine constants module
  • function_prologs (list) – A list of regular expressions matching the bytes for common function prologues
  • function_epilogs (list) – A list of regular expressions matching the bytes for common function epilogues
  • ret_instruction (str) – The bytes for a return instruction
  • nop_instruction (str) – The bytes for a nop instruction
  • instruction_alignment (int) – The instruction alignment requirement
  • default_register_values (list) – A weird listing describing how registers should be initialized for purposes of sanity
  • entry_register_values (dict) – A mapping from register name to a description of the value that should be in it at program entry on linux
  • default_symbolic_register (list) – Honestly, who knows what this is supposed to do. Fill it with the names of the general purpose registers.
  • register_names (dict) – A mapping from register file offset to register name
  • registers (dict) – A mapping from register name to a tuple of (register file offset, size in bytes)
  • lib_paths (list) – A listing of common locations where shared libraries for this architecture may be found
  • got_section_name (str) – The name of the GOT section in ELFs
  • ld_linux_name (str) – The name of the linux dynamic loader program
  • elf_tls (TLSArchInfo) – A description of how thread-local storage works
copy()

Produce a copy of this instance of this arch.

struct_fmt(size=None)

Produce a format string for use in python’s struct module.

Optionally, the size parameter can specify the width of the int to store.

bytes

The standard word size in bytes, calculated from the bits field

capstone

A capstone instance for this arch

unicorn

A unicorn engine instance for this arch

library_search_path(pedantic=False)

A list of paths in which to search for shared libraries.

archinfo.arch.register_arch(regexes, bits, endness, my_arch)

Register a new architecture. Architectures are loaded by their string name using arch_from_id(), and this defines the mapping it uses to figure it out. Takes a list of regular expressions, and an Arch class as input.

Parameters:
  • regexes (list) – List of regular expressions (str or SRE_Pattern)
  • bits (int) – The canonical “bits” of this architecture, ex. 32 or 64
  • endness (str) – The “endness” of this architecture. Use Endness.LE, Endness.BE, or “any”
  • my_arch (Arch) –
Returns:

None

archinfo.arch.arch_from_id(ident, endness='any', bits='')

Take our best guess at the arch referred to by the given identifier, and return an instance of its class.

You may optionally provide the endness and bits parameters (strings) to help this function out.

archinfo.arch.get_host_arch()

Return the arch of the machine we are currently running on.