archinfo — Arch Information Repository

archinfo is a collection of classes that contain architecture-specific information. It is useful for cross-architecture tools (such as pyvex).

Architectures

class archinfo.arch.Endness

Endness specifies the byte order for integer values

Variables:
  • LE – little endian, least significant byte is stored at lowest address
  • BE – big endian, most significant byte is stored at lowest address
  • ME – Middle-endian. Yep.
class archinfo.arch.Register(name, size, vex_offset=None, vex_name=None, subregisters=None, alias_names=None, general_purpose=False, floating_point=False, vector=False, argument=False, persistent=False, default_value=None, linux_entry_value=None, concretize_unique=False)

A collection of information about a register. Each different architecture has its own list of registers, which is the base for all other register-related collections.

It is, just like for Arch object, assumed that the information is compatible with PyVEX.

Variables:
  • name (str) – The name of the register
  • size (int) – The size of the register (in bytes)
  • vex_offset (int) – The VEX offset used to identify this register
  • vex_name (str) – The name libVEX uses to identify the register
  • subregisters (list) – The list of subregisters in the form (name, offset from vex_offset, size)
  • alias_names (tuple) – The list of possible alias names
  • general_purpose (bool) – Whether this is a general purpose register
  • floating_point (bool) – Whether this is a floating-point register
  • vector (bool) – Whether this is a vector register
  • argument (bool) – Whether this is an argument register
  • persistent (bool) – Whether this is a persistent register
  • default_value (tuple) – The offset of the instruction pointer in the register file
  • str linux_entry_value (int,) – The offset of the instruction pointer in the register file
  • concretize_unique (bool) – Whether this register should be concretized, if unique, at the end of each block
class archinfo.arch.Arch(endness, instruction_endness=None)

A collection of information about a given architecture. This class should be subclasses for each different architecture, and then that subclass should be registered with the register_arch method.

A good number of assumptions are made that code is being processed under the VEX IR - for instance, it is expected the register file offsets are expected to match code generated by PyVEX.

Arches maybe compared with == and !=.

Variables:
  • name (str) – The name of the arch
  • bits (int) – The number of bits in a word
  • vex_arch (str) – The VEX enum name used to identify this arch
  • qemu_name (str) – The name used by QEMU to identify this arch
  • ida_processor (str) – The processor string used by IDA to identify this arch
  • triplet (str) – The triplet used to identify a linux system on this arch
  • max_inst_bytes (int) – The maximum number of bytes in a single instruction
  • ip_offset (int) – The offset of the instruction pointer in the register file
  • sp_offset (int) – The offset of the stack pointer in the register file
  • bp_offset (int) – The offset of the base pointer in the register file
  • lr_offset (int) – The offset of the link register (return address) in the register file
  • ret_offset (int) – The offset of the return value register in the register file
  • vex_conditional_helpers (bool) – Whether libVEX will generate code to process the conditional flags for this arch using ccalls
  • syscall_num_offset (int) – The offset in the register file where the syscall number is stored
  • call_pushes_ret (bool) – Whether this arch’s call instruction causes a stack push
  • stack_change (int) – The change to the stack pointer caused by a push instruction
  • memory_endness (str) – The endness of memory, as a VEX enum
  • register_endness (str) – The endness of registers, as a VEX enum. Should usually be same as above
  • instruction_endness (str) – The endness of instructions stored in memory. In other words, this controls whether instructions are stored endian-flipped compared to their description in the ISA manual, and should be flipped when lifted. Iend_BE means “don’t flip” NOTE: Only used for non-libVEX lifters.
  • sizeof (dict) – A mapping from C type to variable size in bits
  • cs_arch – The Capstone arch value for this arch
  • cs_mode – The Capstone mode value for this arch
  • ks_arch – The Keystone arch value for this arch
  • ks_mode – The Keystone mode value for this arch
  • uc_arch – The Unicorn engine arch value for this arch
  • uc_mode – The Unicorn engine mode value for this arch
  • uc_const – The Unicorn engine constants module for this arch
  • uc_prefix – The prefix used for variables in the Unicorn engine constants module
  • function_prologs (list) – A list of regular expressions matching the bytes for common function prologues
  • function_epilogs (list) – A list of regular expressions matching the bytes for common function epilogues
  • ret_instruction (str) – The bytes for a return instruction
  • nop_instruction (str) – The bytes for a nop instruction
  • instruction_alignment (int) – The instruction alignment requirement
  • default_register_values (list) – A weird listing describing how registers should be initialized for purposes of sanity
  • entry_register_values (dict) – A mapping from register name to a description of the value that should be in it at program entry on linux
  • default_symbolic_register (list) – Honestly, who knows what this is supposed to do. Fill it with the names of the general purpose registers.
  • register_names (dict) – A mapping from register file offset to register name
  • registers (dict) – A mapping from register name to a tuple of (register file offset, size in bytes)
  • lib_paths (list) – A listing of common locations where shared libraries for this architecture may be found
  • got_section_name (str) – The name of the GOT section in ELFs
  • ld_linux_name (str) – The name of the linux dynamic loader program
  • byte_width (int) – the number of bits in a byte.
  • elf_tls (TLSArchInfo) – A description of how thread-local storage works
copy()

Produce a copy of this instance of this arch.

struct_fmt(size=None)

Produce a format string for use in python’s struct module.

Optionally, the size parameter can specify the width of the int to store.

bytes

The standard word size in bytes, calculated from the bits field

capstone

A Capstone instance for this arch

unicorn

A Unicorn engine instance for this arch

asm(string, addr=0, as_bytes=True, thumb=False)

Compile the assembly instruction represented by string using Keystone

Parameters:
  • string – The textual assembly instructions, separated by semicolons
  • addr – The address at which the text should be assembled, to deal with PC-relative access. Default 0
  • as_bytes – Set to False to return a list of integers instead of a python byte string
  • thumb – If working with an ARM processor, set to True to assemble in thumb mode.
Returns:

The assembled bytecode

library_search_path(pedantic=False)

A list of paths in which to search for shared libraries.

vex_support

Whether the architecture is supported by VEX or not.

Returns:True if this Arch is supported by VEX, False otherwise.
Return type:bool
unicorn_support

Whether the architecture is supported by Unicorn engine or not,

Returns:True if this Arch is supported by the Unicorn engine, False otherwise.
Return type:bool
capstone_support

Whether the architecture is supported by the Capstone engine or not.

Returns:True if this Arch is supported by the Capstone engine, False otherwise.
Return type:bool
keystone_support

Whether the architecture is supported by the Keystone engine or not.

Returns:True if this Arch is supported by the Keystone engine, False otherwise.
Return type:bool
archinfo.arch.register_arch(regexes, bits, endness, my_arch)

Register a new architecture. Architectures are loaded by their string name using arch_from_id(), and this defines the mapping it uses to figure it out. Takes a list of regular expressions, and an Arch class as input.

Parameters:
  • regexes (list) – List of regular expressions (str or SRE_Pattern)
  • bits (int) – The canonical “bits” of this architecture, ex. 32 or 64
  • endness (str or None) – The “endness” of this architecture. Use Endness.LE, Endness.BE, Endness.ME, “any”, or None if the architecture has no intrinsic endianness.
  • my_arch (class) –
Returns:

None

archinfo.arch.arch_from_id(ident, endness='any', bits='')

Take our best guess at the arch referred to by the given identifier, and return an instance of its class.

You may optionally provide the endness and bits parameters (strings) to help this function out.

archinfo.arch.get_host_arch()

Return the arch of the machine we are currently running on.

class archinfo.arch_aarch64.ArchAArch64(endness='Iend_LE')
class archinfo.arch_amd64.ArchAMD64(endness='Iend_LE')
capstone_x86_syntax

Get the current syntax Capstone uses for x86. It can be ‘intel’ or ‘at&t’

Returns:Capstone’s current x86 syntax
Return type:str
keystone_x86_syntax

Get the current syntax Keystone uses for x86. It can be ‘intel’, ‘at&t’, ‘nasm’, ‘masm’, ‘gas’ or ‘radix16’

Returns:Keystone’s current x86 syntax
Return type:str
asm(string, addr=0, as_bytes=True, thumb=False)

Compile the assembly instruction represented by string using Keystone

Parameters:
  • string – The textual assembly instructions, separated by semicolons
  • addr – The address at which the text should be assembled, to deal with PC-relative access. Default 0
  • as_bytes – Set to False to return a list of integers instead of a python byte string
  • thumb – If working with an ARM processor, set to True to assemble in thumb mode.
Returns:

The assembled bytecode

class archinfo.arch_arm.ArchARM(endness='Iend_LE')
asm(string, addr=0, as_bytes=True, thumb=False)

Compile the assembly instruction represented by string using Keystone

Parameters:
  • string – The textual assembly instructions, separated by semicolons
  • addr – The address at which the text should be assembled, to deal with PC-relative access. Default 0
  • as_bytes – Set to False to return a list of integers instead of a python byte string
  • thumb – If working with an ARM processor, set to True to assemble in thumb mode.
Returns:

The assembled bytecode

class archinfo.arch_arm.ArchARMHF(endness='Iend_LE')
class archinfo.arch_arm.ArchARMEL(endness='Iend_LE')
class archinfo.arch_mips32.ArchMIPS32(endness='Iend_BE')
class archinfo.arch_mips64.ArchMIPS64(endness='Iend_BE')
class archinfo.arch_ppc32.ArchPPC32(endness='Iend_LE')
class archinfo.arch_ppc64.ArchPPC64(endness='Iend_LE')
class archinfo.arch_x86.ArchX86(endness='Iend_LE')
capstone_x86_syntax

Get the current syntax Capstone uses for x86. It can be ‘intel’ or ‘at&t’

Returns:Capstone’s current x86 syntax
Return type:str
keystone_x86_syntax

Get the current syntax Keystone uses for x86. It can be ‘intel’, ‘at&t’, ‘nasm’, ‘masm’, ‘gas’ or ‘radix16’

Returns:Keystone’s current x86 syntax
Return type:str
asm(string, addr=0, as_bytes=True, thumb=False)

Compile the assembly instruction represented by string using Keystone

Parameters:
  • string – The textual assembly instructions, separated by semicolons
  • addr – The address at which the text should be assembled, to deal with PC-relative access. Default 0
  • as_bytes – Set to False to return a list of integers instead of a python byte string
  • thumb – If working with an ARM processor, set to True to assemble in thumb mode.
Returns:

The assembled bytecode

class archinfo.arch_soot.ArchSoot(endness='Iend_BE')
library_search_path(pedantic=False)

A list of paths in which to search for shared libraries.

Utilities

class archinfo.tls.TLSArchInfo(variant, tcbhead_size, head_offsets, dtv_offsets, pthread_offsets, tp_offset, dtv_entry_offset)

Create new instance of TLSArchInfo(variant, tcbhead_size, head_offsets, dtv_offsets, pthread_offsets, tp_offset, dtv_entry_offset)

dtv_entry_offset

Alias for field number 6

dtv_offsets

Alias for field number 3

head_offsets

Alias for field number 2

pthread_offsets

Alias for field number 4

tcbhead_size

Alias for field number 1

tp_offset

Alias for field number 5

variant

Alias for field number 0

Errors

exception archinfo.archerror.ArchError