129 lines
5.3 KiB
Plaintext
129 lines
5.3 KiB
Plaintext
Metasm, the Ruby assembly manipulation suite
|
|
============================================
|
|
|
|
* You have some samples in samples/
|
|
* LICENCE is LGPL
|
|
|
|
Author: Yoann Guillot <yoann at ofjj.net>
|
|
|
|
|
|
Basic overview:
|
|
|
|
Metasm allows you to interact with executables formats (ExeFormat):
|
|
PE, ELF, Shellcode, etc
|
|
There are three approaches of an ExeFormat:
|
|
- compiling one up, from scratch ( -> source)
|
|
- decompiling an existing format ( -> blocks)
|
|
- manipulating the file structure( -> encoded)
|
|
|
|
|
|
Assembly:
|
|
|
|
When compiling, you start from a source text (ruby String, consisting
|
|
mostly in a sequence of instructions/data/padding directive), then you parse
|
|
it.
|
|
The string is handed to a Preprocessor (which handles #if, #ifdef, #include,
|
|
#define, comments etc, almost 100% compatible with gcc -E), which is
|
|
encapsulated in an AsmPreprocessor (which handles asm macro definitions, equ and
|
|
asm comments).
|
|
This AsmPreprocessor returns tokens to the ExeFormat, which parses them as Data,
|
|
Padding, Labels or parser directives. Parser directives always start with a dot.
|
|
They can be generic (.pad, .offset...) or ExeFormat-specific (.section,
|
|
.import...).
|
|
If the ExeFormat does not recognize a word, it hands it to its CPU instance,
|
|
which is responsible for parsing Instructions, or raise an exception.
|
|
All these tokens are stored in one or more arrays in the @source attribute of
|
|
the ExeFormat (Shellcode's @source is an Array, for PE/ELF it is a hash of
|
|
section name => Array)
|
|
Every immediate value can be an arbitrary Expression (see later).
|
|
|
|
You can then assemble the source to binary sections.
|
|
|
|
ExeFormat has a constructor to do that: ExeFormat.assemble(cpu, source)
|
|
it parses the source, assemble it, and return the ExeFormat instance.
|
|
|
|
|
|
EncodedData:
|
|
|
|
In Metasm all binary data is stored as an EncodedData.
|
|
EncodedData has 3 main attributes:
|
|
- @data which holds the raw binary data (generally a ruby String, but see
|
|
VirtualString)
|
|
- @export which is a hash associating an export name (label name) to an offset
|
|
within @data
|
|
- @reloc which is a hash whose keys are offsets within @data, and whose values
|
|
are Relocation objects.
|
|
A Relocation object has an endianness (:little/:big), a sign (:signed/:unsigned/:any),
|
|
a size (in bits) and a target.
|
|
The target is an arbitrary arithmetic/logic Expression.
|
|
|
|
EncodedData also has a @virtualsize (for e.g. .bss sections), and a @ptr (used
|
|
when decoding things)
|
|
|
|
You can fixup an EncodedData, with a Hash variable name => value (value should
|
|
be an Expression or a numeric value). When you do that, each relocation's target
|
|
is bound using the binding, and if the result is calculable (no external variable
|
|
name used in the Expression), the result is encoded using the relocation's
|
|
size/sign/endianness information. If it overflows (try to store 128 in an 8bit
|
|
signed relocation), an EncodeError exception is raised.
|
|
If the relocation's target is not numeric, the target is unchanged if you use
|
|
EncodedData#fixup, or it is replaced with the bound target with #fixup! .
|
|
|
|
|
|
Desassembly: (experimental)
|
|
|
|
When decompiling, you start from a decoded ExeFormat (you need to be able to
|
|
say what data is at which virtual address), you specify a virtual address to
|
|
start (virtual address or export name). The ExeFormat starts disassembling
|
|
instructions. When it encounters an Opcode marked as :setip, it calls the CPU
|
|
to find the jump destination, and backtracks instructions until it finds the
|
|
numeric value.
|
|
The disassembled code is stored as InstructionBlocks, whichs holds a list of
|
|
DecodedInstruction, a list of @from and @to (array of block addresses)
|
|
A DecodedInstruction has an Instruction, an Opcode and a bin_length (to allow
|
|
printing the hex dump)
|
|
(experimental for now, does not handle external calls, does not handle well
|
|
subfunctions, should only be used on small shellcodes)
|
|
|
|
Constructor: Shellcode.disassemble(cpu, binary)
|
|
|
|
|
|
ExeFormat manipulation:
|
|
|
|
You can encode/decode an ExeFormat (ie decode sections, imports, headers etc)
|
|
|
|
Constructor: ExeFormat.decode_file(str), ExeFormat.decode_file_header(str)
|
|
Methods: ExeFormat#encode_file(filename), ExeFormat#encode_string
|
|
|
|
|
|
VirtualString:
|
|
|
|
A VirtualString is an object String-like : you can read/maybe write slices of
|
|
it. It can be used as @data in an EncodedData, and thus allows virtualization
|
|
of most Metasm algorithms.
|
|
You cannot change a VirtualString length.
|
|
Taking a slice of a VirtualString can return either a String (length smaller
|
|
than 4096) or another VirtualString. You can force getting a small VirtualString
|
|
using the #dup(from, length) method.
|
|
Any unimplemented method called on it is forwarded to frozen String which is
|
|
a full copy of the VirtualString (should generally not be used).
|
|
|
|
There are currently 3 VirtualStrings implemented:
|
|
- VirtualFile, whichs loads a file by 4096-bytes chunks, on demand,
|
|
- WindowsRemoteString, which maps another process' virtual memory (uses windows
|
|
debug api)
|
|
- LinuxRemoteString, which maps another process' virtual memory (need ptrace
|
|
rights, memory reading is done using /proc/pid/mem)
|
|
|
|
The Win/Lin version are quite powerful, and allow things like live process
|
|
disassembly/patching easily (use LoadedPE/LoadedELF as ExeFormat)
|
|
|
|
|
|
Things planned:
|
|
|
|
Write a C parser (at least for headers), and adding syntax to support C structs
|
|
in assembly.
|
|
Write a good disassembler, supporting external calls through C header parsing,
|
|
recognize/handle sub functions.
|
|
Write an UI for dasm
|