ESIL
最后更新于
这有帮助吗?
ESIL stands for 'Evaluable Strings Intermediate Language'. It aims to describe a -like representation for every target CPU opcode semantics. ESIL representations can be evaluated (interpreted) in order to emulate individual instructions. Each command of an ESIL expression is separated by a comma. Its virtual machine can be described as this:
As we can see ESIL uses a stack-based interpreter similar to what is commonly used for calculators. You have two categories of inputs: values and operators. A value simply gets pushed on the stack, an operator then pops values (its arguments if you will) off the stack, performs its operation and pushes its results (if any) back on. We can think of ESIL as a post-fix notation of the operations we want to do.
So let's see an example:
Can you guess what this is? If we take this post-fix notation and transform it back to in-fix we get
We can see that this corresponds to the x86 instruction push ebp
! Isn't that cool? The aim is to be able to express most of the common operations performed by CPUs, like binary arithmetic operations, memory loads and stores, processing syscalls. This way if we can transform the instructions to ESIL we can see what a program does while it is running even for the most cryptic architectures you definitely don't have a device to debug on for.
r2's visual mode is great to inspect the ESIL evaluations.
There are 2 environment variables that are important for watching what a program does:
asm.emu
tells r2 if you want ESIL information to be displayed. If it is set to true, you will see comments appear to the right of your disassembly that tell you how the contents of registers and memory addresses are changed by the current instruction. For example, if you have an instruction that subtracts a value from a register it tells you what the value was before and what it becomes after. This is super useful so you don't have to sit there yourself and track which value goes where.
One problem with this is that it is a lot of information to take in at once and sometimes you simply don't need it. r2 has a nice compromise for this. That is what the emu.str
variable is for (asm.emustr
on <= 2.2). Instead of this super verbose output with every register value, this only adds really useful information to the output, e.g., strings that are found at addresses a program uses or whether a jump is likely to be taken or not.
The third important variable is asm.esil
. This switches your disassembly to no longer show you the actual disassembled instructions, but instead now shows you corresponding ESIL expressions that describe what the instruction does. So if you want to take a look at how instructions are expressed in ESIL simply set "asm.esil" to true.
In visual mode you can also toggle this by simply typing O
.
"ae" : Evaluate ESIL expression.
"aes" : ESIL Step.
"aeso" : ESIL Step Over.
"aesu" : ESIL Step Until.
"ar" : Show/modify ESIL registry.
Here is the complete instruction set used by the ESIL VM:
ESIL Opcode
Operands
Name
Operation
example
TRAP
src
Trap
Trap signal
$
src
Syscall
syscall
$$
src
Instruction address
Get address of current instruction stack=instruction address
==
src,dst
Compare
stack = (dst == src) ; update_eflags(dst - src)
<
src,dst
Smaller (signed comparison)
stack = (dst < src) ; update_eflags(dst - src)
[0x0000000]> "ae 1,5,<" 0x0 > "ae 5,5" 0x0"
<=
src,dst
Smaller or Equal (signed comparison)
stack = (dst <= src) ; update_eflags(dst - src)
[0x0000000]> "ae 1,5,<" 0x0 > "ae 5,5" 0x1"
>
src,dst
Bigger (signed comparison)
stack = (dst > src) ; update_eflags(dst - src)
> "ae 1,5,>" 0x1 > "ae 5,5,>" 0x0
>=
src,dst
Bigger or Equal (signed comparison)
stack = (dst >= src) ; update_eflags(dst - src)
> "ae 1,5,>=" 0x1 > "ae 5,5,>=" 0x1
<<
src,dst
Shift Left
stack = dst << src
> "ae 1,1,<<" 0x2 > "ae 2,1,<<" 0x4
>>
src,dst
Shift Right
stack = dst >> src
> "ae 1,4,>>" 0x2 > "ae 2,4,>>" 0x1
<<<
src,dst
Rotate Left
stack=dst ROL src
> "ae 31,1,<<<" 0x80000000 > "ae 32,1,<<<" 0x1
>>>
src,dst
Rotate Right
stack=dst ROR src
> "ae 1,1,>>>" 0x80000000 > "ae 32,1,>>>" 0x1
&
src,dst
AND
stack = dst & src
> "ae 1,1,&" 0x1 > "ae 1,0,&" 0x0 > "ae 0,1,&" 0x0 > "ae 0,0,&" 0x0
|
src,dst
OR
stack = dst | src
> "ae 1,1,|" 0x1 > "ae 1,0,|" 0x1 > "ae 0,1,|" 0x1 > "ae 0,0,|" 0x0
^
src,dst
XOR
stack = dst ^src
> "ae 1,1,^" 0x0 > "ae 1,0,^" 0x1 > "ae 0,1,^" 0x1 > "ae 0,0,^" 0x0
+
src,dst
ADD
stack = dst + src
> "ae 3,4,+" 0x7 > "ae 5,5,+" 0xa
-
src,dst
SUB
stack = dst - src
> "ae 3,4,-" 0x1 > "ae 5,5,-" 0x0 > "ae 4,3,-" 0xffffffffffffffff
*
src,dst
MUL
stack = dst * src
> "ae 3,4,*" 0xc > "ae 5,5,*" 0x19
/
src,dst
DIV
stack = dst / src
> "ae 2,4,/" 0x2 > "ae 5,5,/" 0x1 > "ae 5,9,/" 0x1
%
src,dst
MOD
stack = dst % src
> "ae 2,4,%" 0x0 > "ae 5,5,%" 0x0 > "ae 5,9,%" 0x4
!
src
NEG
stack = !!!src
> "ae 1,!" 0x0 > "ae 4,!" 0x0 > "ae 0,!" 0x1
++
src
INC
stack = src++
> ar r_00=0;ar r_00 0x00000000 > "ae r_00,++" 0x1 > ar r_00 0x00000000 > "ae 1,++" 0x2
--
src
DEC
stack = src--
> ar r_00=5;ar r_00 0x00000005 > "ae r_00,--" 0x4 > ar r_00 0x00000005 > "ae 5,--" 0x4
=
src,reg
EQU
reg = src
> "ae 3,r_00,=" > aer r_00 0x00000003 > "ae r_00,r_01,=" > aer r_01 0x00000003
+=
src,reg
ADD eq
reg = reg + src
> ar r_01=5;ar r_00=0;ar r_00 0x00000000 > "ae r_01,r_00,+=" > ar r_00 0x00000005 > "ae 5,r_00,+=" > ar r_00 0x0000000a
-=
src,reg
SUB eq
reg = reg - src
> "ae r_01,r_00,-=" > ar r_00 0x00000004 > "ae 3,r_00,-=" > ar r_00 0x00000001
*=
src,reg
MUL eq
reg = reg * src
> ar r_01=3;ar r_00=5;ar r_00 0x00000005 > "ae r_01,r_00,*=" > ar r_00 0x0000000f > "ae 2,r_00,*=" > ar r_00 0x0000001e
/=
src,reg
DIV eq
reg = reg / src
> ar r_01=3;ar r_00=6;ar r_00 0x00000006 > "ae r_01,r_00,/=" > ar r_00 0x00000002 > "ae 1,r_00,/=" > ar r_00 0x00000002
%=
src,reg
MOD eq
reg = reg % src
> ar r_01=3;ar r_00=7;ar r_00 0x00000007 > "ae r_01,r_00,%=" > ar r_00 0x00000001 > ar r_00=9;ar r_00 0x00000009 > "ae 5,r_00,%=" > ar r_00 0x00000004
<<=
src,reg
Shift Left eq
reg = reg << src
> ar r_00=1;ar r_01=1;ar r_01 0x00000001 > "ae r_00,r_01,<<=" > ar r_01 0x00000002 > "ae 2,r_01,<<=" > ar r_01 0x00000008
>>=
src,reg
Shift Right eq
reg = reg << src
> ar r_00=1;ar r_01=8;ar r_01 0x00000008 > "ae r_00,r_01,>>=" > ar r_01 0x00000004 > "ae 2,r_01,>>=" > ar r_01 0x00000001
&=
src,reg
AND eq
reg = reg & src
> ar r_00=2;ar r_01=6;ar r_01 0x00000006 > "ae r_00,r_01,&=" > ar r_01 0x00000002 > "ae 2,r_01,&=" > ar r_01 0x00000002 > "ae 1,r_01,&=" > ar r_01 0x00000000
|=
src,reg
OR eq
reg = reg | src
> ar r_00=2;ar r_01=1;ar r_01 0x00000001 > "ae r_00,r_01,|=" > ar r_01 0x00000003 > "ae 4,r_01,|=" > ar r_01 0x00000007
^=
src,reg
XOR eq
reg = reg ^ src
> ar r_00=2;ar r_01=0xab;ar r_01 0x000000ab > "ae r_00,r_01,^=" > ar r_01 0x000000a9 > "ae 2,r_01,^=" > ar r_01 0x000000ab
++=
reg
INC eq
reg = reg + 1
> ar r_00=4;ar r_00 0x00000004 > "ae r_00,++=" > ar r_00 0x00000005
--=
reg
DEC eq
reg = reg - 1
> ar r_00=4;ar r_00 0x00000004 > "ae r_00,--=" > ar r_00 0x00000003
!=
reg
NOT eq
reg = !reg
> ar r_00=4;ar r_00 0x00000004 > "ae r_00,!=" > ar r_00 0x00000000 > "ae r_00,!=" > ar r_00 0x00000001
---
---
---
---
----------------------------------------------
=[] =[*] =[1] =[2] =[4] =[8]
src,dst
poke
*dst=src
> "ae 0xdeadbeef,0x10000,=[4]," > pxw 4@0x10000 0x00010000 0xdeadbeef .... > "ae 0x0,0x10000,=[4]," > pxw 4@0x10000 0x00010000 0x00000000
[] [*] [1] [2] [4] [8]
src
peek
stack=*src
> w test@0x10000 > "ae 0x10000,[4]," 0x74736574 > ar r_00=0x10000 > "ae r_00,[4]," 0x74736574
|=[] |=[1] |=[2] |=[4] |=[8]
reg
nombre
code
> >
SWAP
Swap
Swap two top elements
SWAP
PICK
n
Pick
Pick nth element from the top of the stack
2,PICK
RPICK
m
Reverse Pick
Pick nth element from the base of the stack
0,RPICK
DUP
Duplicate
Duplicate top element in stack
DUP
NUM
Numeric
If top element is a reference (register name, label, etc), dereference it and push its real value
NUM
CLEAR
Clear
Clear stack
CLEAR
BREAK
Break
Stops ESIL emulation
BREAK
GOTO
n
Goto
Jumps to Nth ESIL word
GOTO 5
TODO
To Do
Stops execution (reason: ESIL expression not completed)
TODO
ESIL VM has an internal state flags that are read-only and can be used to export those values to the underlying target CPU flags. It is because the ESIL VM always calculates all flag changes, while target CPUs only update flags under certain conditions or at specific instructions.
Internal flags are prefixed with $
character.
A target opcode is translated into a comma separated list of ESIL expressions.
Memory access is defined by brackets operation:
Default operand size is determined by size of operation destination.
The ?
operator uses the value of its argument to decide whether to evaluate the expression in curly braces.
Is the value zero? -> Skip it.
Is the value non-zero? -> Evaluate it.
If you want to run several expressions under a conditional, put them in curly braces:
Whitespaces, newlines and other chars are ignored. So the first thing when processing a ESIL program is to remove spaces:
Syscalls need special treatment. They are indicated by '$' at the beginning of an expression. You can pass an optional numeric value to specify a number of syscall. An ESIL emulator must handle syscalls. See (r_esil_syscall).
As discussed on IRC, the current implementation works like this:
This approach is more readable, but it is less stack-friendly.
NOPs are represented as empty strings. As it was said previously, syscalls are marked by '$' command. For example, '0x80,$'. It delegates emulation from the ESIL machine to a callback which implements syscalls for a specific OS/kernel.
Traps are implemented with the TRAP
command. They are used to throw exceptions for invalid instructions, division by zero, memory read error, or any other needed by specific architectures.
Here is a list of some quick checks to retrieve information from an ESIL string. Relevant information will be probably found in the first expression of the list.
Common operations:
Check dstreg
Check srcreg
Get destinaion
Is jump
Is conditional
Evaluate
Is syscall
CPU flags are usually defined as single bit registers in the RReg profile. They and sometimes found under the 'flg' register type.
Properties of the VM variables:
They have no predefined bit width. This way it should be easy to extend them to 128, 256 and 512 bits later, e.g. for MMX, SSE, AVX, Neon SIMD.
There can be unbound number of variables. It is done for SSA-form compatibility.
Register names have no specific syntax. They are just strings.
Numbers can be specified in any base supported by RNum (dec, hex, oct, binary ...).
Each ESIL backend should have an associated RReg profile to describe the ESIL register specs.
What to do with them? What about bit arithmetics if use variables instead of registers?
ADD ("+")
MUL ("*")
SUB ("-")
DIV ("/")
MOD ("%")
AND "&"
OR "|"
XOR "^"
SHL "<<"
SHR ">>"
ROL "<<<"
ROR ">>>"
NEG "!"
At the moment of this writing, ESIL does not yet support FPU. But you can implement support for unsupported instructions using r2pipe. Eventually we will get proper support for multimedia and floating point.
ESIL specifies that the parsing control-flow commands must be uppercase. Bear in mind that some architectures have uppercase register names. The corresponding register profile should take care not to reuse any of the following:
rep cmpsb
Those are expressed with the 'TODO' command. They act as a 'BREAK', but displays a warning message describing that an instruction is not implemented and will not be emulated. For example:
To ease ESIL parsing we should have a way to express introspection expressions to extract the data that we want. For example, we may want to get the target address of a jump. The parser for ESIL expressions should offer an API to make it possible to extract information by analyzing the expressions easily.
We need a way to retrieve the numeric value of 'rip'. This is a very simple example, but there are more complex, like conditional ones. We need expressions to be able to get:
opcode type
destination of a jump
condition depends on
all regs modified (write)
all regs accessed (read)
It is important for emulation to be able to setup hooks in the parser, so we can extend it to implement analysis without having to change it again and again. That is, every time an operation is about to be executed, a user hook is called. It can be used for example to determine if RIP
is going to change, or if the instruction updates the stack. Later, we can split that callback into several ones to have an event-based analysis API that may be extended in JavaScript like this:
For the API, see the functions hook_flag_read()
, hook_execute()
and hook_mem_read()
. A callback should return true or 1 if you want to override the action that it takes. For example, to deny memory reads in a region, or voiding memory writes, effectively making it read-only. Return false or 0 if you want to trace ESIL expression parsing.
Other operations require bindings to external functionalities to work. In this case, r_ref
and r_io
. This must be defined when initializing the ESIL VM.
Io Get/Set
Selectors (cs,ds,gs...)