Loading a Binary

Last updated on May 29, 2023 am

Loading a Binary

angr是通过CLE模块来装载一个二进制文件,也就是说CLE的接口也可以用于angr。

The Loader

目前我们可以用以下几行代码装载一个二进制文件

1
2
3
4
import angr, monkeyhex
proj = angr.Project('examples/fauxware/fauxware')
proj.loader
<Loaded fauxware, maps [0x400000:0x5008000]>

Loaded Objects

前面提到Angr使用CLE来装载二进制文件,而CLE装载器即cle.Loader装载了这个二进制文件的所有objects,并且把他们映射到一个内存地址。每一个不同类型的binary objec都可以由cle.Backend处理,比如cle.ELF就是用来装载ELF文件。

loader.all_objects

可以通过loader.all_objects来获得CLE装载所有objects的列表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# All loaded objects
>>> proj.loader.all_objects
[<ELF Object fauxware, maps [0x400000:0x60105f]>,
<ELF Object libc-2.23.so, maps [0x1000000:0x13c999f]>,
<ELF Object ld-2.23.so, maps [0x2000000:0x2227167]>,
<ELFTLSObject Object cle##tls, maps [0x3000000:0x3015010]>,
<ExternObject Object cle##externs, maps [0x4000000:0x4008000]>,
<KernelObject Object cle##kernel, maps [0x5000000:0x5008000]>]

# main object 也就是需要我们关注的那个object
>>> proj.loader.main_object
<ELF Object fauxware, maps [0x400000:0x60105f]>

# 从名字到地址的映射
>>> proj.loader.shared_objects
{ 'fauxware': <ELF Object fauxware, maps [0x400000:0x60105f]>,
'libc.so.6': <ELF Object libc-2.23.so, maps [0x1000000:0x13c999f]>,
'ld-linux-x86-64.so.2': <ELF Object ld-2.23.so, maps [0x2000000:0x2227167]> }

# 从ELF装载的所有文件
# 如果是PE程序 使用all_pe_objects
>>> proj.loader.all_elf_objects
[<ELF Object fauxware, maps [0x400000:0x60105f]>,
<ELF Object libc-2.23.so, maps [0x1000000:0x13c999f]>,
<ELF Object ld-2.23.so, maps [0x2000000:0x2227167]>]

# 处理外部导入
>>> proj.loader.extern_object
<ExternObject Object cle##externs, maps [0x4000000:0x4008000]>

# 此object提供系统调用地址
>>> proj.loader.kernel_object
<KernelObject Object cle##kernel, maps [0x5000000:0x5008000]>

# 通过地址找object
>>> proj.loader.find_object_containing(0x400000)
<ELF Object fauxware, maps [0x400000:0x60105f]>

与object交互

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
obj = proj.loader.main_object

# 入口点
obj.entry
0x400580

# 起始 终止
obj.min_addr, obj.max_addr
(0x400000, 0x60105f)

# segments
obj.segments
<Regions: [<ELFSegment memsize=0xa74, filesize=0xa74, vaddr=0x400000, flags=0x5, offset=0x0>,
<ELFSegment memsize=0x238, filesize=0x228, vaddr=0x600e28, flags=0x6, offset=0xe28>]>
# sections
obj.sections
<Regions: [<Unnamed | offset 0x0, vaddr 0x0, size 0x0>,
<.interp | offset 0x238, vaddr 0x400238, size 0x1c>,
<.note.ABI-tag | offset 0x254, vaddr 0x400254, size 0x20>,
...etc

# 利用地址去找segment 或者 section
obj.find_segment_containing(obj.entry)
<ELFSegment memsize=0xa74, filesize=0xa74, vaddr=0x400000, flags=0x5, offset=0x0>
obj.find_section_containing(obj.entry)
<.text | offset 0x580, vaddr 0x400580, size 0x338>

# plt表找符号
addr = obj.plt['strcmp']
addr
0x400550
obj.reverse_plt[addr]
'strcmp'

# Show the prelinked base of the object and the location it was actually mapped into memory by CLE
obj.linked_base
0x400000
obj.mapped_base
0x400000

Symbols and Relocations

symbol是可执行文件的重要概念,完成从名字到地址的映射

可以利用loader.find_symbol寻找特定symbol,它的参数可以是名字,也可以是地址

1
2
3
strcmp = proj.loader.find_symbol('strcmp')
strcmp
<Symbol "strcmp" in libc.so.6 at 0x1089cd0>

symbol有三种形式地址

.rebased_addr 是全局地址空间的地址

.linked_addr 预链接地址

.relative_addr 即RVA

1
2
3
4
5
6
7
8
9
10
11
12
strcmp.name
'strcmp'

strcmp.owner
<ELF Object libc-2.23.so, maps [0x1000000:0x13c999f]>

strcmp.rebased_addr
0x1089cd0
strcmp.linked_addr
0x89cd0
strcmp.relative_addr
0x89cd0

Loading Options

可以给angr.Project传一些参数,以此来约束加载二进制文件的行为

1
2
3
4
5
6
7
1.auto_load_libs  决定是否加载动态链接库,默认开启
2.skip_libs 跳过这些依赖库
3.ld_path 寻找动态链接库的路径
4.backend 后端,比如 blob
5.base_addr 基址
6.entry_point 入口点
7.arch 架构

Symbolic Function Summaries

Project会将外部的call转化为自己的符号执行,angr已经把一整套的外部库函数给模拟为SimProcedures了,

可以在angr.SIM_PROCEDURES字典访问,键值是库的名字,对象是这个库的函数名字

如果没有这个外部函数的SimProcedueres:

auto_load_libs is True (default), 执行原来的函数

auto_load_libs is False, 也是模拟执行,但返回一个无约束的状态

use_sim_procedures,这个是angr.Project的参数,如果是False,只会模拟执行此symbols.默认是True

Hook

proj.hook(addr, hook), where hook is a SimProcedure instance. You can manage your project’s hooks with .is_hooked, .unhook, and .hooked_by, which should hopefully not require explanation.

可以指定length参数,来决定hook后跳多少个bytes指令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
stub_func = angr.SIM_PROCEDURES['stubs']['ReturnUnconstrained'] # this is a CLASS
proj.hook(0x10000, stub_func()) # hook with an instance of the class

proj.is_hooked(0x10000) # these functions should be pretty self-explanitory
True
proj.hooked_by(0x10000)
<ReturnUnconstrained>

proj.unhook(0x10000)

@proj.hook(0x20000, length=5)
def my_hook(state):
state.regs.rax = 1

proj.is_hooked(0x20000)
True

angr这样描述hook

Furthermore, you can use proj.hook_symbol(name, hook), providing the name of a symbol as the first argument, to hook the address where the symbol lives. One very important usage of this is to extend the behavior of angr’s built-in library SimProcedures. Since these library functions are just classes, you can subclass them, overriding pieces of their behavior, and then use your subclass in a hook.

也即我们可以用proj.hook_symbol(name,hook)来替换angr内置的函数,以我们的方式实现


Loading a Binary
http://example.com/2023/05/26/Loading a Binary/
Author
yring
Posted on
May 26, 2023
Licensed under