Seccomp

Last updated on March 24, 2024 pm

Seccomp 学习笔记

1. 什么是Seccomp

程序中往往有意无意存在一些漏洞,这些漏洞会利用系统调用完成进一步的攻击,比如使用execve完成getshell。于是开发者提出了一种secure computingseccomp安全编程的概念。这最早于2005年被加入Linux的内核中,不过这时候的seccomp十分严苛,只允许四个基本的系统调用

1
2
3
4
exit()
sigreturn()
read()
write()

2. prctl使用

最早期的seccomp是利用prctl函数实现的,函数原型如下:

1
2
3
#include <sys/prctl.h>

int prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5);

利用prctl实现最初的seccomp语句如下:

1
prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);

显然这种方式太粗暴了,会导致程序连最基本的malloc都变得十分困难。于是,更有用的一种方式的被提出来,即filter mode,这会提供一种更细粒度的控制

1
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, filter);

这里filter是使用Berkeley Packet FilterBPF语句所书写的,会在后面详细提及

使用prctl还有一点需要注意的是权限问题,使用 Prctl 需要有 CAP_SYS_ADMIN权能,否则就要设置 PR_SET_NO_NEW_PRIVS 位,若不这样做 非 root 用户使用该程序时 seccomp保护将会失效,设置了 PR_SET_NO_NEW_PRIVS位后能保证 seccomp 对所有用户都能起作用,并且会使子进程即 execve 后的进程依然受控,意思就是 即使执行了 execve 这个系统调用替换了整个 binary 权限不会变化,设置后也不能再更改。

1
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);

3.BPF

BPF是一种运行于内核的可编程数据包过滤器和分类系统,无需往返于用户空间和内核地检查需要被检查的数据包。BPF实现基于内核的一个虚拟机,它有一个简单但严格的指令集,最后会返回一个整数告诉内核如何处理这数据包。BPF有四种类型,即逻辑运算,取值与存放跳转,但BPF不能往回跳,需要保证这个过滤器一定能够结束。

这个虚拟机有四个部分组成

累加寄存器 A

索引寄存器 X

packet 内存

scratch 内存

如下是所有指令

Operator Effect
Loads
ld Load word into A
ldi Load word into A
ldh Load half-word into A
ldb Load byte into A
ldx Load word into x
ldxi Load word into x
ldxb Load byte into x
Stores
st Store A into M[]
stx Store x into M[]
Jumps
jmp Jump to offset
ja Jump to offset
jeq Jump on k == A
jneq Jump on k != A
jne Jump on k != A
jlt Jump on k < A
jle Jump on k <= A
jgt Jump on k > A
jge Jump on k >= A
jset Jump on k & A
Arithmetic
add A +
sub A -
mul A *
div A /
mod A %
neg !A
and A &
or A
xor A ^
lsh A <<
rsh A >>
Misc
tax Copy A into x
txa Copy x into A
ret Return

这些指令在C语言中使用一个结构体来描述

1
2
3
4
5
6
struct sock_filter {
uint16_t code; /* the opcode */
uint8_t jt; /* if true: jump displacement */
uint8_t jf; /* if false: jump displacement */
uint32_t k; /* immediate operand */
};

其中,k在不同指令中会有不同的翻译

instruction translation
BPF_IMM 立即数
BPF_ABS 作为packet memory的索引
BPF_IND 作为packet memory的索引,将取得的值加值寄存器X
BPF_MEM 作为scratch memory的索引,M[]
BPF_LEH packet的大小
BPF_MSH IP header中加载信息

字节关系

Size Bytes
W 4
H 2
B 1

一条最基本的BPF指令可以分为三部分,

  1. 操作类型
  2. 该操作类型下的具体操作
  3. 地址模式

比如现在想描述这样一个操作:A寄存器被除于scratch memory的地24字节处的4字节值,即div A scratch memory[24]

1
2
3
4
struct sock_filter div_insn = {
.code = BPF_ALU + BPF_DIV + BPF_MEM + BPF_W
.k = 24
};

BPF_ALU表示操作类型,BPF_DIV表示该操作类型下的具体操作,BPF_MEM为表示地址在scratch memory中,BPF_W表明字节大小是4字节,那么k就表示scratch memory的索引。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#define BPF_CLASS(code) ((code) & 0x07)     //首先指定操作的类别
#define BPF_LD 0x00 //将值cp进寄存器
#define BPF_LDX 0x01
#define BPF_ST 0x02
#define BPF_STX 0x03
#define BPF_ALU 0x04
#define BPF_JMP 0x05
#define BPF_RET 0x06
#define BPF_MISC 0x07

/* ld/ldx fields */
#define BPF_SIZE(code) ((code) & 0x18) //在ld时指定操作数的大小
#define BPF_W 0x00
#define BPF_H 0x08
#define BPF_B 0x10

#define BPF_MODE(code) ((code) & 0xe0) //操作数类型
#define BPF_IMM 0x00
#define BPF_ABS 0x20
#define BPF_IND 0x40
#define BPF_MEM 0x60
#define BPF_LEN 0x80
#define BPF_MSH 0xa0

/* alu/jmp fields */
#define BPF_OP(code) ((code) & 0xf0) //当操作码类型为ALU时,指定具体运算符
#define BPF_ADD 0x00
#define BPF_SUB 0x10
#define BPF_MUL 0x20
#define BPF_DIV 0x30
#define BPF_OR 0x40
#define BPF_AND 0x50
#define BPF_LSH 0x60
#define BPF_RSH 0x70
#define BPF_NEG 0x80
#define BPF_MOD 0x90
#define BPF_XOR 0xa0

#define BPF_JA 0x00 //当操作码类型是JMP时指定跳转类型
#define BPF_JEQ 0x10
#define BPF_JGT 0x20
#define BPF_JGE 0x30
#define BPF_JSET 0x40
#define BPF_SRC(code) ((code) & 0x08)
#define BPF_K 0x00 //常数
#define BPF_X 0x08

Linux定义了一些宏来便捷的操作,具体有两种,一种用于语句描述,一种用于跳转

1
2
#define BPF_STMT(code, k) 				{ (unsigned short)(code), 0, 0, k }
#define BPF_JUMP(code, k, jt, jf) { (unsigned short)(code), jt, jf, k }

把这样的语句一条一条的整合起来就是一个指令数组,比如这样

1
2
3
4
5
6
7
8
9
10
11
12
13
struct sock_filter[] filter = {
/* A <- pkt[666:666 + 4] */
BPF_STMT(
BPF_LD + BPF_ABS + BPF_W, /* opcode */
666) /* k value */

/* if a == 123: jump forward 7; else: jump forward 9 */
BPF_JUMP(
BPF_JMP + BPF_JEQ + BPF_K, /*opcode */
7, /* jump target if true */
9, /* jump target if false */
123) /* constant to compare against */
}

最终这个指令数组需要返回一个值,使得内核执行相应的操作。

这些过滤器最终被封装到一个结构体中

1
2
3
4
struct sock_fprog filterprog = {
.len = sizeof(filter)/sizeof(filter[0]),
.filter = filter
};

4. 使用BPF

一旦安装好seccomp,那么seccomp就会发送它的包来替换系统调用,每个发送给seccomp的包是这样一个结构体

1
2
3
4
5
6
struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
};

这里的nr就是系统调用号,args是6个寄存器

1
2
32位:ebx,ecx,edx,esi,edi,ebp
64位:rdi,rsi,rdx,r10,r8,r9

之后就利用写好的BPF进行过滤,然后执行返回相应操作,具体地有如下几个返回值

1
2
3
4
5
SECCOMP_RET_KILL    /* kill the task immediately */
SECCOMP_RET_TRAP /* disallow and force a SIGSYS */
SECCOMP_RET_ERRNO /* returns an errno */
SECCOMP_RET_TRACE /* pass to a tracer or disallow */
SECCOMP_RET_ALLOW /* allow */

还有一点需要的提醒的是,在检查系统调用号之前,必须检查系统的架构,因为不同的架构有不同的系统调用号。

一个基本的BPF框架是这样的

1
2
3
4
5
6
7
8
9
if (pkt.arch != MY_ARCH)
deny;
if (pkt.nr == SYS_read)
allow;
if (pkt.nr == SYS_write)
allow;
...
else
deny;

转为BPF汇编指令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
    ldw offsetof(pkt, arch)
jeq MY_ARCH, ok
ret SECCOMP_RET_KILL
ok:
ldw offsetof(pkt, nr)
jeq ALLOWED_SYSCALL, .L0, .L1
.L0:
ret SECCOMP_RET_ALLOW
.L1:
jeq ALLOWED_SYSCALL, .L1, L2
.L2:
ret SECCOMP_RET_ALLOW
.L3:
...
ret SECCOMP_RET_DENY

使用宏表示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
static struct filter = {
BPF_STMT(
BPF_LD+BPF_W+BPF_ABS, /* ldw from abs offset */
offsetof(struct seccomp_data, arch)
),
BPF_JUMP(
BPF_JMP+BPF_JEQ+BPF_K, /* jeq instruction */
AUDIT_ARCH_X86_64, /* the value to test */
1, /* jump distance if true */
0), /* jump distance if false */

BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),

BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),/* load the syscall number */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, SYS_read, 0, 1), /* allow read() */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)
/* deny anything else */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)
};

定义宏进一步缩写

1
2
3
#define Allow(syscall) \
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_##syscall, 0, 1), \
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
struct sock_filter filter[] = {
/* validate arch */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, ArchField),
BPF_JUMP( BPF_JMP+BPF_JEQ+BPF_K, AUDIT_ARCH_X86_64, 1, 0),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),

/* load syscall */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),

/* list of allowed syscalls */
Allow(exit_group), /* exits a processs */
Allow(brk), /* for malloc(), inside libc */
Allow(mmap), /* also for malloc() */
Allow(munmap), /* for free(), inside libc */
Allow(write), /* called by printf */
Allow(fstat), /* called by printf */

/* and if we don't match above, die */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
};

大融合

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>

#include <sys/types.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <sys/socket.h>

#include <linux/filter.h>
#include <linux/seccomp.h>
#include <linux/audit.h>

#define ArchField offsetof(struct seccomp_data, arch)

#define Allow(syscall) \
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, SYS_##syscall, 0, 1), \
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)

struct sock_filter filter[] = {
/* validate arch */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, ArchField),
BPF_JUMP( BPF_JMP+BPF_JEQ+BPF_K, AUDIT_ARCH_X86_64, 1, 0),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),

/* load syscall */
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),

/* list of allowed syscalls */
Allow(exit_group), /* exits a processs */
Allow(brk), /* for malloc(), inside libc */
Allow(mmap), /* also for malloc() */
Allow(munmap), /* for free(), inside libc */
Allow(write), /* called by printf */
Allow(fstat), /* called by printf */

/* and if we don't match above, die */
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
};
struct sock_fprog filterprog = {
.len = sizeof(filter)/sizeof(filter[0]),
.filter = filter
};

int main(int argc, char **argv) {
char buf[1024];

/* set up the restricted environment */
if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
perror("Could not start seccomp:");
exit(1);
}
if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &filterprog) == -1) {
perror("Could not start seccomp:");
exit(1);
}

/* printf only writes to stdout, but for some reason it stats it. */
printf("hello there!\n");

if (argc > 1 && strcmp(argv[1], "haxor") == 0) {
int fd = socket(AF_INET6, SOCK_STREAM, 0);
/* ...and start sending spam */
}
}

5.利用封装

现在编程中,Linux提供了更为简洁接口来使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <unistd.h>
#include <seccomp.h>
#include <linux/seccomp.h>

int main(void){
scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_ALLOW);
seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(execve), 0);
seccomp_load(ctx);

char * filename = "/bin/sh";
char * argv[] = {"/bin/sh",NULL};
char * envp[] = {NULL};
write(1,"i will give you a shell\n",24);
syscall(59,filename,argv,envp);//execve
return 0;
}

ctxFilter context/handle ,其中 typedef void *scmp_filter_ctx

seccomp_init 是初始化的过滤状态,这里用的是 SCMP_ACT_ALLOW ,表示默认允许所有的 syscall,如果初始化状态为 SCMP_ACT_KILL 则表示不允许所有的 syscall。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/*
* seccomp actions
*/

/**
* Kill the process
*/
#define SCMP_ACT_KILL 0x00000000U
/**
* Throw a SIGSYS signal
*/
#define SCMP_ACT_TRAP 0x00030000U
/**
* Return the specified error code
*/
#define SCMP_ACT_ERRNO(x) (0x00050000U | ((x) & 0x0000ffffU))
/**
* Notify a tracing process with the specified value
*/
#define SCMP_ACT_TRACE(x) (0x7ff00000U | ((x) & 0x0000ffffU))
/**
* Allow the syscall to be executed after the action has been logged
*/
#define SCMP_ACT_LOG 0x7ffc0000U
/**
* Allow the syscall to be executed
*/
#define SCMP_ACT_ALLOW 0x7fff0000U

seccomp_rule_add是添加一条规则,原型如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
* Add a new rule to the filter
* @param ctx the filter context
* @param action the filter action
* @param syscall the syscall number
* @param arg_cnt the number of argument filters in the argument filter chain
* @param ... scmp_arg_cmp structs (use of SCMP_ARG_CMP() recommended)
*
* This function adds a series of new argument/value checks to the seccomp
* filter for the given syscall; multiple argument/value checks can be
* specified and they will be chained together (AND'd together) in the filter.
* If the specified rule needs to be adjusted due to architecture specifics it
* will be adjusted without notification. Returns zero on success, negative
* values on failure.
*
*/
int seccomp_rule_add(scmp_filter_ctx ctx,
uint32_t action, int syscall, unsigned int arg_cnt, ...);

seccomp_load是应用过滤,原型如下

1
2
3
4
5
6
7
8
9
10
11
/**
* Loads the filter into the kernel
* @param ctx the filter context
*
* This function loads the given seccomp filter context into the kernel. If
* the filter was loaded correctly, the kernel will be enforcing the filter
* when this function returns. Returns zero on success, negative values on
* error.
*
*/
int seccomp_load(const scmp_filter_ctx ctx);

上面代码中seccomp_rule_add(ctx,SCMP_ACT_KILL,SCMP_SYS(execve), 0); arg_cnt=0,是表示不管 execve 的参数是什么,都会直接限制 execve 执行 syscall。

如果 arg_cnt 不为0,那 arg_cnt 表示后面限制的参数的个数,也就是只有调用 execve,且参数满足要求时,才会拦截 syscall

不让拦截write函数,但是只有参数大于0x10时才拦截

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <unistd.h>
#include <seccomp.h>
#include <linux/seccomp.h>

int main(void){
scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_ALLOW);
seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write),1,SCMP_A2(SCMP_CMP_GT,0x10));//第2(从0)个参数大于0x10
seccomp_load(ctx);
write(1,"i will give you a shell\n",24);//会拦截
write(1,"1234567812345678",0x10);//不被拦截
return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/**
* Specify an argument comparison struct for use in declaring rules
* @param arg the argument number, starting at 0
* @param op the comparison operator, e.g. SCMP_CMP_*
* @param datum_a dependent on comparison
* @param datum_b dependent on comparison, optional
*/
#define SCMP_CMP(...) ((struct scmp_arg_cmp){__VA_ARGS__})

/**
* Specify an argument comparison struct for argument 0
*/
#define SCMP_A0(...) SCMP_CMP(0, __VA_ARGS__)

/**
* Specify an argument comparison struct for argument 1
*/
#define SCMP_A1(...) SCMP_CMP(1, __VA_ARGS__)

/**
* Specify an argument comparison struct for argument 2
*/
#define SCMP_A2(...) SCMP_CMP(2, __VA_ARGS__)

/**
* Specify an argument comparison struct for argument 3
*/
#define SCMP_A3(...) SCMP_CMP(3, __VA_ARGS__)

/**
* Specify an argument comparison struct for argument 4
*/
#define SCMP_A4(...) SCMP_CMP(4, __VA_ARGS__)

/**
* Specify an argument comparison struct for argument 5
*/
#define SCMP_A5(...) SCMP_CMP(5, __VA_ARGS__)



/**
* Comparison operators
*/
enum scmp_compare {
_SCMP_CMP_MIN = 0,
SCMP_CMP_NE = 1, /**< not equal */
SCMP_CMP_LT = 2, /**< less than */
SCMP_CMP_LE = 3, /**< less than or equal */
SCMP_CMP_EQ = 4, /**< equal */
SCMP_CMP_GE = 5, /**< greater than or equal */
SCMP_CMP_GT = 6, /**< greater than */
SCMP_CMP_MASKED_EQ = 7, /**< masked equality */
_SCMP_CMP_MAX,
};

/**
* Argument datum
*/
typedef uint64_t scmp_datum_t;

/**
* Argument / Value comparison definition
*/
struct scmp_arg_cmp {
unsigned int arg; /**< argument number, starting at 0 */
enum scmp_compare op; /**< the comparison op, e.g. SCMP_CMP_* */
scmp_datum_t datum_a;
scmp_datum_t datum_b;
};

Seccomp
http://example.com/2024/01/29/seccomp/
Author
yring
Posted on
January 29, 2024
Licensed under