Seccomp

Last updated on March 24, 2024 pm

Seccomp 学习笔记

1. 什么是Seccomp

程序中往往有意无意存在一些漏洞，这些漏洞会利用系统调用完成进一步的攻击，比如使用execve完成getshell。于是开发者提出了一种secure computing即seccomp安全编程的概念。这最早于2005年被加入Linux的内核中，不过这时候的seccomp十分严苛，只允许四个基本的系统调用

exit()
sigreturn()
read()
write()

2. prctl使用

最早期的seccomp是利用prctl函数实现的，函数原型如下：

1
2
3

#include <sys/prctl.h>

int prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5);

利用prctl实现最初的seccomp语句如下：

1	`prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);`

显然这种方式太粗暴了，会导致程序连最基本的malloc都变得十分困难。于是，更有用的一种方式的被提出来，即filter mode，这会提供一种更细粒度的控制

1	`prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, filter);`

这里filter是使用Berkeley Packet Filter即BPF语句所书写的，会在后面详细提及

使用prctl还有一点需要注意的是权限问题，使用 Prctl 需要有 CAP_SYS_ADMIN权能，否则就要设置 PR_SET_NO_NEW_PRIVS 位，若不这样做非 root 用户使用该程序时 seccomp保护将会失效，设置了 PR_SET_NO_NEW_PRIVS位后能保证 seccomp 对所有用户都能起作用，并且会使子进程即 execve 后的进程依然受控，意思就是即使执行了 execve 这个系统调用替换了整个 binary 权限不会变化，设置后也不能再更改。

1	`prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);`

3.BPF

BPF是一种运行于内核的可编程数据包过滤器和分类系统，无需往返于用户空间和内核地检查需要被检查的数据包。BPF实现基于内核的一个虚拟机，它有一个简单但严格的指令集，最后会返回一个整数告诉内核如何处理这数据包。BPF有四种类型，即逻辑运算,取值与存放、跳转，但BPF不能往回跳，需要保证这个过滤器一定能够结束。

这个虚拟机有四个部分组成

累加寄存器 A

索引寄存器 X

packet 内存

scratch 内存

如下是所有指令

Operator	Effect
Loads
ld	Load word into `A`
ldi	Load word into `A`
ldh	Load half-word into `A`
ldb	Load byte into `A`
ldx	Load word into `x`
ldxi	Load word into `x`
ldxb	Load byte into `x`
Stores
st	Store `A` into `M[]`
stx	Store `x` into `M[]`
Jumps
jmp	Jump to offset
ja	Jump to offset
jeq	Jump on `k == A`
jneq	Jump on `k != A`
jne	Jump on `k != A`
jlt	Jump on `k < A`
jle	Jump on `k <= A`
jgt	Jump on `k > A`
jge	Jump on `k >= A`
jset	Jump on `k & A`
Arithmetic
add	`A` +
sub	`A` -
mul	`A` *
div	`A` /
mod	`A` %
neg	!`A`
and	`A` &
or	`A`
xor	`A` ^
lsh	`A` <<
rsh	`A` >>
Misc
tax	Copy `A` into `x`
txa	Copy `x` into `A`
ret	Return

这些指令在C语言中使用一个结构体来描述

struct sock_filter {
    uint16_t code;  /* the opcode */
    uint8_t jt; /* if true: jump displacement */
    uint8_t jf; /* if false: jump displacement */
    uint32_t k; /* immediate operand */
};

其中，k在不同指令中会有不同的翻译

instruction	translation
BPF_IMM	立即数
BPF_ABS	作为`packet memory`的索引
BPF_IND	作为`packet memory`的索引，将取得的值加值寄存器X
BPF_MEM	作为`scratch memory`的索引，`M[]`
BPF_LEH	取`packet`的大小
BPF_MSH	从`IP header`中加载信息

字节关系

Size	Bytes
W	4
H	2
B	1

一条最基本的BPF指令可以分为三部分，

操作类型
该操作类型下的具体操作
地址模式

比如现在想描述这样一个操作：A寄存器被除于scratch memory的地24字节处的4字节值，即div A scratch memory[24]

struct sock_filter div_insn = {
    .code = BPF_ALU + BPF_DIV + BPF_MEM + BPF_W
    .k = 24
};

BPF_ALU表示操作类型，BPF_DIV表示该操作类型下的具体操作，BPF_MEM为表示地址在scratch memory中，BPF_W表明字节大小是4字节，那么k就表示scratch memory的索引。

#define BPF_CLASS(code) ((code) & 0x07)     //首先指定操作的类别
#define		BPF_LD		0x00                    //将值cp进寄存器
#define		BPF_LDX		0x01
#define		BPF_ST		0x02
#define		BPF_STX		0x03
#define		BPF_ALU		0x04
#define		BPF_JMP		0x05
#define		BPF_RET		0x06
#define		BPF_MISC        0x07
	
/* ld/ldx fields */
#define BPF_SIZE(code)  ((code) & 0x18)         //在ld时指定操作数的大小
#define		BPF_W		0x00
#define		BPF_H		0x08
#define		BPF_B		0x10

#define BPF_MODE(code)  ((code) & 0xe0)         //操作数类型
#define		BPF_IMM		0x00
#define		BPF_ABS		0x20
#define		BPF_IND		0x40
#define		BPF_MEM		0x60
#define		BPF_LEN		0x80
#define		BPF_MSH		0xa0

/* alu/jmp fields */
#define BPF_OP(code)    ((code) & 0xf0)         //当操作码类型为ALU时，指定具体运算符
#define		BPF_ADD		0x00                    		
#define		BPF_SUB		0x10
#define		BPF_MUL		0x20
#define		BPF_DIV		0x30
#define		BPF_OR		0x40
#define		BPF_AND		0x50
#define		BPF_LSH		0x60
#define		BPF_RSH		0x70
#define		BPF_NEG		0x80
#define		BPF_MOD		0x90
#define		BPF_XOR		0xa0

#define		BPF_JA		0x00                    //当操作码类型是JMP时指定跳转类型
#define		BPF_JEQ		0x10
#define		BPF_JGT		0x20
#define		BPF_JGE		0x30
#define		BPF_JSET        0x40
#define BPF_SRC(code)   ((code) & 0x08)         
#define		BPF_K		0x00                    //常数
#define		BPF_X		0x08

Linux定义了一些宏来便捷的操作，具体有两种，一种用于语句描述，一种用于跳转

1 2	`#define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k } #define BPF_JUMP(code, k, jt, jf) { (unsigned short)(code), jt, jf, k }`

把这样的语句一条一条的整合起来就是一个指令数组，比如这样

struct sock_filter[] filter = {
    /* A <- pkt[666:666 + 4] */
    BPF_STMT(
        BPF_LD + BPF_ABS + BPF_W,   /* opcode */
        666)    										/* k value */
      
    /* if a == 123: jump forward 7; else: jump forward 9 */
    BPF_JUMP(
        BPF_JMP + BPF_JEQ + BPF_K,  /*opcode */
        7,  												/* jump target if true */
        9,  												/* jump target if false */
        123)    										/* constant to compare against */
}

最终这个指令数组需要返回一个值，使得内核执行相应的操作。

这些过滤器最终被封装到一个结构体中

struct sock_fprog filterprog = {
    .len = sizeof(filter)/sizeof(filter[0]),
    .filter = filter
};

4. 使用BPF

一旦安装好seccomp，那么seccomp就会发送它的包来替换系统调用，每个发送给seccomp的包是这样一个结构体

struct seccomp_data {
    int nr;
    __u32 arch;
    __u64 instruction_pointer;
    __u64 args[6];
};

这里的nr就是系统调用号，args是6个寄存器

1 2	`32位：ebx,ecx,edx,esi,edi,ebp 64位：rdi,rsi,rdx,r10,r8,r9`

之后就利用写好的BPF进行过滤，然后执行返回相应操作，具体地有如下几个返回值

SECCOMP_RET_KILL    /* kill the task immediately */
SECCOMP_RET_TRAP    /* disallow and force a SIGSYS */
SECCOMP_RET_ERRNO   /* returns an errno */
SECCOMP_RET_TRACE   /* pass to a tracer or disallow */
SECCOMP_RET_ALLOW   /* allow */

还有一点需要的提醒的是，在检查系统调用号之前，必须检查系统的架构，因为不同的架构有不同的系统调用号。

一个基本的BPF框架是这样的

if (pkt.arch != MY_ARCH)
    deny;
if (pkt.nr == SYS_read)
    allow;
if (pkt.nr == SYS_write)
    allow;
...
else
    deny;

转为BPF汇编指令

    ldw offsetof(pkt, arch)
    jeq MY_ARCH, ok
    ret SECCOMP_RET_KILL
ok:
    ldw offsetof(pkt, nr)
    jeq ALLOWED_SYSCALL, .L0, .L1
.L0:
    ret SECCOMP_RET_ALLOW
.L1:
    jeq ALLOWED_SYSCALL, .L1, L2
.L2:
    ret SECCOMP_RET_ALLOW
.L3:
    ...
    ret SECCOMP_RET_DENY

使用宏表示

static struct filter = {
    BPF_STMT(
        BPF_LD+BPF_W+BPF_ABS,			/* ldw from abs offset */
        offsetof(struct seccomp_data, arch)
    ),
    BPF_JUMP(
        BPF_JMP+BPF_JEQ+BPF_K, 		/* jeq instruction */
        AUDIT_ARCH_X86_64,        /* the value to test */
        1,        								/* jump distance if true */
        0),        								/* jump distance if false */
      
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),	
      
    BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),/* load the syscall number */
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, SYS_read, 0, 1),    /* allow read() */
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)
    /* deny anything else */
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)
};

定义宏进一步缩写

1
2
3

#define Allow(syscall) \
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_##syscall, 0, 1), \
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)

struct sock_filter filter[] = {
    /* validate arch */
    BPF_STMT(BPF_LD+BPF_W+BPF_ABS, ArchField),
    BPF_JUMP( BPF_JMP+BPF_JEQ+BPF_K, AUDIT_ARCH_X86_64, 1, 0),
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),

    /* load syscall */
    BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),

    /* list of allowed syscalls */
    Allow(exit_group),  /* exits a processs */
    Allow(brk),     		/* for malloc(), inside libc */
    Allow(mmap),        /* also for malloc() */
    Allow(munmap),      /* for free(), inside libc */
    Allow(write),       /* called by printf */
    Allow(fstat),       /* called by printf */

    /* and if we don't match above, die */
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
};

大融合

#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>

#include <sys/types.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <sys/socket.h>

#include <linux/filter.h>
#include <linux/seccomp.h>
#include <linux/audit.h>

#define ArchField offsetof(struct seccomp_data, arch)

#define Allow(syscall) \
    BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, SYS_##syscall, 0, 1), \
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)

struct sock_filter filter[] = {
    /* validate arch */
    BPF_STMT(BPF_LD+BPF_W+BPF_ABS, ArchField),
    BPF_JUMP( BPF_JMP+BPF_JEQ+BPF_K, AUDIT_ARCH_X86_64, 1, 0),
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),

    /* load syscall */
    BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),

    /* list of allowed syscalls */
    Allow(exit_group),  /* exits a processs */
    Allow(brk),     /* for malloc(), inside libc */
    Allow(mmap),        /* also for malloc() */
    Allow(munmap),      /* for free(), inside libc */
    Allow(write),       /* called by printf */
    Allow(fstat),       /* called by printf */

    /* and if we don't match above, die */
    BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
};
struct sock_fprog filterprog = {
    .len = sizeof(filter)/sizeof(filter[0]),
    .filter = filter
};

int main(int argc, char **argv) {
    char buf[1024];

    /* set up the restricted environment */
    if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
        perror("Could not start seccomp:");
        exit(1);
    }
    if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &filterprog) == -1) {
        perror("Could not start seccomp:");
        exit(1);
    }

    /* printf only writes to stdout, but for some reason it stats it. */
    printf("hello there!\n");

    if (argc > 1 && strcmp(argv[1], "haxor") == 0) {
        int fd = socket(AF_INET6, SOCK_STREAM, 0);
        /* ...and start sending spam */
    }
}

5.利用封装

现在编程中，Linux提供了更为简洁接口来使用

#include <unistd.h>
#include <seccomp.h>
#include <linux/seccomp.h>

int main(void){
	scmp_filter_ctx ctx;
	ctx = seccomp_init(SCMP_ACT_ALLOW);
	seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(execve), 0);
	seccomp_load(ctx);

	char * filename = "/bin/sh";
	char * argv[] = {"/bin/sh",NULL};
	char * envp[] = {NULL};
	write(1,"i will give you a shell\n",24);
	syscall(59,filename,argv,envp);//execve
	return 0;
}

ctx 是 Filter context/handle ，其中 typedef void *scmp_filter_ctx；

seccomp_init 是初始化的过滤状态，这里用的是 SCMP_ACT_ALLOW ，表示默认允许所有的 syscall，如果初始化状态为 SCMP_ACT_KILL 则表示不允许所有的 syscall。

/*
 * seccomp actions
 */

/**
 * Kill the process
 */
#define SCMP_ACT_KILL		0x00000000U
/**
 * Throw a SIGSYS signal
 */
#define SCMP_ACT_TRAP		0x00030000U
/**
 * Return the specified error code
 */
#define SCMP_ACT_ERRNO(x)	(0x00050000U | ((x) & 0x0000ffffU))
/**
 * Notify a tracing process with the specified value
 */
#define SCMP_ACT_TRACE(x)	(0x7ff00000U | ((x) & 0x0000ffffU))
/**
 * Allow the syscall to be executed after the action has been logged
 */
#define SCMP_ACT_LOG		0x7ffc0000U
/**
 * Allow the syscall to be executed
 */
#define SCMP_ACT_ALLOW		0x7fff0000U

seccomp_rule_add是添加一条规则，原型如下

/**
 * Add a new rule to the filter
 * @param ctx the filter context
 * @param action the filter action
 * @param syscall the syscall number
 * @param arg_cnt the number of argument filters in the argument filter chain
 * @param ... scmp_arg_cmp structs (use of SCMP_ARG_CMP() recommended)
 *
 * This function adds a series of new argument/value checks to the seccomp
 * filter for the given syscall; multiple argument/value checks can be
 * specified and they will be chained together (AND'd together) in the filter.
 * If the specified rule needs to be adjusted due to architecture specifics it
 * will be adjusted without notification.  Returns zero on success, negative
 * values on failure.
 *
 */
int seccomp_rule_add(scmp_filter_ctx ctx,
		     uint32_t action, int syscall, unsigned int arg_cnt, ...);

seccomp_load是应用过滤，原型如下

/**
 * Loads the filter into the kernel
 * @param ctx the filter context
 *
 * This function loads the given seccomp filter context into the kernel.  If
 * the filter was loaded correctly, the kernel will be enforcing the filter
 * when this function returns.  Returns zero on success, negative values on
 * error.
 *
 */
int seccomp_load(const scmp_filter_ctx ctx);

上面代码中seccomp_rule_add(ctx,SCMP_ACT_KILL,SCMP_SYS(execve), 0); arg_cnt=0，是表示不管 execve 的参数是什么，都会直接限制 execve 执行 syscall。

如果 arg_cnt 不为0，那 arg_cnt 表示后面限制的参数的个数，也就是只有调用 execve，且参数满足要求时，才会拦截 syscall

不让拦截write函数，但是只有参数大于0x10时才拦截

#include <unistd.h>
#include <seccomp.h>
#include <linux/seccomp.h>

int main(void){
	scmp_filter_ctx ctx;
	ctx = seccomp_init(SCMP_ACT_ALLOW);
	seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(write),1,SCMP_A2(SCMP_CMP_GT,0x10));//第2(从0)个参数大于0x10
	seccomp_load(ctx);
	write(1,"i will give you a shell\n",24);//会拦截
	write(1,"1234567812345678",0x10);//不被拦截
	return 0;
}

/**
 * Specify an argument comparison struct for use in declaring rules
 * @param arg the argument number, starting at 0
 * @param op the comparison operator, e.g. SCMP_CMP_*
 * @param datum_a dependent on comparison
 * @param datum_b dependent on comparison, optional
 */
#define SCMP_CMP(...)		((struct scmp_arg_cmp){__VA_ARGS__})

/**
 * Specify an argument comparison struct for argument 0
 */
#define SCMP_A0(...)		SCMP_CMP(0, __VA_ARGS__)

/**
 * Specify an argument comparison struct for argument 1
 */
#define SCMP_A1(...)		SCMP_CMP(1, __VA_ARGS__)

/**
 * Specify an argument comparison struct for argument 2
 */
#define SCMP_A2(...)		SCMP_CMP(2, __VA_ARGS__)

/**
 * Specify an argument comparison struct for argument 3
 */
#define SCMP_A3(...)		SCMP_CMP(3, __VA_ARGS__)

/**
 * Specify an argument comparison struct for argument 4
 */
#define SCMP_A4(...)		SCMP_CMP(4, __VA_ARGS__)

/**
 * Specify an argument comparison struct for argument 5
 */
#define SCMP_A5(...)		SCMP_CMP(5, __VA_ARGS__)



/**
 * Comparison operators
 */
enum scmp_compare {
	_SCMP_CMP_MIN = 0,
	SCMP_CMP_NE = 1,		/**< not equal */
	SCMP_CMP_LT = 2,		/**< less than */
	SCMP_CMP_LE = 3,		/**< less than or equal */
	SCMP_CMP_EQ = 4,		/**< equal */
	SCMP_CMP_GE = 5,		/**< greater than or equal */
	SCMP_CMP_GT = 6,		/**< greater than */
	SCMP_CMP_MASKED_EQ = 7,		/**< masked equality */
	_SCMP_CMP_MAX,
};

/**
 * Argument datum
 */
typedef uint64_t scmp_datum_t;

/**
 * Argument / Value comparison definition
 */
struct scmp_arg_cmp {
	unsigned int arg;	/**< argument number, starting at 0 */
	enum scmp_compare op;	/**< the comparison op, e.g. SCMP_CMP_* */
	scmp_datum_t datum_a;
	scmp_datum_t datum_b;
};

CTF > Reverse > notes

#reverse

Seccomp

http://example.com/2024/01/29/seccomp/

Author

yring

Posted on

January 29, 2024

Licensed under

西湖论剑 2024 Previous

Zig语言逆向 Next