Programs generation

Bytecode generation

bf_program is used to represent a BPF program. It contains the BPF bytecode, as well as the required maps and metadata.

Workflow

The program is composed of different steps:

  1. Initialize the generic context

  2. Preprocess the packet’s headers: gather information about the packet’s size, the protocols available, the input interface…

  3. Execute the filtering rules: execute all the rules defined in the program sequentially. If a rule matches the packet, apply its verdict and return.

  4. Apply the policy if no rule matched: if no rule matched the packet, return the chain’s policy (default action).

Memory layout

The program will use the BPF registers to following way:

  • r0 : return value

  • r1 to r5 (included): general purpose registers

  • r6 : address of the header currently filtered on

  • r7 : L3 protocol ID

  • r8 : L4 protocol ID

  • r9 : unused

  • r10 : frame pointer

This convention is followed throughout the project and must be followed all the time to prevent incompatibilities. Debugging this kind of issues is not fun, so stick to it.

bf_program_context is used to represent the layout of the first stack frame in the program. It is filled during preprocessing and contains data required for packet filtering.

About preprocessing

The packets are preprocessed according to the program type (i.e. BPF flavor). Each flavor needs to perform the following steps during preprocessing:

  • Store the packet size and the input interface index into the runtime context

  • Create a BPF dynamic pointer for the packet

  • Preprocess the L2, L3, and L4 headers

The header’s preprocessing is required to discover the protocols used in the packet: processing L2 will rovide us with information about L3, and so on. The logic used to process layer X is responsible for discovering layer X+1: the L2 header preprocessing logic will discover the L3 protocol ID. When processing layer X, if the protocol is not supported, the protocol ID is reset to 0 (so we won’t execute the rules for this layer) and subsequent layers are not processed (because we can’t discover their protocol).

For example, assuming IPv6 and TCP are the only supported protocols:

  • L2 processing: discover the packet’s ethertype (IPv6), and store it into r7 .

  • L3 processing: the protocol ID in r7 is supported (IPv6), so a slice is created, and the L4 protocol ID is read from the IPV6 header into r8 .

  • L4 processing: the protocol ID in r8 is supported (TCP), so a slice is created.

  • The program can now start executing the rules.

However, assuming only IPv6 and UDP are supported:

  • L2 processing: discover the packet’s ethertype (IPv6), and store it into r7 .

  • L3 processing: the protocol ID in r7 is supported (IPv6), so a slice is created, and the L4 protocol ID is read from the IPV6 header into r8 .

  • L4 processing: the protocol ID in r8 is no supported (TCP), r8 is set to 0 and we stop processing this layer.

  • The program can now start executing the rules. No layer 4 rule will be executed as r8 won’t match any protocol ID.

Warning

L3 and L4 protocol IDs must be stored in registers, no on the stack, as older verifier aren’t able to keep track of scalar values located on the stack. This means the verification will fail because the verifier can’t verify branches properly.

Defines

PIN_PATH_LEN
BF_PROG_ID_LEN
BF_PROG_CTX_OFF(field)

Convenience macro to get the offset of a field in bf_program_context based on the frame pointer in BPF_REG_10 .

BF_PROG_SCR_OFF(offset)

Convenience macro to get an address in the scratch area of bf_program_context .

EMIT(program, x)
EMIT_KFUNC_CALL(program, function)
EMIT_FIXUP(program, type, insn)
EMIT_FIXUP_CALL(program, function)
EMIT_FIXUP_JMP_NEXT_RULE(program, insn)
EMIT_LOAD_COUNTERS_FD_FIXUP(program, reg)
EMIT_LOAD_SET_FD_FIXUP(program, reg, index)

Load a specific set’s file descriptor.

Note

Similarly to every EMIT_* macro, it must be called from a function returning an int , if the call fails, the macro will return a negative errno value.

Parameters:
  • program – Program to generate the bytecode for. Can’t be NULL.

  • reg – Register to store the set file descriptor in.

  • index – Index of the set in the program.

_cleanup_bf_program_

Functions

int bf_program_new(struct bf_program **program, enum bf_hook hook, enum bf_front front, const struct bf_chain *chain)
void bf_program_free(struct bf_program **program)
int bf_program_marsh(const struct bf_program *program, struct bf_marsh **marsh)
int bf_program_unmarsh(const struct bf_marsh *marsh, struct bf_program **program, const struct bf_chain *chain)
void bf_program_dump(const struct bf_program *program, prefix_t *prefix)
int bf_program_grow_img(struct bf_program *program)
int bf_program_emit(struct bf_program *program, struct bpf_insn insn)
int bf_program_emit_kfunc_call(struct bf_program *program, const char *name)
int bf_program_emit_fixup(struct bf_program *program, enum bf_fixup_type type, struct bpf_insn insn, const union bf_fixup_attr *attr)
int bf_program_emit_fixup_call(struct bf_program *program, enum bf_fixup_func function)
int bf_program_generate(struct bf_program *program)
int bf_program_load(struct bf_program *new_prog, struct bf_program *old_prog)

Load and attach the program to the kernel.

Perform the loading and attaching of the program to the kernel in one step. If a similar program already exists, old_prog should be a pointer to it, and will be replaced.

Parameters:
  • new_prog – New program to load and attach to the kernel. Can’t be NULL.

  • old_prog – Existing program to replace.

Returns:

0 on success, or negative errno value on failure.

int bf_program_unload(struct bf_program *program)
int bf_program_get_counter(const struct bf_program *program, uint32_t counter_idx, struct bf_counter *counter)
int bf_program_set_counters(struct bf_program *program, const struct bf_counter *counters)
struct bf_program_context
#include <bpfilter/cgen/program.h>

BPF program runtime context.

This structure is used to easily read and write data from the program’s stack. At runtime, the first stack frame of each generated program will contain data according to bf_program_context .

The generated programs uses BPF dynamic pointer slices to safely access the packet’s data. bpf_dynptr_slice requires a user-provided buffer into which it might copy the requested data, depending on the BPF program type: that is the purpose of the anonynous unions, big enough to store the supported protocol headers. bpf_dynptr_slice returns the address of the requested data, which is either the address of the user-buffer, or the address of the data in the packet (if the data hasn’t be copied). The program will store this address into the runtime context (i.e. l2 , l3 , and l4 ), and it will be used to access the packet’s data.

While earlier versions of this structure contained the L3 and L4 protocol IDs, they have been move to registers instead, as old version of the verifier can’t keep track of scalar values in the stack, leading to verification failures.

Warning

Not all the BPF verifier versions are born equal as older ones might require stack access to be 8-bytes aligned to work properly.

Public Members

void *arg

Argument passed to the BPF program, its content depends on the BPF program type.

struct bpf_dynptr dynptr

BPF dynamic pointer representing the packet data. Dynamic pointers are used with every program type.

uint64_t pkt_size

Total size of the packet.

uint32_t l3_offset

Offset of the layer 3 protocol.

uint32_t l4_offset

Offset of the layer 4 protocol.

uint32_t ifindex

On ingress, index of the input interface. On egress, index of the output interface.

void *l2_hdr

Pointer to the L2 protocol header.

void *l3_hdr

Pointer to the L3 protocol header.

void *l4_hdr

Pointer to the L4 protocol header.

union bf_program_context._bf_l2 l2
union bf_program_context._bf_l3 l3
union bf_program_context._bf_l4 l4
uint8_t scratch[64]
union _bf_l2
#include <bpfilter/cgen/program.h>

Layer 2 header.

Public Members

struct ethhdr eth
union _bf_l3
#include <bpfilter/cgen/program.h>

Layer 3 header.

Public Members

struct iphdr ip4
struct ipv6hdr ip6
union _bf_l4
#include <bpfilter/cgen/program.h>

Layer 3 header.

Public Members

struct icmphdr icmp
struct udphdr udp
struct tcphdr tcp
struct icmp6hdr icmp6
struct bf_program
#include <bpfilter/cgen/program.h>

Public Members

char id[(BPF_OBJ_NAME_LEN - 4)]
enum bf_hook hook
enum bf_front front
char prog_name[BPF_OBJ_NAME_LEN]
struct bf_printer *printer

Log messages printer.

struct bf_map *cmap

Counters map.

struct bf_map *pmap

Printer map.

bf_list sets

List of set maps.

Link objects attaching the program to a hook.

size_t num_counters

Number of counters in the counters map. Not all of them are used by the program, but this value is common for all the programs of a given codegen.

uint32_t functions_location[_BF_FIXUP_FUNC_MAX]
struct bpf_insn *img
size_t img_size
size_t img_cap
bf_list fixups
int prog_fd

File descriptor of the program.

const struct bf_flavor_ops *ops

Hook-specific ops to use to generate the program.

const struct bf_chain *chain

Chain the program is generated from. This is a non-owning pointer: the bf_program doesn’t have to manage its lifetime.

struct bf_program runtime

Runtime data used to interact with the program and cache information. This data is not serialized.

Switch-cases

bf_swich is used to generate a switch-case logic in BPF bytecode, the logic is the following:

  • Create a new bf_swich object and initialize it. Use bf_swich_get to simplify this step. A bf_swich object contains a pointer to the generated program, and the register to perform the switch comparison against.

  • Call EMIT_SWICH_OPTION to define the various cases for the switch, and the associated BPF bytecode to run.

  • Call EMIT_SWICH_DEFAULT to define the default case of the switch, this is optional.

  • Call bf_swich_generate to generate the BPF bytecode for the switch.

Once bf_swich_generate has been called, this is what the switch structure will look like in BPF bytecode:

if case 1 matches REG, jump to case 1 code
if case 2 matches REG, jump to case 2 code
else jump to default code
case 1 code
    jump after the switch
case 2 code
    jump after the switch
default code

Note

I am fully aware it’s supposed to be spelled switch and not swich , but both switch and case are reserved keywords in C, so I had to come up with a solution to avoid clashes, and swich could be pronounced similarly to switch , at least to my non-native speak ear.

Defines

_cleanup_bf_swich_

Cleanup attribute for a bf_swich variable.

bf_swich_get(program, reg)

Create, initialize, and return a new bf_swich object.

Parameters:
  • programbf_program object to create the switch in.

  • reg – Register to use to compare the cases values to.

Returns:

A new bf_swich object.

EMIT_SWICH_OPTION(swich, imm, ...)

Add a case to the bf_swich

Parameters:
  • swich – Pointer to a valid bf_swich .

  • imm – Immediate value to compare against the switch’s register.

  • ... – BPF instructions to execute if the case matches.

EMIT_SWICH_DEFAULT(swich, ...)

Set the default instruction if no cases of the switch matches the register.

Defining a default option to a bf_swich is optional. If this macro is called twice, the existing default options will be replaced by the new ones.

Parameters:
  • swich – Pointer to a valid bf_swich .

  • ... – BPF instructions to execute if no case matches.

Functions

int bf_swich_init(struct bf_swich *swich, struct bf_program *program, int reg)

Initialise a bf_swich object.

Parameters:
  • swichbf_swich object to initialize, can’t be NULL.

  • programbf_program object to generate the switch-case for. Can’t be NULL.

  • reg – Register to compare to the cases of the switch.

Returns:

0 on success, or negative errno value on failure.

void bf_swich_cleanup(struct bf_swich *swich)

Cleanup a bf_swich object.

Once this function returns, the swich object can be reused by calling bf_swich_init .

Parameters:
  • swich – The bf_swich object to clean up.

int bf_swich_add_option(struct bf_swich *swich, uint32_t imm, const struct bpf_insn *insns, size_t insns_len)

Add an option (case) to the switch object.

Parameters:
  • swichbf_swich object to add the option to. Can’t be NULL.

  • imm – Immediate value to compare the switch’s register to. If the values are equal, the option’s instructions are executed.

  • insns – Array of BPF instructions to execute if the case matches.

  • insns_len – Number of instructions in insns .

Returns:

0 on success, or negative errno value on failure.

int bf_swich_set_default(struct bf_swich *swich, const struct bpf_insn *insns, size_t insns_len)

Set the switch’s default actions if no case matches.

Parameters:
  • swichbf_swich object to set the default action for. Can’t be NULL.

  • insns – Array of BPF instructions to execute.

  • insns_len – Number of instructions in insns .

Returns:

0 on success, or negative errno value on failure.

int bf_swich_generate(struct bf_swich *swich)

Generate the bytecode for the switch.

The BPF program doesn’t contain any of the instructions of the bf_swich until this function is called.

Parameters:
  • swichbf_swich object to generate the bytecode for. Can’t be NULL.

Returns:

0 on success, or negative errno value on failure.

struct bf_swich
#include <bpfilter/cgen/swich.h>

Context used to define a switch-case structure in BPF bytecode.

Public Members

struct bf_program *program

Program to generate the switch-case in.

int reg

Register to compare to the various cases of the switch.

bf_list options

List of options (cases) for the switch.

struct bf_swich_option *default_opt

Default option, if no case matches the switch’s register.

Error handling

bf_jmpctx is a helper structure to manage jump instructions in the program. A bf_jmpctx will insert a new jump instruction in the BPF program and update its jump offset when the bf_jmpctx is deleted.

Example:

// Within a function body
{
    _cleanup_bf_jmpctx_ struct bf_jmpctx ctx =
        bf_jmpctx_get(program, BPF_JMP_IMM(BPF_JEQ, BPF_REG_2, 0, 0));

    EMIT(program,
        BPF_MOV64_IMM(BPF_REG_0, program->runtime.ops->get_verdict(
            BF_VERDICT_ACCEPT)));
    EMIT(program, BPF_EXIT_INSN());
}

ctx is a variable local to the scope, marked with _cleanup_bf_jmpctx_ . The second argument to bf_jmpctx_get is the jump instruction to emit, with the correct condition. When the scope is exited, the jump instruction is automatically updated to point to the first instruction outside of the scope.

Hence, all the instructions emitted within the scope will be executed if the condition is not met. If the condition is met, then the program execution will skip the instructions defined in the scope and continue.

Defines

_cleanup_bf_jmpctx_

Cleanup attribute for a bf_jmpctx variable.

bf_jmpctx_get(program, insn)

Create a new bf_jmpctx variable.

Parameters:
  • program – The program to emit the jump instruction to. It must be non-NULL.

  • insn – The jump instruction to emit.

Returns:

A new bf_jmpctx variable.

Functions

void bf_jmpctx_cleanup(struct bf_jmpctx *ctx)

Cleanup function for bf_jmpctx.

Parameters:
  • ctx – The bf_jmpctx variable to clean up.

struct bf_jmpctx
#include <bpfilter/cgen/jmp.h>

Public Members

struct bf_program *program

A helper structure to manage jump instructions in the program.

The program to emit the jump instruction to.

size_t insn_idx

The index of the jump instruction in the program’s image.

Printing debug messages

bpfilter defines a way for generated BPF programs to print log messages through bpf_trace_printk . This requires:

  • A set of bf_printer_* primitives to manipulate the printer context during the bytecode generation.

  • A EMIT_PRINT macro to insert BPF instructions to print a given string.

  • A BPF map, created by bpfilter before the BPF programs are attached to the kernel.

The printer context bf_printer stores all the log messages to be printed by the generated BPF programs. Log messages are deduplicated to limit memory usage.

During the BPF programs generation, EMIT_PRINT is used to print a given log message from a BPF program. Under the hood, this macro will insert the log message into the global printer context, so it can be used by the BPF programs at runtime.

Before the BPF programs are attached to their hook in the kernel, bpfilter will create a BPF map to contain a unique string, which is the concatenation of all the log messages defined during the generation step. The various BPF programs will be updated to request their log messages from this map directly.

Note

All the message strings are stored in a single BPF map entry in order to benefit from BPF_PSEUDO_MAP_VALUE which allows lookup free direct value access for maps. Hence, using a unique instruction, bpfilter can load the map’s file descriptor and get the address of a message in the buffer. See https://lore.kernel.org/bpf/20190409210910.32048-2-daniel@iogearbox.net.

Defines

_cleanup_bf_printer_
EMIT_PRINT(program, msg)

Emit BPF instructions to print a log message.

This function will insert mulitple instruction into the BPF program to load a given log message from a BPF map into a register, store its size, and call bpf_trace_printk() to print the message.

Warning

As every EMIT_* macro, EMIT_PRINT() will call return if an error occurs. Hence, it must be used within a function that returns an integer.

Parameters:
  • program – Program to emit the instructions to. Must not be NULL.

  • msg – Log message to print.

Functions

int bf_printer_new(struct bf_printer **printer)

Allocate and initialise a new printer context.

Parameters:
  • printer – On success, contains a valid printer context.

Returns:

0 on success, or negative errno value on failure.

int bf_printer_new_from_marsh(struct bf_printer **printer, const struct bf_marsh *marsh)

Allocate a new printer context and intialise it from serialised data.

Parameters:
  • printer – On success, points to the newly allocated and initialised printer context. Can’t be NULL.

  • marsh – Serialised data to use to initialise the printer message.

Returns:

0 on success, or negative errno value on error.

void bf_printer_free(struct bf_printer **printer)

Deinitialise and deallocate a printer context.

Parameters:
  • printer – Printer context. Can’t be NULL.

int bf_printer_marsh(const struct bf_printer *printer, struct bf_marsh **marsh)

Serialise a printer context.

Parameters:
  • printer – Printer context to serialise. Can’t be NULL.

  • marsh – On success, contains the serialised printer context. Can’t be NULL.

Returns:

0 on success, or negative errno value on failure.

void bf_printer_dump(const struct bf_printer *printer, prefix_t *prefix)

Dump the content of the printer structure.

Parameters:
  • printer – Printer object to dump. Can’t be NULL.

  • prefix – Prefix to use for the dump. Can be NULL.

size_t bf_printer_msg_offset(const struct bf_printer_msg *msg)

Return the offset of a specific printer message.

Parameters:
  • msg – Printer message. Can’t be NULL.

Returns:

Offset of msg in the concatenated messages buffer.

size_t bf_printer_msg_len(const struct bf_printer_msg *msg)

Return the length of a specific printer message.

Parameters:
  • msg – Printer message. Can’t be NULL.

Returns:

Length of msg, including the trailing nul termination character.

const struct bf_printer_msg *bf_printer_add_msg(struct bf_printer *printer, const char *str)

Add a new message to the printer.

Parameters:
  • printer – Printer context. Can’t be NULL.

  • str – Message to add to the context. A copy of the buffer is made.

Returns:

The printer message if it was successfuly added to the context, NULL otherwise.

int bf_printer_assemble(const struct bf_printer *printer, void **str, size_t *str_len)

Assemble the messages defined inside the printer into a single nul-separated string.

Parameters:
  • printer – Printer containing the messages to assemble. Can’t be NULL.

  • str – On success, contains the pointer to the result string. Can’t be NULL.

  • str_len – On success, contains the length of the result string, including the nul termination character. Can’t be NULL.

Returns:

0 on success, or negative errno value on failure.