Programs generation

Bytecode generation

bf_program is used to represent a BPF program. It contains the BPF bytecode, as well as the required maps and metadata.

Workflow

The program is composed of different steps:

  1. Initialize the generic context

  2. Preprocess the packet’s headers: gather information about the packet’s size, the protocols available, the input interface…

  3. Execute the filtering rules: execute all the rules defined in the program sequentially. If a rule matches the packet, apply its verdict and return.

  4. Apply the policy if no rule matched: if no rule matched the packet, return the chain’s policy (default action).

Memory layout

The program will use the BPF registers to following way:

  • r0 : return value

  • r1 to r5 (included): general purpose registers

  • r6 : address of the header currently filtered on

  • r7 : L3 protocol ID

  • r8 : L4 protocol ID

  • r9 : unused

  • r10 : frame pointer

This convention is followed throughout the project and must be followed all the time to prevent incompatibilities. Debugging this kind of issues is not fun, so stick to it.

bf_runtime is used to represent the layout of the first stack frame in the program. It is filled during preprocessing and contains data required for packet filtering.

About preprocessing

The packets are preprocessed according to the program type (i.e. BPF flavor). Each flavor needs to perform the following steps during preprocessing:

  • Store the packet size and the input interface index into the runtime context

  • Create a BPF dynamic pointer for the packet

  • Preprocess the L2, L3, and L4 headers

The header’s preprocessing is required to discover the protocols used in the packet: processing L2 will provide us with information about L3, and so on. The logic used to process layer X is responsible for discovering layer X+1: the L2 header preprocessing logic will discover the L3 protocol ID. When processing layer X, if the protocol is not supported, the protocol ID is reset to 0 (so we won’t execute the rules for this layer) and subsequent layers are not processed (because we can’t discover their protocol).

For example, assuming IPv6 and TCP are the only supported protocols:

  • L2 processing: discover the packet’s ethertype (IPv6), and store it into r7 .

  • L3 processing: the protocol ID in r7 is supported (IPv6), so a slice is created, and the L4 protocol ID is read from the IPV6 header into r8 .

  • L4 processing: the protocol ID in r8 is supported (TCP), so a slice is created.

  • The program can now start executing the rules.

However, assuming only IPv6 and UDP are supported:

  • L2 processing: discover the packet’s ethertype (IPv6), and store it into r7 .

  • L3 processing: the protocol ID in r7 is supported (IPv6), so a slice is created, and the L4 protocol ID is read from the IPV6 header into r8 .

  • L4 processing: the protocol ID in r8 is no supported (TCP), r8 is set to 0 and we stop processing this layer.

  • The program can now start executing the rules. No layer 4 rule will be executed as r8 won’t match any protocol ID.

Warning

L3 and L4 protocol IDs must be stored in registers, no on the stack, as older verifier aren’t able to keep track of scalar values located on the stack. This means the verification will fail because the verifier can’t verify branches properly.

Defines

EMIT(program, x)
EMIT_KFUNC_CALL(program, function)
EMIT_FIXUP(program, type, insn)
EMIT_FIXUP_ELFSTUB(program, elfstub_id)
EMIT_FIXUP_JMP_NEXT_RULE(program, insn)
EMIT_LOAD_COUNTERS_FD_FIXUP(program, reg)
EMIT_LOAD_SET_FD_FIXUP(program, reg, index)

Load a specific set’s file descriptor.

Note

Similarly to every EMIT_* macro, it must be called from a function returning an int , if the call fails, the macro will return a negative errno value.

Parameters:
  • program – Program to generate the bytecode for. Can’t be NULL.

  • reg – Register to store the set file descriptor in.

  • index – Index of the set in the program.

_free_bf_program_

Functions

int bf_program_new(struct bf_program **program, const struct bf_chain *chain)
void bf_program_free(struct bf_program **program)
int bf_program_marsh(const struct bf_program *program, struct bf_marsh **marsh)
int bf_program_unmarsh(const struct bf_marsh *marsh, struct bf_program **program, const struct bf_chain *chain, int dir_fd)
void bf_program_dump(const struct bf_program *program, prefix_t *prefix)
int bf_program_grow_img(struct bf_program *program)
int bf_program_emit(struct bf_program *program, struct bpf_insn insn)
int bf_program_emit_kfunc_call(struct bf_program *program, const char *name)
int bf_program_emit_fixup(struct bf_program *program, enum bf_fixup_type type, struct bpf_insn insn, const union bf_fixup_attr *attr)
int bf_program_emit_fixup_elfstub(struct bf_program *program, enum bf_elfstub_id id)
int bf_program_generate(struct bf_program *program)
int bf_program_load(struct bf_program *prog)

Load the BPF program into the kernel.

Prior to loading the BPF program, multiple BPF maps are created to store the counters, the debug strings, and the sets. If the program can’t be loaded, all the maps are destroyed.

Once the loading succeeds, the program and the maps are pinned to the filesystem, unless the daemon is in transient mode. If the BPF objects can’t be pinned, the program is unloaded and the maps destroyed.

Parameters:
  • prog – Program to load into the kernel. Can’t be NULL and must contain instructions.

Returns:

0 on success, or negative errno value on failure.

int bf_program_attach(struct bf_program *prog, struct bf_hookopts **hookopts)

Attach a loaded program to a hook.

The program is attached to a hook using a bf_link object. In persistent mode, the link will be pinned to the filesystem. If the link can’t be pinned, the program will be detached from the hook.

Warning

If the program hasn’t been loaded (using bf_program_load), bf_program_attach will fail.

Parameters:
  • prog – Program to attach. Can’t be NULL.

  • hookopts – Hook-specific options to attach the program to the hook. Can’t be NULL.

Returns:

0 on success, or negative errno value on failure.

int bf_program_pin(struct bf_program *prog, int dir_fd)

Pin the BPF program.

The program and all the BPF objects it uses will be pinned into dir_fd. The BPF link is only pinned if the program is attached to a hook.

Parameters:
  • prog – Program to pin. Can’t be NULL.

  • dir_fd – File descriptor of the directory to pin the program and its BPF objects into.

Returns:

0 on success, or a negative errno value on error.

void bf_program_unpin(struct bf_program *prog, int dir_fd)

Unpin the BPF program.

This function never fails. If the program is not pinned, no file will be removed.

Parameters:
  • prog – Program to unpin. Can’t be NULL.

  • dir_fd – File descriptor of the directory containing the pinned objects.

void bf_program_detach(struct bf_program *prog)

Detach the program from the kernel.

The program is detached but not unloaded.

Parameters:
  • prog – Program to detach. Can’t be NULL.

void bf_program_unload(struct bf_program *prog)

Unload the program.

Parameters:
  • prog – Program to unload. Must not be attached. Can’t be NULL.

int bf_program_get_counter(const struct bf_program *program, uint32_t counter_idx, struct bf_counter *counter)
int bf_program_set_counters(struct bf_program *program, const struct bf_counter *counters)
size_t bf_program_chain_counter_idx(const struct bf_program *program)
size_t bf_program_error_counter_idx(const struct bf_program *program)
struct bf_program
#include <bpfilter/cgen/program.h>

Public Members

char prog_name[16U]
enum bf_flavor flavor
struct bf_printer *printer

Log messages printer.

struct bf_map *cmap

Counters map.

struct bf_map *pmap

Printer map.

bf_list sets

List of set maps.

Link objects attaching the program to a hook.

uint32_t elfstubs_location[_BF_ELFSTUB_MAX]
struct bpf_insn *img
size_t img_size
size_t img_cap
bf_list fixups
int prog_fd

File descriptor of the program.

int pindir_fd

File descriptor of the directory to pin the program into. Unused in transient mode.

const struct bf_flavor_ops *ops

Hook-specific ops to use to generate the program.

const struct bf_chain *chain

Chain the program is generated from. This is a non-owning pointer: the bf_program doesn’t have to manage its lifetime.

struct bf_program runtime

Runtime data used to interact with the program and cache information. This data is not serialized.

ELF stubs

ELF stubs are a mechanism to integrate clang-compiled BPF bytecode into bpfilter-generated BPF programs. Complex logic is more easily implemented in C and integrated into the final program that developed in BPF bytecode directly.

ELF stubs source code is part of bpfilter’s sources, they are compiled using clang, the ELF file is stored in a C array and accessible to the daemon at runtime.

Creating a new ELF stub

  1. Add a new source file for the BPF program in the daemon’s codebase (in the bpf folder, as $NAME.bpf.c).

  2. Declare the ELF stub in the daemon’s CMakeLists.txt (in bf_target_add_elfstubs()).

  3. Add a new ID for this stub in bf_elfstub_id.

  4. Write the BPF C code: define a single function (additional inline functions and macros are allowed).

Technicalities

At build time, ELF stubs are compiled by clang as BPF program. As such, they are bound to the same limitations as any other BPF program. xxd is used to convert the ELF file into a C array (with size), which is included in a generated rawstubs.h header file.

rawstubs.h defines an array of bf_rawstub structures containing:

  • const void *elf: pointer to the ELF data.

  • size_t len: size of the ELF file.

Each ELF stub can be manipulated through an instance of bf_rawstub, all the instances are stored in an array, which is as big as _BF_ELFSTUB_MAX.

However, the ELF stubs are not accessed directly through the bf_rawstub structure, as it’s not usable as-is. Instead, the bf_context will extract the actual bytecode from the ELF file, and relocate the kfunc calls. During generation, bf_ctx_get_elfstub is used to retrieve a pointer to bf_elfstub containing the BPF instructions to be copied in the program.

Limitations

Not any BPF program can be integrated into a bpfilter program, this section lists the current set of limitations:

  • Maps are not supported: BPF maps can be defined, but they are not integrated by bpfilter into the program, so the final BPF program won’t be verifiable.

  • Function are not supported: each ELF stub source code should contain a single function to be integrated. Inline functions are allowed, as they’re not real functions in the final ELF file.

Those limitations might evolve, as new BPF features are developed and the ELF stub implementation is improved.

While map are not supported, bpf_printk() can be used, as bpfilter is able to add the strings to its own map.

Defines

_free_bf_elfstub_

Enums

enum bf_elfstub_id

Identifiers for the ELF stubs.

Each identifier represents a valid ELF stub. If an ELF stub doesn’t have its identifier, it doesn’t exist from bpfilter’s standpoint.

Values:

enumerator BF_ELFSTUB_PARSE_IPV6_EH

Parse IPv6 extension headers.

__u8 bf_parse_ipv6(struct bf_runtime *ctx)

Parameters

  • ctx: address of the bf_runtime context of the program.

Return The L4 protocol on success, or 0 if the program fails creating a dynamic pointer slice.

enumerator BF_ELFSTUB_UPDATE_COUNTERS

Update the counters for a given rule.

__u8 bf_update_counters(struct bf_runtime *ctx, void *map, __u64 key)

Parameters

  • ctx: address of the bf_runtime context of the program.

  • map: address of the counters map.

  • key: key of the map to update.

Return 0 on success, or 1 on error.

enumerator _BF_ELFSTUB_MAX

Functions

int bf_elfstub_new(struct bf_elfstub **stub, enum bf_elfstub_id id)

Allocate and initialize a new ELF stub.

The corresponding raw ELF stub will be read, its text section will be copied into the final ELF, and the calls to kfuncs will be relocated according to the host’s kernel.

Parameters:
  • stub – ELF stub to allocate and initialize. Can’t be NULL.

  • id – Identifier of the raw ELF stub to initialize.

Returns:

0 on success, or negative errno value on failure.

void bf_elfstub_free(struct bf_elfstub **stub)

Deinitialise and deallocate an ELF stub.

Parameters:
  • stub – ELF stub. Can’t be NULL.

struct bf_printk_str
#include <bpfilter/cgen/elfstub.h>

Public Members

size_t insn_idx
const char *str
struct bf_elfstub
#include <bpfilter/cgen/elfstub.h>

Processed ELF stub to be integrated into a BPF program.

Public Members

enum bf_elfstub_id id
struct bpf_insn *insns
size_t ninsns
bf_list strs

Switch-cases

bf_swich is used to generate a switch-case logic in BPF bytecode, the logic is the following:

  • Create a new bf_swich object and initialize it. Use bf_swich_get to simplify this step. A bf_swich object contains a pointer to the generated program, and the register to perform the switch comparison against.

  • Call EMIT_SWICH_OPTION to define the various cases for the switch, and the associated BPF bytecode to run.

  • Call EMIT_SWICH_DEFAULT to define the default case of the switch, this is optional.

  • Call bf_swich_generate to generate the BPF bytecode for the switch.

Once bf_swich_generate has been called, this is what the switch structure will look like in BPF bytecode:

if case 1 matches REG, jump to case 1 code
if case 2 matches REG, jump to case 2 code
else jump to default code
case 1 code
    jump after the switch
case 2 code
    jump after the switch
default code

Note

I am fully aware it’s supposed to be spelled switch and not swich , but both switch and case are reserved keywords in C, so I had to come up with a solution to avoid clashes, and swich could be pronounced similarly to switch , at least to my non-native speak ear.

Defines

_clean_bf_swich_

Cleanup attribute for a bf_swich variable.

bf_swich_get(program, reg)

Create, initialize, and return a new bf_swich object.

Parameters:
  • programbf_program object to create the switch in.

  • reg – Register to use to compare the cases values to.

Returns:

A new bf_swich object.

EMIT_SWICH_OPTION(swich, imm, ...)

Add a case to the bf_swich

Parameters:
  • swich – Pointer to a valid bf_swich .

  • imm – Immediate value to compare against the switch’s register.

  • ... – BPF instructions to execute if the case matches.

EMIT_SWICH_DEFAULT(swich, ...)

Set the default instruction if no cases of the switch matches the register.

Defining a default option to a bf_swich is optional. If this macro is called twice, the existing default options will be replaced by the new ones.

Parameters:
  • swich – Pointer to a valid bf_swich .

  • ... – BPF instructions to execute if no case matches.

Functions

int bf_swich_init(struct bf_swich *swich, struct bf_program *program, int reg)

Initialise a bf_swich object.

Parameters:
  • swichbf_swich object to initialize, can’t be NULL.

  • programbf_program object to generate the switch-case for. Can’t be NULL.

  • reg – Register to compare to the cases of the switch.

Returns:

0 on success, or negative errno value on failure.

void bf_swich_cleanup(struct bf_swich *swich)

Cleanup a bf_swich object.

Once this function returns, the swich object can be reused by calling bf_swich_init .

Parameters:
  • swich – The bf_swich object to clean up.

int bf_swich_add_option(struct bf_swich *swich, uint32_t imm, const struct bpf_insn *insns, size_t insns_len)

Add an option (case) to the switch object.

Parameters:
  • swichbf_swich object to add the option to. Can’t be NULL.

  • imm – Immediate value to compare the switch’s register to. If the values are equal, the option’s instructions are executed.

  • insns – Array of BPF instructions to execute if the case matches.

  • insns_len – Number of instructions in insns .

Returns:

0 on success, or negative errno value on failure.

int bf_swich_set_default(struct bf_swich *swich, const struct bpf_insn *insns, size_t insns_len)

Set the switch’s default actions if no case matches.

Parameters:
  • swichbf_swich object to set the default action for. Can’t be NULL.

  • insns – Array of BPF instructions to execute.

  • insns_len – Number of instructions in insns .

Returns:

0 on success, or negative errno value on failure.

int bf_swich_generate(struct bf_swich *swich)

Generate the bytecode for the switch.

The BPF program doesn’t contain any of the instructions of the bf_swich until this function is called.

Parameters:
  • swichbf_swich object to generate the bytecode for. Can’t be NULL.

Returns:

0 on success, or negative errno value on failure.

struct bf_swich
#include <bpfilter/cgen/swich.h>

Context used to define a switch-case structure in BPF bytecode.

Public Members

struct bf_program *program

Program to generate the switch-case in.

int reg

Register to compare to the various cases of the switch.

bf_list options

List of options (cases) for the switch.

struct bf_swich_option *default_opt

Default option, if no case matches the switch’s register.

Error handling

bf_jmpctx is a helper structure to manage jump instructions in the program. A bf_jmpctx will insert a new jump instruction in the BPF program and update its jump offset when the bf_jmpctx is deleted.

Example:

// Within a function body
{
    _clean_bf_jmpctx_ struct bf_jmpctx ctx =
        bf_jmpctx_get(program, BPF_JMP_IMM(BPF_JEQ, BPF_REG_2, 0, 0));

    EMIT(program,
        BPF_MOV64_IMM(BPF_REG_0, program->runtime.ops->get_verdict(
            BF_VERDICT_ACCEPT)));
    EMIT(program, BPF_EXIT_INSN());
}

ctx is a variable local to the scope, marked with _clean_bf_jmpctx_ . The second argument to bf_jmpctx_get is the jump instruction to emit, with the correct condition. When the scope is exited, the jump instruction is automatically updated to point to the first instruction outside of the scope.

Hence, all the instructions emitted within the scope will be executed if the condition is not met. If the condition is met, then the program execution will skip the instructions defined in the scope and continue.

Defines

_clean_bf_jmpctx_

Cleanup attribute for a bf_jmpctx variable.

bf_jmpctx_default()
bf_jmpctx_get(program, insn)

Create a new bf_jmpctx variable.

Parameters:
  • program – The program to emit the jump instruction to. It must be non-NULL.

  • insn – The jump instruction to emit.

Returns:

A new bf_jmpctx variable.

Functions

void bf_jmpctx_cleanup(struct bf_jmpctx *ctx)

Cleanup function for bf_jmpctx.

Parameters:
  • ctx – The bf_jmpctx variable to clean up.

struct bf_jmpctx
#include <bpfilter/cgen/jmp.h>

Public Members

struct bf_program *program

A helper structure to manage jump instructions in the program.

The program to emit the jump instruction to.

size_t insn_idx

The index of the jump instruction in the program’s image.

Printing debug messages

bpfilter defines a way for generated BPF programs to print log messages through bpf_trace_printk . This requires:

  • A set of bf_printer_* primitives to manipulate the printer context during the bytecode generation.

  • A EMIT_PRINT macro to insert BPF instructions to print a given string.

  • A BPF map, created by bpfilter before the BPF programs are attached to the kernel.

The printer context bf_printer stores all the log messages to be printed by the generated BPF programs. Log messages are deduplicated to limit memory usage.

During the BPF programs generation, EMIT_PRINT is used to print a given log message from a BPF program. Under the hood, this macro will insert the log message into the global printer context, so it can be used by the BPF programs at runtime.

Before the BPF programs are attached to their hook in the kernel, bpfilter will create a BPF map to contain a unique string, which is the concatenation of all the log messages defined during the generation step. The various BPF programs will be updated to request their log messages from this map directly.

Note

All the message strings are stored in a single BPF map entry in order to benefit from BPF_PSEUDO_MAP_VALUE which allows lookup free direct value access for maps. Hence, using a unique instruction, bpfilter can load the map’s file descriptor and get the address of a message in the buffer. See https://lore.kernel.org/bpf/20190409210910.32048-2-daniel@iogearbox.net.

Defines

_free_bf_printer_
EMIT_PRINT(program, msg)

Emit BPF instructions to print a log message.

This function will insert mulitple instruction into the BPF program to load a given log message from a BPF map into a register, store its size, and call bpf_trace_printk() to print the message.

Warning

As every EMIT_* macro, EMIT_PRINT() will call return if an error occurs. Hence, it must be used within a function that returns an integer.

Parameters:
  • program – Program to emit the instructions to. Must not be NULL.

  • msg – Log message to print.

Functions

int bf_printer_new(struct bf_printer **printer)

Allocate and initialise a new printer context.

Parameters:
  • printer – On success, contains a valid printer context.

Returns:

0 on success, or negative errno value on failure.

int bf_printer_new_from_marsh(struct bf_printer **printer, const struct bf_marsh *marsh)

Allocate a new printer context and intialise it from serialised data.

Parameters:
  • printer – On success, points to the newly allocated and initialised printer context. Can’t be NULL.

  • marsh – Serialised data to use to initialise the printer message.

Returns:

0 on success, or negative errno value on error.

void bf_printer_free(struct bf_printer **printer)

Deinitialise and deallocate a printer context.

Parameters:
  • printer – Printer context. Can’t be NULL.

int bf_printer_marsh(const struct bf_printer *printer, struct bf_marsh **marsh)

Serialise a printer context.

Parameters:
  • printer – Printer context to serialise. Can’t be NULL.

  • marsh – On success, contains the serialised printer context. Can’t be NULL.

Returns:

0 on success, or negative errno value on failure.

void bf_printer_dump(const struct bf_printer *printer, prefix_t *prefix)

Dump the content of the printer structure.

Parameters:
  • printer – Printer object to dump. Can’t be NULL.

  • prefix – Prefix to use for the dump. Can be NULL.

size_t bf_printer_msg_offset(const struct bf_printer_msg *msg)

Return the offset of a specific printer message.

Parameters:
  • msg – Printer message. Can’t be NULL.

Returns:

Offset of msg in the concatenated messages buffer.

size_t bf_printer_msg_len(const struct bf_printer_msg *msg)

Return the length of a specific printer message.

Parameters:
  • msg – Printer message. Can’t be NULL.

Returns:

Length of msg, including the trailing nul termination character.

const struct bf_printer_msg *bf_printer_add_msg(struct bf_printer *printer, const char *str)

Add a new message to the printer.

Parameters:
  • printer – Printer context. Can’t be NULL.

  • str – Message to add to the context. A copy of the buffer is made.

Returns:

The printer message if it was successfuly added to the context, NULL otherwise.

int bf_printer_assemble(const struct bf_printer *printer, void **str, size_t *str_len)

Assemble the messages defined inside the printer into a single nul-separated string.

Parameters:
  • printer – Printer containing the messages to assemble. Can’t be NULL.

  • str – On success, contains the pointer to the result string. Can’t be NULL.

  • str_len – On success, contains the length of the result string, including the nul termination character. Can’t be NULL.

Returns:

0 on success, or negative errno value on failure.