Locking¶
Lock object(s) to prevent concurrent access to the ruleset and/or a chain.
Bundles the open file descriptors for $BPFFS, $BPFFS/bpfilter and an optional chain directory under it, and locks them if requested.
Usage¶
A bf_lock should be defined with bf_lock_default() to ensure it contains valid defaults, which prevents API misuse and ensures bf_lock_cleanup() can be called safely.
The bf_lock can then be initialized using bf_lock_init() to only open and lock the pin directory ($BPFFS/bpfilter). If BF_LOCK_NONE is used, the pin directory is not locked (useful if the caller already holds a compatible lock on it).
Alternatively, you can open (and lock) both the pin directory and a chain directory using bf_lock_init_for_chain(). The caller chooses the lock mode for both the pin directory (pindir_mode) and the chain directory (chain_mode) independently. If the chain directory doesn’t exist yet, create=true will create it atomically via stage-and-rename (see “Invariants” below).
bf_lock is cleaned up using bf_lock_cleanup(): it is safe to call on a properly defined lock (using bf_lock_default()), an initialized lock, or an already cleaned-up lock (i.e. bf_lock_cleanup() has already been called). Use the _clean_bf_lock_ variable attribute for automatic cleanup.
Locking matrix¶
Operations on the pin directory and the chain directories should follow this locking policy:
Operation |
Object |
Pindir lock |
Chain dir lock |
|---|---|---|---|
Read |
Ruleset |
|
|
Read |
Chain |
|
|
Write |
Ruleset |
|
|
Write |
Chain |
|
|
Rationale:
Pindir
WRITEis required for any operation that mutates the pin directory namespace (creating, removing, or renaming an entry). This excludes all other pindir lockers (readers and content mutators).Pindir
READis sufficient for operations that only resolve existing names (pure readers and content mutators). It is compatible with other readers/content mutators on different chains but mutually exclusive with pindirWRITE.Chain
WRITEis required for any mutation of a chain directory’s contents. ChainREADis compatible with other chain readers.
Pindir WRITE is mutually exclusive with every other pindir lock, so it should be reserved for namespace-level mutations. For single-chain content mutations, take pindir READ + chain WRITE; for single-chain reads, take pindir READ + chain READ.
Ruleset-level operations iterate over every chain and must take a chain lock around each per-chain step (using bf_lock_acquire_chain() / bf_lock_release_chain()).
Invariants¶
The locking scheme relies on three invariants enforced by the bf_lock module:
I1: pindir is immortal. The pin directory
$BPFFS/bpfilter/is created lazily bybf_lock_init()and never removed by the library. This prevents a “remove under reader” race where one process wouldunlinkatthe pindir between another process’sopenatandflock, leaving the second process holding a lock on an orphaned inode.I2: chain dirs are only removed under . A chain directory can only be removed while its owner holds
BF_LOCK_WRITEon the chain itself andBF_LOCK_WRITEon the pin directory.bf_lock_release_chain()only attempts removal when the released chain lock isBF_LOCK_WRITE; callers must also hold the pindirWRITElock for the removal to be race-free against concurrent readers (this is ensured by the locking matrix above).I3: chain dirs are created atomically. Creation goes through a “stage and rename” protocol: a uniquely-named staging directory is created and locked first, then atomically published to its final name via
renameat2(RENAME_NOREPLACE). Two concurrent creators therefore cannot step on each other: the loser’srenameat2()returnsEEXISTand it rolls back its own staging directory. The winner’s chain directory is never touched by the loser’s cleanup.
Ruleset-level operations iterate over every chain and must take a chain lock around each per-chain step (using bf_lock_acquire_chain / bf_lock_release_chain): the pindir lock alone does not protect against a chain-level writer that holds only the chain’s own WRITE lock.
API¶
-
struct bf_lock¶
- #include </__w/bpfilter/bpfilter/src/libbpfilter/core/lock.h>
Warning
The
bf_lockstructure should only be modified by the locking API, not directly, though callers can read any field safely (e.g. file descriptors).Public Members
-
int bpffs_fd¶
File descriptor of the bpffs directory, -1 if unset.
-
int pindir_fd¶
File descriptor of the pin directory (
$BPFFS/bpfilter), -1 if unset.
-
enum bf_lock_mode pindir_lock¶
Lock mode held on
pindir_fd;BF_LOCK_NONEif unlocked.
-
int chain_fd¶
File descriptor of the chain directory, -1 if unset.
-
char *chain_name¶
Name of the open chain, only valid when
chain_fdis set.
-
enum bf_lock_mode chain_lock¶
Lock mode held on
chain_fd;BF_LOCK_NONEif unlocked or unset.
-
int bpffs_fd¶
Enums
-
enum bf_lock_mode¶
File lock mode used by
bf_lock.Values:
-
enumerator BF_LOCK_NONE¶
BF_LOCK_NONEskips locking entirely (caller already holds a sufficient lock, e.g. an exclusive lock on a parent directory).
-
enumerator BF_LOCK_READ¶
BF_LOCK_READrequests a shared lock (multiple readers allowed).
-
enumerator BF_LOCK_WRITE¶
BF_LOCK_WRITErequests an exclusive lock (single writer, no readers).
-
enumerator _BF_LOCK_MAX¶
-
enumerator BF_LOCK_NONE¶
Defines
-
bf_lock_default()¶
Assign sane defaults to a
bf_lockobject.This macro should always be used for a
bf_lockobject with the_clean_bf_lock_attribute.- Returns:
A
bf_lockobject with valid defaults.
-
BF_LOCK_STAGING_PREFIX¶
Prefix for staging names used by I3. Callers that walk the pindir (e.g.
bf_ctx_get_cgens) must skip entries with this prefix.The prefix cannot start with a
.because bpffs rejectsmkdirfor names starting with a dot. It uses double underscore + “bf_staging_” to minimise the chance of colliding with a user-chosen chain name: the lexer that parses chain names accepts[a-zA-Z0-9_]+so a user could technically create a chain with the same prefix, but in practice they won’t.
Functions
-
int bf_lock_init(struct bf_lock *lock, enum bf_lock_mode mode)¶
Open and lock the pin directory.
The pin directory (
$BPFFS/bpfilter) is created if it doesn’t exist. Because of I1 (see file header), it is never removed by the library, so the inode this function opens is stable for the lifetime of the bpffs mount.- Parameters:
lock – Handle to populate. Must be initialised via
bf_lock_default().mode – Lock mode for the pin directory.
- Pre:
The runtime context has been initialized.
lockis not NULL, and contains sane defaults (seebf_lock_default()).
- Post:
On success:
lockholds a valid file descriptor on the bpffs, a valid file descriptor on the pin directory, and anflock(2)of modemodeon the pin directory.lock->pindir_lock == mode.On failure:
lockis unchanged.
- Returns:
0 on success, or a negative errno value on failure.
-
int bf_lock_init_for_chain(struct bf_lock *lock, const char *name, enum bf_lock_mode pindir_mode, enum bf_lock_mode chain_mode, bool create)¶
Open and lock the pin directory and a chain directory.
Convenience wrapper that chains
bf_lock_init()+bf_lock_acquire_chain(). The caller controls the lock mode for the pin directory (pindir_mode) and the chain directory (chain_mode) independently, per the locking matrix.If the chain directory doesn’t exist and
create=true, it is created atomically via stage-and-rename (I3). Creating a chain directory requirespindir_mode == BF_LOCK_WRITEandchain_mode == BF_LOCK_WRITE.Note
If you already own a lock on the pin directory, use
bf_lock_acquire_chain()instead.- Parameters:
lock – Lock object to initialize.
name – Name of the chain to lock.
pindir_mode – Lock mode for the pin directory.
chain_mode – Lock mode for the chain directory.
create – If true, create the chain directory if it doesn’t exist. Requires
pindir_mode == BF_LOCK_WRITEandchain_mode == BF_LOCK_WRITE.
- Pre:
The runtime context has been initialized.
lockis not NULL, and contains sane defaults (seebf_lock_default()).nameis not NULL.create == trueimpliespindir_mode == BF_LOCK_WRITEandchain_mode == BF_LOCK_WRITE.
- Post:
On success:
lockholds file descriptors on the bpffs, the pin directory (locked withpindir_mode), and the chain directory (locked withchain_mode).lock->chain_name == name(owned copy).On failure:
lockis unchanged.
- Returns:
0 on success, or a negative errno value on failure.
-
void bf_lock_cleanup(struct bf_lock *lock)¶
Clean up resources held by a lock.
Releases every lock held by
lock, closes the open file descriptors, and removes the chain directory ifBF_LOCK_WRITEwas held on it (it might now be empty;unlinkat(AT_REMOVEDIR)silently fails if it isn’t).Per invariant I1, this function does not remove the pin directory itself.
This function can be called if
lockhas been assigned sensible defaults (usingbf_lock_default), initialized (usingbf_lock_init*), or cleaned up (usingbf_lock_cleanup); it is idempotent.- Parameters:
lock – Handle to clean up.
- Pre:
lockis not NULL, and is in a valid state.
- Post:
lockis in the default state (all fds are -1, all modes areBF_LOCK_NONE,chain_name == NULL).
-
int bf_lock_acquire_chain(struct bf_lock *lock, const char *name, enum bf_lock_mode mode, bool create)¶
Lock a chain directory on an existing pin directory lock.
lockmust have been successfully initialised bybf_lock_init(i.e.bpffs_fdandpindir_fdare valid) and must not already hold a chain lock. Depending oncreate:create == false: open and lock the existing chain directory. If the chain directory doesn’t exist, returns-ENOENT. Uses the “recheck-after-flock” protocol (P1) to detect and retry against a concurrentunlink + recreateof the name.create == true: create the chain directory atomically. Internally stages the new directory under a unique.staging.*name, acquiresBF_LOCK_WRITEon the staged inode, then publishes it viarenameat2(RENAME_NOREPLACE). Requiresmode == BF_LOCK_WRITEand the caller must holdBF_LOCK_WRITEon the pin directory. If another process created the chain first, returns-EEXIST.
If creating, opening, or locking the directory fails,
lockis left unchanged.- Parameters:
lock – Initialized
bf_lock.name – Name of the chain to acquire.
mode – Lock mode for the chain directory.
create – If true, create the chain directory atomically if it doesn’t exist.
- Pre:
lockis not NULL, has been initialized, and doesn’t hold a chain lock.nameis not NULL.create == trueimpliesmode == BF_LOCK_WRITEandlock->pindir_lock == BF_LOCK_WRITE.
- Post:
On success:
lock->chain_fdis a valid open (and locked withmode) file descriptor to the chain directory.lock->chain_nameis a heap-allocated copy ofname, owned bylock.On failure:
lockis unchanged and remains in a valid state.
- Returns:
0 on success, or a negative errno value on failure.
-
void bf_lock_release_chain(struct bf_lock *lock)¶
Release a chain lock.
Closes the chain file descriptor. If
BF_LOCK_WRITEwas held on the chain, also attempts to remove the (possibly empty) chain pin directory viaunlinkat(AT_REMOVEDIR); the removal silently no-ops if the directory isn’t empty. Callers relying on that removal being race-free against concurrent readers must also hold the pindirBF_LOCK_WRITE(I2).If
lockdoesn’t hold a chain lock, this is a no-op.- Parameters:
lock – Handle to release the chain from.
- Pre:
lockis not NULL, has been initialized.
- Post:
lockno longer holds a chain lock;chain_fd == -1,chain_name == NULL,chain_lock == BF_LOCK_NONE.Other fields (bpffs/pindir fds and locks) are unchanged.