Pack files¶

Without pack files, each repository chunk is stored as a separate borgstore object. For large repositories this means millions of individual objects, each requiring its own I/O round trip to read or write. On high-latency backends (SFTP, cloud object storage) this overhead dominates backup and restore times.

Pack files address this by grouping multiple chunks into a single store object. A reader that needs one chunk does a partial read (range request) at a known offset instead of fetching a separate file. Store object count drops from one-per-chunk to one-per-pack.

Pack File Format¶

There is no separate file header. Each blob starts with the 8-byte OBJ_MAGIC (BORG_OBJ), so a forward scanner can locate blob boundaries and identify each chunk using only the pack file bytes with no external index.

Per-blob layout¶

Each blob is a self-contained unit:

Offset (relative to blob start)  Size              Type     Field
--------------------------------  ----------------  -------  -----
                               len(OBJ_MAGIC)    bytes    OBJ_MAGIC = ASCII b"BORG_OBJ"
                               1                 uint8    Format version: 0x01
                               32                bytes    chunk_id
                              4                 uint32le meta_size
                              4                 uint32le data_size
                              meta_size         bytes    encrypted_meta
+ meta_size                    data_size         bytes    encrypted_data

chunk_id is the ID hash of the plaintext data (id_hash(plaintext_data)). Storing it in the unencrypted header lets a scanner rebuild the chunk_id → location index without decrypting any blob.

chunk_id is also written into encrypted_meta (the meta dict). The header copy enables key-free scanning and recovery; the meta copy lets future code read chunk_id through the normal meta dict API without parsing the raw header layout.

The fixed part of each blob header is 49 bytes (REPOOBJ_HEADER_SIZE): len(OBJ_MAGIC) + 1 version + 32 chunk_id + 4 meta_size + 4 data_size. REPOOBJ_HEADER_SIZE = len(OBJ_MAGIC) + 1 + 32 + 4 + 4 = 49

A reader locates the next blob by advancing:

next_blob_offset = current_blob_offset + REPOOBJ_HEADER_SIZE + meta_size + data_size

The per-blob magic limits the blast radius of corrupted length fields: if meta_size or data_size is damaged, the scanner loses at most one blob. Once it finds the next OBJ_MAGIC sequence it resumes. Other corruption (payload bit flips) is caught by AEAD on that blob without losing position.

Blobs follow one another contiguously with no padding:

OBJ_MAGIC | version=0x01 | chunk_id_0 | meta_size_0 | data_size_0 | encrypted_meta_0 | encrypted_data_0
OBJ_MAGIC | version=0x01 | chunk_id_1 | meta_size_1 | data_size_1 | encrypted_meta_1 | encrypted_data_1
...

Pack ID¶

The pack ID is the SHA-256 of the pack file’s bytes:

pack_id = sha256(pack_bytes)

Content-addressing the file by its own bytes makes the name commit to the content, so borgstore can verify and cache it and borg check can detect silent corruption of the stored file.

Namespace¶

Pack files are stored under the packs/ namespace in borgstore, using a single directory level keyed on the first byte of the pack ID (hex-encoded):

packs/
  00/ .. ff/
    <pack_id_hex>

Pack Index Entry¶

Each pack contains one blob. The pack for a given chunk is always at:

packs/<hex(pack_id)>

A ChunkIndex entry maps a chunk to its pack:

chunk_id  →  pack_id

Since each pack holds exactly one blob, the blob is always at offset 0 and its length is the full file size. No offset or length field is stored in the index for this phase.

Write Order and Crash Safety¶

Pack data must be stored before any archive pointer references it. The required write order is:

Store the pack file to packs/<pack_id> via borgstore.
Store the partial index file to index/<index_id> (see Index Namespace).
Write the archive and archive pointer. This is the sole commit point.

A crash between steps 1 and 2 leaves orphan pack files in packs/. No archive references these chunks; borg compact removes them on the next run.

A crash between steps 2 and 3 leaves a partial index file covering packs not yet committed to any archive. The extra index entries point to valid, fully-written pack data; they are harmless and will be cleaned up by the next borg compact.

A crash after step 3 cannot leave the repository in an inconsistent state. The archive pointer write is the commit point: data not referenced by any archive pointer is unreachable and treated as garbage by borg compact.

Only borg compact and borg check --repair delete pack files. When compact determines via mark-and-sweep that none of a pack’s blobs are referenced by any archive, it removes the whole file. Individual blobs cannot be removed without rewriting the entire pack, so deletion always operates at pack granularity.

Index Namespace¶

Chunk-to-location mappings are stored as a separate set of encrypted partial index files under the index/ namespace.

Each partial index file covers the packs written in one backup session. Its name is the SHA-256 digest of its own content. A first backup of a large dataset may produce a large partial index file; using the same medium-sized file writer as compact for borg create would bound that. That is the intended direction.

index/
  <sha256_of_content_hex>

Content-addressed naming makes each partial index file self-verifying and idempotent: writing the same index data twice produces the same filename, so a repeated write is a no-op.

Partial index files are write-once. A session stores new partial index files via borgstore; existing files are never modified. On repository open all files under index/ are loaded via borgstore, decrypted, and merged into the in-memory ChunkIndex (a borghash HashTableNT keyed on chunk_id). The merge is commutative and idempotent; order does not matter.

borg compact rewrites the index/ namespace: it identifies live chunks via mark-and-sweep, consolidates the surviving mappings into medium-sized replacement files (targeting roughly 10–100 packs per file), and removes the files it supersedes. Medium-sized files keep the open-time merge cost bounded while avoiding the cache-invalidation traffic on other clients that a single all-in-one index would cause.

If the entire index/ namespace is lost or corrupt, the ChunkIndex can be rebuilt by scanning pack files directly; see Recovery Path.

Recovery Path¶

When borg check --repair detects a missing or incomplete ChunkIndex it rebuilds it by forward-scanning all pack files in packs/.

Each blob’s unencrypted header supplies the OBJ_MAGIC (for re-sync after corruption), the chunk_id, and the size fields needed to locate the next blob. The scan produces a complete chunk_id → (pack_id, offset, length) mapping without decrypting any blob and without the repository key.

Repository Version and Feature Flags¶

Repositories using pack files require repository version 4. Clients that only accept version 3 refuse to open a version 4 repository with an unsupported-version error before any data is read.

In addition, the repository config.feature_flags must include pack_files in the mandatory set for all access modes:

config = {
    "feature_flags": {
        "read":  {"mandatory": ["pack_files"]},
        "write": {"mandatory": ["pack_files"]},
        "check": {"mandatory": ["pack_files"]},
    }
}

A client that does not recognise the pack_files feature flag will refuse to open the repository with a MandatoryFeatureUnsupported error regardless of the version number. The two guards cover different failure modes: the version bump stops clients that predate feature-flag support entirely; the feature flag gives a clearer error message to clients that understand feature flags but don’t know about packs yet.

There is no migration path from version 3 repositories to version 4. Users of the version 3 beta format must create a new repository with borg repo-create.