1b9e818b5SShawn Pearce# reftable 2b9e818b5SShawn Pearce 3b9e818b5SShawn Pearce[TOC] 4b9e818b5SShawn Pearce 5b9e818b5SShawn Pearce## Overview 6b9e818b5SShawn Pearce 7b9e818b5SShawn Pearce### Problem statement 8b9e818b5SShawn Pearce 9b9e818b5SShawn PearceSome repositories contain a lot of references (e.g. android at 866k, 10b9e818b5SShawn Pearcerails at 31k). The existing packed-refs format takes up a lot of 11b9e818b5SShawn Pearcespace (e.g. 62M), and does not scale with additional references. 12b9e818b5SShawn PearceLookup of a single reference requires linearly scanning the file. 13b9e818b5SShawn Pearce 14b9e818b5SShawn PearceAtomic pushes modifying multiple references require copying the 15b9e818b5SShawn Pearceentire packed-refs file, which can be a considerable amount of data 16b9e818b5SShawn Pearcemoved (e.g. 62M in, 62M out) for even small transactions (2 refs 17b9e818b5SShawn Pearcemodified). 18b9e818b5SShawn Pearce 19b9e818b5SShawn PearceRepositories with many loose references occupy a large number of disk 20b9e818b5SShawn Pearceblocks from the local file system, as each reference is its own file 21b9e818b5SShawn Pearcestoring 41 bytes (and another file for the corresponding reflog). 22b9e818b5SShawn PearceThis negatively affects the number of inodes available when a large 23b9e818b5SShawn Pearcenumber of repositories are stored on the same filesystem. Readers can 24b9e818b5SShawn Pearcebe penalized due to the larger number of syscalls required to traverse 25b9e818b5SShawn Pearceand read the `$GIT_DIR/refs` directory. 26b9e818b5SShawn Pearce 27b9e818b5SShawn Pearce### Objectives 28b9e818b5SShawn Pearce 29b9e818b5SShawn Pearce- Near constant time lookup for any single reference, even when the 30b9e818b5SShawn Pearce repository is cold and not in process or kernel cache. 31b9e818b5SShawn Pearce- Near constant time verification if a SHA-1 is referred to by at 32b9e818b5SShawn Pearce least one reference (for allow-tip-sha1-in-want). 33b9e818b5SShawn Pearce- Efficient lookup of an entire namespace, such as `refs/tags/`. 34b9e818b5SShawn Pearce- Support atomic push with `O(size_of_update)` operations. 35b9e818b5SShawn Pearce- Combine reflog storage with ref storage for small transactions. 36b9e818b5SShawn Pearce- Separate reflog storage for base refs and historical logs. 37b9e818b5SShawn Pearce 38b9e818b5SShawn Pearce### Description 39b9e818b5SShawn Pearce 40b9e818b5SShawn PearceA reftable file is a portable binary file format customized for 41b9e818b5SShawn Pearcereference storage. References are sorted, enabling linear scans, 42b9e818b5SShawn Pearcebinary search lookup, and range scans. 43b9e818b5SShawn Pearce 44b9e818b5SShawn PearceStorage in the file is organized into variable sized blocks. Prefix 45b9e818b5SShawn Pearcecompression is used within a single block to reduce disk space. Block 46b9e818b5SShawn Pearcesize and alignment is tunable by the writer. 47b9e818b5SShawn Pearce 48b9e818b5SShawn Pearce### Performance 49b9e818b5SShawn Pearce 50b9e818b5SShawn PearceSpace used, packed-refs vs. reftable: 51b9e818b5SShawn Pearce 52b9e818b5SShawn Pearcerepository | packed-refs | reftable | % original | avg ref | avg obj 53b9e818b5SShawn Pearce-----------|------------:|---------:|-----------:|---------:|--------: 54b9e818b5SShawn Pearceandroid | 62.2 M | 36.1 M | 58.0% | 33 bytes | 5 bytes 55b9e818b5SShawn Pearcerails | 1.8 M | 1.1 M | 57.7% | 29 bytes | 4 bytes 56b9e818b5SShawn Pearcegit | 78.7 K | 48.1 K | 61.0% | 50 bytes | 4 bytes 57b9e818b5SShawn Pearcegit (heads)| 332 b | 269 b | 81.0% | 33 bytes | 0 bytes 58b9e818b5SShawn Pearce 59b9e818b5SShawn PearceScan (read 866k refs), by reference name lookup (single ref from 866k 60b9e818b5SShawn Pearcerefs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs): 61b9e818b5SShawn Pearce 62b9e818b5SShawn Pearceformat | cache | scan | by name | by SHA-1 63b9e818b5SShawn Pearce------------|------:|--------:|---------------:|---------------: 64b9e818b5SShawn Pearcepacked-refs | cold | 402 ms | 409,660.1 usec | 412,535.8 usec 65b9e818b5SShawn Pearcepacked-refs | hot | | 6,844.6 usec | 20,110.1 usec 66b9e818b5SShawn Pearcereftable | cold | 112 ms | 33.9 usec | 323.2 usec 67b9e818b5SShawn Pearcereftable | hot | | 20.2 usec | 320.8 usec 68b9e818b5SShawn Pearce 69b9e818b5SShawn PearceSpace used for 149,932 log entries for 43,061 refs, 70b9e818b5SShawn Pearcereflog vs. reftable: 71b9e818b5SShawn Pearce 72b9e818b5SShawn Pearceformat | size | avg entry 73b9e818b5SShawn Pearce--------------|------:|-----------: 74b9e818b5SShawn Pearce$GIT_DIR/logs | 173 M | 1209 bytes 75b9e818b5SShawn Pearcereftable | 5 M | 37 bytes 76b9e818b5SShawn Pearce 77b9e818b5SShawn Pearce## Details 78b9e818b5SShawn Pearce 79b9e818b5SShawn Pearce### Peeling 80b9e818b5SShawn Pearce 81b9e818b5SShawn PearceReferences stored in a reftable are peeled, a record for an annotated 82b9e818b5SShawn Pearce(or signed) tag records both the tag object, and the object it refers 83b9e818b5SShawn Pearceto. 84b9e818b5SShawn Pearce 85b9e818b5SShawn Pearce### Reference name encoding 86b9e818b5SShawn Pearce 87b9e818b5SShawn PearceReference names are an uninterpreted sequence of bytes that must pass 88b9e818b5SShawn Pearce[git-check-ref-format][ref-fmt] as a valid reference name. 89b9e818b5SShawn Pearce 90b9e818b5SShawn Pearce[ref-fmt]: https://git-scm.com/docs/git-check-ref-format 91b9e818b5SShawn Pearce 927c75a68bSHan-Wen Nienhuys### Key unicity 937c75a68bSHan-Wen Nienhuys 947c75a68bSHan-Wen NienhuysEach entry must have a unique key; repeated keys are disallowed. 957c75a68bSHan-Wen Nienhuys 96b9e818b5SShawn Pearce### Network byte order 97b9e818b5SShawn Pearce 98b9e818b5SShawn PearceAll multi-byte, fixed width fields are in network byte order. 99b9e818b5SShawn Pearce 100b9e818b5SShawn Pearce### Ordering 101b9e818b5SShawn Pearce 102b9e818b5SShawn PearceBlocks are lexicographically ordered by their first reference. 103b9e818b5SShawn Pearce 104b9e818b5SShawn Pearce### Directory/file conflicts 105b9e818b5SShawn Pearce 106b9e818b5SShawn PearceThe reftable format accepts both `refs/heads/foo` and 107b9e818b5SShawn Pearce`refs/heads/foo/bar` as distinct references. 108b9e818b5SShawn Pearce 109b9e818b5SShawn PearceThis property is useful for retaining log records in reftable, but may 110b9e818b5SShawn Pearceconfuse versions of Git using `$GIT_DIR/refs` directory tree to 111b9e818b5SShawn Pearcemaintain references. Users of reftable may choose to continue to 112b9e818b5SShawn Pearcereject `foo` and `foo/bar` type conflicts to prevent problems for 113b9e818b5SShawn Pearcepeers. 114b9e818b5SShawn Pearce 115b9e818b5SShawn Pearce## File format 116b9e818b5SShawn Pearce 117b9e818b5SShawn Pearce### Structure 118b9e818b5SShawn Pearce 119b9e818b5SShawn PearceA reftable file has the following high-level structure: 120b9e818b5SShawn Pearce 121b9e818b5SShawn Pearce first_block { 122b9e818b5SShawn Pearce header 123b9e818b5SShawn Pearce first_ref_block 124b9e818b5SShawn Pearce } 125b9e818b5SShawn Pearce ref_block* 126b9e818b5SShawn Pearce ref_index* 127b9e818b5SShawn Pearce obj_block* 128b9e818b5SShawn Pearce obj_index* 129b9e818b5SShawn Pearce log_block* 130b9e818b5SShawn Pearce log_index* 131b9e818b5SShawn Pearce footer 132b9e818b5SShawn Pearce 133b9e818b5SShawn PearceA log-only file omits the `ref_block`, `ref_index`, `obj_block` and 134b9e818b5SShawn Pearce`obj_index` sections, containing only the file header and log block: 135b9e818b5SShawn Pearce 136b9e818b5SShawn Pearce first_block { 137b9e818b5SShawn Pearce header 138b9e818b5SShawn Pearce } 139b9e818b5SShawn Pearce log_block* 140b9e818b5SShawn Pearce log_index* 141b9e818b5SShawn Pearce footer 142b9e818b5SShawn Pearce 143b9e818b5SShawn Pearcein a log-only file the first log block immediately follows the file 144b9e818b5SShawn Pearceheader, without padding to block alignment. 145b9e818b5SShawn Pearce 146b9e818b5SShawn Pearce### Block size 147b9e818b5SShawn Pearce 148b9e818b5SShawn PearceThe file's block size is arbitrarily determined by the writer, and 149b9e818b5SShawn Pearcedoes not have to be a power of 2. The block size must be larger than 150b9e818b5SShawn Pearcethe longest reference name or log entry used in the repository, as 151b9e818b5SShawn Pearcereferences cannot span blocks. 152b9e818b5SShawn Pearce 153b9e818b5SShawn PearcePowers of two that are friendly to the virtual memory system or 154b9e818b5SShawn Pearcefilesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can 155b9e818b5SShawn Pearceyield better compression, with a possible increased cost incurred by 156b9e818b5SShawn Pearcereaders during access. 157b9e818b5SShawn Pearce 158b9e818b5SShawn PearceThe largest block size is `16777215` bytes (15.99 MiB). 159b9e818b5SShawn Pearce 160b9e818b5SShawn Pearce### Block alignment 161b9e818b5SShawn Pearce 162b9e818b5SShawn PearceWriters may choose to align blocks at multiples of the block size by 163b9e818b5SShawn Pearceincluding `padding` filled with NUL bytes at the end of a block to 164b9e818b5SShawn Pearceround out to the chosen alignment. When alignment is used, writers 165b9e818b5SShawn Pearcemust specify the alignment with the file header's `block_size` field. 166b9e818b5SShawn Pearce 167b9e818b5SShawn PearceBlock alignment is not required by the file format. Unaligned files 168b9e818b5SShawn Pearcemust set `block_size = 0` in the file header, and omit `padding`. 169b9e818b5SShawn PearceUnaligned files with more than one ref block must include the 170b9e818b5SShawn Pearce[ref index](#Ref-index) to support fast lookup. Readers must be 171b9e818b5SShawn Pearceable to read both aligned and non-aligned files. 172b9e818b5SShawn Pearce 173b9e818b5SShawn PearceVery small files (e.g. 1 only ref block) may omit `padding` and the 174b9e818b5SShawn Pearceref index to reduce total file size. 175b9e818b5SShawn Pearce 176b9e818b5SShawn Pearce### Header 177b9e818b5SShawn Pearce 178b9e818b5SShawn PearceA 24-byte header appears at the beginning of the file: 179b9e818b5SShawn Pearce 180b9e818b5SShawn Pearce 'REFT' 181b9e818b5SShawn Pearce uint8( version_number = 1 ) 182b9e818b5SShawn Pearce uint24( block_size ) 183b9e818b5SShawn Pearce uint64( min_update_index ) 184b9e818b5SShawn Pearce uint64( max_update_index ) 185b9e818b5SShawn Pearce 186b9e818b5SShawn PearceAligned files must specify `block_size` to configure readers with the 187b9e818b5SShawn Pearceexpected block alignment. Unaligned files must set `block_size = 0`. 188b9e818b5SShawn Pearce 189b9e818b5SShawn PearceThe `min_update_index` and `max_update_index` describe bounds for the 190b9e818b5SShawn Pearce`update_index` field of all log records in this file. When reftables 191b9e818b5SShawn Pearceare used in a stack for [transactions](#Update-transactions), these 192b9e818b5SShawn Pearcefields can order the files such that the prior file's 193b9e818b5SShawn Pearce`max_update_index + 1` is the next file's `min_update_index`. 194b9e818b5SShawn Pearce 195b9e818b5SShawn Pearce### First ref block 196b9e818b5SShawn Pearce 197b9e818b5SShawn PearceThe first ref block shares the same block as the file header, and is 198b9e818b5SShawn Pearce24 bytes smaller than all other blocks in the file. The first block 199b9e818b5SShawn Pearceimmediately begins after the file header, at position 24. 200b9e818b5SShawn Pearce 201b9e818b5SShawn PearceIf the first block is a log block (a log-only file), its block header 202b9e818b5SShawn Pearcebegins immediately at position 24. 203b9e818b5SShawn Pearce 204b9e818b5SShawn Pearce### Ref block format 205b9e818b5SShawn Pearce 206b9e818b5SShawn PearceA ref block is written as: 207b9e818b5SShawn Pearce 208b9e818b5SShawn Pearce 'r' 209b9e818b5SShawn Pearce uint24( block_len ) 210b9e818b5SShawn Pearce ref_record+ 211b9e818b5SShawn Pearce uint24( restart_offset )+ 212b9e818b5SShawn Pearce uint16( restart_count ) 213b9e818b5SShawn Pearce 214b9e818b5SShawn Pearce padding? 215b9e818b5SShawn Pearce 216b9e818b5SShawn PearceBlocks begin with `block_type = 'r'` and a 3-byte `block_len` which 217b9e818b5SShawn Pearceencodes the number of bytes in the block up to, but not including the 218b9e818b5SShawn Pearceoptional `padding`. This is always less than or equal to the file's 219b9e818b5SShawn Pearceblock size. In the first ref block, `block_len` includes 24 bytes 220b9e818b5SShawn Pearcefor the file header. 221b9e818b5SShawn Pearce 222b9e818b5SShawn PearceThe 2-byte `restart_count` stores the number of entries in the 223b9e818b5SShawn Pearce`restart_offset` list, which must not be empty. Readers can use 224b9e818b5SShawn Pearce`restart_count` to binary search between restarts before starting a 225b9e818b5SShawn Pearcelinear scan. 226b9e818b5SShawn Pearce 227b9e818b5SShawn PearceExactly `restart_count` 3-byte `restart_offset` values precedes the 228b9e818b5SShawn Pearce`restart_count`. Offsets are relative to the start of the block and 229b9e818b5SShawn Pearcerefer to the first byte of any `ref_record` whose name has not been 230b9e818b5SShawn Pearceprefix compressed. Entries in the `restart_offset` list must be 231b9e818b5SShawn Pearcesorted, ascending. Readers can start linear scans from any of these 232b9e818b5SShawn Pearcerecords. 233b9e818b5SShawn Pearce 234b9e818b5SShawn PearceA variable number of `ref_record` fill the middle of the block, 235b9e818b5SShawn Pearcedescribing reference names and values. The format is described below. 236b9e818b5SShawn Pearce 237b9e818b5SShawn PearceAs the first ref block shares the first file block with the file 238b9e818b5SShawn Pearceheader, all `restart_offset` in the first block are relative to the 239b9e818b5SShawn Pearcestart of the file (position 0), and include the file header. This 240b9e818b5SShawn Pearceforces the first `restart_offset` to be `28`. 241b9e818b5SShawn Pearce 242b9e818b5SShawn Pearce#### ref record 243b9e818b5SShawn Pearce 244b9e818b5SShawn PearceA `ref_record` describes a single reference, storing both the name and 245b9e818b5SShawn Pearceits value(s). Records are formatted as: 246b9e818b5SShawn Pearce 247b9e818b5SShawn Pearce varint( prefix_length ) 248b9e818b5SShawn Pearce varint( (suffix_length << 3) | value_type ) 249b9e818b5SShawn Pearce suffix 25044a75d9eSShawn Pearce varint( update_index_delta ) 251b9e818b5SShawn Pearce value? 252b9e818b5SShawn Pearce 253b9e818b5SShawn PearceThe `prefix_length` field specifies how many leading bytes of the 254b9e818b5SShawn Pearceprior reference record's name should be copied to obtain this 255b9e818b5SShawn Pearcereference's name. This must be 0 for the first reference in any 256b9e818b5SShawn Pearceblock, and also must be 0 for any `ref_record` whose offset is listed 257b9e818b5SShawn Pearcein the `restart_offset` table at the end of the block. 258b9e818b5SShawn Pearce 259b9e818b5SShawn PearceRecovering a reference name from any `ref_record` is a simple concat: 260b9e818b5SShawn Pearce 261b9e818b5SShawn Pearce this_name = prior_name[0..prefix_length] + suffix 262b9e818b5SShawn Pearce 263b9e818b5SShawn PearceThe `suffix_length` value provides the number of bytes available in 264b9e818b5SShawn Pearce`suffix` to copy from `suffix` to complete the reference name. 265b9e818b5SShawn Pearce 26644a75d9eSShawn PearceThe `update_index` that last modified the reference can be obtained by 26744a75d9eSShawn Pearceadding `update_index_delta` to the `min_update_index` from the file 26844a75d9eSShawn Pearceheader: `min_update_index + update_index_delta`. 26944a75d9eSShawn Pearce 270b9e818b5SShawn PearceThe `value` follows. Its format is determined by `value_type`, one of 271b9e818b5SShawn Pearcethe following: 272b9e818b5SShawn Pearce 273b9e818b5SShawn Pearce- `0x0`: deletion; no value data (see transactions, below) 274b9e818b5SShawn Pearce- `0x1`: one 20-byte object id; value of the ref 275b9e818b5SShawn Pearce- `0x2`: two 20-byte object ids; value of the ref, peeled target 276b9e818b5SShawn Pearce- `0x3`: symbolic reference: `varint( target_len ) target` 277b9e818b5SShawn Pearce 278b9e818b5SShawn PearceSymbolic references use `0x3`, followed by the complete name of the 279b9e818b5SShawn Pearcereference target. No compression is applied to the target name. 280b9e818b5SShawn Pearce 281b9e818b5SShawn PearceTypes `0x4..0x7` are reserved for future use. 282b9e818b5SShawn Pearce 283b9e818b5SShawn Pearce### Ref index 284b9e818b5SShawn Pearce 285b9e818b5SShawn PearceThe ref index stores the name of the last reference from every ref 286b9e818b5SShawn Pearceblock in the file, enabling reduced disk seeks for lookups. Any 287b9e818b5SShawn Pearcereference can be found by searching the index, identifying the 288b9e818b5SShawn Pearcecontaining block, and searching within that block. 289b9e818b5SShawn Pearce 290b9e818b5SShawn PearceThe index may be organized into a multi-level index, where the 1st 291b9e818b5SShawn Pearcelevel index block points to additional ref index blocks (2nd level), 292b9e818b5SShawn Pearcewhich may in turn point to either additional index blocks (e.g. 3rd 293b9e818b5SShawn Pearcelevel) or ref blocks (leaf level). Disk reads required to access a 294b9e818b5SShawn Pearceref go up with higher index levels. Multi-level indexes may be 295b9e818b5SShawn Pearcerequired to ensure no single index block exceeds the file format's max 296b9e818b5SShawn Pearceblock size of `16777215` bytes (15.99 MiB). To acheive constant O(1) 297b9e818b5SShawn Pearcedisk seeks for lookups the index must be a single level, which is 298b9e818b5SShawn Pearcepermitted to exceed the file's configured block size, but not the 299b9e818b5SShawn Pearceformat's max block size of 15.99 MiB. 300b9e818b5SShawn Pearce 301b9e818b5SShawn PearceIf present, the ref index block(s) appears after the last ref block. 302b9e818b5SShawn Pearce 303b9e818b5SShawn PearceIf there are at least 4 ref blocks, a ref index block should be 304b9e818b5SShawn Pearcewritten to improve lookup times. Cold reads using the index require 305b9e818b5SShawn Pearce2 disk reads (read index, read block), and binary searching < 4 blocks 306b9e818b5SShawn Pearcealso requires <= 2 reads. Omitting the index block from smaller files 307b9e818b5SShawn Pearcesaves space. 308b9e818b5SShawn Pearce 309b9e818b5SShawn PearceIf the file is unaligned and contains more than one ref block, the ref 310b9e818b5SShawn Pearceindex must be written. 311b9e818b5SShawn Pearce 312b9e818b5SShawn PearceIndex block format: 313b9e818b5SShawn Pearce 314b9e818b5SShawn Pearce 'i' 315b9e818b5SShawn Pearce uint24( block_len ) 316b9e818b5SShawn Pearce index_record+ 317b9e818b5SShawn Pearce uint24( restart_offset )+ 318b9e818b5SShawn Pearce uint16( restart_count ) 319b9e818b5SShawn Pearce 320b9e818b5SShawn Pearce padding? 321b9e818b5SShawn Pearce 322b9e818b5SShawn PearceThe index blocks begin with `block_type = 'i'` and a 3-byte 323b9e818b5SShawn Pearce`block_len` which encodes the number of bytes in the block, 324b9e818b5SShawn Pearceup to but not including the optional `padding`. 325b9e818b5SShawn Pearce 326b9e818b5SShawn PearceThe `restart_offset` and `restart_count` fields are identical in 327b9e818b5SShawn Pearceformat, meaning and usage as in ref blocks. 328b9e818b5SShawn Pearce 329b9e818b5SShawn PearceTo reduce the number of reads required for random access in very large 330b9e818b5SShawn Pearcefiles the index block may be larger than other blocks. However, 331b9e818b5SShawn Pearcereaders must hold the entire index in memory to benefit from this, so 332b9e818b5SShawn Pearceit's a time-space tradeoff in both file size and reader memory. 333b9e818b5SShawn Pearce 334b9e818b5SShawn PearceIncreasing the file's block size decreases the index size. 335b9e818b5SShawn PearceAlternatively a multi-level index may be used, keeping index blocks 336b9e818b5SShawn Pearcewithin the file's block size, but increasing the number of blocks 337b9e818b5SShawn Pearcethat need to be accessed. 338b9e818b5SShawn Pearce 339b9e818b5SShawn Pearce#### index record 340b9e818b5SShawn Pearce 341b9e818b5SShawn PearceAn index record describes the last entry in another block. 342b9e818b5SShawn PearceIndex records are written as: 343b9e818b5SShawn Pearce 344b9e818b5SShawn Pearce varint( prefix_length ) 345b9e818b5SShawn Pearce varint( (suffix_length << 3) | 0 ) 346b9e818b5SShawn Pearce suffix 347b9e818b5SShawn Pearce varint( block_position ) 348b9e818b5SShawn Pearce 349b9e818b5SShawn PearceIndex records use prefix compression exactly like `ref_record`. 350b9e818b5SShawn Pearce 351b9e818b5SShawn PearceIndex records store `block_position` after the suffix, specifying the 352b9e818b5SShawn Pearceabsolute position in bytes (from the start of the file) of the block 353b9e818b5SShawn Pearcethat ends with this reference. Readers can seek to `block_position` to 354b9e818b5SShawn Pearcebegin reading the block header. 355b9e818b5SShawn Pearce 356b9e818b5SShawn PearceReaders must examine the block header at `block_position` to determine 357b9e818b5SShawn Pearceif the next block is another level index block, or the leaf-level ref 358b9e818b5SShawn Pearceblock. 359b9e818b5SShawn Pearce 360b9e818b5SShawn Pearce#### Reading the index 361b9e818b5SShawn Pearce 362b9e818b5SShawn PearceReaders loading the ref index must first read the footer (below) to 363b9e818b5SShawn Pearceobtain `ref_index_position`. If not present, the position will be 0. 364b9e818b5SShawn PearceThe `ref_index_position` is for the 1st level root of the ref index. 365b9e818b5SShawn Pearce 366b9e818b5SShawn Pearce### Obj block format 367b9e818b5SShawn Pearce 368b9e818b5SShawn PearceObject blocks are optional. Writers may choose to omit object blocks, 369b9e818b5SShawn Pearceespecially if readers will not use the SHA-1 to ref mapping. 370b9e818b5SShawn Pearce 371b9e818b5SShawn PearceObject blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping 372b9e818b5SShawn Pearceto ref blocks containing references pointing to that object directly, 373b9e818b5SShawn Pearceor as the peeled value of an annotated tag. Like ref blocks, object 374b9e818b5SShawn Pearceblocks use the file's standard block size. The abbrevation length is 375b9e818b5SShawn Pearceavailable in the footer as `obj_id_len`. 376b9e818b5SShawn Pearce 377b9e818b5SShawn PearceTo save space in small files, object blocks may be omitted if the ref 378b9e818b5SShawn Pearceindex is not present, as brute force search will only need to read a 379b9e818b5SShawn Pearcefew ref blocks. When missing, readers should brute force a linear 380b9e818b5SShawn Pearcesearch of all references to lookup by SHA-1. 381b9e818b5SShawn Pearce 382b9e818b5SShawn PearceAn object block is written as: 383b9e818b5SShawn Pearce 384b9e818b5SShawn Pearce 'o' 385b9e818b5SShawn Pearce uint24( block_len ) 386b9e818b5SShawn Pearce obj_record+ 387b9e818b5SShawn Pearce uint24( restart_offset )+ 388b9e818b5SShawn Pearce uint16( restart_count ) 389b9e818b5SShawn Pearce 390b9e818b5SShawn Pearce padding? 391b9e818b5SShawn Pearce 392b9e818b5SShawn PearceFields are identical to ref block. Binary search using the restart 393b9e818b5SShawn Pearcetable works the same as in reference blocks. 394b9e818b5SShawn Pearce 395b9e818b5SShawn PearceBecause object identifiers are abbreviated by writers to the shortest 396b9e818b5SShawn Pearceunique abbreviation within the reftable, obj key lengths are variable 397b9e818b5SShawn Pearcebetween 2 and 20 bytes. Readers must compare only for common prefix 398b9e818b5SShawn Pearcematch within an obj block or obj index. 399b9e818b5SShawn Pearce 400b9e818b5SShawn Pearce#### obj record 401b9e818b5SShawn Pearce 402b9e818b5SShawn PearceAn `obj_record` describes a single object abbreviation, and the blocks 403b9e818b5SShawn Pearcecontaining references using that unique abbreviation: 404b9e818b5SShawn Pearce 405b9e818b5SShawn Pearce varint( prefix_length ) 406b9e818b5SShawn Pearce varint( (suffix_length << 3) | cnt_3 ) 407b9e818b5SShawn Pearce suffix 408b9e818b5SShawn Pearce varint( cnt_large )? 409b9e818b5SShawn Pearce varint( position_delta )* 410b9e818b5SShawn Pearce 411b9e818b5SShawn PearceLike in reference blocks, abbreviations are prefix compressed within 412b9e818b5SShawn Pearcean obj block. On large reftables with many unique objects, higher 413b9e818b5SShawn Pearceblock sizes (64k), and higher restart interval (128), a 414b9e818b5SShawn Pearce`prefix_length` of 2 or 3 and `suffix_length` of 3 may be common in 415b9e818b5SShawn Pearceobj records (unique abbreviation of 5-6 raw bytes, 10-12 hex digits). 416b9e818b5SShawn Pearce 417b9e818b5SShawn PearceEach record contains `position_count` number of positions for matching 418b9e818b5SShawn Pearceref blocks. For 1-7 positions the count is stored in `cnt_3`. When 419b9e818b5SShawn Pearce`cnt_3 = 0` the actual count follows in a varint, `cnt_large`. 420b9e818b5SShawn Pearce 421b9e818b5SShawn PearceThe use of `cnt_3` bets most objects are pointed to by only a single 422b9e818b5SShawn Pearcereference, some may be pointed to by a couple of references, and very 423b9e818b5SShawn Pearcefew (if any) are pointed to by more than 7 references. 424b9e818b5SShawn Pearce 425b9e818b5SShawn PearceA special case exists when `cnt_3 = 0` and `cnt_large = 0`: there 426b9e818b5SShawn Pearceare no `position_delta`, but at least one reference starts with this 427b9e818b5SShawn Pearceabbreviation. A reader that needs exact reference names must scan all 428b9e818b5SShawn Pearcereferences to find which specific references have the desired object. 429b9e818b5SShawn PearceWriters should use this format when the `position_delta` list would have 430b9e818b5SShawn Pearceoverflowed the file's block size due to a high number of references 431b9e818b5SShawn Pearcepointing to the same object. 432b9e818b5SShawn Pearce 433b9e818b5SShawn PearceThe first `position_delta` is the position from the start of the file. 434b9e818b5SShawn PearceAdditional `position_delta` entries are sorted ascending and relative 435b9e818b5SShawn Pearceto the prior entry, e.g. a reader would perform: 436b9e818b5SShawn Pearce 437b9e818b5SShawn Pearce pos = position_delta[0] 438b9e818b5SShawn Pearce prior = pos 439b9e818b5SShawn Pearce for (j = 1; j < position_count; j++) { 440b9e818b5SShawn Pearce pos = prior + position_delta[j] 441b9e818b5SShawn Pearce prior = pos 442b9e818b5SShawn Pearce } 443b9e818b5SShawn Pearce 444b9e818b5SShawn PearceWith a position in hand, a reader must linearly scan the ref block, 445b9e818b5SShawn Pearcestarting from the first `ref_record`, testing each reference's SHA-1s 446b9e818b5SShawn Pearce(for `value_type = 0x1` or `0x2`) for full equality. Faster searching 447b9e818b5SShawn Pearceby SHA-1 within a single ref block is not supported by the reftable 448b9e818b5SShawn Pearceformat. Smaller block sizes reduce the number of candidates this step 449b9e818b5SShawn Pearcemust consider. 450b9e818b5SShawn Pearce 451b9e818b5SShawn Pearce### Obj index 452b9e818b5SShawn Pearce 453b9e818b5SShawn PearceThe obj index stores the abbreviation from the last entry for every 454b9e818b5SShawn Pearceobj block in the file, enabling reduced disk seeks for all lookups. 455b9e818b5SShawn PearceIt is formatted exactly the same as the ref index, but refers to obj 456b9e818b5SShawn Pearceblocks. 457b9e818b5SShawn Pearce 458b9e818b5SShawn PearceThe obj index should be present if obj blocks are present, as 459b9e818b5SShawn Pearceobj blocks should only be written in larger files. 460b9e818b5SShawn Pearce 461b9e818b5SShawn PearceReaders loading the obj index must first read the footer (below) to 462b9e818b5SShawn Pearceobtain `obj_index_position`. If not present, the position will be 0. 463b9e818b5SShawn Pearce 464b9e818b5SShawn Pearce### Log block format 465b9e818b5SShawn Pearce 466b9e818b5SShawn PearceUnlike ref and obj blocks, log blocks are always unaligned. 467b9e818b5SShawn Pearce 468b9e818b5SShawn PearceLog blocks are variable in size, and do not match the `block_size` 469b9e818b5SShawn Pearcespecified in the file header or footer. Writers should choose an 470b9e818b5SShawn Pearceappropriate buffer size to prepare a log block for deflation, such as 471b9e818b5SShawn Pearce`2 * block_size`. 472b9e818b5SShawn Pearce 473b9e818b5SShawn PearceA log block is written as: 474b9e818b5SShawn Pearce 475b9e818b5SShawn Pearce 'g' 476b9e818b5SShawn Pearce uint24( block_len ) 477b9e818b5SShawn Pearce zlib_deflate { 478b9e818b5SShawn Pearce log_record+ 479b9e818b5SShawn Pearce uint24( restart_offset )+ 480b9e818b5SShawn Pearce uint16( restart_count ) 481b9e818b5SShawn Pearce } 482b9e818b5SShawn Pearce 483b9e818b5SShawn PearceLog blocks look similar to ref blocks, except `block_type = 'g'`. 484b9e818b5SShawn Pearce 485b9e818b5SShawn PearceThe 4-byte block header is followed by the deflated block contents 486b9e818b5SShawn Pearceusing zlib deflate. The `block_len` in the header is the inflated 487b9e818b5SShawn Pearcesize (including 4-byte block header), and should be used by readers to 488b9e818b5SShawn Pearcepreallocate the inflation output buffer. A log block's `block_len` 489b9e818b5SShawn Pearcemay exceed the file's block size. 490b9e818b5SShawn Pearce 491b9e818b5SShawn PearceOffsets within the log block (e.g. `restart_offset`) still include 492b9e818b5SShawn Pearcethe 4-byte header. Readers may prefer prefixing the inflation output 493b9e818b5SShawn Pearcebuffer with the 4-byte header. 494b9e818b5SShawn Pearce 495b9e818b5SShawn PearceWithin the deflate container, a variable number of `log_record` 496b9e818b5SShawn Pearcedescribe reference changes. The log record format is described 497b9e818b5SShawn Pearcebelow. See ref block format (above) for a description of 498b9e818b5SShawn Pearce`restart_offset` and `restart_count`. 499b9e818b5SShawn Pearce 500b9e818b5SShawn PearceBecause log blocks have no alignment or padding between blocks, 501b9e818b5SShawn Pearcereaders must keep track of the bytes consumed by the inflater to 502b9e818b5SShawn Pearceknow where the next log block begins. 503b9e818b5SShawn Pearce 504b9e818b5SShawn Pearce#### log record 505b9e818b5SShawn Pearce 506b9e818b5SShawn PearceLog record keys are structured as: 507b9e818b5SShawn Pearce 508b9e818b5SShawn Pearce ref_name '\0' reverse_int64( update_index ) 509b9e818b5SShawn Pearce 510b9e818b5SShawn Pearcewhere `update_index` is the unique transaction identifier. The 511b9e818b5SShawn Pearce`update_index` field must be unique within the scope of a `ref_name`. 512b9e818b5SShawn PearceSee the update transactions section below for further details. 513b9e818b5SShawn Pearce 514b9e818b5SShawn PearceThe `reverse_int64` function inverses the value so lexographical 515b9e818b5SShawn Pearceordering the network byte order encoding sorts the more recent records 516b9e818b5SShawn Pearcewith higher `update_index` values first: 517b9e818b5SShawn Pearce 518b9e818b5SShawn Pearce reverse_int64(int64 t) { 519b9e818b5SShawn Pearce return 0xffffffffffffffff - t; 520b9e818b5SShawn Pearce } 521b9e818b5SShawn Pearce 522b9e818b5SShawn PearceLog records have a similar starting structure to ref and index 523b9e818b5SShawn Pearcerecords, utilizing the same prefix compression scheme applied to the 524b9e818b5SShawn Pearcelog record key described above. 525b9e818b5SShawn Pearce 526b9e818b5SShawn Pearce``` 527b9e818b5SShawn Pearce varint( prefix_length ) 528b9e818b5SShawn Pearce varint( (suffix_length << 3) | log_type ) 529b9e818b5SShawn Pearce suffix 530b9e818b5SShawn Pearce log_data { 531b9e818b5SShawn Pearce old_id 532b9e818b5SShawn Pearce new_id 533b9e818b5SShawn Pearce varint( name_length ) name 534b9e818b5SShawn Pearce varint( email_length ) email 535b9e818b5SShawn Pearce varint( time_seconds ) 536b9e818b5SShawn Pearce sint16( tz_offset ) 537b9e818b5SShawn Pearce varint( message_length ) message 538b9e818b5SShawn Pearce }? 539b9e818b5SShawn Pearce``` 540b9e818b5SShawn Pearce 541b9e818b5SShawn PearceLog record entries use `log_type` to indicate what follows: 542b9e818b5SShawn Pearce 543b9e818b5SShawn Pearce- `0x0`: deletion; no log data. 544b9e818b5SShawn Pearce- `0x1`: standard git reflog data using `log_data` above. 545b9e818b5SShawn Pearce 546b9e818b5SShawn PearceThe `log_type = 0x0` is mostly useful for `git stash drop`, removing 547b9e818b5SShawn Pearcean entry from the reflog of `refs/stash` in a transaction file 548b9e818b5SShawn Pearce(below), without needing to rewrite larger files. Readers reading a 549b9e818b5SShawn Pearcestack of reflogs must treat this as a deletion. 550b9e818b5SShawn Pearce 551b9e818b5SShawn PearceFor `log_type = 0x1`, the `log_data` section follows 552b9e818b5SShawn Pearce[git update-ref][update-ref] logging, and includes: 553b9e818b5SShawn Pearce 554b9e818b5SShawn Pearce- two 20-byte SHA-1s (old id, new id) 555b9e818b5SShawn Pearce- varint string of committer's name 556b9e818b5SShawn Pearce- varint string of committer's email 557b9e818b5SShawn Pearce- varint time in seconds since epoch (Jan 1, 1970) 558b9e818b5SShawn Pearce- 2-byte timezone offset in minutes (signed) 559b9e818b5SShawn Pearce- varint string of message 560b9e818b5SShawn Pearce 561b9e818b5SShawn Pearce`tz_offset` is the absolute number of minutes from GMT the committer 562b9e818b5SShawn Pearcewas at the time of the update. For example `GMT-0800` is encoded in 563b9e818b5SShawn Pearcereftable as `sint16(-480)` and `GMT+0230` is `sint16(150)`. 564b9e818b5SShawn Pearce 565b9e818b5SShawn PearceThe committer email does not contain `<` or `>`, it's the value 566b9e818b5SShawn Pearcenormally found between the `<>` in a git commit object header. 567b9e818b5SShawn Pearce 568b9e818b5SShawn PearceThe `message_length` may be 0, in which case there was no message 569b9e818b5SShawn Pearcesupplied for the update. 570b9e818b5SShawn Pearce 571b9e818b5SShawn Pearce[update-ref]: https://git-scm.com/docs/git-update-ref#_logging_updates 572b9e818b5SShawn Pearce 573c517725bSHan-Wen NienhuysContrary to traditional reflog (which is a file), renames are encoded as a 574c517725bSHan-Wen Nienhuyscombination of ref deletion and ref creation. 575c517725bSHan-Wen Nienhuys 576c517725bSHan-Wen Nienhuys 577b9e818b5SShawn Pearce#### Reading the log 578b9e818b5SShawn Pearce 579b9e818b5SShawn PearceReaders accessing the log must first read the footer (below) to 580b9e818b5SShawn Pearcedetermine the `log_position`. The first block of the log begins at 581b9e818b5SShawn Pearce`log_position` bytes since the start of the file. The `log_position` 582b9e818b5SShawn Pearceis not block aligned. 583b9e818b5SShawn Pearce 584b9e818b5SShawn Pearce#### Importing logs 585b9e818b5SShawn Pearce 586b9e818b5SShawn PearceWhen importing from `$GIT_DIR/logs` writers should globally order all 587b9e818b5SShawn Pearcelog records roughly by timestamp while preserving file order, and 588b9e818b5SShawn Pearceassign unique, increasing `update_index` values for each log line. 589b9e818b5SShawn PearceNewer log records get higher `update_index` values. 590b9e818b5SShawn Pearce 591b9e818b5SShawn PearceAlthough an import may write only a single reftable file, the reftable 592b9e818b5SShawn Pearcefile must span many unique `update_index`, as each log line requires 593b9e818b5SShawn Pearceits own `update_index` to preserve semantics. 594b9e818b5SShawn Pearce 595b9e818b5SShawn Pearce### Log index 596b9e818b5SShawn Pearce 597b9e818b5SShawn PearceThe log index stores the log key (`refname \0 reverse_int64(update_index)`) 598b9e818b5SShawn Pearcefor the last log record of every log block in the file, supporting 599b9e818b5SShawn Pearcebounded-time lookup. 600b9e818b5SShawn Pearce 601b9e818b5SShawn PearceA log index block must be written if 2 or more log blocks are written 602b9e818b5SShawn Pearceto the file. If present, the log index appears after the last log 603b9e818b5SShawn Pearceblock. There is no padding used to align the log index to block 604b9e818b5SShawn Pearcealignment. 605b9e818b5SShawn Pearce 606b9e818b5SShawn PearceLog index format is identical to ref index, except the keys are 9 607b9e818b5SShawn Pearcebytes longer to include `'\0'` and the 8-byte 608b9e818b5SShawn Pearce`reverse_int64(update_index)`. Records use `block_position` to 609b9e818b5SShawn Pearcerefer to the start of a log block. 610b9e818b5SShawn Pearce 611b9e818b5SShawn Pearce#### Reading the index 612b9e818b5SShawn Pearce 613b9e818b5SShawn PearceReaders loading the log index must first read the footer (below) to 614b9e818b5SShawn Pearceobtain `log_index_position`. If not present, the position will be 0. 615b9e818b5SShawn Pearce 616b9e818b5SShawn Pearce### Footer 617b9e818b5SShawn Pearce 618b9e818b5SShawn PearceAfter the last block of the file, a file footer is written. It begins 619b9e818b5SShawn Pearcelike the file header, but is extended with additional data. 620b9e818b5SShawn Pearce 621b9e818b5SShawn PearceA 68-byte footer appears at the end: 622b9e818b5SShawn Pearce 623b9e818b5SShawn Pearce``` 624b9e818b5SShawn Pearce 'REFT' 625b9e818b5SShawn Pearce uint8( version_number = 1 ) 626b9e818b5SShawn Pearce uint24( block_size ) 627b9e818b5SShawn Pearce uint64( min_update_index ) 628b9e818b5SShawn Pearce uint64( max_update_index ) 629b9e818b5SShawn Pearce 630b9e818b5SShawn Pearce uint64( ref_index_position ) 631b9e818b5SShawn Pearce uint64( (obj_position << 5) | obj_id_len ) 632b9e818b5SShawn Pearce uint64( obj_index_position ) 633b9e818b5SShawn Pearce 634b9e818b5SShawn Pearce uint64( log_position ) 635b9e818b5SShawn Pearce uint64( log_index_position ) 636b9e818b5SShawn Pearce 637b9e818b5SShawn Pearce uint32( CRC-32 of above ) 638b9e818b5SShawn Pearce``` 639b9e818b5SShawn Pearce 640b9e818b5SShawn PearceIf a section is missing (e.g. ref index) the corresponding position 641b9e818b5SShawn Pearcefield (e.g. `ref_index_position`) will be 0. 642b9e818b5SShawn Pearce 643b9e818b5SShawn Pearce- `obj_position`: byte position for the first obj block. 644b9e818b5SShawn Pearce- `obj_id_len`: number of bytes used to abbreviate object identifiers 645b9e818b5SShawn Pearce in obj blocks. 646b9e818b5SShawn Pearce- `log_position`: byte position for the first log block. 647b9e818b5SShawn Pearce- `ref_index_position`: byte position for the start of the ref index. 648b9e818b5SShawn Pearce- `obj_index_position`: byte position for the start of the obj index. 649b9e818b5SShawn Pearce- `log_index_position`: byte position for the start of the log index. 650b9e818b5SShawn Pearce 651b9e818b5SShawn Pearce#### Reading the footer 652b9e818b5SShawn Pearce 653b9e818b5SShawn PearceReaders must seek to `file_length - 68` to access the footer. A 654b9e818b5SShawn Pearcetrusted external source (such as `stat(2)`) is necessary to obtain 655b9e818b5SShawn Pearce`file_length`. When reading the footer, readers must verify: 656b9e818b5SShawn Pearce 657b9e818b5SShawn Pearce- 4-byte magic is correct 658b9e818b5SShawn Pearce- 1-byte version number is recognized 659b9e818b5SShawn Pearce- 4-byte CRC-32 matches the other 64 bytes (including magic, and version) 660b9e818b5SShawn Pearce 661b9e818b5SShawn PearceOnce verified, the other fields of the footer can be accessed. 662b9e818b5SShawn Pearce 663b9e818b5SShawn Pearce### Varint encoding 664b9e818b5SShawn Pearce 665b9e818b5SShawn PearceVarint encoding is identical to the ofs-delta encoding method used 666b9e818b5SShawn Pearcewithin pack files. 667b9e818b5SShawn Pearce 668b9e818b5SShawn PearceDecoder works such as: 669b9e818b5SShawn Pearce 670b9e818b5SShawn Pearce val = buf[ptr] & 0x7f 671b9e818b5SShawn Pearce while (buf[ptr] & 0x80) { 672b9e818b5SShawn Pearce ptr++ 673b9e818b5SShawn Pearce val = ((val + 1) << 7) | (buf[ptr] & 0x7f) 674b9e818b5SShawn Pearce } 675b9e818b5SShawn Pearce 676b9e818b5SShawn Pearce### Binary search 677b9e818b5SShawn Pearce 678b9e818b5SShawn PearceBinary search within a block is supported by the `restart_offset` 679b9e818b5SShawn Pearcefields at the end of the block. Readers can binary search through the 680b9e818b5SShawn Pearcerestart table to locate between which two restart points the sought 681b9e818b5SShawn Pearcereference or key should appear. 682b9e818b5SShawn Pearce 683b9e818b5SShawn PearceEach record identified by a `restart_offset` stores the complete key 684b9e818b5SShawn Pearcein the `suffix` field of the record, making the compare operation 685b9e818b5SShawn Pearceduring binary search straightforward. 686b9e818b5SShawn Pearce 687b9e818b5SShawn PearceOnce a restart point lexicographically before the sought reference has 688b9e818b5SShawn Pearcebeen identified, readers can linearly scan through the following 689b9e818b5SShawn Pearcerecord entries to locate the sought record, terminating if the current 690b9e818b5SShawn Pearcerecord sorts after (and therefore the sought key is not present). 691b9e818b5SShawn Pearce 692b9e818b5SShawn Pearce#### Restart point selection 693b9e818b5SShawn Pearce 694b9e818b5SShawn PearceWriters determine the restart points at file creation. The process is 695b9e818b5SShawn Pearcearbitrary, but every 16 or 64 records is recommended. Every 16 may 696b9e818b5SShawn Pearcebe more suitable for smaller block sizes (4k or 8k), every 64 for 697b9e818b5SShawn Pearcelarger block sizes (64k). 698b9e818b5SShawn Pearce 699b9e818b5SShawn PearceMore frequent restart points reduces prefix compression and increases 700b9e818b5SShawn Pearcespace consumed by the restart table, both of which increase file size. 701b9e818b5SShawn Pearce 702b9e818b5SShawn PearceLess frequent restart points makes prefix compression more effective, 703b9e818b5SShawn Pearcedecreasing overall file size, with increased penalities for readers 704b9e818b5SShawn Pearcewalking through more records after the binary search step. 705b9e818b5SShawn Pearce 706b9e818b5SShawn PearceA maximum of `65535` restart points per block is supported. 707b9e818b5SShawn Pearce 708b9e818b5SShawn Pearce## Considerations 709b9e818b5SShawn Pearce 710b9e818b5SShawn Pearce### Lightweight refs dominate 711b9e818b5SShawn Pearce 712b9e818b5SShawn PearceThe reftable format assumes the vast majority of references are single 713b9e818b5SShawn PearceSHA-1 valued with common prefixes, such as Gerrit Code Review's 714b9e818b5SShawn Pearce`refs/changes/` namespace, GitHub's `refs/pulls/` namespace, or many 715b9e818b5SShawn Pearcelightweight tags in the `refs/tags/` namespace. 716b9e818b5SShawn Pearce 717b9e818b5SShawn PearceAnnotated tags storing the peeled object cost an additional 20 bytes 718b9e818b5SShawn Pearceper reference. 719b9e818b5SShawn Pearce 720b9e818b5SShawn Pearce### Low overhead 721b9e818b5SShawn Pearce 722b9e818b5SShawn PearceA reftable with very few references (e.g. git.git with 5 heads) 723b9e818b5SShawn Pearceis 269 bytes for reftable, vs. 332 bytes for packed-refs. This 724b9e818b5SShawn Pearcesupports reftable scaling down for transaction logs (below). 725b9e818b5SShawn Pearce 726b9e818b5SShawn Pearce### Block size 727b9e818b5SShawn Pearce 728b9e818b5SShawn PearceFor a Gerrit Code Review type repository with many change refs, larger 729b9e818b5SShawn Pearceblock sizes (64 KiB) and less frequent restart points (every 64) yield 730b9e818b5SShawn Pearcebetter compression due to more references within the block compressing 731b9e818b5SShawn Pearceagainst the prior reference. 732b9e818b5SShawn Pearce 733b9e818b5SShawn PearceLarger block sizes reduce the index size, as the reftable will 734b9e818b5SShawn Pearcerequire fewer blocks to store the same number of references. 735b9e818b5SShawn Pearce 736b9e818b5SShawn Pearce### Minimal disk seeks 737b9e818b5SShawn Pearce 738b9e818b5SShawn PearceAssuming the index block has been loaded into memory, binary searching 739b9e818b5SShawn Pearcefor any single reference requires exactly 1 disk seek to load the 740b9e818b5SShawn Pearcecontaining block. 741b9e818b5SShawn Pearce 742b9e818b5SShawn Pearce### Scans and lookups dominate 743b9e818b5SShawn Pearce 744b9e818b5SShawn PearceScanning all references and lookup by name (or namespace such as 745b9e818b5SShawn Pearce`refs/heads/`) are the most common activities performed on repositories. 746b9e818b5SShawn PearceSHA-1s are stored directly with references to optimize this use case. 747b9e818b5SShawn Pearce 748b9e818b5SShawn Pearce### Logs are infrequently read 749b9e818b5SShawn Pearce 750b9e818b5SShawn PearceLogs are infrequently accessed, but can be large. Deflating log 751b9e818b5SShawn Pearceblocks saves disk space, with some increased penalty at read time. 752b9e818b5SShawn Pearce 753b9e818b5SShawn PearceLogs are stored in an isolated section from refs, reducing the burden 754b9e818b5SShawn Pearceon reference readers that want to ignore logs. Further, historical 755b9e818b5SShawn Pearcelogs can be isolated into log-only files. 756b9e818b5SShawn Pearce 757b9e818b5SShawn Pearce### Logs are read backwards 758b9e818b5SShawn Pearce 759b9e818b5SShawn PearceLogs are frequently accessed backwards (most recent N records for 760b9e818b5SShawn Pearcemaster to answer `master@{4}`), so log records are grouped by 761b9e818b5SShawn Pearcereference, and sorted descending by update index. 762b9e818b5SShawn Pearce 763b9e818b5SShawn Pearce## Repository format 764b9e818b5SShawn Pearce 765b9e818b5SShawn Pearce### Version 1 766b9e818b5SShawn Pearce 767b9e818b5SShawn PearceA repository must set its `$GIT_DIR/config` to configure reftable: 768b9e818b5SShawn Pearce 769b9e818b5SShawn Pearce [core] 770b9e818b5SShawn Pearce repositoryformatversion = 1 771b9e818b5SShawn Pearce [extensions] 772b9e818b5SShawn Pearce refStorage = reftable 773b9e818b5SShawn Pearce 774b9e818b5SShawn Pearce### Layout 775b9e818b5SShawn Pearce 776b9e818b5SShawn PearceA collection of reftable files are stored in the `$GIT_DIR/reftable/` 777b9e818b5SShawn Pearcedirectory: 778b9e818b5SShawn Pearce 779cf11a03bSHan-Wen Nienhuys 00000001-00000001.log 780cf11a03bSHan-Wen Nienhuys 00000002-00000002.ref 781cf11a03bSHan-Wen Nienhuys 00000003-00000003.ref 782b9e818b5SShawn Pearce 783b9e818b5SShawn Pearcewhere reftable files are named by a unique name such as produced by 784cf11a03bSHan-Wen Nienhuysthe function `${min_update_index}-${max_update_index}.ref`. 785b9e818b5SShawn Pearce 786b9e818b5SShawn PearceLog-only files use the `.log` extension, while ref-only and mixed ref 787b9e818b5SShawn Pearceand log files use `.ref`. extension. 788b9e818b5SShawn Pearce 789b9e818b5SShawn Pearce 790*c217d33fSHan-Wen NienhuysThe stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the current 791*c217d33fSHan-Wen Nienhuysfiles, one per line, in order, from oldest (base) to newest (most recent): 792*c217d33fSHan-Wen Nienhuys 793*c217d33fSHan-Wen Nienhuys $ cat .git/reftable/tables.list 794cf11a03bSHan-Wen Nienhuys 00000001-00000001.log 795cf11a03bSHan-Wen Nienhuys 00000002-00000002.ref 796cf11a03bSHan-Wen Nienhuys 00000003-00000003.ref 797b9e818b5SShawn Pearce 798*c217d33fSHan-Wen NienhuysReaders must read `$GIT_DIR/reftable/tables.list` to determine which files are 799*c217d33fSHan-Wen Nienhuysrelevant right now, and search through the stack in reverse order (last reftable 800*c217d33fSHan-Wen Nienhuysis examined first). 801b9e818b5SShawn Pearce 802*c217d33fSHan-Wen NienhuysReftable files not listed in `tables.list` may be new (and about to be added 803b9e818b5SShawn Pearceto the stack by the active writer), or ancient and ready to be pruned. 804b9e818b5SShawn Pearce 805*c217d33fSHan-Wen Nienhuys### Backward compatibility 806*c217d33fSHan-Wen Nienhuys 807*c217d33fSHan-Wen NienhuysOlder clients should continue to recognize the directory as a git repository so 808*c217d33fSHan-Wen Nienhuysthey don't look for an enclosing repository in parent directories. To this end, 809*c217d33fSHan-Wen Nienhuysa reftable-enabled repository must contain the following dummy files 810*c217d33fSHan-Wen Nienhuys 811*c217d33fSHan-Wen Nienhuys* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`. 812*c217d33fSHan-Wen Nienhuys* `.git/refs/`, a directory 813*c217d33fSHan-Wen Nienhuys* `.git/refs/heads`, a regular file 814*c217d33fSHan-Wen Nienhuys 815b9e818b5SShawn Pearce### Readers 816b9e818b5SShawn Pearce 817b9e818b5SShawn PearceReaders can obtain a consistent snapshot of the reference space by 818b9e818b5SShawn Pearcefollowing: 819b9e818b5SShawn Pearce 820*c217d33fSHan-Wen Nienhuys1. Open and read the `tables.list` file. 821b9e818b5SShawn Pearce2. Open each of the reftable files that it mentions. 822b9e818b5SShawn Pearce3. If any of the files is missing, goto 1. 823b9e818b5SShawn Pearce4. Read from the now-open files as long as necessary. 824b9e818b5SShawn Pearce 825b9e818b5SShawn Pearce### Update transactions 826b9e818b5SShawn Pearce 827b9e818b5SShawn PearceAlthough reftables are immutable, mutations are supported by writing a 828b9e818b5SShawn Pearcenew reftable and atomically appending it to the stack: 829b9e818b5SShawn Pearce 830*c217d33fSHan-Wen Nienhuys1. Acquire `tables.list.lock`. 831*c217d33fSHan-Wen Nienhuys2. Read `tables.list` to determine current reftables. 832b9e818b5SShawn Pearce3. Select `update_index` to be most recent file's `max_update_index + 1`. 833cf11a03bSHan-Wen Nienhuys4. Prepare temp reftable `tmp_XXXXXX`, including log entries. 834cf11a03bSHan-Wen Nienhuys5. Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`. 835*c217d33fSHan-Wen Nienhuys6. Copy `tables.list` to `tables.list.lock`, appending file from (5). 836*c217d33fSHan-Wen Nienhuys7. Rename `tables.list.lock` to `tables.list`. 837b9e818b5SShawn Pearce 838b9e818b5SShawn PearceDuring step 4 the new file's `min_update_index` and `max_update_index` 839b9e818b5SShawn Pearceare both set to the `update_index` selected by step 3. All log 840b9e818b5SShawn Pearcerecords for the transaction use the same `update_index` in their keys. 841b9e818b5SShawn PearceThis enables later correlation of which references were updated by the 842b9e818b5SShawn Pearcesame transaction. 843b9e818b5SShawn Pearce 844*c217d33fSHan-Wen NienhuysBecause a single `tables.list.lock` file is used to manage locking, the 845b9e818b5SShawn Pearcerepository is single-threaded for writers. Writers may have to 846*c217d33fSHan-Wen Nienhuysbusy-spin (with backoff) around creating `tables.list.lock`, for up to an 847b9e818b5SShawn Pearceacceptable wait period, aborting if the repository is too busy to 848b9e818b5SShawn Pearcemutate. Application servers wrapped around repositories (e.g. Gerrit 849b9e818b5SShawn PearceCode Review) can layer their own lock/wait queue to improve fairness 850b9e818b5SShawn Pearceto writers. 851b9e818b5SShawn Pearce 852b9e818b5SShawn Pearce### Reference deletions 853b9e818b5SShawn Pearce 854b9e818b5SShawn PearceDeletion of any reference can be explicitly stored by setting the 855b9e818b5SShawn Pearce`type` to `0x0` and omitting the `value` field of the `ref_record`. 856b9e818b5SShawn PearceThis serves as a tombstone, overriding any assertions about the 857b9e818b5SShawn Pearceexistence of the reference from earlier files in the stack. 858b9e818b5SShawn Pearce 859b9e818b5SShawn Pearce### Compaction 860b9e818b5SShawn Pearce 861b9e818b5SShawn PearceA partial stack of reftables can be compacted by merging references 862b9e818b5SShawn Pearceusing a straightforward merge join across reftables, selecting the 863b9e818b5SShawn Pearcemost recent value for output, and omitting deleted references that do 864b9e818b5SShawn Pearcenot appear in remaining, lower reftables. 865b9e818b5SShawn Pearce 866b9e818b5SShawn PearceA compacted reftable should set its `min_update_index` to the smallest of 867b9e818b5SShawn Pearcethe input files' `min_update_index`, and its `max_update_index` 868b9e818b5SShawn Pearcelikewise to the largest input `max_update_index`. 869b9e818b5SShawn Pearce 870b9e818b5SShawn PearceFor sake of illustration, assume the stack currently consists of 871b9e818b5SShawn Pearcereftable files (from oldest to newest): A, B, C, and D. The compactor 872b9e818b5SShawn Pearceis going to compact B and C, leaving A and D alone. 873b9e818b5SShawn Pearce 874*c217d33fSHan-Wen Nienhuys1. Obtain lock `tables.list.lock` and read the `tables.list` file. 875b9e818b5SShawn Pearce2. Obtain locks `B.lock` and `C.lock`. 876b9e818b5SShawn Pearce Ownership of these locks prevents other processes from trying 877b9e818b5SShawn Pearce to compact these files. 878*c217d33fSHan-Wen Nienhuys3. Release `tables.list.lock`. 879cf11a03bSHan-Wen Nienhuys4. Compact `B` and `C` into a temp file `${min_update_index}-${max_update_index}_XXXXXX`. 880*c217d33fSHan-Wen Nienhuys5. Reacquire lock `tables.list.lock`. 881b9e818b5SShawn Pearce6. Verify that `B` and `C` are still in the stack, in that order. This 882b9e818b5SShawn Pearce should always be the case, assuming that other processes are adhering 883b9e818b5SShawn Pearce to the locking protocol. 884cf11a03bSHan-Wen Nienhuys7. Rename `${min_update_index}-${max_update_index}_XXXXXX` to 885cf11a03bSHan-Wen Nienhuys `${min_update_index}-${max_update_index}.ref`. 886*c217d33fSHan-Wen Nienhuys8. Write the new stack to `tables.list.lock`, replacing `B` and `C` with the 887b9e818b5SShawn Pearce file from (4). 888*c217d33fSHan-Wen Nienhuys9. Rename `tables.list.lock` to `tables.list`. 889b9e818b5SShawn Pearce10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing 890b9e818b5SShawn Pearce readers to backtrack. 891b9e818b5SShawn Pearce 892b9e818b5SShawn PearceThis strategy permits compactions to proceed independently of updates. 893b9e818b5SShawn Pearce 894cf11a03bSHan-Wen NienhuysEach reftable (compacted or not) is uniquely identified by its name, so open 895cf11a03bSHan-Wen Nienhuysreftables can be cached by their name. 896cf11a03bSHan-Wen Nienhuys 897b9e818b5SShawn Pearce## Alternatives considered 898b9e818b5SShawn Pearce 899b9e818b5SShawn Pearce### bzip packed-refs 900b9e818b5SShawn Pearce 901b9e818b5SShawn Pearce`bzip2` can significantly shrink a large packed-refs file (e.g. 62 902b9e818b5SShawn PearceMiB compresses to 23 MiB, 37%). However the bzip format does not support 903b9e818b5SShawn Pearcerandom access to a single reference. Readers must inflate and discard 904b9e818b5SShawn Pearcewhile performing a linear scan. 905b9e818b5SShawn Pearce 906b9e818b5SShawn PearceBreaking packed-refs into chunks (individually compressing each chunk) 907b9e818b5SShawn Pearcewould reduce the amount of data a reader must inflate, but still 908b9e818b5SShawn Pearceleaves the problem of indexing chunks to support readers efficiently 909b9e818b5SShawn Pearcelocating the correct chunk. 910b9e818b5SShawn Pearce 911b9e818b5SShawn PearceGiven the compression achieved by reftable's encoding, it does not 912b9e818b5SShawn Pearceseem necessary to add the complexity of bzip/gzip/zlib. 913b9e818b5SShawn Pearce 914b9e818b5SShawn Pearce### Michael Haggerty's alternate format 915b9e818b5SShawn Pearce 916b9e818b5SShawn PearceMichael Haggerty proposed [an alternate][mh-alt] format to reftable on 917b9e818b5SShawn Pearcethe Git mailing list. This format uses smaller chunks, without the 918b9e818b5SShawn Pearcerestart table, and avoids block alignment with padding. Reflog entries 919b9e818b5SShawn Pearceimmediately follow each ref, and are thus interleaved between refs. 920b9e818b5SShawn Pearce 921b9e818b5SShawn PearcePerformance testing indicates reftable is faster for lookups (51% 922b9e818b5SShawn Pearcefaster, 11.2 usec vs. 5.4 usec), although reftable produces a 923b9e818b5SShawn Pearceslightly larger file (+ ~3.2%, 28.3M vs 29.2M): 924b9e818b5SShawn Pearce 925b9e818b5SShawn Pearceformat | size | seek cold | seek hot | 926b9e818b5SShawn Pearce---------:|-------:|----------:|----------:| 927b9e818b5SShawn Pearcemh-alt | 28.3 M | 23.4 usec | 11.2 usec | 928b9e818b5SShawn Pearcereftable | 29.2 M | 19.9 usec | 5.4 usec | 929b9e818b5SShawn Pearce 930b9e818b5SShawn Pearce[mh-alt]: https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/ 931b9e818b5SShawn Pearce 932b9e818b5SShawn Pearce### JGit Ketch RefTree 933b9e818b5SShawn Pearce 934b9e818b5SShawn Pearce[JGit Ketch][ketch] proposed [RefTree][reftree], an encoding of 935b9e818b5SShawn Pearcereferences inside Git tree objects stored as part of the repository's 936b9e818b5SShawn Pearceobject database. 937b9e818b5SShawn Pearce 938b9e818b5SShawn PearceThe RefTree format adds additional load on the object database storage 939b9e818b5SShawn Pearcelayer (more loose objects, more objects in packs), and relies heavily 940b9e818b5SShawn Pearceon the packer's delta compression to save space. Namespaces which are 941b9e818b5SShawn Pearceflat (e.g. thousands of tags in refs/tags) initially create very 942b9e818b5SShawn Pearcelarge loose objects, and so RefTree does not address the problem of 943b9e818b5SShawn Pearcecopying many references to modify a handful. 944b9e818b5SShawn Pearce 945b9e818b5SShawn PearceFlat namespaces are not efficiently searchable in RefTree, as tree 946b9e818b5SShawn Pearceobjects in canonical formatting cannot be binary searched. This fails 947b9e818b5SShawn Pearcethe need to handle a large number of references in a single namespace, 948b9e818b5SShawn Pearcesuch as GitHub's `refs/pulls`, or a project with many tags. 949b9e818b5SShawn Pearce 950b9e818b5SShawn Pearce[ketch]: https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html 951b9e818b5SShawn Pearce[reftree]: https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/ 952b9e818b5SShawn Pearce 953b9e818b5SShawn Pearce### LMDB 954b9e818b5SShawn Pearce 955b9e818b5SShawn PearceDavid Turner proposed [using LMDB][dt-lmdb], as LMDB is lightweight 956b9e818b5SShawn Pearce(64k of runtime code) and GPL-compatible license. 957b9e818b5SShawn Pearce 958b9e818b5SShawn PearceA downside of LMDB is its reliance on a single C implementation. This 959b9e818b5SShawn Pearcemakes embedding inside JGit (a popular reimplemenation of Git) 960b9e818b5SShawn Pearcedifficult, and hoisting onto virtual storage (for JGit DFS) virtually 961b9e818b5SShawn Pearceimpossible. 962b9e818b5SShawn Pearce 963b9e818b5SShawn PearceA common format that can be supported by all major Git implementations 964b9e818b5SShawn Pearce(git-core, JGit, libgit2) is strongly preferred. 965b9e818b5SShawn Pearce 966b9e818b5SShawn Pearce[dt-lmdb]: https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/ 967b9e818b5SShawn Pearce 968b9e818b5SShawn Pearce## Future 969b9e818b5SShawn Pearce 970b9e818b5SShawn Pearce### Longer hashes 971b9e818b5SShawn Pearce 972b9e818b5SShawn PearceVersion will bump (e.g. 2) to indicate `value` uses a different 973b9e818b5SShawn Pearceobject id length other than 20. The length could be stored in an 974b9e818b5SShawn Pearceexpanded file header, or hardcoded as part of the version. 975