I started getting involved in the retro computing scene back in 2020, first with Windows 3.1 running in an emulator, then with a Commodore 64, then with a Z-80 CP/M machine I built from a kit, and finally with a classic 68k Macintosh. For the past few years I have focused mainly on the Macintosh side of the hobby. To that end I run a small BBS from my G4 Mac Mini (running Mac OS 9.2). However, the author of the BBS software has decided to stop working on it and I took that as a sign that I should finally try writing my own BBS from scratch, something that has been on my bucket list since I was a teenager using BBSes as my primary form of online social interaction. This post is part of a series on implementing various components used by this new BBS software. The goal is to end up with something that can compile on DOS, Classic Mac OS, and UN*Xes. The first component I will tackle is a simple database that will be used to store user data, messages, etc.

Overall Design

vDB is a simple database in style of database systems commonly used in the 80s and 90s. It's not really a "relational" database because there are no foreign key relationships enforced by the database. It supports indexing for fast lookups but not any sort of query language or ability to join tables together for more complicated queries.

It's features include:

  1. Data is stored in fixed-size blocks (called pages) for efficient random access. The database tracks empty blocks that can be reused by future inserts. Records can use multiple adjacent pages if needed.
  2. Each database file maintains a journal and all changes are written to the journal before being written to the database file. This greatly reduces the likelihood of data corruption due to power outage or other system failure.
  3. Records can be indexed by various fields in the record to improve query times. Indexes are stored as B-Trees. A primary index maps record IDs to physical page numbers. Additional indexes use a hash function to map strings to record IDs.
  4. Pages for both data and indexes are 512 bytes for more efficient disk access.
  5. Deletion of data only marks the page as empty, which improves deletion performance. The database file is then regularly compacted, which involves moving active records into free pages to reduce the total size of the data files on disk.

The library is written in C89 for maximum portability and targets Classic Mac OS 7/8/9, DOS, Linux, and UNIX. Nearly everything lives on the stack or is caller-provided, so heap allocation is minimal. All multi-byte values are stored in little-endian byte order on disk, with automatic endian conversion so database files are portable between big-endian and little-endian systems.

Each database is comprised of several files. Using a "Users" database as an example, the files might look like this:

  • USERS.DAT - The actual record data
  • USERS.IDX - Primary index mapping Record ID → Page Number (B-Tree)
  • USERS.I00 - Secondary index 00 (e.g., username → Record ID)
  • USERS.I01 - Secondary index 01 (e.g., email → Record ID)
  • USERS.JNL - Journal file for transaction recovery

Data File Format

The .DAT file is a binary file organized into fixed-size 512-byte pages. This page-based architecture allows for random access of specific records and easy reuse of freed space when records are removed. The 512-byte page size matches the B-Tree index page size and aligns well with disk sector sizes on retro systems, which is important for efficient I/O on older hardware where disk access is a significant bottleneck.

All on-disk structures use explicit byte-by-byte serialization to avoid problems with compiler alignment and padding differences across platforms.

Database Header

The first two pages of the .DAT file (pages 0 and 1) contain the database header. Page 0 holds metadata and index definitions. Page 1 holds the free page list.

Page 0: Metadata and Index Definitions

The first 32 bytes of page 0 are the header proper:

Field Type Size Notes
signature char[4] 8 bytes "VDB" + null terminator for validation
version uint16 2 bytes Format version (currently 1)
page_size uint16 2 bytes Always 512
record_size uint16 2 bytes Size of each record in bytes
record_count int32 4 bytes Total number of active records
next_record_id int32 4 bytes Next available Record ID (auto-incrementing)
last_compacted int32 4 bytes Timestamp of last compaction
journal_pending bool 1 byte True if journal needs replay on open
index_count byte 1 byte Number of secondary indexes (0–15)
reserved byte[8] 8 bytes Padding to reach 32 bytes

The remaining 480 bytes of page 0 hold an array of up to 15 secondary index definitions (DBIndexInfo), each 32 bytes:

Field Type Size Notes
field_name char[30] 30 bytes Name of the indexed field
index_type byte 1 byte 0 = integer key, 1 = string key (CRC16 hash)
index_number byte 1 byte Maps to the .I?? file extension (00–14)

This comes to exactly 32 + (15 × 32) = 512 bytes, fitting neatly into a single page.

Page 1: Free Page List

Page 1 contains the DBFreeList structure, which tracks empty pages available for reuse:

Field Type Size Notes
free_page_count uint16 2 bytes Total count of free pages in the database
free_page_list_len uint16 2 bytes Number of entries in the free_pages array
free_pages int32[127] 508 bytes Page numbers of known empty pages

This also comes to exactly 512 bytes: 4 bytes of header plus 127 entries at 4 bytes each. The free_pages array acts as a stack - pages are allocated by decrementing free_page_list_len and freed by incrementing it.

Data Pages

Data pages start at page 2 and continue to the end of the file. Each page is a DBPage with this layout:

Field Type Size Notes
id int32 4 bytes Record ID
status byte 1 byte 0 = Empty, 1 = Active, 2 = Continuation
reserved byte 1 byte Padding
data byte[506] 506 bytes Record data

With 6 bytes of overhead per page, each page provides 506 bytes of usable data storage. Records larger than 506 bytes span multiple consecutive pages. The number of pages needed for a record is simply ceiling(record_size / 506). The first page of a multi-page record has status Active, and subsequent pages have status Continuation. All pages in a record share the same Record ID.

Free Space Management

When a record is deleted, its pages are marked as Empty and the free_page_count in the free list is incremented. If there's room in the free_pages array (which holds up to 127 entries), the first page number of the deleted record is added. The remaining pages in a multi-page record don't need to be tracked individually because their positions can be calculated from the first page number and the record size.

When a new record needs to be added, the allocator first checks free_page_count. If it's zero, a new page is appended to the end of the file. If there are free pages available and entries in the free_pages array, the allocator searches for a consecutive run of pages large enough for the record. If the array is empty but free_page_count is positive, the database scans the .DAT file to repopulate the array before trying again.

This approach has clear trade-offs. The upside is that deletion is fast since it just flips a status byte and updates a counter, and allocation is usually O(1) since we're just popping from a stack. The downside is that the file can accumulate dead space over time, especially with many delete/insert cycles. That's where compaction comes in - it's essentially a defragmentation process that moves active records to fill in the gaps left by deleted records, then truncates the file. Since records can move during compaction, it's important that the primary index provides a layer of indirection between Record IDs and physical page numbers (more on this below). After compaction, only the primary index needs updating; secondary indexes are unaffected because they reference Record IDs rather than physical locations.

Indexes

Without indexes, finding a record by anything other than scanning every page in the file would be impossible. For a BBS with thousands of users and messages, a linear scan is unacceptable. Indexes provide a way to jump directly to the record you need.

Primary Index

Every vDB database has a primary index stored in a .IDX file. This index maps Record IDs (logical identifiers) to page numbers (physical locations in the .DAT file). This separation of logical identity from physical location is a key design decision.

Consider what happens during compaction: records get moved to new physical locations to reclaim space. If other parts of the system referenced records by page number directly, every reference would need updating whenever a record moved. With the primary index acting as an intermediary, only the primary index itself needs updating during compaction. Everything else continues to reference the Record ID, which never changes.

The lookup path is straightforward - given a Record ID, search the primary index B-Tree to get the page number, then seek to that position in the .DAT file and read the record.

B-Trees

The indexes are stored as B-Trees, which are balanced tree data structures optimized for systems that read and write large blocks of data. A B-Tree of order m allows each node to hold up to m keys, and the tree stays balanced by splitting nodes that overflow and merging nodes that underflow. This guarantees O(log n) lookup, insertion, and deletion times.

B-Trees are a natural fit here because each node maps to a single 512-byte page on disk, so traversing the tree requires one disk read per level. For a B-Tree with order 60 (which is what fits in a 512-byte page given 4-byte keys and 4-byte values), a tree with a million entries would only be about 3 to 4 levels deep. That's 3 to 4 disk reads to find any record, which is quite reasonable even on older hardware.

The B-Tree implementation in vDB supports duplicate keys, which is important for secondary indexes where hash collisions can occur. When a leaf node's value list contains more values than can fit in a single page, it spills into overflow pages linked from the leaf entry.

Index File Format

Each B-Tree index file uses the same 512-byte page structure. Page 0 is the B-Tree header:

Field Type Size Notes
magic char[4] 4 bytes File type identifier
version uint16 2 bytes Format version
order uint16 2 bytes Maximum keys per node
root_page int32 4 bytes Page number of the root node
next_free_page int32 4 bytes Next available page for allocation
page_count int32 4 bytes Total pages in the file

The remaining pages are internal nodes, leaf nodes, or overflow pages, identified by a page type byte at the start of each page.

Secondary indexes (the .I?? files) have the same B-Tree structure but map differently. For integer fields, the field value is used directly as the key. For string fields, a CRC16 hash of the (case-insensitive) string is used as the key. For the primary index, the value stored is the physical page number. For all other indexes, the values stored are Record IDs rather than page numbers. This means looking up a record by any key other than its Record ID requires two B-Tree lookups: first on the secondary index to get the Record ID, then on the primary index to get the page number. When reading from an index with a string key the caller verifies that the actual string matches the key to handle hash collisions (which have roughly a 1-in-65536 chance per entry with CRC16).

Journaling

Power failures and system crashes are a real concern, especially on retro hardware that lacks the protections modern systems take for granted. If the system dies partway through writing a record, you could end up with a half-written page in the data file and indexes that don't match the data. Journaling prevents this by recording what you intend to do before you do it.

Journal File Format

The journal file (.JNL) contains a sequence of fixed-size entries, each 518 bytes:

Field Type Size Notes
operation byte 1 byte 0 = None, 1 = Update, 2 = Delete, 3 = Add
page_num int32 4 bytes Target page number (−1 for Add operations)
record_id int32 4 bytes Record ID for verification
data byte[507] 507 bytes New record data (for Update and Add)
checksum uint16 2 bytes CRC16 of the preceding fields

The checksum is crucial - it lets the recovery process detect and skip corrupted entries that were only partially written.

Write Operation Sequence

Every write operation follows the same sequence:

  1. Append to journal - Write a DBJournalEntry describing the operation.
  2. Set pending flag - Set journal_pending = true in the database header (page 0).
  3. Flush to disk - Ensure the journal is physically written to disk.
  4. Perform the operation - Execute the actual changes on the .DAT file.
  5. Update indexes - Modify all affected B-Tree indexes.
  6. Clear pending flag - Set journal_pending = false in the header.
  7. Truncate journal - Set the journal file size to 0 bytes.

The key insight is that step 3 (the flush) is the commit point. If the system fails before the flush, the journal is incomplete and can be safely discarded. If the system fails after the flush but before the database is fully updated, the journal contains a complete record of what needs to happen and can be replayed. For operations affecting multiple records, all journal entries are written before the pending flag is set, creating an atomic multi-operation transaction.

Journal Recovery on Database Open

When a database is opened, the first thing that happens is a check of the journal_pending flag. If it's false, the database opens normally. If it's true, recovery kicks in:

  1. Open the journal file.
  2. Read each journal entry and verify its CRC16 checksum.
  3. Replay valid entries: writes go to their target pages, deletes mark pages as empty, and adds find available pages for new records.
  4. Skip any entries with invalid checksums (these were partially written and are incomplete).
  5. Rebuild all indexes from the .DAT file. This is simpler and safer than trying to figure out which index updates completed before the crash.
  6. Clear the journal_pending flag and truncate the journal.

The full index rebuild during recovery is a deliberate trade-off. It's slower than trying to replay partial index updates, but it guarantees index consistency without having to reason about which indexes were updated before the crash. For the database sizes typical of a retro BBS, the rebuild is fast enough that this simplicity is well worth it.

Performance Considerations

For the kinds of workloads this database is designed for - a retro BBS with maybe a few thousand users and tens of thousands of messages - performance is more than adequate. Here's the general picture:

Primary index lookups are O(log n) via the B-Tree. With order-60 nodes, even 100,000 records only requires about 3 levels of tree traversal, meaning 3 disk reads to find any record by its ID. Secondary index lookups require two B-Tree traversals (secondary index -> Record ID, then primary index -> page number), so roughly double the disk reads. This is a deliberate trade-off: slightly slower reads in exchange for much simpler compaction and updates. Since secondary indexes reference Record IDs rather than page numbers, they never need updating when records move during compaction.

Free page allocation is O(1) in the common case, just popping from the free page array. It degrades to O(n) when the array is empty but free pages exist (requiring a file scan to refill it), but after that scan, the next 127 allocations are back to O(1).

The journal adds overhead to every write - an extra file append and flush, plus two header updates and a truncate. For a BBS where writes are relatively infrequent compared to reads (users log in and browse messages far more often than they post), this overhead is acceptable in exchange for crash safety.

One thing to keep in mind is that each insert, update, or delete has to touch all secondary indexes as well as the primary index. For a table with many indexes, this multiplies the write cost. In practice, keeping the index count reasonable (the library supports up to 15 secondary indexes per table, but most tables will only use 2 or 3) keeps this manageable.

Next Steps

Now that I've throughly considered the design of the database, it is time to implement it. Rather than implementing it by hand, my next post will demonstrate how to implement it using an AI coding assistant such as Claude Code.