neighbourhoodie-nnh-logo

How CouchDB Prevents Data Corruption: Append-Only Database Files posted Wednesday, September 18, 2024 by The Neighbourhoodie Team

CouchDB takes keeping your safe very seriously. It does everything in its power not to accidentally lose any of your data. Let’s look at one of the things that keep your data safe: append-only database files.

CouchDB stores all data in files in a file system. Each database shard is represented by a single file. You might be used to using files when using a computer for day to day tasks. What you can usually expect is that something you saved in a file (e.g. a text document or an image) is going to be there and that you can retrieve it at a later point.

However, things are not that simple. A lot of things can go wrong, especially when updating data in files that are already stored on disk. Imagine you are editing a photo, and you are applying a blur filter to a certain region. Then you save your photo.

To see what can go wrong, we have to look under the hood. The way programs can work with files is managed through the operating system kernel. And usually, a program can tell the kernel to open a file, find a position inside the file, and then write a fixed-length buffer of bytes to disk.

So when applying your blur filter, the program needs to figure out which bytes in the photo file need changing with the newly blurred pixels, put those pixels into one or more buffers, and then tell the kernel to write them to the correct place in the file.

So far so good. But what happens, when halfway through writing a buffer, your laptop runs out of battery? Did the buffer make it to disk? Did it not make it? Did half of it make it? You can’t know, and now, upon recharging and restarting your laptop, you might have a broken photo file because the buffer couldn’t be saved properly.

This is just one of many scenarios of how to lose data when writing bytes to disk. CouchDB circumvents all of these scenarios by never updating data it has already written to disk.

Writing new data to a file is relatively error-proof, so that is the only thing CouchDB only ever does. Even if you delete a document, that’s information that gets appended to the end of the database file.

Eventually, your database could get so large that it fills your disk. CouchDB doesn’t usually let it grow that far by using a process called compaction. It creates a new, smaller database file with only the most recent information.