Always Compact posted Wednesday, January 29, 2025 by The Neighbourhoodie Team
CouchDB’s revision control system can sometimes lead to the assumption that CouchDB works like git, where you can access all versions of a document for eternity. Though it’s tempting, it’s inaccurate, and knowing why is important for understanding database health.
While CouchDB and git both use their knowledge about revision for seamless synchronisation, CouchDB does not guarantee old versions of documents are kept around, for the simple reason that the storage of all that data would quickly exceed what you’d reasonably expect from a database.
Originally, CouchDB required operators to periodically start a compaction task for each database and each view index. Several versions ago a crude compaction scheduler was introduced to periodically check all databases and indexes for a compaction threshold, and if exceeded, started compaction.
CouchDB 3.0.0 introduced an all-new scheduler that uses a lot fewer resources for background tasks.
You may now be wondering if you could disable the compaction scheduler and skip issuing manual compaction tasks in an effort to achieve a database that behaves like git with CouchDB. It’s hard not to make our recommendation here sound less foreboding. SImply put: don’t ever disable compaction.
This is under no circumstances a recommended way to run CouchDB and sooner rather than later, will introduce various problems:
- You’ll run out of disk space. CouchDB only ever appends information to a database file; even document deletions. Flexible block storage on cloud providers can take you a long way. But it’s inevitable that you would hit the limit and compaction would become impossible in the absence of adequate disk space. You also can’t compact onto another disk.
- Only the latest version of a document will be replicated. As soon as you have two replication sites, you lose access to your document history.
- Performance will become abysmal. CouchDB uses a B+-tree for underlying storage. B+-trees rely on an operation called rebalancing to ensure good average performance. CouchDB only runs these operations during compaction. So if you don’t run compaction, your CouchDB will get slower and slower over time.
In summary: don’t ever disable compaction.
There are multiple techniques to maintain document history in a CouchDB-friendly fashion, and we’ll introduce those in another post.
« Back to the blog post overview