neighbourhoodie-nnh-logo

CouchDB Data Modelling: Prefer Smaller Attachments posted Wednesday, February 19, 2025 by The Neighbourhoodie Team

We’ve already shared a CouchDB data modelling tip about preferring smaller docs. This one is about smaller attachments.

While size affects the general operation of CouchDB and most of its features, attachments work a bit differently. They only affect operations when reading and writing, and during replication.

The original design goal for attachments was to hold “arbitrary binary data”, and that’s what they delivered. After being implemented with a dedicated API, you could put large ISO images or even long video files with multiple GBs of data into CouchDB.

This is not to suggest that attaching huge files is a good idea, but it is possible.

Our better idea is: keep your attachments as small as makes sense in your use case. There are two reasons for keeping attachments small:

  1. replication performance
  2. compaction churn

Replication Performance

First, replication happens in concurrent batches. Batches are streamed in multipart HTTP messages, so no binary data conversion needs to happen. This is as efficient as it can be implemented, but there is a shortcoming. No attachment operation — including replication — can be resumed if your connection broke off half-way through uploading a large attachment. This is not very efficient. Smaller attachments cause this less frequently.

Compaction Churn

Second, compaction takes the contents of an existing database and copies out all data that is still considered the most recent and writes it into a new file. Then it swaps the old and new database file and throws away the old file. This way, CouchDB gets rid of operational data it doesn’t need anymore. When it comes to documents, only the latest version, along with all its attachments, is kept around. Attachments are usually larger than your document JSON, and usually longer-lived than a document revision. But on each compaction all attachments are copied over to the new database file verbatim. When using many and large attachments, this causes a lot of unnecessary disk IO.

There are a few operational mitigations to this (store all attachments in a separate db, compact once and then never change it, and only reference the attachments in your main database), but the biggest thing you can do to make better use of attachments is keeping them small.

How Small is Small?

We recommend attachments for binary data that are commonly associated with regular production data. Think: a user profile picture along with a user record. We recommend keeping your attachments in the 1–10 MB range.

« Back to the blog post overview