neighbourhoodie-nnh-logo

All Views in One Design Doc posted Wednesday, August 28, 2024 by The Neighbourhoodie Team

CouchDB organises the definitions for all secondary indexes in a database in design documents. This is true for JavaScript Views, Erlang Views, Lucene text indexes, and Mango Queries.

Design documents include at least one view definition, but there can be multiple. Here are two example design documents with one definition each:

{
  "_id": "_design/one-view",
  "views": {
    "my-first-view: {
      "map": "function(doc) { emit(doc.example) }"
    }
  }
}

{
  "_id": "_design/another-view",
  "views": {
    "my-second-view: {
      "map": "function(doc) { emit(doc.next_example) }"
    }
  }
}

Here is a design document with two view definitions:

{
  "_id": "_design/two-views",
  "views": {
    "my-first-view: {
      "map": "function(doc) { emit(doc.example) }"
    },
    "my-example-view: {
      "map": "function(doc) { emit(doc.next_example) }"
    }
  }
}

In terms of querying either of these views, CouchDB makes no difference between the two formats. So why do the two forms exist?

The difference is when building the index for these views. Each design document corresponds to one index file on disk (per database shard). So two separate design documents with one view definition each create two indexes. A single design document with two view definitions creates a single index.

Why is this important? — Let’s look at how view indexes are created:

  1. When a view is queried, the view engine will check if an up to date index exists for the requested view.
  2. If yes, it returns the result from the index. If not, it asks the database’s changes feed for all documents since the last query (or since the beginning, if this is the first query).
  3. The view engine then loads all documents that are not yet represented in the view index, and applies the index definitions to them (e.g. executes the map function(s) for each document), and writes the results into the index.
  4. Then it can reply with the result read from the index.

Let’s zoom in on one aspect. This means for each design document, the view engine will load each document at least once from the database and processes it. If we have two design documents, each document is loaded at least twice from the database, and so on.

When looking at how long the individual steps in this path take, we find out why this is relevant:

  1. find doc pointer in by-seq index
  2. load doc body from file
  3. if it is a JS view, serialise the doc into JSON and send it over stdio to a couchjs process.
  4. apply the document to the index definitions (map/reduce functions, mango index) and return the results
  5. if it is a JS view, receive the result over stdio again and serialise the result into Erlang terms.

Of these steps, step 4 is by far the fastest. Even if you have 20 or 100 map functions that you run a document through, steps 1, 2, 3, and 5 are considerably more work, even if you don’t use JS views.

Informally, one could say that for each design document, you send the entire contents of your database through an external couchjs process. With multiple design docs, you do this as many times as you have design documents.

If you only have a single design document with multiple index definitions, you only have to do steps 1, 2, 3 and 5 once. So viewed over time, this option uses a lot fewer computing resources that CouchDB can use to do something else.

Since everything is a trade-off, here are a few downsides of combining views into single design docs. While the advice above focuses on consolidating all views into a single design doc, the real answer for you might be “fewer”, based on these trade-offs.

  1. Sometimes you want to perform a set of changes to your design document that you can’t do in one step. And you want to avoid rebuilding parts of intermediate indexes while you make your changes.
  2. If you have many concurrent read requests to your view indexes, at some point you will hit a bottleneck, as each view shard only has a single file descriptor for reading data. Having a larger database q setting is the only way to parallelise that load within a single design doc index.
  3. Depending on the nature of your views, your view index shard files can grow really big, impacting server administration in the usual ways that large files are unwieldy.