Everything You Need to Know About CouchDB Database Names posted Tuesday, October 13, 2020 by The Neighbourhoodie CouchDB Team

Naming a database does not sound like an exciting activity. But it can be, if you know all the considerations that go into naming a database in CouchDB. Let’s start with the restrictions.

CouchDB database names have restrictions in terms of which characters can be used. Based on these restrictions, a database name:

  1. must begin with a lowercase letter from a to z, no diacritics etc.
  2. Each character in the name must be one of:
    1. a lowercase letter from a to z
    2. a number from 0-9
    3. an underscore, dollar sign, open or closed parenthesis
    4. the plus and minus signs
    5. a slash
  3. May be no longer than 238 characters.

Or expressed as a Regular Expression: ^[a-z][a-z0-9_$()+/-]{238}$

The collection of special characters might seem unfamiliar at first. We’ll explain how they come together further down.

First, we talk about one of them, the slash, or /. It is used in URLs and in UNIX-like file systems to denote hierarchy, like a subdirectory.

In CouchDB 1.x, a database was represented by a single file in the file system. If you had a database called people, CouchDB would store all associated data in a file called people.couch. And all .couch database files are stored in the same directory on your file system (it can be found in the CouchDB configuration under [couchdb] database_dir

CouchDB does not put a practical limit on how many databases there can be on a server (other than the theoretical ~43238), but file systems do. While those limits are getting higher and higher, in the times of CouchDB 1.x, you had to consider how many databases you would create to get good performance out of your file systems. Some file systems get really slow when there are more than 216 or 232 files in a single directory.

To make sure you can create more databases than a file system limits you to, CouchDB allows you to add slashes to database names. It will create actual subdirectories in the file system, so you can avoid having too many files in a single directory.

For example, a database called user/32/14/55187 will be stored in the datatabase_dir as user/32/14/55187.couch.

CouchDB 2.0 introduced database sharding, the splitting up of single databases into multiple .couch files, which are stored each in their own directory per shard range. A shard range is expressed as a subdirectory which is named after the range, which goes from 00000000 to ffffffff. For example, a database with four shards (q=4) occupies the following shard ranges:

  • 00000000-3fffffff
  • 40000000-7fffffff
  • 80000000-1bffffff
  • 1c000000-ffffffff

A database with just one shard occupies the full range:

  • 00000000-ffffffff

For databases with a single shard, which are common in the database-per-user pattern, all database files are stored in the same directory on the file system, and the same rules as with CouchDB 1.x apply.

But the more shards you have per database, the fewer actual files there are in each shard subdirectory in the file system. So you’ll be reaching at which point the file system introduces slowness at a later point, but it is still worth considering if you have a very large number of databases.

Incidentally, the shard ranges explain why the database name is limited to 238 characters. In the past, file system paths could be at most 255 (28) characters long. But since CouchDB 2.0 and onwards always includes the shard range, we have to subtract 17 characters (2x8 for the beginning and end of the shard range plus 1 for the dash in the middle).

And where do the other special characters come in? It is pretty simple actually. When deciding which characters should be allowed in database names, the CouchDB developers surveyed all common file systems and collected all their respective restrictions about what characters could be included in file names. The result is the list of characters allowed in a CouchDB database: all these characters are allowed as part of a file name on any modern file system.

That said, we usually recommend keeping it to [a-z][a-z0-9-_/].