Everything You Need to Know About CouchDB Database Names posted Tue Oct 13 2020 by The Neighbourhoodie CouchDB Team
Naming a database does not sound like an exciting activity. But it can be, if you know all the considerations that go into naming a database in CouchDB. Let’s start with the restrictions.
CouchDB database names have restrictions in terms of which characters can be used. Based on these restrictions, a database name:
- must begin with a lowercase letter from
z, no diacritics etc.
- Each character in the name must be one of:
- a lowercase letter from
- a number from
- an underscore, dollar sign, open or closed parenthesis
- the plus and minus signs
- a slash
- a lowercase letter from
- May be no longer than 238 characters.
Or expressed as a Regular Expression:
The collection of special characters might seem unfamiliar at first. We’ll explain how they come together further down.
First, we talk about one of them, the slash, or
/. It is used in URLs and in UNIX-like file systems to denote hierarchy, like a subdirectory.
In CouchDB 1.x, a database was represented by a single file in the file system. If you had a database called people, CouchDB would store all associated data in a file called
people.couch. And all
.couch database files are stored in the same directory on your file system (it can be found in the CouchDB configuration under
CouchDB does not put a practical limit on how many databases there can be on a server (other than the theoretical ~43238), but file systems do. While those limits are getting higher and higher, in the times of CouchDB 1.x, you had to consider how many databases you would create to get good performance out of your file systems. Some file systems get really slow when there are more than 216 or 232 files in a single directory.
To make sure you can create more databases than a file system limits you to, CouchDB allows you to add slashes to database names. It will create actual subdirectories in the file system, so you can avoid having too many files in a single directory.
For example, a database called
user/32/14/55187 will be stored in the
CouchDB 2.0 introduced database sharding, the splitting up of single databases into multiple .couch files, which are stored each in their own directory per shard range. A shard range is expressed as a subdirectory which is named after the range, which goes from
ffffffff. For example, a database with four shards (
q=4) occupies the following shard ranges:
A database with just one shard occupies the full range:
For databases with a single shard, which are common in the database-per-user pattern, all database files are stored in the same directory on the file system, and the same rules as with CouchDB 1.x apply.
But the more shards you have per database, the fewer actual files there are in each shard subdirectory in the file system. So you’ll be reaching at which point the file system introduces slowness at a later point, but it is still worth considering if you have a very large number of databases.
Incidentally, the shard ranges explain why the database name is limited to 238 characters. In the past, file system paths could be at most 255 (28) characters long. But since CouchDB 2.0 and onwards always includes the shard range, we have to subtract 17 characters (2x8 for the beginning and end of the shard range plus 1 for the dash in the middle).
And where do the other special characters come in? It is pretty simple actually. When deciding which characters should be allowed in database names, the CouchDB developers surveyed all common file systems and collected all their respective restrictions about what characters could be included in file names. The result is the list of characters allowed in a CouchDB database: all these characters are allowed as part of a file name on any modern file system.
That said, we usually recommend keeping it to