This is an interview with Joan Touzet. Joan is Head of CouchDB Support at Neighbourhoodie, is a CouchDB committer, and also sits on the Apache Software Foundation Board of Directors.
I was working on doctoral research at the Ontario Institute for Studies in Education at the University of Toronto. Part of our research was to take posts students made on a prototype web forum, personal blog or chat room and perform various semantic analyses of the text. But the websites themselves made extracting the content difficult.
A friend knew about CouchDB (version 0.8 at the time!) and suggested I try it. I managed to extract each post from the 3 other services and insert them into CouchDB. The REST-based UI was fantastic and so much easier to use than a SQL-based backend. Writing analysis scripts in Python to run through each record was a snap, plus I could write the findings back into CouchDB. I could then replicate the database to a colleague, who could then run their own analyses.
I was hooked.
My early contributions to CouchDB weren’t in code or documentation - they were in community building. I gained so much by being able to talk to its core developers in IRC daily, that I decided to give back by sitting in IRC and answering questions, too, as I learned the answers.
At the time, CouchDB and the open source community at large weren’t as keen on recognizing non-code contributors with merit, so it took some time (and a few small code patches) before I became a CouchDB committer. Now, while I work on code all over CouchDB, my main focus is on release engineering, continuous integration and packaging.
Jan may say the same thing, but it’d have to be
#745. This was a super
gnarly bug that we couldn’t always reproduce, where replication of large
attachments would simply crash out and never finish. The error code pointed to
some of the ugliest code in CouchDB, namely the HTTP multipart parsing code.
At first, we thought it was a race condition in how an
HTTP 413 (request body is
too large) error result was handled. The socket was being closed before the
was returned to the client, so the client never knew to send the attachment in
Later, once we fixed that bug, we still got internal Erlang process crashes with the same error, or situations where CouchDB would consume all of the RAM on a machine and explode. We discovered the recurrence was related to enforcing stricter HTTP request size limits, which was preventing successful replication. Raising the size limit or reducing the attachment size fixed the issue.
It’s got to be Mango.
At the second CouchDB summit in 2012, shortly after I joined Cloudant (I was
employee #20), I brought up the idea of a Domain-Specific Language (DSL). So
looked to see if a field existed, then emitted a
(key, value) pair of
value, 1) to get an index on a different “column” than the document’s primary
_id. I figured, if we could write a DSL for basic secondary indexes, we’d be
able to eliminate approximately 60% of people’s views. We could also reduce CPU
usage on the cluster, and improve performance, because we wouldn’t have to
met with lukewarm interest, but it was added to the backlog.
If you want to learn more about Mango, there’s plenty of information in the documentation as well as in my ApacheCon 2018 talk. Online you can find my slides from the talk and a recording of the session, which starts at 17:30.
Experiment locally, but you should bring your ideas back to the CouchDB community, through the user mailing list or Slack channel (both linked from https://couchdb.apache.org/) for another pair of eyes, and to learn some best practices. If you’re looking for more personalised attention, Neighbourhoodie Software is always here to help! ;)