neighbourhoodie-nnh-logo

Use JSON Patch to Resolve Conflicts posted Tuesday, September 15, 2020 by The Neighbourhoodie CouchDB Team

CouchDB is unique in the database world because it makes data conflicts first-class citizens of its data model. Normally, databases and applications built on them do a large amount of work to avoid doing this. In many scenarios, this leads to subtle errors and occasional data loss.

In CouchDB this means: no data is ever randomly lost and you can always make sure you have access to your user’s data. The only downside: you have to actively embrace conflicts and prepare for them. But don’t worry, it is not a lot of work.

A document conflict manifests as a _conflicts field in your doc that you get if you query your doc with the conflicts=true option and a conflict exists. We also use the revs=true option to get a little more information about what is going on.

Normally, you don’t see any conflicts:

GET /db/doc?revs=true
{
  "_id: "doc",
  "_rev": "3-WXYZ9876",
  "_revisions": [
    "2-BCDE2345",
    "1-ABCD1234"
  ],
  "x": 3,
  "y": 1
}

With the conflicts option you do:

GET /db/doc?revs=true&conflicts=true
{
  "_id: "doc",
  "_rev": "3-WXYZ9876",
  "_conflicts": ["3-CDEF3456"]
  "_revisions": [
    "2-BCDE2345",
    "1-ABCD1234"
  ],
  "x": 3,
  "y": 1,
}

But now what? We can see that this doc has four revisions total, 1 and 2 are uncontroversial, but then there are two revisions 3-… and they are in conflict. So we can deduce that two clients tried to update revision 2 at the same time and they disagree on their contents. Let’s look at the other 3-… conflict:

GET /db/doc?revs=true&rev=3-CDEF3456
{
  "_id: "doc",
  "_rev": "3-CDEF3456",
  "_revisions": [
    "2-BCDE2345",
    "1-ABCD1234"
  ],
  "x": 1,
  "y": 4,
}

Now we can see that revision 3-CDEF3456 set our fields x and y to the values 3 and 1, while revision 3-WXYZ9876 set our fields x and y to the values 1 and 4. Clearly a conflict, but what is the “right” solution now?

Without knowing more about the application that produced this, we can’t really do much. But what if we knew what revision 2-BCDE2345 looked like? By default, we don’t know what it did look like, because CouchDB does not guarantee old document revisions to be around.

Introducing JSON Patch

With a little trick, we can keep just enough information around, so we can reconstruct previous revisions. To do this, we are going to use something very neat: JSON Patch. It is a way to describe the differences between two JSON objects.

Here is a small example. Say we have two JSON objects that look like this:

{
  "a": 1
}

{
  "a": 2
}

JSON Patch can describe the difference between the two objects. If we want to know what changed in between the first and the second object, this JSON Patch describes the difference:

{[
   { "op": "replace", "path": "/a", "value": 2 }
]}

If we want to know the difference between the second and the first, this is the corresponding JSON Patch:

{[
   { "op": "replace", "path": "/a", "value": 1 }
]}

Using JSON Patch

Now if we produce our documents in a way that we don’t only update our fields to the values we want, but also record our changes in JSON Patch format, that would allow us to reconstruct earlier revisions from the latest revision. The one trick here is that we don’t store the JSON Patch that gets us from the older to the newer revision, but the other way around, from the newer to the older.

Here is an example:

GET /db/doc?revs=true
{
  "_id: "doc",
  "_rev": "3-WXYZ9876",
  "_revisions": [
    "2-BCDE2345",
    "1-ABCD1234"
  ],
  "x": 3,
  "y": 1,
  "history": [
      {[
        { "op": "replace", "path": "/x", "value": 2 }
      ]},
      {[
        { "op": "replace", "path": "/x", "value": 1 }
      ]}
   ]
}

From this, we can deduce the document bodies for revisions 2-BCDE2345 (by applying x=2) and 1-ABCD1234 (by applying x=1).

2-BCDE2345:
{
  "_id: "doc",
  "_rev": "2-BCDE2345",
  "_revisions": [
    "1-ABCD1234"
  ],
  "x": 2,
  "y": 1,
  "history": [
      {[
        { "op": "replace", "path": "/x", "value": 1 }
      ]}
   ]
}

1-ABCD1234:
{
  "_id: "doc",
  "_rev": "2-BCDE2345",
  "_revisions": [
    "1-ABCD1234"
  ],
  "x": 1,
  "y": 1,
  "history": [ ]
}

Why is this useful? If we now have the JSON Patch history for both revisions 3-WXYZ9876 and 3-CDEF3456, we can perform what is called a three way merge. It allows us to resolve the conflict without any further information.

Here is how. First, let’s look at the history of revision 3-CDEF3456:

{
  "_id: "doc",
  "_rev": "3-CDEF3456",
  "_revisions": [
    "2-BCDE2345",
    "1-ABCD1234"
  ],
  "x": 2,
  "y": 4,
  "history": [
      {[
        { "op": "replace", "path": "/y", "value": 1 }
      ]},
      {[
        { "op": "replace", "path": "/x", "value": 1 }
      ]}
   ]
}

From this, we can now take the reconstruction of revision 2-BCDE2345:

{
  "_id: "doc",
  "_rev": "2-BCDE2345",
  "_revisions": [
    "1-ABCD1234"
  ],
  "x": 2,
  "y": 1,
  "history": [
      {[
        { "op": "replace", "path": "/x", "value": 1 }
      ]}
   ]
}

Using this as a base, we can produce the two JSON Patches that would then produce both 3- revisions:

{[
      { "op": "replace", "path": "/x", "value": 3 }
]}
{[
      { "op": "replace", "path": "/y", "value": 4 }
]}

And now we can compare the patches themselves. We see that the changes for each revision happened on separate fields. That means we can safely apply both patches to revision 2-BCDE2345 and get no conflict.

The resulting document looks like this and is stored as revision 4-XYXY8877

GET /db/doc
{
  "_id: "doc",
  "_rev": "4-XYXY8877",
  "x": 3,
  "y": 4
}

Voilá.