neighbourhoodie-nnh-logo

Automatic Conflict Resolution with CouchDB and Svelte posted Wednesday, December 11, 2024 by Alex

This is the second part of a blog post series on building a real-time, multi-user Kanban board with CouchDB and Svelte. We’ve previously made design decisions that should help reduce the opportunities for conflicts, but since we can’t rule them out completely, we do need to provide ways to resolve them. In this post, we’ll be covering the possibilities of automatic conflict resolution, which means:

  • A machine can resolve the conflict without any user input
  • The resolution happens in the background without anyone even noticing

The app exists on GitHub, and you can follow along as we add features by checking out different tags. This post covers step 2.

The Goal and How to Get There

Let’s briefly recap the whole conflict situation:

  1. We know that, in a multi-user context, conflicts can happen: two users change the same resource at roughly the same time. In most databases, the first change is simply overwritten and lost (”Last Write Wins”).
  2. We know that CouchDB behaves differently and allows us to identify and work with these conflicting data sets.
  3. In the worst case, both users’ changes interfere with each other, and can’t be resolved automatically. Think of a git merge or rebase where git asks you to ”Resolve all conflicts manually” before continuing.
  4. We also know that some of these changes might not interfere with each other. Staying with version control as an example, this would be git cleanly applying two users’ changes onto each other in a merge or rebase. We would like our kanban board to behave similarly.

So we won’t be able to resolve all conflicts automatically. There will be instances where a human needs to decide what the final state of a document will be, and no amount of code, however clever, will change that. But before we annoy humans with the chore of resolving conflicts, we can at least try to have the machine resolve them as far as possible, and only if that fails will we turn to the users.

Our Approach to Automatic Conflict Resolution

The method we’ll be employing here is called a Three Way Merge. It has a number of requirements:

  1. We need to actually know a conflict exists on a document. CouchDB handles this bit.
  2. We need two1 conflicting revisions of the document. CouchDB also handles this by keeping the conflicting revisions around until the conflict has been resolved.
  3. We also need the revision the two conflicting revisions are based on. CouchDB does not guarantee that this will be around, since its compaction mechanism will periodically prune the database of unnecessary data, and the base revision of a conflict is technically speaking not necessary to resolve a conflict manually. We can either add functionality that will enable us to reconstruct the base revision, or simply only try the merge if the base revision still happens to exist by sheer coincidence. For this post, we can actually ignore this problem, because we’re not trying to solve for all types of conflicts, but only for one that isn’t affected by this issue. More on that later though2.

Since Three Way Merges of JavaScript objects haven’t really changed much recently, we can in good conscience use a… 14 year-old implementation that was actually written for use with CouchDB, slap a bit of TypeScript on top, and off we go.

This merge function takes the base revision plus two conflicting revisions and returns a merged revision. However, the Three Way Merge can’t always resolve the conflicts, there’s actually a fourth requirement relating to the actual nature of the changes in the data itself:

  1. The two conflicting revisions may not change the same keys to different values:
// This works:
const baseRev = {x: 1, y: 1}
const change1 = {x: 2, y: 1} // Only changes x
const change2 = {x: 1, y: 2} // Only changes y
const merged  = {x: 2, y: 2} // Merge can be performed, yay

// This also works:
const baseRev = {x: 1, y: 1}
const change1 = {x: 2, y: 1} // Changes x
const change2 = {x: 2, y: 1} // Also changes x, but makes the same change
const merged  = {x: 2, y: 1} // Merge can be performed, yay

// This does not work:
const baseRev = {x: 1, y: 1}
const change1 = {x: 2, y: 1} // Changes x
const change2 = {x: 7, y: 1} // Also changes x, but makes a different change
// No automatic merging for you! 
// The computer does not know whether x should be 2 or 7

Our merge library will attempt to merge as many changes as it can, and if it runs into something it cannot resolve, it will return the merged revision with every change it could manage, plus a conflicts object that describes whatever it couldn’t deal with. So we can chuck all of our eligible conflicts into this function, and if we still get conflicts, we need to notify a human. If we don’t, we can quietly resolve the conflict in the background.

Checking for Conflicts

First off, there are actually two types of conflicts:

  1. Immediate conflicts that occur because a user is trying to directly update a remote CouchDB doc that has already been changed. This request will fail immediately with a 409 conflict error.
  2. Deferred conflicts that occur because a user was offline while making a change to their local database instance, and that change was then later replicated (synced) to another instance after the target document has already been modified by someone else. Since our app isn’t offline capable, doesn’t use multiple database instances and doesn’t use replication, we need not cover this. If you want to explore this avenue, the previously mentioned blog post will point you in the right direction.

Since the use of the Three Way Merge is basically the same for both cases, we’re just going to implement the former, because it introduces less complexity and is sufficient to make the point of this blog post.

Since we’re only dealing with the first type, we will always immediately know about conflicts, since CouchDB will complain with a 409 whenever we try to cause one. That’s pretty convenient.

Collecting the Three Revisions

Whatever the type of conflict, if we want to resolve it, we need the base revision and two conflicting revisions. When we make a request to change a document and it returns a 409:

  • The base revision is our document state before the user started making a change. Not before the request was sent, that’s too late: the conflicting change will probably have arrived via the _changes feed and been merged into our local app state after the local change was started by the user, but before it was completed and sent to the CouchDB. We need to make a copy of a card the moment the user clicks edit or starts dragging the card, and keep it until our request succeeds or the edit is cancelled.
  • The first conflicting revision is the payload of our failed request. We should have that already.
  • The second conflicting revision is the current state of the document in the CouchDB. We can fetch that with a GET, since we know the document’s _id. We might even have already received this updated version via the _changes feed, but let’s just not try to be clever here and just fetch it, force brut.

If you’re wondering about the _revs, we don’t need to do anything special here. Since our change got rejected, the base revision and our local first conflicting revision will have the same _rev, and the second conflicting revision from CouchDB will have a new, higher _rev. This will get merged in without a conflict and thus always be correct3.

To make the whole process more understandable, let’s walk through two three-way-merges, one that resolves cleanly and one that doesn’t:

Happy-Path Three Way Merge

We start with the base revision of our card (we’ll omit createdAt and updatedAt in this example, because datetime strings are so cluttery). We are looking at this from the perspective of Alice’s machine.

The base revision, before any changes are made:

{
  "_id": "card-08d46168-3f39-492a-b311-8500595f85c3",
  "_rev": "5-d31749d5d1ac33fa79159a996da23fc9",
  "type": "card",
  "title": "One More Lime",
  "column": "column-0",
  "createdBy": "Alice",
  "position": 2,
  "updatedBy": "Charlie"
}

Now, Alice clicks on the edit button, and then Bob drags the card to column-1. Bob’s write to CouchDB succeeds.

Bob’s successful change (this happens first, but since we’re on Alice’s machine, we get this last, so we call it the second conflicting revision):

{
  "_id": "card-08d46168-3f39-492a-b311-8500595f85c3",
  "_rev": "6-fa79d1ac33159a23fc9a9d31749d596d",
  "type": "card",
  "title": "One More Lime",
  "column": "column-1",
  "createdBy": "Alice",
  "position": 2,
  "updatedBy": "Bob"
}

Alice is updating the title, since somebody introduced a typo she wants to fix: obviously it should read ”One More Time”. This produces the following PUT payload:

Alice’s attempted change (since we have this straight away, we call it the first conflicting revision)

{
  "_id": "card-08d46168-3f39-492a-b311-8500595f85c3",
  "_rev": "5-d31749d5d1ac33fa79159a996da23fc9",
  "type": "card",
  "title": "One More Time",
  "column": "column-0",
  "createdBy": "Alice",
  "position": 2,
  "updatedBy": "Alice"
}

She clicks save, and CouchDB returns a 409 conflict error: she was trying to update the document with the _rev: "5-d31749…", but Bob’s change has already bumped the _rev to 6-fa79d1…. We catch that error and will now collect the data needed to attempt a three way merge.

We kept the base revision, and we kept the first conflicting revision (Alice’s attempted change), now we just need to get the second conflicting revision, Bob’s successful change. This has probably already arrived in our local client via the _changes feed, but we’re going to be explicit about it and just fetch it from the CouchDB again. Then, we chuck those three revisions into our merge function, and get this back:

{
  "merged": {
    "_id": "card-08d46168-3f39-492a-b311-8500595f85c3",
    "_rev": "6-fa79d1ac33159a23fc9a9d31749d596d",
    "type": "card",
    "title": "One More Time",
    "column": "column-1",
    "createdBy": "Alice",
    "position": 2,
    "updatedBy": "Bob"
  },
  "added": {
    "a": {},
    "b": {}
  },
  "updated": {
    "a": {
      "title": "One More Time",
      "updatedBy": "Alice"
    },
    "b": {
      "_rev": "6-fa79d1ac33159a23fc9a9d31749d596d",
      "column": "column-1",
      "updatedBy": "Bob"
    }
  },
  "conflicts": {
    "updatedBy": {
      "a": "Alice",
      "o": "Charlie",
      "b": "Bob"
    }
  }
}

But… what? Why is there a conflict? Isn’t this the happy path, where one party changes title and the other changes column and everything is all flowers and unicorns? Well, almost. A decision we made in the past has come back to haunt us: adding the updatedBy (and updatedAt) keys to the card documents. These are always modified by anyone changing the document. This means we never have the case where the conflicting parties make changes that don’t affect each other, so we can never have automatic conflict resolution. Meh.

There would be a simple workaround, since we’re technically making an update in Alice’s name here. We’re merging Alice’s change on top of Bob’s, so updatedBy: "Alice" and updatedAt: new Date().toISOString() will necessarily be correct. There are many ways to go about this, but we could opt to simply ignore any conflicts that pertain to updatedBy or updatedAt. Instead of just doing the merge and checking for conflicts like so:

const mergeResult = merge(base, first, second)
if (!mergeResult.conflicts) {
  // PUT the new revision of this document into CouchDB 
}

…we omit the two keys we don’t need, spread the rest of the conflicts object into a new conflicts variable, and check how many keys it has. We don’t need to check for its existence beforehand, since we provide a default to the destructuring call above, so the ...conflicts rest spread will always, at the very least, be {}.

const mergeResult = merge(base, first, second)
const {updatedBy, updatedAt, ...conflicts} = mergeResult!.conflicts || {}
if (Object.keys(conflicts).length === 0 {
  // PUT the new revision of this document into CouchDB
}

Now when we want to PUT our new revision into CouchDB, we force those two values to be what we want them to be:

const resolution = await db.put({
  ...mergeResult.merged,
  updatedBy: first.updatedBy,
  updatedAt: first.updatedAt, // or new Date().toISOString() also works
})

Hm. Hm Hm Hm. This technically works. But does it feel like a good idea? It does look like some weird hoop-jumping is going on. And consider: if this were an Offline-First app that properly utilises the distributed nature of CouchDB, using updatedBy and updatedAt would sabotage CouchDB's capability to identify and resolve sync conflicts itself. We previously noted that two users making the exact same changes to a document would generate the exact same document, and CouchDB would resolve that neatly if these changes were replicated on top of each other. But with updatedBy and updatedAt, this can no longer occur. Even though our app (currently) isn’t using replication, it feels like we’re working against the grain somehow. CouchDB doesn’t want us to structure our data this way, it’s an antipattern that we’ve subconsciously imported from other tech stacks.

Before we even think about how to solve this, it’s worth thinking about whether we even need these keys at all. Is there a user need that requires this data in order to be fulfilled? Currently not, this information isn’t used or displayed anywhere. So maybe we should simply omit updatedBy and updatedAt. However, in the next post, we’ll be building a manual conflict resolution UI, and in that, it would be beneficial if users could see who made which change. So let’s forge on an make this work now, while we’re already in the correct mindset.

What then would be the couchy way of doing this? Well, in the previous post we posited that one of the ways of avoiding conflicts is making data more granular, so maybe our updatedBy and updatedAt should be in a different document, and not in the card document itself. So maybe we introduce a new document type, something like this:

type ActivityLog = {
  type: "activityLog"
  _id: string             // We’re not sure what this would look like yet…
  updatedBy: string
  updatedAt: string
}

However, we can’t just reference these in the card documents, that would introduce the exact same problem all over again when that reference changes. So the card document cannot know about its log document, which means each log document must hold the reference to its card. Fine. But we don’t want to be iterating over many log documents to find the one we’re interested in, this lookup should be as cheap as possible. The cheapest lookup is a straightforward document GET by _id, without any sort of query, so we’d want the log doc to have a deterministic _id that we can construct from the data we already have in the card document. This would work:

"_id": `log-${card._id}`

But now conflicts can arise when these activity logging documents are written! So let’s get even more granular: instead of every card having its own log document, how about every card revision has its own log document?

"_id": `log-${card._id}-${card._rev}`

Now, whenever a user successfully writes to a card document, they would immediately fire off a write-once-update-never activity log document with an _id that guarantees no-one else will be writing to the same document. Cool. And when we need to find out who made an update to a card, we just GET the document with the _id we've constructed from the card’s _id and _rev.

BUT WAIT” I hear you say, ”do we really want to write a new document on every single change?” To which the couchy answer is: ”yeah, sure, why not, it’ll be fine.”4 The documents are tiny, we’re not indexing them, and if our app gets astronomically successful and they actually do end up becoming troublesome, we can always prune or checkpoint them, or store them in separate databases. Our future selves have a lot of options.

So. Inside our tryToPut() function, whenever we successfully perform a PUT request on a card, we immediately send the log doc after it:

const putResponse = await db.put(newVersion)
if (putResponse.ok) {
  const log: ActivityLog = {
    type: "activityLog",
    updatedAt: new Date().toISOString(),
    updatedBy: currentUserName,
    _id: `log-${putResponse.id}-${putResponse.rev}`,
  }
  db.put(log)
}

In our future conflict resolution UI, we will make one extra GET request for the log document belonging to the remote conflict, so we know who made that change. We already know who made the other change: our local user.

Right. After that little detour, let’s get back on the happy track. We’ve gotten rid of updatedBy and updatedAt, and when we now repeat the merge we tried before we went off on this little logging adventure, we get this:

{
  "merged": {
    "_id": "card-08d46168-3f39-492a-b311-8500595f85c3",
    "_rev": "6-fa79d1ac33159a23fc9a9d31749d596d",
    "type": "card",
    "title": "One More Time",
    "column": "column-1",
    "createdBy": "Alice",
    "position": 2,
  },
  "added": {
    "a": {},
    "b": {}
  },
  "updated": {
    "a": {
      "title": "One More Time",
    },
    "b": {
      "column": "column-1",
      "_rev": "6-fa79d1ac33159a23fc9a9d31749d596d"
    }
  }
}

Much better! We can clearly see that the two updates don’t touch each other: one updates title, and the other updates column and _rev. Which means the merge could be performed cleanly! The object in merged is now our PUT payload for the new and improved, conflict-free revision of our card. We can db.put() this into our CouchDB, it will generate a new 7- revision with both Alice and Bob’s changes in it, that will propagate to all clients via the _changes feed, and everyone’s happy, because nobody noticed this dance was even happening in the background. Excellent.

And that… was it? There’s no UI to build, no server-side code, just running the merge and a few extra requests. All conflicts that can be resolved automatically will be resolved automatically, invisibly, in the background. Our objective for this second post in the series is met. So what’s next? Manual conflict resolution. For that, we need conflicts that can’t be resolved automatically.

One of the great advantages of this approach is that it is completely independent of your business logic: the Three Way Merge works with any JSON structure. You have it run whenever CouchDB responds with a 409 error, and it will just sit there, being useful whenever it can, with no further intervention required on your part. Shipped a new feature, added new data types, restructured your documents? Don’t worry: it will just work. It’s a big, minimally-invasive and maintenance-free win.

Unhappy-Path Three Way Merge

We’ve already seen what a merge with conflicts looks like earlier: the merge function returns a result object with a conflicts key. In that example, we had both parties modifying the updatedBy key, and we fixed that, but they could still both, for example, modify the title key with a different change. When this occurs, we need to hand off the conflict to a human to resolve, since there’s no way for a machine to logically deduce which title is correct. Designing and building a UI for this will be the topic of the third blog post in this series.

Join us for the next posts, where we’ll deal with:

  1. Manual conflict resolution, aka. asking a human to do it
  2. UI locking and its trade-offs
  3. More fun topics to flesh out the app, such as audit trails

Please check back soon! 👋

Footnotes

  1. Eagle eyed readers will have noticed that it doesn’t say ”We need the two conflicting revisions”, only ”we need two conflicting revisions”. There may actually be more than two, but we can only deal with two at the same time. If there are more, the Three Way Merge needs to be repeated.

  2. You can also forego storing the base revision by keeping the diffs (json patches) instead, see our blog post on using JSON patches for conflict resolution. This is useful if your documents are huge.

  3. We go into _rev in more detail in the previous post.

  4. Relax™

« Back to the blog post overview