Explaining Distributed Data Consistency to IT novices? Well, …

Greek Shepherd

it’s all greek to me. Bruce Stidston cited a post on Google+ where Yonatan Zunger, Chief Architect of Google+, tried to explain Data Consistency by way of Greeks enacting laws onto statute books on disparate islands. Very long post here. It highlights the challenges of maintaining data consistency when pieces of your data are distributed over many locations, and the logistics of trying to keep them all in sync – in a way that should be understandable to the lay – albeit patient – reader.

The treatise missed out the concept of two-phased commit, which is a way of doing handshakes between two (identical copies) of a database to ensure a transaction gets played successfully on both the master and the replica sited elsewhere on a network. So, if you get some sort of failure mid transaction, both sides get returned to a consistent state without anything going down the cracks. Important if that data is monetary balance transfers between bank accounts for example.

The thing that impressed me most – and which i’d largely taken for granted – is how MongoDB (the most popular Open Source NoSQL Database in the world) can handle virtually all the use cases cited in the article out of the box, with no add-ons. You can specify “happy go lucky”, majority or all replicas consistent before confirming write completion. And if a definitive “Tyrant” fails, there’s an automatic vote among the surviving instances for which secondary copy becomes the new primary (and on rejoining, the changes are journaled back to consistency). And those instances can be distributed in different locations on the internet.

Bruce contended that Google may not like it’s blocking mechanics (which will slow down access while data is written) to retain consistency on it’s own search database. However, I think Google will be very read heavy, and it won’t usually be a disaster if changes are journaled onto new Google search results to its readers. No money to go between the cracks in their case, any changes just appear the next time you enact the same search; one very big moving target.

Ensuring money doesn’t go down the cracks is what Blockchains design out (majority votes, then change declines to update attempts after that’s achieved). That’s why it can take up to 10 minutes for a Bitcoin transaction to get verified. I wrote introductory pieces about Bitcoin and potential Blockchain applications some time back if those are of interest.

So, i’m sure there must be a more pithy summary someone could draw, but it would add blockchains to the discussion, and probably relate some of the artistry behind hashes and Git/Github to manage large, multiuser, multiple location code, data and writing projects. However, that’s for the IT guys. They should know this stuff, and know what to apply in any given business context.

Footnote: I’ve related MongoDB as that is the one NoSQL database I have accreditations in, having completed two excellent online courses with them (while i’m typically a senior manager, I like to dip into new technologies to understand their capabilities – and to act as a bullshit repellent!). Details of said courses here. The same functionality may well be available with other NoSQL databases.