In this demo tutorial, we will show how the diff and patch operation can be applied to monitor changes in TerminusDB schema, TerminusDB documents, JSON schema, and other document databases like MongoDB. Save time comparing large JSON documents, and build in data collaboration features with this free open-source tool.
A Little Background on JSON diff and patch
A fundamental tool in Git’s strategy for distributed management of source code is the concept of the diff and the patch. These foundational operations are what make git possible. Diff is used to construct a patch that can be applied to an object such that the final state makes sense for some value.
But what about structured data? Do similar situations arise with structured data that require diff and patch operations? Sure they do.
In applications, when two or more people are updating the same object, such as an online store, this sort of curation operation is often achieved with a lock on the object. Which means only one person can win. And locks are a massive source of pain, not only because you can’t achieve otherwise perfectly reasonable concurrent operations, but because you risk getting stale locks and having to figure out when to release them.
When more than one person is working on a dataset, there are often times when there is a conflict. Without adequate workflow and conflict measures, quite often someone’s change gets squashed and as a result, data can start to become inaccurate. In the long run, this causes all sorts of issues with reporting, customer service, and business intelligence. This is where diff and patch come in, where users can see a before and after state each time they submit their changes to the database. Here, any conflicts can be flagged and a human review can oversee these changes to ensure data accuracy in the long run. Better data, better decisions.
Using Diff and Patch with TerminusDB Python
Prerequisites
You will need to install the TerminusDB Python client, check out here.
Ensure you have the docker container running on localhost.
In this script, we demonstrate how diff
will give you a Patch
object back and with that object, you can apply patch
to modify an object and we show this for TerminusDB schema, TerminusDB documents, and JSON schema.
In TerminusDB, documents and schemas are represented in JSON-LD format. With diff and patch, we can easily compare any documents and schemas to see what has been changed.
Let us look at a document as a Python object:
class Person(DocumentTemplate):
name: str
age: intjane = Person(name="Jane", age=18)
janine = Person(name="Janine", age=18)
You can directly apply a diff
to get a patch object:
result_patch = client.diff(jane, janine)pprint(result_patch.content)
With the patch object (result_patch
here), you can either review its content or you can apply it to an object and you can get an after object back.
after_patch = client.patch(jane, result_patch)pprint(after_patch)
assert after_patch == janine._obj_to_dict()
As you can see, the after_patch
object (document) is the same as janine
. You can put this document back in the database using replace_document
to commit this change.
Diff and patch also work with JSON-LD documents:
jane = { "@id" : "Person/Jane", "@type" : "Person", "name" : "Jane"}
janine = { "@id" : "Person/Jane", "@type" : "Person",
"name" : "Janine"}result_patch = client.diff(jane, janine)pprint(result_patch.content)
It is also not limited to JSON-LD, it can work with schemas:
class Company(DocumentTemplate):
name: str
director: Personschema1 = WOQLSchema()
schema1.add_obj("Person", Person)
schema2 = WOQLSchema()
schema2.add_obj("Person", Person)
schema2.add_obj("Company", Company)result_patch =
client.diff(schema1, schema2)pprint(result_patch.content)
Note that diff and patch will work on most JSON formats.
Another application example is to compare 2 JSON schemas:
schema1 = {
"type": "object",
"properties": {
"name": { "type": "string" },
"birthday": { "type": "string", "format": "date" },
"address": { "type": "string" },
}
}schema2 = {
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"birthday": { "type": "string", "format": "date" },
"address": {
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" },
"country": { "type" : "string" }
}
}
}
}result_patch = client.diff(schema1, schema2)pprint(result_patch.content)
See the full script here
Using Diff and Patch with MongoDB
In this script, we demonstrate how diff and patch can be used in your MongoDB workflow. The first part of the script is the MongoDB tutorial on how to use Pymongo and in the second part, we demonstrate the extra step to review the changes before applying a patch to your MongoDB collection.
As we discovered in the last section, diff and patch can apply to any JSON format. Since MongoDB also uses JSON format to describe their data, we can use diff and patch to do similar things.
Here we use the tutorial for Pymongo as an example:
client = MongoClient(os.environ["MONGO_CONNECTION_STRING"])
connection = client['user_shopping_list']collection_name = connection["user_1_items"]item_1 = {
"_id" : "U1IT00001",
"item_name" : "Blender",
"max_discount" : "10%",
"batch_number" : "RR450020FRG",
"price" : 340,
"category" : "kitchen appliance"
}item_2 = {
"_id" : "U1IT00002",
"item_name" : "Egg",
"category" : "food",
"quantity" : 12,
"price" : 36,
"item_description" : "brown country eggs"
}
collection_name.insert_many([item_1,item_2])expiry_date = '2021-07-13T00:00:00.000'
expiry = dt.datetime.fromisoformat(expiry_date)
item_3 = {
"item_name" : "Bread",
"quantity" : 2,
"ingredients" : "all-purpose flour",
"expiry_date" : expiry
}
collection_name.insert_one(item_3)
Imagine we want to change item_1
:
new_item_1 = {
"_id" : "U1IT00001",
"item_name" : "Blender",
"max_discount" : "50%",
"batch_number" : "RR450020FRG",
"price" : 450,
"category" : "kitchen appliance"
}
We can compare the old and new item 1 with diff and patch:
tbd_endpoint = WOQLClient("http://localhost:6363/")
item_1 = collection_name.find_one({"item_name" : "Blender"})
patch = tbd_endpoint.diff(item_1, new_item_1)pprint(patch.content)
Again, we can review before making the change at MongoDB:
collection_name.update_one(patch.before, {"$set": patch.update})
This is another more complicated example:
expiry_date = '2021-07-15T00:00:00.000'
expiry = dt.datetime.fromisoformat(expiry_date)
new_item_3 = {
"item_name" : "Bread",
"quantity" : 5,
"ingredients" : "all-purpose flour",
"expiry_date" : expiry
}item_3 = collection_name.find_one({"item_name" : "Bread"})
item_id = item_3.pop('_id')
patch = tbd_endpoint.diff(item_3, new_item_3)pprint(patch.content)
before = patch.before
before['_id'] = item_idcollection_name.update_one(before, {"$set": patch.update})
See the full script here.
Using Diff and Patch with MongoDB JavaScript
Just like the last section, diff and patch can be used to compare documents and schemas to see what has been changed using the JavaScript client.
In this script, we will demonstrate it.
We created a function called patchMongo
:
const mongoPatch = function(patch){
let query = {};
let set = {}; if('object' === typeof patch){
for(var key in patch){
const entry = patch[key]; if( entry['@op'] == 'SwapValue'){
query[key] = entry['@before'];
set[key] = entry['@after'];
}else if(key === '_id'){
query[key] = ObjectId(entry);
}else{
let [sub_query,sub_set] = mongoPatch(entry);
query[key] = sub_query;
if(! sub_set === null){
set[key] = sub_set;
}
}
}
return [query,set]
}else{
return [patch,null]
}
}
We created an object that we can put back to update the data in MongoDB:
let patchPromise = client.getDiff(jane,janine,{});
patchPromise.then( patch => {
let [q,s] = mongoPatch(patch)
console.log([q,s]); const res = db.inventory.updateOne(q, { $set : s});
console.log(res);
if (res.modifiedCount == 1){
console.log("yay!")
}else{
console.log("boo!")
}
console.log(patch);
});
See the full script here.
We hope you found this tutorial useful. We’ve included some additional links below for further reading:
History
- 2nd March, 2022: Initial version