We are using lots of distributed systems today, the systems running on a group of nodes, some systems even setup machines around the world. The cost is high, paying power and networks charges every hour. We knew how to spend money, in this article, we will discuss a lightweight globally distributed architecture. It is low cost, more importantly, it is movable, you don't need to rent a room to place servers.
Background
Think about a small company, having only three employees, they have big ambition, and run a global business, so they assigned employees to New York, London, and Hong Kong with three smart phones, new phones output PC screen when it is connected with TV and keyboard. This company just started, employees don't come to office every day, they are more often in airports, highways, coffee houses, beaches, or at a village talking business. Often they don't have a stable Internet, but they need the numbers all the time, where on Earth we should choose to place or buy the servers?
Their smart phones are the most trusted place to store data. Can the phone be a part of the software system?
Charge Free, Globally Distributed Architecture
What you are seeing is a Phone, an Email and a World. In the rest of this article, we will discuss how to use software to connect them as a whole. We will use two components, NoSQL database iBoxDB to store data, Email client MailKit to send data, and use Xamarin Forms to build App UI.
Zoom in this architecture, will see lots of boxes.
Data will be packed as Boxes, and use Email Servers that are already deployed around the world to deliver. The qualities of Email servers are variant, delivery order and time are not guaranteed. How to make sure we can unpack back the Data.
Message's ID
Since Internet appeared, nodes have communicated with each other, ID design became as important as Font design. We had many solutions, here introduce a solution, called "Dependent ID
", it means if you want to process this message, first you need to process the message, if the message got lost, ask to send it again.
As the picture above, the traditional point to point messages have a ordered ID, easy to know which got lost. distributed messages also have ordered ID, but don't have a ID Server to create IDs for all Nodes, not easy to figure out which ID had lost, when we added a dependent-id, thing is getting easier.
Implement
To implement this architecture, we need to register four email addresses from four Email Service Providers located in four cities around the world. For simplicity, we install an Email Server locally, and create four domains to represents four locations.
Stand-Alone Mobile App
Before getting distributed, first we build a stand-alone APP. Start a Xamarin Forms solution, and add two components to the Android project.
Install-Package iBoxDB
Install-Package MailKit
Xamarin Forms use lots of binding techniques, we follow the convention, add a Box
property and a PropertyChanged
event. In iBoxDB, data are encapsulated in Boxes.
public class BoxSpace : INotifyPropertyChanged
{
Box box = null;
public Box Box
{
get { return box; }
set
{
box?.Dispose();
box = value;
PropertyChanged?.Invoke(this, new PropertyChangedEventArgs("Box"));
}
}
public event PropertyChangedEventHandler PropertyChanged;
}
When the Box
property is changed, it triggers the Collection
to reload records from database.
public class ObservableTableCollection<T> : ObservableCollection<T> where T : class, new()
{
BoxSpace boxSpace;
string tableName;
public ObservableTableCollection(BoxSpace boxSpace, string tableName)
{
Init();
boxSpace.PropertyChanged += BoxSpace_PropertyChanged;
}
private void BoxSpace_PropertyChanged
(object sender, System.ComponentModel.PropertyChangedEventArgs e)
{
Init();
}
private IEnumerator<T> objects;
private void Init()
{
Clear();
objects = null;
objects = boxSpace.Box?.Select<T>("from " + tableName).GetEnumerator();
}
}
And this Collection
is binding with UI:
public ObservableTableCollection<Item> GetItemTable()
{
return new ObservableTableCollection<Item>(BoxSpace, "Item");
}
public partial class ItemsPage : ContentPage
{
public ObservableTableCollection<Item> Items { get; set; }
public ItemsPage()
{
InitializeComponent();
Items = DataStore.Current.GetItemTable();
BindingContext = this;
}
}
when we want to update the UI, we just update the Box
property:
public void Update()
{
BoxSpace.Box = Auto.Cube();
}
We don't want to load all data to UI at once. Add a limitation to LoadNext
() method, load 10 records each time.
public int LoadNext()
{
...
int startIndex = Count;
List<T> list = new List<T>();
while (objects.MoveNext())
{
Items.Add(objects.Current);
list.Add(objects.Current);
if (list.Count >= 10)
{
break;
}
}
if (list.Count > 0)
OnCollectionChanged(new NotifyCollectionChangedEventArgs
(NotifyCollectionChangedAction.Add, list, startIndex));
}
This method will be called on the UI appearing or scrolling to the end of List
.
void OnAppearing(object sender, ItemVisibilityEventArgs e)
{
if (Items.Count == 0)
Items.LoadNext();
}
void OnItemAppearing(object sender, ItemVisibilityEventArgs e)
{
if (Items != null && e.Item == Items[Items.Count - 1])
{
Items.LoadNext();
}
}
Main Database Definition
In this article, we store Item's data only, create a database table for it. Here are the definitions:
DB db = new DB(...);
db.GetConfig().EnsureTable<Item>("Item", "Id", "DatabaseId");
db.GetConfig().EnsureTable<Confirmed>("Confirmed", "DatabaseId");
Auto = db.Open();
For stand-alone APP, using ID as the Key of database is enough, but we are ready for distributing, it will have many databases in the System, the Key consists of Id and DatabaseId
and what the Confirmed
table is doing?
For distributed Application, network may become unstable, like in a forest without GPS, the simple solution to make sure which Path we had walked through is to make a mark. We will use Confirmed table later. Take a look at the classes first.
public class GlobalObject
{
public long Id { get; set; }
public Guid DatabaseId { get; set; }
public DateTime Time { get; set; } = DateTime.UtcNow;
}
public class Item : GlobalObject
{
public string Text { get; set; }
public string Description { get; set; }
public decimal Price { get; set; }
}
public class Confirmed : GlobalObject
{
}
Now, we can save Item
's data to database.
public bool AddItem(Item item)
{
using (var box = Auto.Cube())
{
item.Id = Tick.GetNextTick();
item.DatabaseId = DatabaseId();
box["Item"].Insert(item);
CommitResult cr = box.Commit();
Update();
return cr == CommitResult.OK;
}
}
DatabaseId
is a Guid
(Globally Unique Identifier), it was created when creating a new database. Main database's Table Id is generated from Time
, not a sequenceId
. The generator's details in SourceCode
.
Start the APP.
What Email Service Can Do
Everyone has a stand-alone APP now, recording prices around the world. how they know prices from each other, sending email by hand. Recall how we store data:
using (var box = Auto.Cube())
{
box["Item"].Insert(item);
CommitResult cr = box.Commit();
}
We put Item
into a box, can we ship boxes out to other places by software. The answer is Yes, that is what this article is trying to do.
Email service is 3rd part service, the good news is that they are standardized. It is easy to switch from one provider to another provider, you can choose the service which has the best quality from your location without changing any code. We use several email's addresses to build a network to deliver Boxes.
DataStore.Current.AccountSpace.Account = new Account
{
Network = new string[] { "andy@newyork.net", "kelly@london.net",
"kevin@hongkong.net", "backup@backup.net" },
}
public static void SendEmail(MimeMessage message)
{
var account = DataStore.Current.AccountSpace.Account;
using (var client = new SmtpClient())
{
...
message.From.Add(new MailboxAddress(account.UserName, account.UserName));
foreach (var toName in account.Network)
{
message.To.Add(new MailboxAddress(toName, toName));
}
client.Send(message);
}
}
As the code above, we send email to all addresses. In this article, we use email to deliver messages only, messages are filtered inside APP through two simple rules, is myself or others.
if (log.DatabaseId == DBVar.DatabaseId)
{
...
}
else
{
...
}
The next question is where to collect the Box
?
Log Database Design
When the Box
is committing to database, it will trigger a OnReceived()
method, but we can't send the Box
out directly, because we are using 3rd service and don't know if Internet is stable. First, we put Box
es to the database's log, and wait to connect with email server. In this article, we store logs in another iBoxDB
database, to distinguish from the Main Database, we call it Log Database.
Following is the definition of Log Database:
public AutoBox Auto { get; private set; }
public Variable DBVar { get; private set; }
public DataStoreSync(long logAddr)
{
DB db = new DB(logAddr);
db.GetConfig().EnsureTable<Log>("Log", "Id", "DatabaseId");
db.GetConfig().EnsureTable<Log>("RemoteLog", "Id", "DatabaseId");
db.GetConfig().EnsureTable<Log>("WaitingSendLog", "Id", "DatabaseId");
db.GetConfig().EnsureTable<Confirmed>("Confirmed", "DatabaseId");
db.GetConfig().EnsureTable<Variable>("DBVar", "Id");
Auto = db.Open();
using (var box = Auto.Cube())
{
if (box["DBVar", 0L].Select<Object>() == null)
{
box["DBVar"].Insert(new Variable
{
Id = 0L,
DatabaseId = Guid.NewGuid(),
SentId = 0,
ReceivedId = 0,
});
box.Commit().Assert();
}
}
using (var box = Auto.Cube())
{
DBVar = box["DBVar", 0L].Select<Variable>();
}
}
The Log
Table stores Box
es collected at committing. RemoteLog
Table stores Box
es downloaded from remote APP through Email Service. WaitingSendLog
Table stores the Box
es lost on the way, waiting to re-send.
RemoteLog
Table and WaitingSendLog
Table are Buffers, and will be cleaned after they get processed. We set Variable.ReceivedId
equal to zero, the APP downloads email from beginning, you can set another value if the email account is not empty.
Logging
All boxes are logged in a Log
format.
class GlobalObject {
long Id { get; set; }
Guid DatabaseId { get; set; }
}
public class Log : GlobalObject
{
public const byte IncGroup = 1;
public long DependentId;
public Guid DependentDatabaseId;
public Guid GlobalId;
public MemoryStream MBox;
}
The Id in Log is a sequence number. Logs have two dependent Logs, the explicit one has written on two fields, DependentId and DependentDatabaseId
. the implicit one is the previous Log, the Id minus 1
.
Each Log
always has a previous Log
, but only the Log
triggered by remote Box
has explicit Dependent Log
. Following is how we write Box
in Log
.
public void OnReceived(Socket socket, BoxData outBox, bool normal)
{
if (socket.Actions == 0) { return; }
using (var box = Auto.Cube())
{
if (!normal)
{
foreach (var lastLog in box.Select<Log>("from Log limit 0,1"))
{
if (socket.ID.Equals(lastLog.GlobalId))
{
return;
}
}
}
Log log = new Log()
{
Id = box.NewId(Log.IncGroup, 1),
DatabaseId = DBVar.DatabaseId,
GlobalId = socket.ID
};
if (socket.DestAddress != FromRemoteAddress)
{
log.DependentId = 0;
log.DependentDatabaseId = Guid.Empty;
log.MBox = new MemoryStream(outBox.ToBytes());
}
else
{
Confirmed confirmed = DB.To<Confirmed>(new MemoryStream(socket.Tag));
log.DependentId = confirmed.Id;
log.DependentDatabaseId = confirmed.DatabaseId;
log.MBox = null;
}
box["Log"].Insert(log);
box.Commit().Assert();
}
}
Socket.ID
is a unique Id for each box, first we check if this Box
had logged by using the socket.ID
. What is the "FromRemoteAddress
" ? Recall again how we store data in box.
using (var box = Auto.Cube()) {
box["Item"].Insert(item);
CommitResult cr = box.Commit();
}
The parameter of Cube()
method is empty, it means from local user in this article. What about from remote user.
var confirmed = new Confirmed()
{
DatabaseId = log.DatabaseId,
Id = log.Id
};
if (log.MBox != null)
{
using (var mainBox =
DataStore.Current.Auto.GetDatabase()
.Cube(FromRemoteAddress, DB.From(confirmed).ToArray()))
{
var lastConfirmedId = mainBox["Confirmed", log.DatabaseId].Select<Confirmed>()?.Id;
if (lastConfirmedId == null || lastConfirmedId < log.Id)
{
BoxReplication.MasterReplicate(mainBox, new BoxData(log.MBox.ToArray()));
mainBox["Confirmed"].Replace(confirmed);
mainBox.Commit().Assert();
}
}
}
box["Confirmed"].Replace(confirmed);
box.Commit().Assert();
It uses FromRemoteAddress
as parameter, and after Replicated remote data to local by using BoxReplication.MasterReplicate
, we update the Confirmed
Table to record this Log from the remote database had processed. The Confirmed Table looks like below:
databaseId(Primary Key) | Id(Had Confirmed) |
Guid(5F854E71-50B8-490A-BA24-0499A7D355D4) | 8 |
Guid(70403B66-A463-45FE-A692-F4CD2839E501) | 1 |
Guid(F7F2464F-8D07-4EF0-9A9E-07063672E691) | 21 |
Before entering data replication, we have some dependencies checking
if (box["Confirmed", log.DatabaseId].Select<Confirmed>()?.Id >= log.Id)
{
return true;
}
if (log.Id == 1L || box["Confirmed",
log.DatabaseId].Select<Confirmed>()?.Id >= (log.Id - 1L))
if (log.DependentDatabaseId == Guid.Empty ||
log.DependentDatabaseId == DBVar.DatabaseId ||
log.DependentId == 0L || box["Confirmed",
log.DependentDatabaseId].Select<Confirmed>()?.Id >= log.DependentId)
{
}
}
We use a Loop to scan RemoteLog
Buffer until no more Log
can be processed, it makes the messages' received order
become not important.
while (count != logs.Count)
{
count = logs.Count;
foreach (var remoteLog in logs)
{
bool success = ProcessRemoteLog(remoteLog);
if (success)
{
result++;
Auto.Delete("RemoteLog", remoteLog.Id, remoteLog.DatabaseId);
}
}
logs = Auto.Select<Log>("from RemoteLog order by Id");
}
Pack Logs in one Message
After the operations above, we have many logs, instead of sending it one by one, we pack logs together by using email's attachments.
var message = new MimeMessage();
message.Subject = CreateLogEmailTitle();
var builder = new BodyBuilder();
foreach (var log in all)
{
string fileName = $"{++num}_{log.Id}_{log.DatabaseId}.bin";
fileName = (log.DatabaseId == DBVar.DatabaseId ? "LOG_" : "ASK_") + fileName;
builder.Attachments.Add(fileName, DB.From(log),
ContentType.Parse("application/octet-stream"));
}
message.Body = builder.ToMessageBody();
EmailService.SendEmail(message);
The rest of SourceCode are about how to deal with lost. Today email services don't lose message, this code maybe never be executed. you can read from SourceCode.
Now you can synchronize APPs around the world.
Lock System
If the Application is about sharing information, no two users edit/delete the same article at the same time, we don't need a Lock System. If you want to add one, a simple solution is to send a Lock
message to all nodes, waiting all nodes accepted. The Lock
message can ask to lock all system or several ID(s) only.
Improvement
We didn't use the backup@backup.net address in this demo. That was designed for when a node is shutdown or the old logs got cleaned, we can copy backup data to new node.
Summary
In this article, we used iBoxDB to store data, and recycle Box to another APP through Email services. This architecture is simple and clean, we just added several lines of code, then a stand-alone application becomes a distributed application.
We used global email system as the global message delivery system, because it is free and thousands upon thousands of servers supporting this huge system. Actually, you have many choices, any global service that can send and receive message is an option, including SMS, etc.
If you're preparing a big cool system, before paying millions for servers and networks, maybe you should try this architecture as a homework. Some things you need to know, if each piece is 99.999% available, when you added up all, the answer is not 1000%, hardware problems are hard to debug, there is no 100% reliable device. Time makes devices weaker, we can take times to make system stronger.
References
History
- Version 1.0
- Version 2.0
- Version 2.0.1
- Version 3.0
- Pack
Logs
- Use
Socket.Tag
to pass Object