Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Movable Globally Distributed System with Database

0.00/5 (No votes)
7 Nov 2017 1  
A distributed system architecture, mobile devices around the world, free charge to construct your global application
We are using lots of distributed systems today, the systems running on a group of nodes, some systems even setup machines around the world. The cost is high, paying power and networks charges every hour. We knew how to spend money, in this article, we will discuss a lightweight globally distributed architecture. It is low cost, more importantly, it is movable,  you don't need to rent a room to place servers.

Background

Think about a small company, having only three employees, they have big ambition, and run a global business, so they assigned employees to New York, London, and Hong Kong with three smart phones, new phones output PC screen when it is connected with TV and keyboard. This company just started, employees don't come to office every day, they are more often in airports, highways, coffee houses, beaches, or at a village talking business. Often they don't have a stable Internet, but they need the numbers all the time, where on Earth we should choose to place or buy the servers?

Their smart phones are the most trusted place to store data. Can the phone be a part of the software system?

Charge Free, Globally Distributed Architecture

What you are seeing is a Phone, an Email and a World. In the rest of this article, we will discuss how to use software to connect them as a whole. We will use two components, NoSQL database iBoxDB to store data, Email client MailKit to send data, and use Xamarin Forms to build App UI.

Zoom in this architecture, will see lots of boxes.

Data will be packed as Boxes, and use Email Servers that are already deployed around the world to deliver. The qualities of Email servers are variant, delivery order and time are not guaranteed. How to make sure we can unpack back the Data.

Message's ID

Since Internet appeared, nodes have communicated with each other, ID design became as important as Font design. We had many solutions, here introduce a solution, called "Dependent ID", it means if you want to process this message, first you need to process the message, if the message got lost, ask to send it again.

As the picture above, the traditional point to point messages have a ordered ID, easy to know which got lost. distributed messages also have ordered ID, but don't have a ID Server to create IDs for all Nodes, not easy to figure out which ID had lost, when we added a dependent-id, thing is getting easier.

Implement

To implement this architecture, we need to register four email addresses from four Email Service Providers located in four cities around the world. For simplicity, we install an Email Server locally, and create four domains to represents four locations.

Stand-Alone Mobile App

Before getting distributed, first we build a stand-alone APP. Start a Xamarin Forms solution, and add two components to the Android project.

Install-Package iBoxDB
Install-Package MailKit

Xamarin Forms use lots of binding techniques, we follow the convention, add a Box property and a PropertyChanged event. In iBoxDB, data are encapsulated in Boxes.

public class BoxSpace : INotifyPropertyChanged
{
   Box box = null;
   public Box Box
   {
      get { return box; }
      set
      {
         box?.Dispose();
         box = value;
         PropertyChanged?.Invoke(this, new PropertyChangedEventArgs("Box"));
      }
   }

  public event PropertyChangedEventHandler PropertyChanged;
}

When the Box property is changed, it triggers the Collection to reload records from database.

public class ObservableTableCollection<T> : ObservableCollection<T> where T : class, new()
{
    BoxSpace boxSpace;
    string tableName;

    public ObservableTableCollection(BoxSpace boxSpace, string tableName)
    {
        Init();
        boxSpace.PropertyChanged += BoxSpace_PropertyChanged;
    }

    private void BoxSpace_PropertyChanged
    (object sender, System.ComponentModel.PropertyChangedEventArgs e)
    {
       Init();
    }

    private IEnumerator<T> objects;
    private void Init()
    {
       Clear();
       objects = null;
       objects = boxSpace.Box?.Select<T>("from " + tableName).GetEnumerator();
    }
}

And this Collection is binding with UI:

public ObservableTableCollection<Item> GetItemTable()
{
   return new ObservableTableCollection<Item>(BoxSpace, "Item");
}     
public partial class ItemsPage : ContentPage
{
   public ObservableTableCollection<Item> Items { get; set; }

   public ItemsPage()
   {
      InitializeComponent();

      Items = DataStore.Current.GetItemTable(); 
      BindingContext = this;
   }
}

when we want to update the UI, we just update the Box property:

public void Update()
{
   BoxSpace.Box = Auto.Cube();
}

We don't want to load all data to UI at once. Add a limitation to LoadNext() method, load 10 records each time.

public int LoadNext()
{ 
            ...
  int startIndex = Count;
  List<T> list = new List<T>();
  while (objects.MoveNext())
  {
      Items.Add(objects.Current);
      list.Add(objects.Current);
      if (list.Count >= 10)
      {
         break;
      }
  }
  if (list.Count > 0)
    OnCollectionChanged(new NotifyCollectionChangedEventArgs
    (NotifyCollectionChangedAction.Add, list, startIndex));
               
}

This method will be called on the UI appearing or scrolling to the end of List.

void OnAppearing(object sender, ItemVisibilityEventArgs e)
{
   if (Items.Count == 0)
     Items.LoadNext();       
}
void OnItemAppearing(object sender, ItemVisibilityEventArgs e)
{
  if (Items != null && e.Item == Items[Items.Count - 1])
  {
     Items.LoadNext();
  }
}

Main Database Definition

In this article, we store Item's data only, create a database table for it. Here are the definitions:

  DB db = new DB(...);
  db.GetConfig().EnsureTable<Item>("Item", "Id", "DatabaseId");
  db.GetConfig().EnsureTable<Confirmed>("Confirmed", "DatabaseId");

  Auto = db.Open();

For stand-alone APP, using ID as the Key of database is enough, but we are ready for distributing, it will have many databases in the System, the Key consists of Id and DatabaseId and what the Confirmed table is doing?

For distributed Application, network may become unstable, like in a forest without GPS, the simple solution to make sure which Path we had walked through is to make a mark. We will use Confirmed table later. Take a look at the classes first.

public class GlobalObject
{
   public long Id { get; set; }
   public Guid DatabaseId { get; set; }

   public DateTime Time { get; set; } = DateTime.UtcNow;
}

public class Item : GlobalObject
{
   public string Text { get; set; }
   public string Description { get; set; }
   public decimal Price { get; set; }
}   
  
public class Confirmed : GlobalObject
{
}

Now, we can save Item's data to database.

public bool AddItem(Item item)
{
  using (var box = Auto.Cube())
  {
      item.Id = Tick.GetNextTick();
      item.DatabaseId = DatabaseId();
      box["Item"].Insert(item);
      CommitResult cr = box.Commit();
      Update();
      return cr == CommitResult.OK;
  }
}

DatabaseId is a Guid(Globally Unique Identifier), it was created when creating a new database. Main database's Table Id is generated from Time, not a sequenceId. The generator's details in SourceCode.

Start the APP.

What Email Service Can Do

Everyone has a stand-alone APP now, recording prices around the world. how they know prices from each other, sending email by hand. Recall how we store data:

using (var box = Auto.Cube())
{
   box["Item"].Insert(item);
   CommitResult cr = box.Commit();
}

We put Item into a box, can we ship boxes out to other places by software. The answer is Yes, that is what this article is trying to do.

Email service is 3rd part service, the good news is that they are standardized. It is easy to switch from one provider to another provider, you can choose the service which has the best quality from your location without changing any code. We use several email's addresses to build a network to deliver Boxes.

DataStore.Current.AccountSpace.Account = new Account
{ 
   Network = new string[] { "andy@newyork.net", "kelly@london.net", 
                            "kevin@hongkong.net", "backup@backup.net" },
}
public static void SendEmail(MimeMessage message)
{
   var account = DataStore.Current.AccountSpace.Account;

   using (var client = new SmtpClient())
   {
      ...
      message.From.Add(new MailboxAddress(account.UserName, account.UserName));
      foreach (var toName in account.Network)
      {
         message.To.Add(new MailboxAddress(toName, toName));
      }
      client.Send(message);
      
   }
}

As the code above, we send email to all addresses. In this article, we use email to deliver messages only, messages are filtered inside APP through two simple rules, is myself or others.

if (log.DatabaseId == DBVar.DatabaseId)
{ 
   ...
}
else
{ 
  ...
}

The next question is where to collect the Box?

Log Database Design

When the Box is committing to database, it will trigger a OnReceived() method, but we can't send the Box out directly, because we are using 3rd service and don't know if Internet is stable. First, we put Boxes to the database's log, and wait to connect with email server. In this article, we store logs in another iBoxDB database, to distinguish from the Main Database, we call it Log Database.

Following is the definition of Log Database:

public AutoBox Auto { get; private set; }
public Variable DBVar { get; private set; }
public DataStoreSync(long logAddr)
{
   DB db = new DB(logAddr);

   //Key={Id,DatabaseId}, Id before DatabaseId for faster Search.
   db.GetConfig().EnsureTable<Log>("Log", "Id", "DatabaseId");

   //Downloaded from Email Service, delete row after processed
   db.GetConfig().EnsureTable<Log>("RemoteLog", "Id", "DatabaseId");
   //Waiting for sending to others, clear all after synchronized
   db.GetConfig().EnsureTable<Log>("WaitingSendLog", "Id", "DatabaseId");

   //Confirmed Log-Id for all databases in the Network,  count(*)==databases.length
   db.GetConfig().EnsureTable<Confirmed>("Confirmed", "DatabaseId");

   //database's variables, only one record, Id==0L
   db.GetConfig().EnsureTable<Variable>("DBVar", "Id");

   Auto = db.Open();
   using (var box = Auto.Cube())
   {
       if (box["DBVar", 0L].Select<Object>() == null)
       {
           // this Table only have one record, Id==0L.
           box["DBVar"].Insert(new Variable
           {
              Id = 0L,
              DatabaseId = Guid.NewGuid(),//new databaseId
              SentId = 0, //which Log from Log-Table had sent
              ReceivedId = 0, //email sequenceId had downloaded
           });
           box.Commit().Assert();
       }
   }

   using (var box = Auto.Cube())
   {
       DBVar = box["DBVar", 0L].Select<Variable>();
   }
}

The Log Table stores Boxes collected at committing. RemoteLog Table stores Boxes downloaded from remote APP through Email Service. WaitingSendLog Table stores the Boxes lost on the way, waiting to re-send.

RemoteLog Table and WaitingSendLog Table are Buffers, and will be cleaned after they get processed. We set Variable.ReceivedId equal to zero, the APP downloads email from beginning, you can set another value if the email account is not empty.

Logging

All boxes are logged in a Log format.

class GlobalObject { 
  long Id { get; set; }  
  Guid DatabaseId { get; set; } 
}
public class Log : GlobalObject
{
   public const byte IncGroup = 1;

   public long DependentId;
   public Guid DependentDatabaseId;

   public Guid GlobalId;
   public MemoryStream MBox; //Binary Box

}

The Id in Log is a sequence number. Logs have two dependent Logs, the explicit one has written on two fields, DependentId and DependentDatabaseId. the implicit one is the previous Log, the Id minus 1.

Each Log always has a previous Log, but only the Log triggered by remote Box has explicit Dependent Log. Following is how we write Box in Log.

public void OnReceived(Socket socket, BoxData outBox, bool normal)
{
    if (socket.Actions == 0) { return; }
    using (var box = Auto.Cube())
    {
        if (!normal)
        {
            //limit 0,1, only check the last, descending order
            foreach (var lastLog in box.Select<Log>("from Log limit 0,1"))
            {
                //had logged
                if (socket.ID.Equals(lastLog.GlobalId))
                {
                    return;
                }
            }
        }

        Log log = new Log()
        {
            Id = box.NewId(Log.IncGroup, 1),
            DatabaseId = DBVar.DatabaseId,

            GlobalId = socket.ID
        };

        if (socket.DestAddress != FromRemoteAddress)
        {
            //Current user operates, no remote dependency, local dependency is Log.Id-1L
            log.DependentId = 0;
            log.DependentDatabaseId = Guid.Empty;
            log.MBox = new MemoryStream(outBox.ToBytes());
        }
        else
        {
            //Replicates from remote user, set dependency
            Confirmed confirmed = DB.To<Confirmed>(new MemoryStream(socket.Tag));
            log.DependentId = confirmed.Id;
            log.DependentDatabaseId = confirmed.DatabaseId;
            //Current database doesn't store the remote Box's data Again.
            log.MBox = null;
        }
        box["Log"].Insert(log);

        box.Commit().Assert();
    }
}

Socket.ID is a unique Id for each box, first we check if this Box had logged by using the socket.ID. What is the "FromRemoteAddress" ? Recall again how we store data in box.

using (var box = Auto.Cube()) {   
  box["Item"].Insert(item);   
  CommitResult cr = box.Commit(); 
}

The parameter of Cube() method is empty, it means from local user in this article. What about from remote user.

var confirmed = new Confirmed()
{
    DatabaseId = log.DatabaseId,
    Id = log.Id
};

if (log.MBox != null)
{
    using (var mainBox =
        DataStore.Current.Auto.GetDatabase()
        .Cube(FromRemoteAddress, DB.From(confirmed).ToArray()))
    {
        var lastConfirmedId = mainBox["Confirmed", log.DatabaseId].Select<Confirmed>()?.Id;
        if (lastConfirmedId == null || lastConfirmedId < log.Id)
        {
            BoxReplication.MasterReplicate(mainBox, new BoxData(log.MBox.ToArray()));

            //this Replace(confirmed) records which remote Log had replicated to current DB, 
            //also trigger OnReceived() event to log it by using DependentId
            mainBox["Confirmed"].Replace(confirmed);
            mainBox.Commit().Assert();
        }
    }
}
box["Confirmed"].Replace(confirmed);
box.Commit().Assert();

It uses FromRemoteAddress as parameter, and after Replicated remote data to local by using BoxReplication.MasterReplicate, we update the Confirmed Table to record this Log from the remote database had processed. The Confirmed Table looks like below:

databaseId(Primary Key) Id(Had Confirmed)
Guid(5F854E71-50B8-490A-BA24-0499A7D355D4) 8
Guid(70403B66-A463-45FE-A692-F4CD2839E501) 1
Guid(F7F2464F-8D07-4EF0-9A9E-07063672E691) 21

Before entering data replication, we have some dependencies checking

if (box["Confirmed", log.DatabaseId].Select<Confirmed>()?.Id >= log.Id)
{
    return true;
}

if (log.Id == 1L || box["Confirmed", 
    log.DatabaseId].Select<Confirmed>()?.Id >= (log.Id - 1L))
    if (log.DependentDatabaseId == Guid.Empty || 
    log.DependentDatabaseId == DBVar.DatabaseId ||
        log.DependentId == 0L || box["Confirmed", 
        log.DependentDatabaseId].Select<Confirmed>()?.Id >= log.DependentId)
    {
        //Do Relication
    } 
}

We use a Loop to scan RemoteLog Buffer until no more Log can be processed, it makes the messages' received order become not important.

while (count != logs.Count)
{
    count = logs.Count;
    foreach (var remoteLog in logs)
    {
        bool success = ProcessRemoteLog(remoteLog);
        if (success)
        {
            result++;
            Auto.Delete("RemoteLog", remoteLog.Id, remoteLog.DatabaseId);
        }
    }
    logs = Auto.Select<Log>("from RemoteLog order by Id");
}

Pack Logs in one Message

After the operations above, we have many logs, instead of sending it one by one, we pack logs together by using email's attachments.

var message = new MimeMessage();
message.Subject = CreateLogEmailTitle();

var builder = new BodyBuilder();
foreach (var log in all)
{
   string fileName = $"{++num}_{log.Id}_{log.DatabaseId}.bin";
   fileName = (log.DatabaseId == DBVar.DatabaseId ? "LOG_" : "ASK_") + fileName;
   builder.Attachments.Add(fileName, DB.From(log), 
   ContentType.Parse("application/octet-stream"));
}

message.Body = builder.ToMessageBody();
EmailService.SendEmail(message); 

The rest of SourceCode are about how to deal with lost. Today email services don't lose message, this code maybe never be executed. you can read from SourceCode.

Now you can synchronize APPs around the world.

Lock System

If the Application is about sharing information, no two users edit/delete the same article at the same time, we don't need a Lock System. If you want to add one, a simple solution is to send a Lock message to all nodes, waiting all nodes accepted. The Lock message can ask to lock all system or several ID(s) only.

Improvement

We didn't use the backup@backup.net address in this demo. That was designed for when a node is shutdown or the old logs got cleaned, we can copy backup data to new node.

Summary

In this article, we used iBoxDB to store data, and recycle Box to another APP through Email services. This architecture is simple and clean, we just added several lines of code, then a stand-alone application becomes a distributed application.

We used global email system as the global message delivery system, because it is free and thousands upon thousands of servers supporting this huge system. Actually, you have many choices, any global service that can send and receive message is an option, including SMS, etc.

If you're preparing a big cool system, before paying millions for servers and networks, maybe you should try this architecture as a homework. Some things you need to know, if each piece is 99.999% available, when you added up all, the answer is not 1000%, hardware problems are hard to debug, there is no 100% reliable device. Time makes devices weaker, we can take times to make system stronger.

References

History

  • Version 1.0
  • Version 2.0
  • Version 2.0.1
  • Version 3.0
    • Pack Logs
    • Use Socket.Tag to pass Object

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here