I recently saw a question in a forum about SOA (Service Oriented Architecture) and a very "strange" deployment strategy, where each web application had a copy of the same service instead of sharing the service.
This question made me think about the scalability of SOA and about new deployment strategies (or better, old but rediscovered ones). In fact, those are two different topics and I could talk about them in completely different posts but, well, as I though about them both at the same time, I am posting both at the same time too.
Deployment
There are effectively two "opposed" ideas on how to organize DLLs, services and dependencies in general. One of them is to maximize reuse, so every file/library/service exists only once and is referenced by many different applications, while the other one is to isolate things, including all the needed files with every application, effectively having many copies of the same files if many applications need them.
Having many copies of the same files may look like a terrible idea but this is the strategy that's becoming more and more popular and the big question is: Why?
The short answer is simplicity. To get an idea, these are some of the advantages:
Installation
When installing an application, there's no need to verify if the dependencies exist or not. Simply putting all the files together in a folder will work fine.
When uninstalling an application, there's no need to verify if there's a shared file used by other applications (to avoid deleting it) and there's no risk of forgetting an unused file lost forever in a shared folder. Simply deleting the application's folder with all its dependencies inside will work fine.
When updating an application and its dependencies, there's no need to worry if the new version of a library has a breaking change that could affect other applications. As long as the application being installed works correctly with this particular library version, everything is fine. Other applications will not use this particular version of the library.
Development
Thinking as the developer of a "shared" resource (a resource that can be used by more than one application, even if it is actually copied), it is easier to refactor a library when we aren't required to keep backwards compatibility. Surely, we must try to avoid excessive changes as we don't want people to abandon the library because it changes too much, but we can do required refactoring without having to provide "obsolete" wrappers to keep the old code working.
Thinking as the developer of an application that uses a "shared" resource, we know for sure that our application will not be affected if another applications install a different version of the same library, as our application will keep using its local copy of the library, not some shared copy that may be changed at any time.
Well, I actually have some more points, like the consequences of keeping obsolete functions in libraries, but I think this is enough to show why having many copies of any dependency may be better than using a really "shared" one.
Yet, there are two main problems with this approach:
- In many cases, we will literally have the same file copied many, many times by different applications. Maybe the file system can detect that and reuse the same content but, by default, we would be wasting space when many applications require the same DLLs or files;
- Security fixes are harder to apply. With the shared solution, it is enough to "apply the fix" and all applications that use the shared library will have the improved security. With the isolated approach, you may have updated 10 applications already and still have other applications referencing the old version of the library. Actually, the time to get the fixes is also increased: it is not enough to have a new version of a DLL published by the DLL's publisher. Every application that uses the DLL must be updated by the application's publisher, which in many cases is a different publisher.
Yet, this second point is again an advantage to the developer of the library/shared resource. If a security fix causes a breaking change (quite common situation, as most security fixes simply block an action from happening or add an extra parameter to a function call to provide the security information), it becomes the responsibility of the applications' publishers to test and fix their applications accordingly. If the application doesn't work with the fix, well, it becomes their option to keep the application vulnerable or it would be their fault if the application crashes after using the new version of the DLL.
If the resource was really shared, it would be the responsibility of the shared resource writer to guarantee that nothing breaks. This many times complicates the security fixes because things that could be done with an extra parameter may need to use some (thread-)static data or any other kind of work-around to keep the existing APIs untouched.
Of course, I am not trying to discuss those cases where we can't have separate copies of a given library/resource. I am only trying to show the benefits of using a copy instead of a shared file in those cases it is possible to do it.
(Web) Services and SOA
Up to this moment, I was focusing more on libraries than services. In many situations, we can say that having a library or a service achieves the same result, aside from some differences on performance and isolation in case of exceptions. That is, the same way we can have one library copied in many different applications, we can have one service copied in many different (web) applications.
Yet, as I just said, in some cases, we really can't have multiple instances of the same service running in parallel, so this actually becomes the reason to create a service. Maybe we want to share a database between two or more applications. Maybe we simply want to cache data in memory, and having multiple copies of it in memory causes excessive memory consumption and can cause inconsistency if one of the instances updates the data and the others aren't aware of the change. Of course, each one of the cases may have alternative solutions but what I want to say is that some services must be shared and having multiple copies simply won't work.
So, how can we determine if a service can be shared or not?
In my opinion, any service that can be copied per application should exist as a library first. Maybe it must be exposed as a service because web pages need to access its functionality but, if the web application itself (running on the server) requires the functionality, it is better to avoid the overhead of calling a service and call it locally (I will discuss a little more of this later).
Only when the service can't be copied per application, it should exist as a real service. But, in this case, it is already known that it must be shared by many applications instead of being copied.
That's Not SOA
I know, saying that we must use libraries whenever possible goes against SOA... or at least of what some people understand of SOA.
Some interpretations of SOA consider that everything should be a separate service. Forget about different classes. Any two unrelated functionalities must be done by different web services.
That is, your application needs to get the date and time? Create a service to do that.
You want to save a file? Create a service to do that.
When saving the file, the date and time must be obtained? Don't allow the service that saves the file to get the date and time on its own. It must call the other service that's only responsible for getting the date and time.
You want to verify accesses? Well, a different service must do that.
Benefits of Everything as a Different Service
Before starting this topic, remember, I don't think it is a good idea to make everything as a different service and I would recommend using libraries more often than services. Yet, I will expose some points used by those who defend the idea.
The first benefit is isolation (at least when you put each service on its own process). This single benefit can actually be divided in a lot of small benefits, like:
- If there's a memory leak in one service, only the process that holds that service will start to use more and more memory. This actually makes it much easier to identify the source of the problem. If a library was used as part of an application, well, the single process that included everything will be the source, without any clue on the real responsible for the leak;
- Similarly, if unsafe code is used and there's memory corruption, only the service that corrupts memory will have issues. All other services (and processes) will continue to work fine. There's no risk of service 2 crashing because service 1 corrupted its memory;
- Statistics? Well, they will also be done per service. This will make it easier to know which services consume more CPU, more memory, read or write to the disk too often, helping identify which ones need to be optimized or will require a better hardware to run;
- Resets. Considering something really crashed and needs to be restarted, well, only the service that crashed needs to be restarted. All others can continue to run normally;
- Well, there are probably many others, but I think this is enough to get the idea.
The second and third benefits of using everything as a service is scalability and workload distribution. As each service can run in a different process and even in a different computer, if it is identified that the server utilization is reaching its max, it is possible to get another computer and at least split the charge by putting some services in one computer and the remaining services on the new computer. Considering a situation where we have 10 services instead of a single application, this means that we could use up to 10 different computers to split the workload, and this without talking about farms (where 2 or more servers actually run the same services).
Scalability and Distribution Problem
I can't say anything bad about the benefit of isolation. It really works great.
The problem I see with SOA (at least when everything is a different service) is scalability and the unnecessary distribution of the workload. I know, it looks like I am contradicting myself because I just said that the second benefit is scalability, but that's because I was simply exposing what many people believe. And that's the point that I actually disagree with.
The communication between two processes in the same computer is slower than simply calling a function inside the same process. Even worse, the communication between two computers is much slower. So, when each service is really doing a lot of work, the cost of that communication may not be a problem. It is better to lose some time communicating with another computer and then be free to do something else than losing time doing all the work. But when the communication is for small actions, we may be losing more time (and even consuming more memory) with the communication than doing the work locally.
I used the example of getting the Date and Time as a service. If we really need to guarantee that all accesses have the right date and time by microsecond precision (or with the guarantee that we never get the exact same value, for example) maybe we really need a single service answering that. But in most cases, that's not what we need. So, some will say that here's where we have the extra advantage of services: Each server may host the same Date and Time service, so they could do a local communication instead of a remote communication and any application running outside the servers could communicate with any server to get the date and time. That is, we could have a farm of date and time services.
Yet, in my opinion, we can do even better. If every server can host the date and time service, that means that every other service running on those servers could get the date and time directly (maybe using a shared or copied library) and completely avoid the communication overhead. Notice that getting the date and time directly is much faster than simply sending a message (even to the same computer) to request that date. And the truth is that when the service runs in the same computer, that computer will still use its CPU time to do the job anyways, meaning that either the computer loses time only getting the date, either it creates the communication messages and also loses time getting the date. So, having everything as a service doesn't improve scalability in that case. Actually, it does the opposite, using more resources to do the same job, making the action slower and less scalable.
Analogy?
If you prefer an analogy, I can say that instead of looking at your own watch to know the time, you could ask a colleague near you the time and simply wait him look at his watch and then tell you the time (like a local communication) or you could open your e-mail application and send an e-mail to someone asking the time. In the mean time you can do other things but the action that needs the time is suspended until you get a message back telling you the time.
So, which of those options is more scalable? And which of those options better "distributes the work"?
Maybe you can say that asking your colleagues the time better distributes the work, but it actually increases your work and also gives some work to your colleagues. As humans, we may do that if we want to chat. Computers don't have that need.
Real Case?
You may think that I am going too far talking about a service to get the date and time. Yet, I can say this is actually one of the most common cases. Maybe there is no real service and the database server is used to get the right date. Maybe there is a real service. Maybe it is even worse, there's a service to get the date and time, which in turn queries the database for the date and time.
Think about it. Considering how things are usually encapsulated and how records are often presented in the screen, when a new record is created, a request is made to a service requesting the date and time (first communication). Such service then requests the date and time from the database (second communication). And we now only have an empty record in memory with the date and time filled. Then the record is edited and saved, doing the third communication.
Importing a big text file, without any screen displaying the data. It could very easily fill the same date and time to all records or to execute queries directly, yet to reuse code, a record instance is created. That is, one line of text is read, a record is created (doing the two communications to get the data and time). Then the record is filled and saved, doing the third communication. In the end, for every record, that would only require one database access to be inserted, there are 3 communications, two going to the database.
How much you think that will scale?
Wouldn't things scale better if the application could get the date and time directly (using a local function) so we could only do one communication per inserted record, instead of three?
I am not even going to discuss bulk inserts here. Having only one database access and no communication at all to get the date and time, the database will potentially support the "double" of inserts before reaching its limit and the network will be used only one third compared to the excessively SOA approach.
Farms
One of the biggest scalability boosts when using SOA is the use of farms. Effectively, two or more computers (in some cases, even thousands of computers) run the same service(s) in parallel. This not only improves performance, it also makes the services more available, as one server may be shut down or restarted while the others continue to serve requests.
It is actually not necessary to be using the SOA architecture to use a farm. You can simply create a web application and host it by using two or more computers, completely ignoring any web service. Yet, considering how the internet evolved, most web applications now are divided in HTML, JavaScript and web services accessed by the JavaScript, so I will keep talking about the services.
As I said before, some services are created because we simply can't run them in parallel. In that case, having a farm will not help (and trying to do so will probably cause all kinds of errors). So, what usually happens?
Usually, there's a single computer used as the database server and the entire farm connects to that single computer. If we consider a case where the work done by the web service is like 10 times bigger than the work done inside the database, that means that a farm up to 10 computers can be served by a single database server. Putting 11 or more computers on the farm will not help (and would probably force the database to keep too many active connections in memory, reducing the benefits of the farm every computer past the 10th).
The thing here is: How do we measure how much work the services are doing compared to the database? Also, we have problems like:
- The use of stored procedures: It is said that using stored procedures is better than using normal queries over the database because they are kept optimized/prepared inside the database and avoid transmitting too much data to the clients. Yet, doing this means more work to be done by the database, less work by the services... and the services are the ones benefiting from the farm;
- Data caching: Many developers still don't understand the cost of communication and don't know how to cache data correctly. If each computer on the farm has a cached copy of the data, when there's an update to the data made by one computer, all the other caches on the farm will still have the old data. In some cases, this is acceptable, in many others, this is not. Communicating to every other computer on the farm to tell there was a change will completely kill the benefits of the cache. So the "common" solution is that the cache pings the database to see if there are changes to a record (what most consider to be a small request) and, if there are, do a second query to get the data (the query that actually returns the data). If not, the data already present on the cache is used. Well, the error here is that pinging the database to see if there are changes consumes almost the same time as simply getting the data directly from the database. More time is spent sending the query, locating the record and sending the answer message than with the size of the answer. That is, such a strategy brings almost no benefit when the cache is up-to-date, but doubles the communication needed to the database when the cache is not up-to-date;
- Memory locks become obsolete: When we deal with a single process and many threads, we can use locks to do what's called an "atomic" operation. When two or more computers may be doing the same job/dealing with the same shared data, we can't use in-memory locks. This means that those cases become either unprotected (the most common case, as the local locks make the unit tests pass without giving a real protection) or the lock must be managed on some common place... usually the database, increasing even more the load on it.
Well, I am not saying that there are no ways to make SOA work. There are... and the proof is that we have many big sites working. Yet, simply creating a farm and putting the services there will not solve the issue. Some services can't run in parallel. And the database is usually one of those services.
I am not saying that databases can't benefit from a farm. But usually there are restrictions to do so. Either we have a chance of getting old data if a request goes to the wrong server, or we must force requests coming from an area to always be served by the same computer (which will reduce the benefits of the farm if most requests come from the same area) and a lot of similar problems.
In the end, before using SOA because "it scales", consider if the particular service really scales or not. Creating new services and using SOA doesn't bring scalability for free.
The End for Now
Well... this is the end of this post. Maybe I will continue talking about SOA problems in another post.
I am sorry if this post lost focus as I started talking about deployment of libraries and services and ended-up talking about the scalability problems of SOA... but, well, that's what I had in my mind.
CodeProject