Introduction
The fast spread of Voice over IP (VoIP) technology can be sharply predicted today. You can find VoIP solutions in more and more fields. One of the possible use patterns of these VoIP solutions is the build of a simple Internet telephony program. Because of these facts, I decided to build my own VoIP telephone application based on the following knowledge and requirements:
The used code should support the latest stable .NET technologies, the use of C# programming language and easy obfuscation. [3]
The two essential protocols of VoIP calls are SIP [1] and the H323 [2] protocol. Both protocols are capable of creating audio-visual communication between the participants with the use of other protocols.
I decided to use the SIP protocol because it is easy to implement and the understanding of communication processes is also easier. Moreover, the SIP protocol does not inherit anything from the features of PSTN network. Detailed comparisons have also dealt with this topic.
So after the design decision it was obvious to use SIP, SDP, RTP and RTCP protocols: SIP (Session Initiation Protocol) is used creating sessions between the parties. SDP (Session Description Protocol) is used for describing multimedia communication sessions. RTP (Transport Protocol for Real-Time Applications) defines the delivery of mediadata. The communication process is built up by using SIP, SDP then RTP protocols.
Experiments
In the first phase of my experiment, I decided to write an own Softphone. It started with the minimal implementation of SIP protocol, then I developed the minimal representation of SIP messages (in other words, I developed the SIP Headers that are included in an average SIP Message just like Via, Contact, From, To, Call-ID). After that, I successfully establish a call on INVITE level.
Until this point, I could go on easily and quickly but then I had to face two problems: Once the invite messages that arrived in certain situations had no effects. The other problem was that the SDP protocol that is for the reconciliation of the media was missing or waiting for implementation, and the RTP Protocols that are responsible for the media communication were missing, as well. Then, as a consequence, the following software architecture was imagined:
+-----------------------------+
| UserAgent [4] |
+--------------+--------------+
|SIP/SDP [4][5]| RTP [6] |
+--------------+--------------+
| Network layer |
+-----------------------------+
Since only two fifth of this software architecture was ready, I started to search for components on the Internet. There are fine SDP implementations available on the Internet but the RTP implementations also include network communication. This fact would make the standardized use difficult. I was working on the first problem when I found a SIP guide called "A Hitchhiker's Guide to the Session Initiation Protocol" [8]. That was the point when I decided to give up using my own written components to realize the required application.
In the second phase of my experiment, I was looking for outside components that give complex solution for treating SIP protocol. Most of the SDKs available on the Internet do not meet the above requirements. They cannot be used properly, it is difficult to use them, or they require too much technical knowledge or they are Wrapped COM objects.
As the result of my search, I found the solution offered by Ozeki. Ozeki VoIP SIP SDK provides an easy to use interface; furthermore, it helps testing with a Mock Softphone object. In the MockUp part of the software development life cycle, it makes testing of the appropriate components and models much easier. Furthermore, the random events match reality and they create realistic situations without making a real phone call.
In the third phase of my experiment, I got to know and started to use the selected component. Now I would like to summarize the results and experiences. Since the aim of this article is not the presentation of Windows sound management, the sound management will be presented in a simple and striking way. On the basis of the sample code presented on Ozeki VoIP SIP SDK website [9][10], you can get a transparent, easy-to-use and simple code with the help of a component that is able to handle VoIP calls pragmatically.
Ozeki VoIP SIP SDK
To make it simple, I am going to show you a program which ignores the implementation of GUI and the handling of the technicality of the audio device, as well. The problems deriving from these details are easily solved by showing SDK usage in a console application. Sound handling problems are also solved by the instant return of the received audio data. Thus, the code focuses on the handling of events and on the introduction of constructional objects.
In order to do so, we need to be familiar with the available tools. In the middle of the abstraction, there is the IPhoneCall
. You can find more information about IPhoneCall
in the documentation available on the website of Ozeki. To the Phonecall
objects, a listener can be attached that is similar to the Observer
pattern. Although, the attaching and detaching needs to be done by the programmer with the help of AttachListener
and DetachListener
extension methods. Additionally, all event types are unifiedly handled with the help of VoIPEventArgs
.
public interface IPhoneCallListener
{
void CallErrorOccured(object sender, VoIPEventArgs<CallError> e);
void CallStateChanged(object sender, VoIPEventArgs<CallState> e);
void DtmfReceived(object sender, VoIPEventArgs<DTMF> e);
void MediaDataReceived(object sender, VoIPEventArgs<VoIPMediaData> e);
void PlainMediaDataReceived(object sender, VoIPEventArgs<EncodedMediaData> e);
}
During the active lifecycle of a representative telephone call object situations can happen. These situations are listed in IPhoneCallListener
. These function names speak for themselves so they will not be discussed here. The example shown below can give you guidance.
class PongCallListener : IPhoneCallListener
{
public void DtmfReceived(object sender, VoIPEventArgs<DTMF> e)
{
var dtmf = e.Item;
var call = (PhoneCall)sender;
Console.WriteLine("Dtmf received");
call.SendDTMFSignal(VoIPMediaType.Audio, e.Item);
}
In this example, we are creating a simple PhoneCallListener
object. It will send some of the received information to the other party as soon as it is received. For this DTMF signal sending is an example.
public void CallErrorOccured(object sender, VoIPEventArgs<CallError> e)
{
var call = (PhoneCall)sender;
Console.WriteLine("Call error occurred: " e.Item);
}
If an error occurs during the configuration of the call, the purpose of the error will be written on the screen.
public void MediaDataReceived(object sender, VoIPEventArgs<VoIPMediaData> e)
{
var call = (PhoneCall)sender;
call.SendMediaData(e.Item.MediaType, e.Item.PCMData);
}
A data has arrived in pure PCM format. This means that SDK can handle not just audio but other media type data as well. Here, we simply send back the received data to the sender and by this, we cause a big surprise for him.
public void CallStateChanged(object sender, VoIPEventArgs<CallState> e)
{
var call = (PhoneCall)sender;
Console.WriteLine("Call state changed: " e.Item);
if (e.Item > CallState.InCall)
call.DetachListener(this);
}
The status of the call may change, if the status is different than the InCall
, namely it ended somehow, we put the phone down, or the other party put it down. If these situations happen, than it is worth removing the PhoneCallListener
object from the PhoneCall
object, with the above mentioned DetachListener
method.
public void PlainMediaDataReceived(object sender, VoIPEventArgs<EncodedMediaData> e)
{
}
}
If the data arrives in an encrypted form from the caller, then we used the IPhoneCall.PlainMediaData
property while the application was running. In this case, we need to do nothing.
Accordingly, the device we mostly need to deal with is the IPhoneCall
interface implements object, and the IPhoneCallListener
objects that are attached to it. In this way, we get an appropriate creative freedom.
pclass Program
{
We also need to create telephone calls. In order to do this, we need a program.
static Dictionary<string,IPhoneCall> Calls;
static PongCallListener FunnyCallListener;
The program contains the active calls in a Dictionary
, and we only use one PongCallListener
.
static void Main(string[] args)
{
ISoftPhone SoftPhone = new SoftPhone("", 5000, 8000, 5060);
SoftPhone.IncommingCall = (SoftPhone_IncommingCall);
IPhoneLine PhoneLine = null;
FunnyCallListener = new PongCallListener();
Calls = new Dictionary<string, IPhoneCall>();
We instance a SoftPhone
object that handles the calls. If we are bored of testing our application with real calls, then we can use the received ArbSoftPhone
, that creates random situations.
After our SoftPhone
is completed, we subscribe to incoming call events, that is thrown when there is an incoming call just like its name suggests. In parameters, the calling object is found alone.
We also need a telephone line, this is the IPhoneLine
interface. Here, I would like to add that SDK can handle multiple parallel lines, with multiple parallel callings on them. Then, we instance our CallListener
object that we are going to attach to every call, during the running of the program.
The Calls
dictionary assigns Call string
s to the call objects.
Console.WriteLine("Be funny!");
Console.Write("Display name: "); string displayName = Console.ReadLine();
Console.Write("Username: "); string username = Console.ReadLine();
Console.Write("Register name: "); string registerName = Console.ReadLine();
Console.Write("Register password: "); string registerPassword = Console.ReadLine();
Console.Write("Domain server: "); string domainServer = Console.ReadLine();
We read the registration information from the user.
string[] domains = domainServer.Split(':');
int port = 5060;
if (domains.Length == 2)
port = Int32.Parse(domains[1]);
SIPAccount account = new SIPAccount(true, displayName, username, registerName,
registerPassword, domains[0], port);
PhoneLine = SoftPhone.CreateAndRegisterPhoneLine(account);
We check if there was a port in the given domain, if not, then we use the default 5060 one during the registration. To do so, we need to create a SIPAccount
object, then by using this object, we request an IPhoneLine
object from the SoftPhone
. On this IPhoneLine
object, we start the registration procedure.
while (true)
{
Then, comes the fun part...
string statement = Console.ReadLine().Trim();
if (statement.StartsWith("exit"))
break;
...until we are bored of it. From the keyboard, we read a string
. If it is "exit" we quit, if it is something else...
if (!Calls.ContainsKey(statement))
{
if (PhoneLine.RegisteredInfo == PhoneLineInformation.RegistrationSucceded)
{
IPhoneCall Call = SoftPhone.CreateCallObject(
PhoneLine, statement, FunnyCallListener
);
Calls.Add(statement, Call);
Call.CallStateChanged = (Call_CallStateChanged);
Call.Start();
}
}
}
... we check, if we have attached call object to the received string
, if yes then nothing happens, if no we check our telephone line has successfully registered or not. If it did, we created a phone object with SoftPhone
, and it will only start calls on our one telephone line. These calls will be created to the typed phone number and it will be contained in the property of DialInfo
.
We attach the typed phone number to created object, we sign for the call state transition, for the case when the otherside put down, we get a notification and we can take it out from the Calls.
Then we start the call. We keep on repeating this until we are bored of it.
foreach (IPhoneCall call in Calls.Values)
call.HangUp();
SoftPhone.Close();
}
When quit, we put down every active call.
After that, we also need event handlers. These are for handling the incoming calls. Also, they are responsible for the removal of the calling object dictionary at the specific call ending.
static void SoftPhone_IncommingCall(object sender, VoIPEventArgs<IPhoneCall> e)
{
e.Item.AttachListener(FunnyCallListener);
Calls.Add(e.Item.DialInfo, e.Item);
e.Item.CallStateChanged = (Call_CallStateChanged);
e.Item.Accept();
}
We immediately attach our CallListener
for the incoming calls. We add it to Calls. Then we add the change state transition function as well. We automatically accept the incoming call.
static void Call_CallStateChanged(object sender, VoIPEventArgs<CallState> e)
{
if (e.Item > CallState.InCall)
{
IPhoneCall call = sender as IPhoneCall;
if (call == null)
return;
Calls.Remove(call.DialInfo);
call.DetachListener(FunnyCallListener);
call.CallStateChanged -= (Call_CallStateChanged);
}
}
}
If the received CallState
is greater than the InCall
, then the call is ended, and this is the event we are interested in. We remove the call object from Calls, according to dial info, and then we remove CallListener
from it, just like Call_CallStateChanged
event handler.
The CallState
is an enum
, that is a sorting along the call statuses. For example, it grows from Setup
to InCall
, through Completed
, that is why the comparative operators can be used on it, everything that signals the ending of the call is larger than the InCall
.
The example shows how simply and quickly a rarely complex application can be developed that is able to handle phone calls. Expendation only depends on the IPhoneCallListener
implementation.
Summary
To summarize, in order to implement your own VoIP SoftPhone
is time consuming and it requires a lot of energy. Therefore, it is efficient to use previously written components. After the study of many SDKs, the most understandable was the solution given by Ozeki. As it was shown in the examples, written in interfaces, implemented in classes it handles phone calls in the easiest way. Like Albert Einstein said:"Everything should be made as simple as possible, but no simpler."
You do not need to worry about the implementation details. Everything that the programmer can do with the telephone call, this stands in the center, and its situation is defined in one place. Therefore, your plans can be easily achieved, because you do not need to get lost in technical details. It is as easy as one, two, three. I ask for a telephone, and for one or more lines and I call or I am called. However, the article greatly refers to the documentation out on the web page from which more information can be earned. I can recommend this solution to everyone.
References
[1] http://en.wikipedia.org/wiki/Session_Initiation_Protocol
[2] http://en.wikipedia.org/wiki/H323
[3] http://en.wikipedia.org/wiki/Obfuscated_code
[4] http://tools.ietf.org/html/rfc3261
[5] http://tools.ietf.org/html/rfc4566
[6] http://tools.ietf.org/html/rfc3550
[7] http://tools.ietf.org/html/rfc3551
[8] http://tools.ietf.org/html/rfc5411
[9] http://www.voip-sip-sdk.com
[10] http://www.voip-sip-sdk.com/index.php?owpn=98
History
- 18th March, 2011: Initial post