Designing a Peer to Peer Voice Communication Software

Mukit

5.00/5 (5 votes)

27 Mar 2013CPOL6 min read

13.5K

Why let Skype go unchallenged for years and years? Maybe some of you can come up with better algorithms to top Skype and provide better alternatives to the user.

There is no reason actually to design a peer to peer voice communication software when you already have Skype. But why let Skype go unchallenged for years and years? Maybe some of you can come up with better algorithms to top Skype and provide better alternatives to the user. Maybe you can present a better GUI than what Skype has to offer at the moment. So let's give it a thought.

When you are designing a peer to peer voice communication software, the main obstacle is the fire-wall. If you are part of a intranet based lan where you connect to the internet through the main server, you don't have a public IP which makes it difficult for a contact on the other end of the world to find and connect to you.

Here, I am going to discuss some basic ideas about how such peer to peer connections can be established. Before that, a little introduction:

Before worrying about the fire-wall stuff, you first need to think about how you are going to pass your voice to the other side where a listener has the speaker on to listen to your voice.

An easy way is to use Direct X in Windows, which has its own voice communication mechanism from one PC to the other. A more complicated way would be to grab the voice using Windows/Linux APIs, use a codec to compress the captured voice and then pass it to the other side and decompress it using the same codec.

Maybe, you can have a wrapper around whatever library you are using to capture the voice and compressing it and passing it to the other side.

According to my suggestion, you need to write a Voice Server Manager to let some other party connect to you for voice communication. Here the server manager should hide all the nitty gritty details of the library you should be using to open a port in your side and listen for requests to connect. This should also hide how the voice will be received from the other side and what to do when internal events are thrown.
If there is a server, then obviously, there should be a client. So, better write a Voice Client Manager as well which should hide all the details of how it will connect to the server and pass voice to and from the server and client. Mind you, you might be using the same internal library classes/codes/functions in the server and client side (for example, some open source codes which use the same class for server and client). The classes I am recommending are for better maintenance of the product.

Your Voice Server and Client Mangers should be flexible enough to throw events to be caught by listeners whenever a communication is established between a server and a client. Also, events should be thrown when one party just starts to communicate and if there is silence, given your internal library allows it. It is useful to show the voice communication status in the GUI.

In short, you have VoiceManagerServer, VoiceManagerClient, IEventListenerInterface for client and server.

Now the question is how can one party know the IP Address of the other to connect to him for conversation. For this, you have to create a database in some webserver which would keep contact names (plus other details) and IP addresses. Whenever the user logs in to the voice communication software, he/she would store his public IP on the server, given he has one. If the user is behind a fire-wall in a lan, he/she will not have a public IP, rather a administrator designated one starting with 198... (most likely). They are doomed for now until I provide the solution.

Anyway, once a user logs on to the voice communication software, he will query the database in the webserver to retrieve a list of available IPs to connect to. The user id associated with the IP can be used to identify if that user retrieved as a connection is in the contact list of the just logged in user. Actually, the query in the database should be formulated in such a way that only the users in the contact list of the logged in user are retrieved once the logged in user passes his user id. And obviously, any user who is logged in will have a status which should be set in a corresponding field of the database that resides in the web server. And the logged in users should be the ones you'd be trying to connect to.

So far, this has been elementary stuff. Let's come to the point this topic is really about.

The real issue is to connect to a user behind the fire-wall. The main idea is to use the logged on users who have public IPs as a channel; Using tunnelling mechanism, you have to use the beholder of the public IP as a medium to connect + propagate the voice of one party to the other, who is behind the fire-wall.

So, the basic idea for establishing connections between Peer "A" and Peer "B" (both behind fire wall) is:

User with a public ip->Creates a Server "S" which opens a port to listen
"A" connects as client to "S"
"A" sends his voice as raw data to "S"
"B" connects to "S" as a client
"S" attaching a header to the raw data sent by A, sends it to "B" where "B" parses the header of the raw data and understands, from whom the voice is coming. The last thing to do is to (decompress if required and) play the data using system APIs.

In short,

A->S (public IP)<-B

And once a connection is established, you can send voice data to and from the servers and clients.

Just to make sure that a connection will be established, you can create a few servers ("S") in the webserver (mainly used for maintaining the contacts database) and through these servers, you can channel the voice to the other side.

Here, an important aspect is to retrieve from database the most nearly available public IP for both A and B as far as distance goes. This is a challenge and this is an area where a lot of improvement and optimizations can be done and new algorithms can be introduced.

Now, all these discussions above are just to highlight one possibility and that is, as a reader, after learning the basics, you may come up with better ways and algorithms to establish a peer to peer based network which may top Skype/Viber, etc. someday.

Waiting,
Mukit

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)