Where is SIP2

by Colin Berkshire

It’s hard to pick the one thing that makes SIP so bad. There are so many little things to choose from. But I suppose my choice is the lack of any management of the media stream routing.

We all know SIP really has problems with media streams. One-way audio and zero-way audio are the bane of every VOIP installer. Yes, in a “properly designed network” it all works. But in the real world there is such a plethora that it struggles to work at all.

I don’t understand why SIP hasn’t been adapted to manage the media streams better. At the very least it could request the endpoints to send a test packet through and report on the results, to ensure that there is two-way audio. This would test the uPnP and the NAT traversal and everything in-between.

But it would really go one step further to making VOIP better. It could negotiate with Proxies and Relays. That would be cool.

Let’s say we have two people who are both behind a NAT and who wish to talk with each other. This is a problem, of course, and unless uPnP is really working, there are going to be problems. And, as we all know, there is no answer for a double-NAT. When you hear “double-NAT” you just give up.

It doesn’t have to be this way. It could work perfectly.

If SIP would instruct the endpoints to try a call and test the connection the SIP manager would know if they could communicate. If not, the SIP manager could then pull in some intelligence to route the call. That intelligence could be a set of relays and proxies that could be threaded into the call.

Take a simple case.

Two users, both behind double-NATs want to have a conversation. The answer is to have each of them communicate through a bridge that is publicly available. A company could operate one or more SIP bridges strategically plated on the network. Then, the SIP manager would tell each endpoint to connect to stream their audio through the bridge on an allocated port. The bridge would be stupid: It accepts connections and bridges the media. It makes no decisions. Now, double-NAT users can talk.

Or, perhaps bridges could be set up within a company, and via a VPN they could have a tunnel to a remote branch office. Now, the SIP manager can understand the company network topology and know there is a media bridge and can route the call through it.

In a perfect world every device would have an IPv6 address and would be publicly facing. But that isn’t happening in my lifetime. So I think it is appropriate for SIP to take on more responsibility for setting up the media stream in a way that works.

Now, some of this could be done today. But it is hideously complicated. What we need is a SIP2 architecture that handled this transparently. First and foremost, SIP needs to know when a connection is 2-way or 1-way or 0-way.

I also think that a new SIP protocol should pay more attention to encryption of calls, too. I think we need more than TLS with a single certificate. Certificates should be unique for each conversation, and each direction. Each side should get the public-key for that unique call for the opposite end. Then, calls can truly be private.

Let’s get SIP2 going? Who can do this?