TalkingHeadz with Aurangzeb Khan, SVP Jabra

by Dave Michels

You can subscribe to TalkingHeadz on most podcast apps.


Dave Michels 2:43
Today we have with us Aurangzeb Khan, the SVP of intelligent vision systems at GN Jabra, welcome Aurangzeb.

Aurangzeb Khan 2:52
Thanks very much, Dave, how are you?

Dave Michels 2:54
Our podcast listeners cannot see you like we can. But I have to comment on that mustache. That is an impressive

Aurangzeb Khan 3:04
I’ve had it pretty much my whole life.

Evan Kirstel 3:06
So Alright, let’s get on with with business. But thanks for that first question, Dave. So tell me Aurangzeb. I know all about jabra headsets, I have a couple at least. But when did Jabra get into video

Aurangzeb Khan 3:23
About two years ago Evan, so I had started a company in California called Altia systems, where we pioneered a new video technology. And I’d love to tell you more about that. In about two years back, we were talking with jobra, about the idea of building some combined product, actually, that product is now out, it’s panna cast 50. And one thing led to another and we really loved the vision the company was executing to and their view of where the world might go with intelligence and AI and all of those things. And we were sort of on parallel journeys, you know, we were doing video and Jabra had been doing audio. And we felt that the world needed these to come together. So we brought the companies together. So that’s when I joined Jabra. I remember the announcement

Dave Michels 4:01
when Jabra announced that they were acquiring your company. And it was surprising for a number of reasons. We’ll get into that. But I just want to highlight here, you sold a 22 person company for $125 million. Is that present, correct?

Aurangzeb Khan 4:16
Yes, that’s right. That’s right. You know, we had pioneered some unique technology. And as you’re seeing, it’s kind of manifesting itself in all the new products coming out. But yeah, that’s right. And the team was sort of 90 plus percent engineering, very tightly focused team. You know, I try and build startups, keep them as small as possible with just the right folks to kind of get that core technology put together in the innovation and shape. And then we are now benefiting from the massive business and manufacturing capabilities that gn has to scale up and scale out.

Evan Kirstel 4:44
Fantastic. So you’re the founder of altius systems.

Aurangzeb Khan 4:47
I’m the founder. Yeah, I’ve been in Silicon Valley for quite some time bit of a serial entrepreneur. I’ve also helped build very high end systems. So some of my earlier work at tandem nonstop. We develop the Six Sigma variable says Sims that run banks and stock exchanges. And then my first startup, we did the Sony PlayStation, two graphics chip, and then later on a video supercomputer. So I’ve always been interested in video and it turned out to be a good place.

Dave Michels 5:12
So you’re hanging out in Silicon Valley, you’re a serial entrepreneur. And you’ve got all this experience in a bunch of important growing sectors like gaming. And you had this lightbulb moment if I have this, right, where you said, aha, the world needs another video company. Is that accurate?

Aurangzeb Khan 5:31
Yeah, pretty close. Yes. You know what happened, I worked with a group of entrepreneurial organizations, open organization of accepting entrepreneurs, and from Pakistan, originally anti, which is a group of people from India and Pakistan, in many countries now around the world. And we look at ideas, and we share ideas and technology. And I met some folks who had been experimenting with multi camera arrays. And you know, I’d been a photographer growing up. And I thought it was really strange that in technology, we had blown through so many limitations, but cameras, but I compare them to how I see the world with my eyes were just very limiting. With my eyes, I see about 180, I see stereo vision, and it’s a certain way that I can sense the world around me. But with cameras, I was kind of getting this tunnel view. And, you know, people have tried to change that using ultra wide angle lenses and fisheye lenses. And those create a lot of distortion. With multi camera Ray technology, you can overcome many of those limitations and create something that is much more natural and closer to how we instinctively sense and view the world. So I was really intrigued by that. And I started to explore that technology. And you know, it turns out to be a hard problem to solve, particularly for actually collaboration, because of real time constraints, you know, information has to go from us to you and back very quickly for it to be a good experience.

Dave Michels 6:43
So I want to make sure I got that, right, because you’re comparing these cameras to your eyes. So the obvious intuitive thing to do was to create a stereo vision the way you described, you have two eyes. And so what you kind of went a little crazy, you went three cameras. So

Aurangzeb Khan 7:01
explain that. But actually, you know, I’m gonna come back if you’ll give me a chance on the stereo, because we ended up building a product for Intel, for using the Korean Olympics, which had six cameras and did real time 3d stereo vision. You know, I think originally we were even bigger, we had a five camera array. And the core idea was that we wanted to keep each camera each unit camera at the right focal length and with the right field of view, so that I did not have a lot of distortion. And then to do one ad, we needed to overlap between the two, because the algorithms we invented are stitching down in real time dynamically stitching, the video feeds coming from these three cameras. So we built our own processor called the panic isolation processor, which is inside our device, where we do all this math. And so three allows us to keep that degree of overlap between the cameras, keep the field of view about right to where you don’t get massive optical distortion. And yet, you can still get that 180 interesting. So is the AI that does the processing in the camera, or the desktop or the application software or above. And the first product we built, the engine that we had we had built is basically a real time dedicated real time video processing engine. It’s a streaming architecture, it brings all the all the pixels in stitches them in real time one pass through, and I’ll put a stage to frame. And that’s a 32 to nine aspect ratio frame. Then in those days, this is back in 20 1415, we were using computer vision. And then in 2018, we started to experiment with AI. Initially, the AI was on Compute stakes or you know in a PC. But now with our latest product, the pelagos 50. We actually have two built in edge AI processors, one for audio one for video. And we think that that gives a lot of power to the product. And we’ll talk about that hopefully here. And as we look ahead, we think that AI for a long time there was a view AI would be all in the cloud. And we didn’t really think that was the right answer. Because, you know, for low latency, ultra high information flow, you want that AI in the room where the oracle of truth in that room, we see everything, and we hear everything. And if we can process all of that instead of sending it you know, 300 milliseconds away with lots of bandwidth to the cloud and back. It’s just a better, more pleasing experience. So we see AI becoming properly peered and hierarchically delivered at the edge in the room and and in the cloud.

Evan Kirstel 9:19
I actually understood everything you just said, I do have a degree in electrical engineering, which Dave has never believed. But I’m wonderful. That’s a fantastic approach. How do you get all that data from the three HD cameras down a single USB wire?

Aurangzeb Khan 9:34
Yeah, part of it is we can identify the region of interest and we don’t process our device actually has an incredible array of pixels because each camera has 13 megapixel so we have almost 40 megapixels of potential data coming in. And we identify a region of interest. And we process the pixels that conform to that region of interest. So when I’m doing a 180 I have a certain field of view that I need and a certain pixel density that I need from each camera to let’s say do parallel Make 4k or attorney dp or a 720. And now when I’m doing deep zoom, so imagine, you know, we don’t have people across spread out that whole field of view, there’s let’s say, just me like I am home. Well, we can detect where the person is and just frame automatically frame the shot, we were the first to deliver that capability, which we call intelligence room, I think people now call it auto framing. And that case, you know, the pixels I need are different than the pixel the need for the 180. So really, it’s dynamic optimization of the pixel array, input coming in, that is then getting packaged into that 32 to nine, feed in inside a 16 to nine frame,

Dave Michels 10:35
you can call it auto framing in general. But if you have a logical engineer on the call, you can use it that’s appropriate. So help us understand why would a headset maker buy a video company like yours?

Aurangzeb Khan 10:47
Yes, as you know, Gen. Audio GeoGebra has a storied history in the technology business, you know, 150 years ago, late the first telegraph cable to China and kind of been innovating since that time, and really deep, deep expertise in audio products and very well regarded the speed product is sort of synonymous with Speaker phones. And as you said, I’m using the headsets and earbuds and so on. And they were innovating in AI also, they were looking at cues, they could pick up from audio, as well as just the pace and tonality and timbre of the voice to understand what people’s feelings and emotions were, what they were sensing and things of that kind. And we were both working in the collaboration space, you know, they had products, for example, very successful with Microsoft Teams with zoom and other providers as were ours. Now, I think these are sort of two threads that came together. They believed as we did, that, in the future, people would want these products combined, that there was a level of physical integration coming in the product space. And they wanted to find the complimentary capability on the video side. And similarly, we had come to that view ourselves that these integrated video bars and of course, now the AI O’s and so on would be coming along. And so we kind of came together initially from the conversation of could we build a product together, then as we got talking, we felt like we had a pretty good alignment on strategy and vision. And it would make sense to join forces and scale out because gn jobra, had built this incredible manufacturing and sales engine, a great brand, very well recognized and respected product. And we were tiny, but we’re starting to be recognized as having done something quite innovative and valuable in the market.

Evan Kirstel 12:19
That’s fantastic. I’m still shocked that gn netcom was founded in 1869. That’s almost as old as Dave Michaels. But were you actually surprised when you were approached by jobra? And not, you know, Apple or Google, for example.

Aurangzeb Khan 12:33
I mean, you know, a number of folks that approached us earlier, Evan, and I think as you all know that this is one of those doing an MLM I’ve both taken companies public and done m&a. And you can’t tell it’s kind of those one of those unpredictable things, right. So I was visiting Europe for a customer visit ended up going to Copenhagen and meeting Rene and Holger and some of the key folks. And you know, it’s through that interaction, and through that idea of, you know, how are the folks what do they think like, how do they act? How do they decide what are they trying to do? That it started to feel? Right? It It felt like, yeah, this could actually work? And make sense. So I was surprised. Yes. Not I mean, the market was changing. So it was not a total shock. But it was certainly not something I had been out, you know, looking for,

Dave Michels 13:16
I would have really prolonged that. That process. I love traveling to Copenhagen, I hope you got a lot of trips.

Aurangzeb Khan 13:23
It’s a beautiful city I have made many times, you know, I have to say California is open again. So it’s really wonderful. And it’s funny how when you don’t get to travel at all, for 18 months, you start sort of miss

Dave Michels 13:36
your flight gets canceled, and you’ve any wonder but thinking about it, it seems kind of strange that a headset company would be interested in video, but I guess we saw the same thing with when Plantronics bought Polycom but I always thought that Plantronics was really going out to the phone business headsets and phones go together. That makes sense. But video is the one thing we do in enterprise comms, it doesn’t really use a headset. So are you finding money synergies? Or what’s what’s going on there?

Aurangzeb Khan 13:59
I mean, if you look back over the last year, all of us around the world have been through a terrible time, right. And interestingly, it became this pivot moment for collaboration. I mean, you saw all the stats for Microsoft Teams and zoom and Google and WebEx just explode, right? We all learn how to live and work the way we are doing right now. I’m actually now at the office, sometimes at home, sometimes at the office. So today I’m at home, but because of this capability, because of how I can see and engage and kind of read the room and read the body language, it works. And so video became this glue that provided the human touch. We really think that’s why i mean if you look at video, it just completely exploded. Right? Absolutely. And so you know, whichever Matic Aston speak it’s a very natural combo, and ended up really enabling educators. For example, when educators to teach from home, you know, I’m just sitting here and for an educator, it’s almost theater, right? They have to move around. They have to use whiteboards. They have to engage students, this being stuck in front of a screen is not not a good experience for the student or the educator. So with our device, they could move around. We had a whiteboard technology. To where they could create a second feed, where we could present the whiteboard in a rectified view. And I use the headset at home because the people walking around, I don’t want to disturb everybody. I’m up early. So I turned out to be actually a pretty natural combination to put these together.

Evan Kirstel 15:14
Yeah, I would, I would agree. And Java is is consumer, you’re a professional consumer prosumer business. So your focus on all three of those segments?

Aurangzeb Khan 15:24
Pretty much right now, we’ve been mainly focused on enterprise and business, I think down the road. Of course, the technology, as you know, applies in so many ways. I mean, people have been using telecasts, you know, on self driving cars to make movies on drones, and so many, many, many use cases. But our focus right now is business collaboration and helping people in education and public sector as well as in just your commercial business be able to work together.

Dave Michels 15:50
So I’m just thinking about, it’s pretty revolutionary, what you’ve done with three cameras, and we’ll talk a little more about that, but just kind of at a higher level. Why, I mean, the world has done pretty well with pan tilt, zoom and wide angle lenses, what was the problem you were trying to fix

Aurangzeb Khan 16:06
was originally You know, when we were first starting out, we were seeing physical space was getting re architected in the, in this archetype of what we call the huddle room, right, which are small physical spaces, good TVs are coming cheap, people are hanging a big monitor on the wall, putting a table abutting it right underneath that, and then putting five chairs around that. And if you look at the old style cameras, they were designed for a layout of a long skinny room with like maybe seven feet between the screen, and that first chair. So they didn’t really work. I mean, you lost, you know, half the room real estate, when you tried to use those cameras in this environment. And PTC actually does we have people in education in higher ed in university education, trying to use PCs, let’s say you and I are talking I can see we can we can have a very fluid conversation. But if you had to wait for me on a PTC to go wait for it to kind of sway over over with electro mechanical movement or electro optical movement, it just kills that whole flow of conversation. So we really felt that, you know, you needed instant response fluid, no noise, no movement devices, that could frame you and frame you nicely. And that could include everybody, because practically in a huddle room, those front two chairs will 40% of that usable space was was off camera. And of course, if you’re off camera, it doesn’t feel right, you don’t feel like a full and equal citizen, or participant in that conversation. So that’s originally what we wanted to fix. And it turns out, interestingly, that as we went into a pandemic world, social distancing meant that now though, you know, sort of five people maybe at three, they were just fitting very far apart, you know, six feet apart, the cameras wouldn’t work, they all had to have their own feeds. Or they couldn’t be in a collaboration space and have a combined experience. Now,

Dave Michels 17:48
I’m an electrical engineer, but how do you overcome parallax distortion? I think that’s what they use the term I heard of the Doctor Who episode.

Aurangzeb Khan 17:58
That’s exactly right. And in fact, your parallax is very easy to illustrate, just hold the finger in front of your nose and close while I enclose the other end, you’ll see it looked like it shifts. That’s parallax. And we do have parallax and indeed, below a certain range, you would notice it. So the key to parallax is what’s called baseline which is what is the separation between the cameras. So to give you an example, in our first product, you know, pelicans funny was it about 10 feet, many people who are doing VR or other kinds of products, they said it at infinity, which practically means 40 feet, so anything closer than 40 feet, you’ll see parallax with ours, it was 12 feet with Eric as to we brought it to three feet, and we’re Pelican three and going forward, it’s 18 inches, which practically means it’s it’s just not there for the kind of use cases we’re working on. And the way to do that is we use the technology bill for a smartphone scale cameras, you know, which is billions of dollars of r&d spend to miniaturize cameras, we work with the guys who did that and found a way to bring these three cameras in very close to each other.

Evan Kirstel 18:56
That doesn’t add any delay with you know, audio synchronizing with video.

Aurangzeb Khan 19:00
No, it’s the electronics allows us to manage that very closely because of our low latency architecture Avenue. That’s a great question. And in the old days, people had to do a lot of work because the video pipeline was so much slower than the audio pipeline, but thanks to Moore’s Law, we can get these hyper integration chips that allow us to do the whole pipeline inside one chip.

Dave Michels 19:20
So I understand you can use these three cameras basically as one camera, but can you use them as three or two cameras?

Aurangzeb Khan 19:27
Yeah, that’s a great question. So let me first answer it in the following way telecast. 51 big innovation we brought is that the three cameras produced two video streams simultaneously and a data stream. They work as one but they produce three kinds of outputs. And the power of that is that let’s say you’re in a meeting room. I love thinking with my hands and doodling on the whiteboards on it. But if you’re not in that room, somebody goes to a whiteboard. Often you feel Oh, I’m lost. I have no idea what’s going on. Right? Well with our device innovate mounted at the front of the room, one camera can bring in everybody can follow what we call virtual director mode, follow the flow of conversation. The other camera feed again, coming from all three all at once can be focused on region of interest. So you can identify by clicking for coordinates that represent a whiteboard. Now that whiteboard know, it could be at a 90 degree angle to the camera. But we will extract it, rectify it and present it to you as a second live video feed. As if you were standing directly in front of it. So if somebody walks to a whiteboard, you know you’re not lost.

Evan Kirstel 20:30
We’re still trying to trick you. But it doesn’t seem like we’re getting anywhere. What about if I sit right between the seam of the cameras?

Aurangzeb Khan 20:37
Yes, if you are within 18 inches, you will see potentially some distortion with these you’d have to get really close. So at least show it to people. I mean, we’re like physics is physics that we got, you know, there’s design space for the use of the products they are you have to get really, in the old days, you know, you would have seen it if you were further away. And that also many people who do stitching, they just got a blur or blend the pixels. We don’t do that we do something much more accurate than that. So you try it. I guess, you know, hopefully, if you have one of our products, try it. Let me see. Let me know how you like it. Because we do actually try to even at that distance give you a good experience.

Dave Michels 21:19
So going back to that pan tilt zoom question, I guess you don’t have any moving parts in your catch. Exactly. Exactly. Right. Is that better your effect than, say, an optical zoom because digital zoom gets distorted. But are you able to compete with an optical zoom, we are

Aurangzeb Khan 21:35
because of the very high pixel density in the device. And because most collaboration services, let’s say are 1080 P or 720 P, we have digital zoom lossless digital zoom up to 6x. So if you go from a full one ad down to one person, it’s 6x. Effective zoom. Now if you want more than optical zoom would have an advantage, right? But as pixels become more and more plentiful going going down the road here. digital zoom has has a lot of headroom ahead of it. In practical use cases, this is a good number. Well, one thing also on optical zoom, one challenge we’ve seen with let’s say this is delay, right? I mean, it’s taking time to move those elements or if it’s a disease taking time to swivel and pivot that time interferes with the flow of conversation.

Evan Kirstel 22:21
So you’re like the Ilan musk of video conferencing? This is very impressive to hear. Nope, there’s

Dave Michels 22:27
no reason to call names. Well, that’s

Evan Kirstel 22:31
a nicer a nicer Ilan Musk, what’s your background? Is it physics or AI or math or something else?

Aurangzeb Khan 22:36
Actually a lot. I was a slow learner, right. So I went to school for a long time. So I did a first degree in physics and mathematics. And then electrical engineering, computer and sciences, nuclear engineering. And then I chose to work in chip design.

Evan Kirstel 22:49
So while you really took the easy path, while you’re good at numbers, maybe you could help Dave with his taxes, as well as he definitely could use some help.

Dave Michels 22:59
Nuclear duty a lot of people in telecom have nuclear. Actually, your resume really is impressive. I was looking at LinkedIn, not only in addition to all that stuff, as mentioned, but you went to Stanford and Berkeley, several C level positions. At what point did you realize, you know, the pinnacle of your career was going to be video.

Aurangzeb Khan 23:19
You know, it’s one of those, like, life is so wonderful, serendipitous, right. I had always been doing video stuff with chips we did for Sony and others just really loved the challenge of these massive, massive bandwidth sim D Kanaka, massive amount of data coming through, going through a few simple instructions and so on. So I’d say back in 2009, I had helped spin out ever spend from Freescale and ran the company for a while I pretty much always worked in Silicon Valley. That was the one time I was going to Arizona and back every week. And around that time, I liked that company a lot. It’s done great. It’s gone ahead and created a unique place where itself in the industry had a great IPO. But I had a meeting these folks working on video and I realized that I just I’m drawn to hard problems that are worth solving. And we figured if we could solve this, the application of this technology is so wide collaboration is clearly the need for the moment. But if you look at the whole notion of ambient intelligence, right, where these technologies can allow us to have for example, empathetic buildings, I mean, it can go into so many kinds of use cases. I was really drawn to, to that challenge of you know, okay, this is if we can figure this out, it’s going to help in so many ways.

Evan Kirstel 24:31
night so I’m looking at your products. One of the benefits you’re saying here is vivid HDR, and HDR is kind of confusing. Is it a smartphone thing or a camera or a TV thing or both? What is vivid HDR, why is it a benefit?

Aurangzeb Khan 24:45
High Dynamic Range basically because now as people are using cameras everywhere, you know, you might be in a situation where you’re backlit or side led or you might be in an extreme range of luminance. Basically, Evans definitely

Dave Michels 24:56
usually lit

Evan Kirstel 24:59
well Is Massachusetts.

Aurangzeb Khan 25:03
As a result, high dynamic range is a technology that allows us to essentially provide the best visual experience, when you have a wide disparity in the in how well, the amount of light coming into the video shot from different places. So it’s a combination, you know, in our camera pipeline, we have an image sensor SOC, and an image signal pipeline processor and then our own and we combine the information across those chips to turn this on. So the the core idea really is to help you get a better video experience in extreme light variation situations.

Dave Michels 25:36
So I’ve been using the personal panic asked, I guess that’s the original panic gas and Okay, all the sudden you have this like whole series of panic as the 50 and the 20. Yes, and you’ve moved to the room system. So tell us a little bit about how those are different than the personal one.

Aurangzeb Khan 25:52
But first of all, thank you. And hopefully, it’s a good experience, and you like using the product. The main difference is that with the pelagos 50. So we look at the world today. And we said, you know, that original panic asked, it’s a great versatile device, you can take it anywhere, you can do a fireside chat with it, you can do video collaboration, it’ll frame the shot nicely, it’s tiny as you know, paragraphs 50, we had a lot of interest from customers saying, you know, they just wanted one thing to hang on the wall in the collaboration space, that would provide great audio and video. And then in the current environment, people are also very interested in data. So back even in 2017, we were using the cameras as AI sensors to create numerical information. For example, we can detect where the person is. Now, we don’t know who it is. And we don’t take any pictures or store any pictures or anything. But we can create a numerical count saying, you know, there’s one, two, or three or five people visible in that 180 view. And that data turns out to be really useful now, because for example, if you say, Oh, this space, maybe before, you could have 12 people here, now you can have five, independent 50. If the device has account, put into it through it to say, this room should have a capacity of five, if it sees more than five people, it can give you information in the room in real time, a visual cue, or an oral cue saying, you know, you’re overcapacity. So we find that panic, as 50 is ability to provide that information, you know, to a network interface to it or to people in the room. So the audio video interfaces, it’s just a really good thing right now, for people’s sense of safety and well being

Evan Kirstel 27:23
fantastic. And what about system support for the panic cast something like Microsoft Teams or other platforms we see yes, if take up

Aurangzeb Khan 27:32
with, we work very closely with both of those platforms, we’re fully integrated with zoom. Actually, we have an API level integration with teams and zoom. And Microsoft Teams is a certified product with both those services, you know, it’s now shipping to customers, with Microsoft Teams, also, we’ve been looking ahead. And they announced June 17. Microsoft team’s vision for the future, you know, what are they planning to do next. And we’re a part of that leadership announcement as products that enable that vision. So very keen, and very closely collaborating with the service providers, because in the end, our products experience comes to people through those UC services.

Dave Michels 28:09
So those Room Systems work with teams and zoom, but the personal panic acid I have that should work with the system. Is that correct?

Aurangzeb Khan 28:17
Yes. Yeah, exactly. And even the room system can work with many, many other services, you know, that we haven’t named here. But yes, indeed, it should. It’s a USB plug and play peripheral. So it should work as the podcast does with pretty much anything that you would like to use it with?

Dave Michels 28:31
Well, it seems to me like your work is done here at jobra. So anything left off? Is there anything left to do? Oh, yes.

Aurangzeb Khan 28:39
plenty more to do. Thank God, you’re right. There’s always room for more engineering and more.

Evan Kirstel 28:43
For more cameras more. Yeah.

Aurangzeb Khan 28:45
There you go. So indeed, yes, there’s a lot more to do. And you know, I’ll just say maybe just share one thought with you. With the power of the modern AI chips, you can start to infer information and take action. The core idea really is that great technology, great products kind of fade into the background, they do the right thing, and, and you can focus on your conversation. And so we think that there’s plenty more room to innovate in that space, in addition, of course, to the core audio and video capabilities of our devices.

Evan Kirstel 29:16
Well, thanks so much for joining us. We really enjoyed our conversation and so much insight that you can look forward to seeing you out in the real world at some point.

Aurangzeb Khan 29:23
Great catching up, Evan. Thank you, Dave. Thank you looking forward to how this all comes out. And thank you for your interest.

Transcribed by