Picture This
The webcam conferencing of the 1990s was, from a functional perspective, little evolved beyond AT&T’s Picturephone of the 1960s. But what if video could be just as much about metadata as it is about pictures and sound? What if this metadata could help computers to interact with humans in more natural ways? This is the promise of perceptual computing, and it may be what “video” in the twenty-first century is truly about.
Tage Erlander, Swedish PM, uses the videophone to talk to Lennart Hyland, a popular TV show host, 1969. Source: Wikimedia Commons
Intel is serious about encouraging perceptual computing into the mainstream, but the company understands that rock star developers can greatly help the process by bringing their creative imagination and technical expertise to bear on today’s early perceptual computing building blocks. So when seven top developers made the cut for Intel’s Ultimate Coder Challenge: Going Perceptual and were told to go wild with the Intel® Perceptual Computing SDK, hopes ran high for amazing results. After seven grueling weeks, no one walked away disappointed.
Contest participant Lee Bamber took home the awards for Best Video and tied for Best Blog after his PerceptuCam application successfully married perceptual computing with videoconferencing. He demonstrated that what was a fanciful communication technology for wealthy enterprises only a few years ago is now within the reach of anyone.
PerceptuCam Main Screen
PerceptuCam: Form and Function
PerceptuCam paints a 3D virtual world focused around a conference table. Each participant operates a PC attached to a gesture camera. Software digitizes each user into an avatar seated at the table. This blending of group videoconferencing and virtual reality would be innovative enough in its own right, but Bamber went further. Within the virtual space, users have access to a range of virtual objects that serve as analogs to real objects or functions. For example, the meeting room might have a “big screen display” at its back that users can then control with gestures. Or one might have a virtual file folder that can be tossed to another participant and in doing so transfer the folder’s file contents to that user.
One of Bamber’s goals was to keep PerceptuCam learning curve-free, so speech recognition figures into the interface. Rather than present users with an abundance of options or menus, the home screen simply offers two huge buttons: Host and Call. Head tracking using the gesture camera shifts the perspective so that users can look around the virtual room—no mouse or keyboard is necessary. So far, Bamber has created only one gesture control: users draw a square in front of the camera, then do a hand swipe either to the left or right. On-screen, the operator’s fingertip shoots out multifarious fireworks as a visual cue that some operation is happening. Bamber designed this proof of functionality more as an experiment than a fixed method for the future, but his point was clear. With perceptual computing, input can be as quick and visually engaging as it is effective and functional.
In fact, Bamber had several opportunities to make the interface more complex, but he forced himself to keep things simple for Intel’s contest. “Because this was a teleconferencing app,” he noted, “I had to spend a good deal of time writing the networking stuff—stuff that allows two completely separate programs to talk to each other fast enough that you end up with this fluid conference experience. The other ideas I had, like pulling the document from the computer and turning it into a virtual object and then passing it around the table…that had to go. The floppy disk you see in some of the videos was sort of an early prototype to explore how sharing of a virtual object might work.”
Feeling fuzzy? There are plenty of artifacts here...besides the floppy disk.
Not surprisingly, PerceptuCam is a bit of a bandwidth pig. Bamber was thankful to have an Ultrabook™ device on hand for the system’s higher-end processing capabilities; although he worries he might have “melted it a bit.” (Each contest participant received an Ultrabook device with a Creative Interactive Gesture Camera Development Kit.) All other things being equal, Bamber prefers having a wired network connection, which generally performs better with the application’s sustained bandwidth demands. He hopes that tablets will soon have the resources necessary to support a demanding 3D application such as his.
Challenges Addressed During Development
As with all of his competitors, Bamber’s top challenge during the contest was the seven-week schedule and its choke hold around his ability to deliver on what imagination suggested. So many possibilities, so little time. In fact, apart from paying a third-party artist for three hours of work to create the final videoconferencing model, Bamber did all of the work on PerceptuCam himself while still burdened with a day-job commitment. Most PerceptuCam work happened on the weekends, sometimes in coding marathons up to 29 hours long.
But Bamber had another weakness that could have proven even more crippling: While he had ample experience with 3D programming, he had no background in perceptual computing. Fortunately, as a career programmer, Bamber understands the value of carrying a tool set custom fit to the craftsman. In 2000, Bamber’s company, The Game Creators, released a programming language called DarkBASIC. This allowed programmers to code for DirectX* games using the widely known and fairly simple BASIC language. DarkBASIC excels at 3D world creation making it a great fit for designing a 3D conference space, but it completely lacked any perceptual computing capabilities. So Bamber added them.
“Because I had written the language, I could add anything I wanted to it,” said Bamber. “I added the theories of perceptual computing functions, so I could extend the functionality of the programming language. I added perceptual computing on top of all the science and capabilities I already had and got an even better toolkit.”
Interestingly, Bamber faced another problem not uncommon to pioneers in new tech fields: the lack of a common set of conventions. Consider universal routines such as right-click font controls or Ctrl-C for copying. Before such shortcuts existed, there was no quick way to perform those functions apart from taking the long road through the File menu. Similarly, today, there are no commonly accepted gesture controls. How does one wave to bring up a command list? Nobody knows, because gesture libraries don’t exist yet.
Bamber didn’t attempt to create such a library. He was content to devise just one gesture—the square-and-swipe. But even getting that far required a significant amount of experimentation. He discovered that the gestures people tended to like the most had “instant visual cues” tied to them. This was why he devised the “fireworks” effect.
Fireworks! Cursor tracking never looked so good.
Bamber suggests that once you have the coordinates of the hand from the gesture functions, either from the SDK or your own implementation, you may want to perform gesture detection. Here is Bamber’s source code for detecting when users swipe a hand across the camera’s field of view.
if ( bHaveDepthCamera )
{
if ( iNearestX[1]!=0 )
{
iNearestY[1] = 0;
if ( iSwipeMode==0 && iNearestX[1] > 160+80 )
{
iSwipeMode=1;
iSwipeModeLives=25;
}
if ( iSwipeMode==1 && iNearestX[1] > 160+0 )
{
if ( iNearestX[1] < 160+80 )
{
iSwipeModeLives=25;
iSwipeMode=2;
}
}
else
{
iSwipeModeLives--;
if ( iSwipeModeLives < 0 ) iSwipeMode=0;
}
if ( iSwipeMode==2 && iNearestX[1] > 160-80 )
{
if ( iNearestX[1] < 160+0 )
{
iSwipeModeLives=25;
iSwipeMode=3;
}
}
else
{
iSwipeModeLives--;
if ( iSwipeModeLives < 0 ) iSwipeMode=0;
}
if ( iSwipeMode==3 && iNearestX[1] < 160-80 )
{
iNearestY[1] = 5;
iSwipeMode = 0;
}
else
{
iSwipeModeLives--;
if ( iSwipeModeLives < 0 ) iSwipeMode=0;
}
}
}
Incorporating perceptual gesture commands doesn't have to be particularly complex or difficult. Bamber's swipe code is only a few dozen short lines.
Gesture computing wasn’t built in a day, or even seven weeks, and Bamber found it expedient to fall back on touch screen technology in order to keep the sketch capabilities he wanted in his program. This approach proved trickier than expected.
“When you create a sketch in the app, you can see the sketch at the other end,” said Bamber. “Traditionally, that required a mouse, and everyone knew how to code for that. Then, when you go up to Windows* 8 and use touch on an Ultrabook screen, there are slightly different ways you have to handle it. Long story short, the mouse controls and touch don’t perfectly map. I had to go in and do some specific Windows 8 touch-related code in order to create a good sketch experience.”
When Bamber talks about “melting” his Ultrabook, he’s really exposing a previously unknown (or at least unexplored) hiccup in the perceptual computing world. Obviously, two 30 frames-per-second camera streams (one for each of the two gesture camera lenses) require a formidable amount of bandwidth. While not as demanding, a voice input stream from a microphone adds more load. Then Bamber performed additional processing on the camera data, attempting to use eye motion as an input mode alongside head motion. All of this together proved too much—not necessarily for the Ultrabook device’s CPU but for the software stack’s bandwidth—and the voice stream would become too choppy for recognition to work. He dropped eye tracking in order to solve the problem and kept moving forward.
Bamber’s final challenge was one of his most intriguing program elements. In every PerceptuCam session, each user’s avatar is surrounded by “fuzzies,” which appear as quickly shifting, densely packed lines dancing around the edge of each participant. (Anyone who remembers the scene in Minority Report in which Tom Cruise’s character watches holographic home videos of his wife will find the effect strikingly similar.) The fuzzies are actually artifacts derived from infrared fluctuations picked up by the gesture camera. Bamber burned several fruitless hours trying to remedy the problem. Then observers reported actually liking the fuzzies. Intel contest judges agreed. Bamber found himself giving way to public opinion and embracing the fuzzies as a feature (although he still believes they can be mitigated in software).
While Bamber believes there are software-based approaches that will mitigate the “fuzzies” around PerceptuCam avatars, many observers enjoy the effect, persuading Bamber to leave them in.
Lessons Learned, Advice Given
Having started from ground zero with perceptual computing, detailing everything that Bamber learned across his seven-week coding adventure would require far more space than we have here; fortunately, his award-winning blog series details plenty of these advances, and we encourage readers to follow his Ultimate Coder Challenge: Going Perceptual journey.
The one area that Bamber points out as being of particular note is the Intel Perceptual Computing SDK. The SDK made available to contestants was still in beta as they adopted it, rough spots and all. But all participants understood the importance of helping to refine the beta SDK and improve it for the wider community. One of the things that marks Bamber as an experienced, capable coder is his ability to find SDK weaknesses, work around them, and, when possible, craft improvements.
“Don’t assume any SDK does everything,” he cautioned. “Also, don’t assume that because something is in the documentation, it’s actually working in the SDK. I think those two assumptions will save any developer weeks of pain. If you start writing your application on the belief that these features are going to be available when you need them, and then when you go to find them they’re not there, you’ll end up with a half-baked app, and it’ll be too late to turn back.”
Bamber’s personal best practices involve getting the SDK, then writing small, low-level prototypes to test for functionality. These become building blocks. He doesn’t begin to write the application proper until all of the needed building blocks have been developed and tested.
Another strategy point is to probe the SDK and become intimately acquainted with both its strengths and weaknesses. Even a beta SDK can still yield remarkable advantages.
“If you just get the depth data from the perceptual computing SDK, there are a million, million things you can do with that data. It’s a wonderful undiluted, unfiltered, capacity of data, coming straight from the user, 30 frames a second, in real time. Get really familiar with that depth data and you can do practically anything. It’s like the one ring that controls them all.”
Bamber offers one other bit of wisdom: Don’t think your application can handle everything. Especially in a new area like perceptual computing, it’s tempting to go in a dozen directions and try to tackle the entire field. Narrow your perceptual computing objectives, even to only one thing. Make sure it can be accomplished—and then do it well. Asking for more than that only increases the odds of frustration and project failure.
Looking Ahead
Camera resolution will soon improve, and Bamber already anticipates being able to isolate which PerceptuCam menu the user is looking at. If there is a menu on both the left and the right, for example, and the user is clearly looking at the left menu, then the speech recognition engine could focus on only the words contained in that menu, thus trimming the compute load. In the interim, Bamber wonders if nose tracking might fill the same function.
Bamber has run a business for over a decade and claims that such virtual conferencing solutions have been accessible to only large enterprises with purchasing budgets “containing lots of zeroes.” With a gesture camera under USD 200, the Intel Perceptual Computing SDK, and a bit of elbow grease, developers can tear down yesterday’s price barriers and bring products like PerceptuCam to everyone. An industry-wide marketing push may be needed, but the audience is out there and waiting.
“Perceptual computing is really a jump,” said Bamber. “Other things—the development of DirectX 10 graphics cards, packing ten sensors into an Ultrabook—are really just extensions and combining things that already exist. Perceptual computing is absolutely new territory. The fact that you could almost know what people are going to do before they do it, every little gesture picked up by the computer…that’s exciting. I don’t think we’ve even seen 10 percent of what this technology is going to be able to achieve.”
Resources
The Ultimate Coder Challenge: Going Perceptual contestants were remarkably supportive of one another throughout the contest, and Bamber cites his interactions with them as being one of the top resources that made his own accomplishments possible. He also relied on the Intel Perceptual Computing SDK, his own DarkBASIC Professional language, Microsoft Visual Studio* 2012, and C++ for adding perceptual computing elements to the data language.
Bamber notes that most SDK examples are in C++. Some are in C#. Fewer still are in Unity*. But he recommends C++ because that’s where most of the necessary resources currently reside. Similarly, he coded PerceptuCam for the Windows 8 Desktop since that’s what nearly his entire existing code base runs in. Coding for the more touch-centric Windows Runtime (“Metro”) environment would have meant starting from scratch.
Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality, or compatibility of third-party vendors and their devices. For optimization information, see software.Intel.com/en-us/articles/optimizationnotice/.All products, dates, and plans are based on current expectations and subject to change without notice. Intel, the Intel logo, and Ultrabook, are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright © 2013. Intel Corporation. All rights reserved.