Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / WPF

Augmented World

4.81/5 (11 votes)
25 Dec 2013CPOL22 min read 28.1K  
Augmented World is an animation app that allows you to create animation movies by augmenting yourself into the scene

This article is an entry in our AppInnovation Contest. Articles in this sub-section are not required to be full articles so care should be taken when voting.

Image 1 

Platform: All in One 
Category: Entertainment  
Coding Language: C#, WPF
Special Hardware: Creative Interactive Gesture Camera, Mic ( optional)  
Input Methods: Touch, Voice, Head Movement, Hand Gesture, Mouse, Keyboard  

Overview of the Application  


Download the App for free from here

Augmented World is the most amazing App you would have ever used or wanted to use to entertain yourself and other by exploring your creativity. First I will give a brief overview of what the App is going to do. Augmented World will provide you with an option to capture real world animation and store them as animation objects.  

2. It gives you a Gallery option to create and manage all animations. 

 3. An animation editor is one where you can build animation time line.   

4. Record the animation as a story and watch and share the movie! 

So you will have backgrounds where you can navigate like panorama, put animating characters and animate them using event triggers, rules or input. Augment ( insert your live stream) into the scene so that you can be in scene with animating characters. Your face can be converted to unique animating facetoons(Explained later). It supports live voice recording. You can make any length animation videos by recording the scene. You can insert multiple characters at the same time, automate the animation. Then view and share your creation. Turn your stories into reality.  

Why a All In One? 

What else? The App will be a mega saga of virtual reality. It is not like games which you play and move on to next. It is one that allows you to create. The wider the screen the better becomes the experience of creating and living augmented world.  But should that mean that you should be deprived of the fun to play around the characters with your friends? No! Never! A game becomes more fun when played with friends. Similarly a creation becomes more entertaining when your friends join in. Detach the device.  Place it over table and play around with characters. What is more amazing is you can still plug in the camera so that you and your friends are all augmented in the scene.  Can you imagine the fun you will have when you see yourself walking past a beach with butterflies moving around? 

Why Gesture Camera

The whole purpose is to entertain. To envision and conceive an idea that is feasible to implement to give that ultimate user experience. Gesture camera and PerC is essentially a beast of entertainment platform if used wisely. People generally misunderstand the objective of Gesture Camera as one that can be used to move around hands to control. It is never a replacement of mouse but certainly a platform that enables a better experience. One innovation should always meet other to produce that wow experience. 

Modes: 

desktop mode 

In desktop mode, you can capture animation, compose movies, work with gestures and voice. Imagine a wide screen in front of you and you are just moving your hands to control the characters , navigate in scene. Gesture gives that awesomeness to the application that would really be missing without it.  You can never enjoy a wide screen to fullest if you are closer to it. You need to be a good half meter away to feel the realism. Gesture gives you that. But gesture is not always fun, as navigating in the menus gets tougher. So voice comes in.  If you are like conventional, mouse and keyboard is always ready for you. The App engine is designed such a way that there is no half input support. So everything you are able to do with mouse can be done with keyboard, touch  and gestures. So nobody is forced into using the app in it's way. App performs exactly the way they want. 

 Tablet Mode 

Tablet mode will be more of a composing mode where many users can sit around, use their touch to move the characters, animate the characters, record voice and playback the animation. Create and Share animations which include you and your friends in scene.  

Image 2 

What if user does not have Creative Camera? Can he still Enjoy the App

Yes of course. The principle logic of augmentation here is head perspective. EmguCV is used for head tracking and it can segment the face without any special camera and just with front camera of app. So Creative camera will never by "the most essential" part of the app. Yes having it would allow the user to acquire animation.  Hundreds of puppet animations will come preloaded with the app. So capturing animation is optional. One of the major advantage of AIO is that it supports multitouch and multi user out of the box. Therefore many user can simultaneously control the characters. We use ffmpeg for audio and video recording. Thus even PerC is an optional option. Once I have the device, I can check the feasibility of every option and iterate the design accordingly.

So basically the app is made purely with multi touch and multiuser in mind. Augmentation works with or Without PerC.  But PerC complements the system. 

I created the above image to give you a better visual understanding of the fun factor of the App.   

 

Intended Users 

Everybody loves animation and creation. Animation has been like a dark room. What comes out is always cheered but very few takes the courage to get in. With several physics, complicated technical modules and jargon, animation becomes a thing for pros. It needs lot of patience and understanding of art to be any closer to the solution. But wait! Would you not enjoy making an animation movie if you were told that you can learn that in 2 minutes? Would you not want to make that great puppet show and animation for your kids? Yes all of us, we do. So the app augmented world is targeted to each and every individual who loves to smile, who wants to share the smile, who wants to be creative. No age bar, as there is no learning curve. Augmented world is going to redefine the animation and entertainment once for all and every user would enjoy. Hence "eight" to "eighty", school to home everywhere this would be used.      

 

Approach taken to develop the Application  

Image 3 

overview of functional blocks 

Looks simple? Well that is the overall objective of the app. Animation is a concept which is loved by one and all but people can't animate because of the learning curve and minute details of it.  All of that is going to get changed with Augmented World.  

The app is build on a powerful animation engine developed by me to augment objects in the scene. We shall shortly see that any complicated theory or physics can be simplified if done to keep end user in mind rather than to show millions of things that the concept can do.    

 Animation Theory 

Everyone knows what is an animation. And most of us love animation for ; It is entertaining and it gives us a chance to re live our childhood. Animations in early days was perceived as a sequential drawing  which you can easily understand by the word Frame. So when an object is displaced by a smaller distance from previous frame and the pattern is continued over several frames, it creates an animation sequence. When you display the frames very quickly in front of the eye ( Technically known as Frames par Second), our mind fails to detect them as independent objects and conceive them as a single entity. That's what is basic animation. Now when the object changes not only in it's frame but there is also a change in positional vector of the frame, it is a timeline of animation. For instance consider that you want to see an animation of a character moving one hand from rest position to top, you need to draw several  "frames". In each frame the hand position will have to be drawn a little higher.

When you want the character to move or "walk across" the scene , merely redrawing the hand position at different place will not help. It needs to be redrawn at different part of the scene. When your character moves while moving it's hand at the top, it is called a composite animation.  ( Change within the frame and as well as within the scene).

Now consider that there are two characters which are approaching to each other from two different parts of the scene. Both are different characters, so in every picture you not only need to draw both the characters , at the same time you need to maintain a perfect sequence for both the characters. That is little too much work for the artist. Imagine, if you could draw on a see through paper  ( say plastic or glass frame) and draw the characters in separate glass frames and actually place them at different positions of the scenes, you could use same sequence many number of times( while we move our hands, it gets displaced to certain distance and then comes back to first position and then again starts the same displacement).  Won't you call such a glass over the scene painting a layer? Absolutely that is a layer. But there are to objects in that layer. Whet if you put up an object little higher? That will look bit bigger. So it's another layer. And what does the independent objects called? The cells. 

So now one frame will contain several cells, each cells representing one distinct animation sequence, cells are positioned at different layers and layers are put over a scene. Several such sequential frames together will produce an animation.  

 So we learnt about Frames, Layers and displacement and also FPS and timeline. These are basic components of any animation. 

The theory reads so simple. Then why is animation called a difficult art and animation movies are some of the biggest budget movies? Try yourself a simple trick. Take a blank white notebook. In the first page write a simple drawing. In the second page repeat the drawing but with little change and continue that till the last page. Now if you turn the pages quickly, you will see your first animation coming to life. But how much time would it take? Hours or may be days. And that is just a simple drawing. Imagine how long it takes to build complex texture background and manipulating characters on top of it. No matter if you are using a digital drawing or paint and brushes, it is time consuming. Hence Animation movies are typically charged at thousands to millions of dollars par minute animation based on the complexity of the scene.  

 This is the only reason not everybody can make entertaining animation. It needs time and a lot of patience which is absent in most of us due to our routine stressed life. But Don't you worry! We are going to change that soon.  

 Make a black coffee or better open a can of beer to get started with this joy ride!  

 

Introducing Augmented World 

1. Conceptualizing Animation  

Imagine a puppet in your hand and you want to animate it. What would you do? You can do it with a process called "Stop Frame Animation". Place the object against a white background, move it little and keep taking the still photo of the puppet. Finally combine them to build the complete animation. Following are four such stills of a character puppet my son calls "Poo".  

 

 Image 4Image 5

Image 6Image 7

 Do you find any trace or hint of any animation in Poo sequence? I can't find it and so won't you. Now look at the image bellow. 

Image 8

 

Yes. This is animation. Not very professional because I cut short a 27 sequence animation to 4 frames, but good enough to give you the idea.  Give yourself a chance or your thoughts some wing. How much time you guess this little animation would be taking with stop motion photography?Shoot them, then remove the background in photoshop, make them png and finally combining them? No idea? It would take anything from 30 minutes to two hours based on your skills. And how much time did it take me to capture this? 10 seconds. That's absolutely amazing isn't it?

2. Augmenting Real world Objects 

 Won't you be interested to know the algorithm? It is done with Perceptual Computing Camera and Intel Perceptual Computing SDK. The camera has two sensors, one the regular RGB sensors and the other a masterpiece: a depth sensor. A depth sensor is an IR sensor. It contains an emitter and a transmitter. The transmitted wave is reflected against an object and comes back. The higher distance it travels before hitting the object, the more energy light looses. Hence the receiver can calculate depth or distance of every point on the object from camera. Therefore if we can find a simple mechanism of of telling the system, exactly what distance we are expecting the object to be infront of camera then we can remove the other part called background. This technique is also called a background separation. But wait, did I tell you that resolution of depth camera and RGB camera are entirely different? A standard RGB stream returns 640x480 frame where as depth camera returns 320x240 frame.  Therefore even if you use a distance threshold to cut off part of the screen and able to capture the separate the background, it is still useless. As a depth image does not produce any real world visual data. Here is my depth map captured from depth camera.   

 Image 9

 So what you plan to do? Simple! Resize the image and put it over RGB image to cut the portion from RGB image. But I wish the theory was as simple as this one. Unfortunately it is not. Why? Simply because the depth sensor is located an inch apart from the RGB sensor. Just like our eyes to get the view of depth. Do a simple test. Put an object infront of you. Close one eye and observe it. Now close the other eye and observe it. Check it out with both eyes open. If you can repeat this sequence fast, you will understand that even human brain perceives the images couple of inches apart. Then brain maps them with the most complicated algorithm. I wonder who wrote that!    

 Coming back to SDK, it has done fair enough job to do this as you will see in the image bellow.Image 10 

Closely observe the larger image. Don't you see the projection is deviated by several millimeters. This deviation is not constant. So you come more center to image, it is that much more smaller and you go away , it becomes larger.  So if I cut the RGB image with projection map, towards the right side of the image, its going to be pure rubbish data. Fortunately there exists a physics. Light rays creates an angle between transmitter and receiver.  We know the distance between sensors, the distance through depth map.  So whole transmission and reception creates a triangle. Check out the symbolic image bellow. 

Image 11   

Green color elements are known and red elements are unknown. Now it is a simple trigonometry. The projection image above falters because it assumes that the receiver is at the sensor. If we displace the depth data by an angle calculated as above, it will be accurately mapped. 

Image 12 

Yes! Now you have started visualizing the whole concept don't you?  We can actually move an object in front of the scene and capture that view without any trace of background. So what that leads us to? We can actually emulate the concept of stop motion animation purely by motion sensors. Now photoshop, no making png image.  Basic Idea conceived let us now move ahead to next section called Animation Manager. 

 

3. Layering   

We have exactly the same principle that we discussed in the first section. A Scene, Layers, Frames and cells. First we will capture still shots of sequence when you will animate an object or puppet in front of the camera. This entire sequence will be saved in a folder and will be mapped to main program through a simple xml  database schema for quick and easy access to the sequence. It could be a simple List or Array or a class with few more data.   

 At the run time, it will be shown to user frame by frame depending upon the speed.  But! There could be many such sequences and all of them should not be (may not be)  animating at the same point in the screen or same speed. So there should be a manager which should monitor them. We will call this manager an AnimationBehevior. Before going deep into animation behavior, I would like to elaborate a little about Visual Physics. An animation is supposed to be frames captured from camera. So virtually somewhere there must be a camera! We assume that the camera is located at the center bottom. Now if an object goes away from camera, it must be visible with smaller size.  

  Image 13 

See the picture above. My hand size is bigger than my face. That is because, it is closer to camera. But while capturing animation, you will only be animating the object in x-y-z plane and can not actually vary the size as object capture camera will have a different perspective or view than the movie camera. 

 

So AnimationBehevior must have the mechanism to adjust the size of the object depending upon it's position.  Also not all characters will be front facing. They may be rotated by an angle or may be side faced. When the character goes to the other end, you may want to invert the view. Hence the Manager should be able to produce both inverted as well as non inverted image. See bellow for an idea.  

Image 14  

The animation will never be continues. Sometimes the animation event will be triggered by inputs like keyboard, touch mouse, voice or sometimes it might be autonomous. An autonomous animation would be one where the object moves from one part of the scene to the other by following a path either as provided or by a self calculated path. As it moves, the distance from camera will surely be varying. Thus the manager must adjust it's size automatically. As you read about the path, the manager also must have ability to remember fed path. Manager must be an instance. Therefore we need to run it as background worker. It will keep presenting it's next frame or image either by self or by an event triggered by input or any other animation event. 

4.  Scene Manager:

This is one that would control the background. Huh? Background is supposed to be a simple image right? Yes and No. It is partially correct. Background is definitely an Image but not the whole part being shown to the camera. It's like panorama. Consider a large scenery. Can you capture the whole scenery in one shot? No, but if you go at enough distance, you would be able to capture it. But when you go far away from a scene, things will be smaller right? 

Image 15   

This is part of the scene that camera can see 

Image 16

This is actually the entire scene. Observe carefully that the huts are looking smaller, because we have gone far away from the earlier position. So you can now easily conceptualize that the manager must not only present the scene but also notify about the camera distance from the scene as well as displacement. Now what is displacement? Observe the first image. Do you see sky? No! But in actual image, sky is present. So if you want to see sky and cloud without going away from scene what you need to do? Go up with the camera. But remember your coordinate system is not changing. Background is a layer. If you move only the background, the foreground object will remain to be in the same position, This will be a complete dead situation.  Imagine, some birds are flying, and you bring down the camera, birds are in same position. What will you see ? They are in ground. How to tackle the situation?   I hope you are still having the bear mug in your hand. Keep it on the table and look at it. Now without changing your head position get up from your chair. Don't you see the the object going down relative to your eyes? That's exactly the logic to be used by our SceneManager. It must display a part of scene and when you move the virtual camera, the position must be propagated through to all object with the help of AnimationBehevior for them to change their relative scale and position.  

5. Special Effects:

Animations sometimes gets very boring if there aren't any special effects. Two characters moving across scene speaking in animated voice does not do any good to entertainment factor. Special effects attracts. Special effects are formed using a concept well known as particle effect. Most of you have heard or used it! It is a treasure to game developers. But developers have rarely used it with as good an effect in animation engines as say in game engine. There are plenty of good animation apps in Apple store. But very few uses a comprehensive particle engine to good effect. You want to simulate a raining scene and there aren't any rain. How would that look. Or you have a scene where a home has caught fire and there is no fire to play around with? Two characters meet and "fall in love" and there are no hears circulating them in air. Would you be entertained as much as the animator wants you to be? Never. So we are cleverly using Neon particle engine for these effects. Why Neon? It is a particle engine written specifically in wpf. It is inspired by most famous particle engine for XNA   called Mercury particles. Most importantly it has done a wonderful justice to it's use by removing rendering from the main engine. Particles are points which changes with time according to predefined path which follows a physics. When appropriate brushes with texture or colors are used to paint those points, cumulatively they give a special effect.  This drawing is called a rendering. As we have different layers and managers for drawing, any particle engine that draws of it's own would be unusable. So what we do? We allow it to draw in memory and copy the memory buffer right in the scene depending upon controls that allows us to control the position, speed, lifetime of the particles.   

 Have a look at some of the effects generated by Neon.  

 Image 17

6. Facetoons 

We are augmenting live user in the scene along with other animation characters such that user feels like he is in the scene and not the one controlling. That is perfect but won't you really want to see some fun being added? What's the fun if your face appears in the same state? The augmentation must have the capability to change your face and morph it. The morphing is of two types. Either you take a predefined character and put your facial features like  mouth, eyes, nose into the character or do a reverse, i.e.  put a characters features into your faces live feed. 

 Image 18

The images are not really nice. That is because I am still working on the morphing algorithms. Once they are perfect, these toons will make you smile more.  The augmentation does not stop at morphing at all. It goes a step further and allows your face to be put in an animating characters shoulder. So the character is going to animate with all size, scale and light physics along with your live face. 

Image 19 

This whole thing will go better with more and more polishing. 

7. Character Control: 

This is where the entire engine is tested. You may want a character to keep animating automatically, or flip or may fade away. How you do that? You have multiple characters in the scene. So you need to select them. How would you select? Simple! double tapping on the character or by hand gesture.  As each character in the scene is an object of AnimationBehevior and the class stores the current scale and position of the characters, when we double tap on the scene, the points are checked for boundary of each of animation objects and when it is matched with an object, menu is populated for that object. You can define various actions through menu like automatic animation, moving along a path, jumping.

8. Interface: 

The app will use WPF and metro design principle. Boarder less design with a lot of flexible change in layout is what is expected. It will be having four pane layout with left side providing controls and assets to work with them, the middle one being the scene and augmentation pane, bottom is for frames and current character control and right pane will be properties as usual.  Image control of wpf will be used for rendering rather than canvas. Rendering will be done in the background using GDI+ and directshow and frames will be converted to WPF compatible BitmapImage.     

 

Other components of the App: 

Along with movie recording the app will come with several utility features for voice recording, audio mixing, playback gallery and so on.  

 

Overall design 

Image 20 

For  purists who can't live without UML, the above class diagram should be helpful. It presents most of the main classes that would be needed for the project. 

Where is the code?   

 All code and detailed algorithm will be discussed in round 2 of the contest. 

Few of my accomplishments:  

* Finalist, App Innovation Contest 1 

* Second Prize winner in Codeproject Ultrabook Article Competition  

* Three Second Prize winner in Perceptual Computing Challenge I.

* Finalist, Perceptual Computing Challenge II ( Ongoing) 

* Published Several Apps with Intel AppUp   

* Experience of prototyping over 1000 products in Embedded, DSP, Image Processing, Biometrics and Robotics

* Author of over ten International publications 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)