You can't just directly calculate the distance between two pixels in real world units.
You will need some more information which may be obtained in several ways:
- Considering the person is passing the camera almost horizontal (not walking away from or up to the camera) you may use the size of some object in the image that you know. E.g. if you know the length of the floor (seen on the image) the person is walking on you can easily correlate the relating walking distance. You may know this method from typical crime scene photographies, where a ruler of known size is placed next to the relevant obejcts.
- Using some kind of depth information and camera parameters: You can use the depth information of an image to calculate distances and dimensions of objects in an images. Given the distance of an object and camera parameters (focal length) you can calculate the real world dimensions of an object a bit of geometry. Now this depth information may come from several sources: some depth camera which will give you the depth information directly or you can calculate the depth map if you're looking at the same scene from different position (e.g. with a stereo-camera setup). If you just have a video of a scene from on point of view you can use consecutive frames to calculate the depth information. The latter is also know as structure from motion:
http://en.wikipedia.org/wiki/Structure_from_motion[
^]
You should understand the pinhole-camera model which contains all the neccesary variables needed:
http://en.wikipedia.org/wiki/Pinhole_camera_model[
^]
There are of course many other ways if you can influence the recordings.