[Update March 2015]
I've released a major update to the engine and wrote a new article on it as, well. Kindly view it at THIS link.
From now an on, I'll not answer to any questions asked on this article, kindly refer to the other version. V2 is much better and most of the bugs you people reported, have been cleared.
Introduction
OMR (wiki) are answer sheets that are not intended to be read by a human being. This projects eliminates the need to buy OMR reading machines and even a photo scanner for computer. Any >3MP mobile phone camera with autofocus will do the job.
Background
I Googled to to find a good OMR engine but in vain, so I decided to make my own. Normally in schools and colleges they use specialized machines to read OMR answer sheets. In my case I needed to eliminate the need of buying the OMR reader machine, even the scanner! Pictures taken with a 3MP mobile phone camera (autofocus required) can be read through this engine. At an initial level, I have created my own sheet types that can be read.
The image processing part utilizes AForge.Net's image processor (libraries included in the download).
[FAQ] Q: 1 Will it work with other OMR sheets?
It looks like a lot of people have asked a same question repeatedly in the feedback.
"Can we use any other OMR sheet with this engine?" .
And the answer is, "NO".
Why? Because of the Paper image extraction Algorithm used in method ExtractPaperFromFlattened(). It identifies a paper out of a scanned immage by finding the 4 crossed circular symbols located on the paper. THESE SYMBOLS DEFINE THE BOUNDARIES OF PAPER AND THUS, THE INFORMATION ABOUT CROPPING, RESCALING AND SKEWING OF PAPER IS ESTIMATED.
So, no symbols, NO DETECTION of paper.
[FAQ] Q: 2 Can i create my own OMR Sheet?
In V1, you can't. Atleast without my help, you can't lest you are a super geek.
in V2, Yes! View THIS link
Using the Code
After adding all the references (AForge and OMR) you can use the simplest method overload to extract an OMR wrapped sheet from camera/scanner image.
Raw images must contain one clear view of the supported sheet formats (Printable PDF is in the download). e.g.
Bitmap unf = new Bitmap(panel1.BackgroundImage);
OpticalReader reader = new OpticalReader();
panel1.BackgroundImage = (System.Drawing.Image)reader.ExtractOMRSheet(unf,
"sheets.xml" , OMREnums.OMRSheet.A550);
This will extract sheet as this:
And once the sheet is extracted you can process it using methods like:
OpticalReader rr = new OpticalReader();
MessageBox.Show(rr.getRegNumOfSheet(panel1.BackgroundImage,
OMREnums.OMRSheet.A550, "sheets.xml",false).ToString());
Sheet Detection From Camera/Scanner Image
The procedure to detect a sheet involves detection of a sheet's corners. In the printed document corners are marked with specific binary images. We detect them, we detect a sheet.
- So, first of all, we need to flatten out the picture using correct contrast, fill, threshold and invert filter. As a start point, raw images with no contrast, brightness or fill correction are inverted. Given a threshold, the image is then converted to binary. This image is called a "Flattened Image" and is obtained by using the "
OMR.OpticalReader.flatten
" method. - Once an image is flattened out, blob detection starts. In the first stage all sizes and kinds of blobs starting from a minimum blob size are detected (this ensures we remove noise grains blobs)
- The left edge is detected first and then we detect the right one edge of sheet.
- Out of hundreds of blobs detected out of a picture in first filter, the wrong sized blobs are filtered out by checking their size to camera/scanner image's size ratio.
- In the second filter, blobs placed on the wrong side of the image are filtered out
- In the third filter, blobs having insanely wrong aspect ratios are filtered out (ensuring we detect and reject the blobs produced by bends/lines on sheet)
- As a last filter to blobs, all the blobs are compared against a mirrored corner image (mirrored because we inverted the image in first step).
- Filtered out blobs are once again, re-verified that they are exactly four in quantity and are placed on the right sides of sheet. Also, left and right edges don't vary too much in their lengths.
- Verified blobs represent the real location of sheet corners in image coordinate system.
- Images can be cropped through of these points from the unflattened image and wrapped to produce a perfect rectangular image called OMR Sheet.
- If all the above filters yield only 4 corner blobs, the process is continued, otherwise a recursive call is made on the same function with the same parameters, but with altered contrast correction that may yield in a better results value.
See the code with line-line comments on the "OMR.OpticalReader.ExtractOMRSheet
" method.
Reading the Extracted Sheet
Major image processing lied in the image extraction part. Now the next stage is to read the OMR sheet.
Normally OMR sheets have multiple choices to one question. All the options to same question are printed aside on the paper, forming a "block". All the blocks's locations, sizes and numbers of options given are enough to save it to XML file. Location is recorded according to a coordinate system, usually followed in .NET i.e. upper left corner as O(0,0) (x,y) +ive x asis towards right side, +ive y towards bottom.
To read a specific block in the sheet, (sheet means the extracted sheet in the first section), OMR.OpticalReader.getScoreOfSheet
can be called. This method executes the above procedure repeatedly to read all the lines in all of the 4 BigAnswerBlocks printed on the sheet.
Reading The Selected Option Out of the Given Choices
When a multiple choice selection block is sliced out of the document it's time to read the selected choice. For reading, the block is divided into as many equal parts as there are options present in it. Then the image is converted to binary, based on the mean color of the block. This is how we convert white papers to pure white and more than half the ink-filled pixels to pure black.
The black pixel count is recorded on each subdivision of block.
The darkest block is compared with other blocks and if a remarkable difference exists between the two sub divisions, the darker one is recorded as "Marked". Depending upon the number of Marked choices it can be decided which option was selected.
Note
Take a look at the other methods also. The methods can read all the choices from a single paper sheet in one method call, and create an XML specification sheet for two kinds of sheets given.
The heart of the camera image sheet recognizer lies in the following method.
Points of Interest
Now, what's next is to make an application that takes a Folder full of answer sheets for a test paper conducted in a class of 50 students or more. Or, have the address of a scanner attached to a PC and one after another start processing images. Depending upon the registration number written on sheet, the program should create an XLS file a PDF so that result is compiled totally electronically.