Algorithms

3-Jan-07 6:04

I'm looking to convert a jpg image to a text file or at least into some format that is readable even if it's in hex or ASCII. Basically I want to convert the jpg to data then use a portion of it to be compared to another similar data set. The thing is I'm really new to this programming thing and I'm not sure if I'm asking for the impossible.

If anyone could give me some advice on this it would be grand. The language I'm using is c++ but before I can even write the code for it I need to find the conversion
method.

Ed.Poore3-Jan-07 7:00

3-Jan-07 7:00

Please don't cross post Mad | :mad:

I have no idea what I just said. But my intentions were sincere.

Lowki3-Jan-07 23:08

3-Jan-07 23:08

Ed.Poore wrote:
Are you wanting to compare parts of the original file (which may change due to compression etc) or are you trying to compare pixel colours etc?

Sorry about the cross posting wasn't sure where the best place for this type of question was going to be.

but since you ask..

yes parts of the original to parts of a stored copy. The compression should be the same as they will be the same format and size and if I have this right in my head the vectors will be of a very specific portion of the image. pixel colours won't help much as it's all in black and white. at least till I get told otherwise hahahahaha WTF | :WTF:

eggsovereasy3-Jan-07 10:45

eggsovereasy

3-Jan-07 10:45

A jpg IS data!

Is the jpg an image of text that you want to convert into editable text (OCR)?

Or do you just want to convert the raw binary data into a string?

Or do you just want to compare to jpg's?

Lowki3-Jan-07 23:01

3-Jan-07 23:01

eggsovereasy wrote:
Or do you just want to convert the raw binary data into a string?

Or do you just want to compare to jpg's?

A bit of both really. I only want to use a portion of the data in the jpg not all. if I can extract only the part I need I can discard the unwanted data(save space and speed possibly) and use the converted data for my comparison.

I've found lots of information on how a jpg's data is structured but not on how to convert it to a form that can be read. No text in the image.

Ed.Poore3-Jan-07 23:04

3-Jan-07 23:04

Lowki wrote:
I've found lots of information on how a jpg's data is structured but not on how to convert it to a form that can be read. No text in the image.

Can you explain (carefully) what you're trying to accomplish here? Do you want to perform OCR on the image or are you just getting confused Roll eyes | :rolleyes:

I have no idea what I just said. But my intentions were sincere.

Lowki3-Jan-07 23:20

3-Jan-07 23:20

Confusion is always part of the problem it's what makes it fun. OCR seemed to me to be a longer way around what I was trying to do I have samples of 106 images each with a unique identifying symbol. I want to take the bit of data with the symbol on it and compare it to a database of all the other symbols till it matches the one it's being compared to. Recognition of sorts. I thought (perhaps wrongly) that it would be simpler to convert the images to raw data like ascii or hex or just raw text code even and cut the bit that has the symbol in it. If I took another image and made it the same size made it blank and experimented on where the data changes as lines are added to it I could in theory work out which portion of the code I would need to cut out. I'm all very new to it all so this may not be feasible which is why I'm asking the question.

Ed.Poore3-Jan-07 23:44

3-Jan-07 23:44

For jpegs this is not possible to my understanding because of the nature of the compression, I seem to remember somewhere that it compresses in swirls rather than lines and I don't think that two images that look the same will look similar on disk when they're compressed.

So I think you'll have to decompress the whole image first, after that you could either use OCR or just pattern recognition but you'll have to decompress it first I think.

I have no idea what I just said. But my intentions were sincere.

Lowki4-Jan-07 0:37

4-Jan-07 0:37

decompress? WTF | :WTF:

ah ... I see.. right then.

Ed.Poore4-Jan-07 3:36

4-Jan-07 3:36

Undestand now Roll eyes | :rolleyes:

I have no idea what I just said. But my intentions were sincere.

Lowki5-Jan-07 4:12

5-Jan-07 4:12

I get the jist of it. Decompressing the a jpg seems to be bit problematic .I've been looking for info on how that’s done and it's proving a bit fruitless ,will just have to keep looking. I did manage to open the jpgs with a hex editor but that’s not going to help me much. As you said the encryption is quite involved on jpgs. If even one pixel is different it changes the entire hex code aside from the first chuck which I assume would be the size and thumbnail information. But after that it even changes the image information in hex if you've saved onto the same image without making any changes. I'm hoping that decompressing the jpg will give me more of a standardised result otherwise the variables are going to be enormous

Ed.Poore5-Jan-07 4:37

5-Jan-07 4:37

Take a look at http://www.ijg.org/[^] there's some links there for C code to decompress JPEGs I think.

I have no idea what I just said. But my intentions were sincere.

El Corazon5-Jan-07 6:04

El Corazon

5-Jan-07 6:04

Lowki wrote:
I'm hoping that decompressing the jpg will give me more of a standardised result otherwise the variables are going to be enormous

I guess that depends on how you do the comparison, and what you are trying to compare. Comparing byte for byte with a lossy graphics compression is definitely problematic. However, doing pattern matching and image tracking/finding can be done if the choice of sub-images you are looking for have high contrast and are few in number (you are looking for a few possible symbols rather than every possible symbol in a plethora of possibilities). You can grade matching results through a bayesian or other fuzzy comparison, and then find the "best match" which is as much as you can hope for when dealing with jpeg (lossy) compression.

_________________________
Asu no koto o ieba, tenjo de nezumi ga warau.
Talk about things of tomorrow and the mice in the ceiling laugh. (Japanese Proverb)

Rilhas30-Jan-07 12:39

Rilhas

30-Jan-07 12:39

Hi,

In general you can compare uncompressed JPEG images. JPEG is a lossy compression scheme, but, don't forget, the intent of it is to compress with minimal visual loss. So, all you have to do, is a visual comparator.

The visual comparator is diferent from a normal comparator in the sence that it must tolerate local deviations. For example, maybe you can consider that 2 images are the same if their JPEG comparison results in a 1% overall difference.

The visual comparator should focus mainly on luminance (the average of the three RGB components). This is because JPEG dedicates about 80% of the resulting bits to luminance and only about 20% to chrominance. So, in general, color looses much more in JPEG than brightness.

Another thing your comparator must take into account is that edges usually lose accuracy on JPEG. So, your comparator should take into account that large pixel diferences should tolerate a larger diference in the comparison not only for the pixel being compared but also for the surrounding pixels.

Also bear in mind that if your comparator is visual then you can even align misaligned JPEG's. The key when using decompressed JPEG's for analysis or comparison is to always be tolerant to local diferences while paying more atention (and being more intolerant) to wide-scale diferences.

I hope this helps,

Rogério Rilhas

Rilhas30-Jan-07 12:46

Rilhas

30-Jan-07 12:46

By the way: a JPEG can usually be decompressed to a raw bitmap format (a format where each pixel is represented by a 8-bit intensity red value, an 8-bit green value, and an 8-bit blue value). So, your decompressed image of say 200x300 pixels will, usually, result in a linear sequential flat array of 200x300x3 bytes.

To access the pixel at column 70 in line 20 you access the bytes (20*200+70)*3, (20*200+70)*3+1, and (20*200+70)*3+2. A luminance-only black and white decompression will usually result in a flat array of 200x300x1 (grayscale intensity value of 8 bits). There are variations when considering if line 70 is counted from the top or from the bottom. Also, the blue component may be the first of the 3 bytes and not the last. A good JPEG decompression will let you specify the resulting oientation and endian.

These can be considered raw formats, and comparisons can, thus, be made on a component-by-component basis.

Let me know if you need a way to decompress JPEG's to raw bitmap.

Rogério Rilhas

Maximilien5-Jan-07 5:40

Maximilien

5-Jan-07 5:40

I've been reading a bit of this topic, and I think you are trying to do something with the wrong approach.

I think you need to render the image to a bitmap (CImage, ... ), and compare the resulting bitmaps or only sections of the images.

There are ton of image comparision algorithms on the web.

Maximilien Lincourt
Your Head A Splode - Strong Bad

Lowki8-Jan-07 1:51

8-Jan-07 1:51

thank you that was simple and to the point appreciate it. will give it a shot.