|
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
|
Hi
I've got a jpg file which I'd like to read.
When using od in UNIX, I can get a hex version of its contents:
$ od -x file.jpg | head<br />
0000000 d8ff e1ff 8711 7845 6669 0000 4949 002a<br />
0000020 0008 0000 000b 010e 0002 0015 0000 0092<br />
0000040 0000 010f 0002 0018 0000 00b2 0000 0110<br />
0000060 0002 0005 0000 00d2 0000 0112 0003 0001<br />
0000100 0000 0001 0000 011a 0005 0001 0000 00e2<br />
0000120 0000 011b 0005 0001 0000 00ea 0000 0128<br />
When using this simple program, most of the values I read have a strange value:
#include "stdafx.h"<br />
#include <iostream><br />
#include <fstream><br />
<br />
using namespace std;<br />
ifstream::pos_type size;<br />
char * memblock;<br />
<br />
<br />
int main () <br />
{<br />
ifstream file ("y:\\EXIF\\sanyo-vpcg250.jpg", ios::in|ios::binary);<br />
if (file.is_open())<br />
{<br />
size = file.tellg();<br />
memblock = new char [size];<br />
file.seekg (0, ios::beg);<br />
file.read (memblock, size);<br />
file.close();<br />
<br />
cout << "the complete file content is in memory";<br />
<br />
char x=memblock[0];<br />
delete[] memblock;<br />
}<br />
else cout << "Unable to open file";<br />
return 0;<br />
}
The x value becomes 0xfd in the debugger (in vs.net using windows xp).
What's going wrong? And how can I get an output like the one using od?
-- modified at 19:58 Friday 28th September, 2007
Woops, should be moved to: http://www.codeproject.com/script/comments/forums.asp?forumid=1647
|
|
|
|
|
The first byte you should see is 0xFF.
Try changing the memblock type to BYTE (unsigned char) since you're
working in binary.
After the read() call, look at memblock in the debugger. Should be FF D8 FF E1...
Mark
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
Also
tellg() is probably returning 0.
To get the file length, you need to seek to the end before calling tellg...
...<br />
file.seekg(0, ios_base::end);<br />
size = file.tellg();<br />
...<br />
Mark
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
That is indeed the solution.
thanks a lot
unsigned char's aren't to be used by read I guess (after having looked at the function's signature).
So that wasn't quite an option, I'm afraid.
|
|
|
|
|
GentooGuy wrote: unsigned char's aren't to be used by read I guess (after having looked at the function's signature)
Yes, but in binary mode, you really aren't dealing with char so a cast can be appropriate.
It depends on the data in the file....if it was really all char data then you probably wouldn't be
using binary mode.
Whatever works for you - in the end you're reading bytes and you'll need to cast them to
something else eventually
Cheers,
Mark
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
Okay thanks for the advice.
Currently, I'm having another small (I hope) problem:
#include "stdafx.h"<br />
#include <iostream><br />
#include <fstream><br />
<br />
using namespace std;<br />
ifstream::pos_type size;<br />
char * memblock;<br />
<br />
void swapByteOrder();<br />
int readFile(char *filename);<br />
void printFile(int nr);
<br />
int main () <br />
{<br />
if(!readFile("y:\\EXIF\\sanyo-vpcg250.jpg"))<br />
{<br />
cout <<"Some error occurred while opening the file"<< endl;<br />
return 1;<br />
}<br />
printFile(10);<br />
cout << endl;<br />
swapByteOrder();<br />
printFile(10);<br />
return 0;<br />
}<br />
<br />
<br />
<br />
int readFile(char *filename)<br />
{<br />
ifstream file (filename, ios::in|ios::binary);<br />
if (file.is_open())<br />
{<br />
file.seekg(0, ios_base::end);<br />
size = file.tellg();<br />
memblock = new char [size];<br />
file.seekg (0, ios::beg);<br />
file.read (memblock, size);<br />
file.close();<br />
<br />
cout << "the complete file content is in memory\n";<br />
<br />
cout << "Size : "<< size << endl;<br />
for(int i=0;i<100;i++)<br />
{ <br />
char x = memblock[i];<br />
cout << hex << (int)memblock[i]<<endl;<br />
} <br />
<br />
delete[] memblock;<br />
}<br />
else cout << "Unable to open file";<br />
return -1;<br />
}<br />
<br />
<br />
void swapByteOrder()<br />
{<br />
long max = size;<br />
char temp;<br />
for(int i=0 ;i<max-2; i+=2)<br />
{<br />
temp=memblock[i];<br />
memblock[i]=memblock[i+1];<br />
memblock[i+1]=temp;<br />
}<br />
}<br />
<br />
<br />
void printFile(int nr)<br />
{<br />
for(int i=0;i<nr ;i++)<br />
{<br />
cout << hex << memblock[i] << endl; <br />
}<br />
<br />
}
The SwapByte function gets a access violation, when reaching i==3992. This is strange because it should be able to run to 62096 (the lenght of the file , as indicated by size).
What's going wrong here?
|
|
|
|
|
I'm surprised it gets that far, since you delete memblock in readFile()
I'm curious....why are you reading bytes from a jpeg file as ints
cout << hex << (int)memblock[i]<<endl;
and why would you be swapping byte order? Are you trying to make the jpeg unreadable?
Actually, this whole loop doesn't make sense
for(int i=0;i<100;i++)<br />
{ <br />
char x = memblock[i];<br />
cout << hex << (int)memblock[i]<<endl;<br />
}
You're indexing the array by bytes but casting to int (4 bytes)???
Mark
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
Well I got confused too. I'm java developer (bsc. in CS) but I'm getting quite stuck on this one.
Manual GC isn't exactly my cup of tea
Well, I want to retrieve some EXIF information from the file, and when not swapping the bytes (hey, I do NOT write the array to disk) I get the same ouput as when using od (UNIX tool for displaying files).
So I thought, I had a byte-order related problem.
od output:
0000000 d8ff e1ff 8711 7845 6669 0000 4949 002a<br />
When running my own app, I found a ff first, then the d8, an ff, the e1, the 11, the 87. etc...
That's my reason to swap these bytes.
|
|
|
|
|
Hi,
A typical JPEG hex dump starts like this:
000000 FF D8 FF E0 00 10 4A 46 49 46 00 01 02 01 00 87
000010 00 87 00 00 FF ED 08 9E 50 68 6F 74 6F 73 68 6F
i.e. the very first byte is FF.
If you interpret that as a number of 16-bit words (as your od command seems to do)
then you would get D8FF E0FF etc. but that does not mean this is how you should look at it.
In fact JPEG coding is byte oriented, each FF XX pair of bytes marks the start of something
and may be preceeded by an arbitrary number of FF bytes.
I suggest you:
- start by reading the JPEG standard, you can find it on the web;
- look at JPEG files with an unbiased tool, one that shows bytes, not larger integers.
BTW: if you read a JPEG file with Image.FromFile() the Image class will offer access
to a lot of metadata as well (e.g. GetPropertyItem() method)
Luc Pattyn [Forum Guidelines] [My Articles]
this weeks tips:
- make Visual display line numbers: Tools/Options/TextEditor/...
- show exceptions with ToString() to see all information
- before you ask a question here, search CodeProject, then Google
|
|
|
|
|
thanks for the info.
nut the image class is .net based, and I don't want just plain C++ without ms specific stuff.
|
|
|
|
|
It's your binary viewer utility that's swapping the bytes.
If you go through and swap bytes, you won't have a JPEG anymore.
If you want to see the actual bytes in order, change your byte viewer loop to
for(int i=0;i<100;i++)<br />
{ <br />
cout << hex << (int)(unsigned char)memblock[i] << endl;<br />
}
And for your non-GC related issue - you don't want to use your array after you delete it
Mark
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
okay that's strange.
thanks you for the help, I really appreciated it
|
|
|
|
|
it works fine now
thanks a lot!
|
|
|
|
|
I'm sorry, but I still think they're swapped.
My reason for this is the fact that I'm looking up 0x9003, which is a tag in a JPEG file indicating the date of the picture.
When using 'od', I see this:
<br />
$ od -x file.jpg | grep 9003<br />
0000520 0004 0000 3230 3030 9003 0002 0014 0000<br />
This is the only occurence of '9003' in a file which does contain the information (so this must be the instance I'm looking for).
But, when running my program, and printing some lines, I get this output:
<br />
4<br />
0<br />
0<br />
0<br />
30<br />
32<br />
30<br />
30<br />
3<br />
90<br />
2<br />
0<br />
14<br />
0<br />
This is produced by the loop you've proposed. When comparing both outputs, I see this one has the bytes swapped when compared to od and (!) the exif standard. So I guess, od isn't wrong. Or am I indeed wrong?
|
|
|
|
|
I think we're confusing two different issues here...
First, your "od" is lumping 2 byte pairs and is assuming little-endian
byte order so, as you can see from the s ample listings you've posted,
each pair of bytes appears swapped in the od-generated listing.
Second, you have to parse your file properly, depending on the byte order.
exif has some kind of tag to indicate whether multi-byte integers are stored
in "motorola" or "intel" order. This doesn't mean you can just go through the
entire file and swap every pair of bytes. This means when you encounter
multibyte-integer data in the file, you may need to swap bytes to work with
the data on your platform.
You need to parse the file bytes following its type and format. I don't have the jpeg/jfif
format memorized but it's well documented all over the internet
Again, the only swapping going on here is by your "od" utility. In your code you simply
have the bytes in the same order they occur in the file.
Mark
*edit* LOL I really meant "sample", not "ample"....ample sounded snotty LMAO
Last modified: 17mins after originally posted --
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
okay that sounds possible.
But when having a look at http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/EXIF.html
I can see the tags about the creation date are 0x9003 and 0x9004 (still have to decide which one to take).
when a having a look at the jpg itself (which has a date and time of 01-01-1998) i see the strings about the date in a proper sequence.... but there's no 9003. Just a 0390 before it.
I've looked it up in the exif documentation, and I haven't found anything about inverting the bytes on such markers. What am I missing?
|
|
|
|
|
GentooGuy wrote: but there's no 9003. Just a 0390 before it.
Where are you seeing that? In your od results? If so, then that IS 9003
because od is swapping avery pair of bytes.
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
Nop, in the VS binary editor
|
|
|
|
|
I don't know what to tell you....
There's only three possibilities here:
1) The file was written incorrectly (not following specs)
2) There's a tag somewhere that indicates the byte order and you need to use it
3) You're interpreting the binary bytes incorrectly.
Which is it?
Mark
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
option 2.
I've read something about it yesterday, currently trying to find the document which described it.
|
|
|
|
|
JPEG is big-endian.
EXIF follows TIFF specs which can be big or little endian.
This is usually determined by the first two bytes of the file:
"II" (0x49 0x49) for little endian, "MM" (0x4D 0x4D) for big endian.
For a file with "MM" byte order, tags (and all other multi-byte fields
of tags) will need to be swapped on Intel machines.
For the tag 0x9003, I would expect the following storage in the file:
Big endian: 0x90 0x03
Little endian: 0x03 0x90
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|
|
thanks, I've got it working
|
|
|
|
|
BTW are you using Visual Studio? If so, open the jpeg file in the binary editor window.
It won't swap any bytes in the display.
File menu -> Open/File...
Select the file
Click the little drop arrow on the "Open" button and choose "Open with..."
Choose binary editor
Mark Salsbery
Microsoft MVP - Visual C++
|
|
|
|