Windows App demo
Console App demo
Introduction
Ever wondered how to access the information in an AVI file or wanted to extract information directly from a WAV file? Most people refer external COM objects (like AVIFIL32.DLL) to access the information (see A Simple C# Wrapper for the AviFile Library). The RIFF Parser allows you to access the resource information stored in the files directly by C#.
Background
What is a RIFF file?
RIFF is the Resource Interchange File Format. This is a general purpose format for exchanging multimedia data types that was defined by Microsoft and IBM during their long forgotten alliance.
RIFF File Format
A RIFF file consists of a RIFF header followed by zero or more lists (of chunks and other lists) and chunks (of data). For a specific example, see the description of an AVI RIFF form, below:
The RIFF header has the following form:
'RIFF' fileSize fileType (data)
where 'RIFF'
is the literal FourCC code 'RIFF', fileSize
is a four byte value giving the size of the data in the file, and fileType
is a FourCC that identifies the specific file type. The value of fileSize
includes the size of the fileType
FourCC plus the size of the data that follows, but does not include the size of the 'RIFF' FourCC or the size of fileSize
. The file data consists of chunks and lists, in any order.
A chunk has the following form:
ckID ckSize ckData
where ckID
is a FourCC that identifies the data contained in the chunk, ckData
is a four byte value giving the size of the data in ckData
, and ckData
is zero or more bytes of data. The data is always padded to nearest WORD
boundary. ckSize
gives the size of the valid data in the chunk; it does not include the padding, the size of ckID
, or the size of ckSize
.
A list has the following form:
'LIST' listSize listType listData
where 'LIST'
is the literal FourCC code 'LIST', listSize
is a four byte value giving the size of the list, listType
is a FourCC code, and listData
consists of chunks or lists, in any order. The value of listSize
includes the size of listType
plus the size of listData
; it does not include the 'LIST'
FourCC or the size of listSize
.
FourCCs
A FourCC (four-character code) is a 32-bit unsigned integer created by concatenating four ASCII characters. For example, the FourCC 'abcd'
is represented on a Little-Endian system as 0x64636261
. FourCCs can contain space characters, so ' abc'
is a valid FourCC. The RIFF file format uses FourCC codes to identify stream types, data chunks, index entries, and other information.
What is the �AVI� file format?
AVI RIFF Form
AVI files are identified by the FourCC 'AVI ' in the RIFF header. All AVI files include two mandatory LIST
chunks, which define the format of the streams and the stream data, respectively. An AVI file might also include an index chunk, which gives the location of the data chunks within the file. An AVI file with these components has the following form:
RIFF ('AVI '
LIST ('hdrl' ... )
LIST ('movi' ... )
['idx1' () ]
)
The 'hdrl'
list defines the format of the data and is the first required LIST
chunk. The 'movi'
list contains the data for the AVI sequence and is the second required LIST
chunk. The 'idx1'
list contains the index. AVI files must keep these three components in the proper sequence.
Note: The OpenDML extensions define another type of index, identified by the FourCC 'indx'
.
The 'hdrl'
and 'movi'
lists use subchunks for their data. The following example shows the AVI RIFF form expanded with the chunks needed to complete these lists:
RIFF ('AVI '
LIST ('hdrl'
'avih'(<MAIN AVI Header>)
LIST ('strl'
'strh'(<STREAM header>)
'strf'(<STREAM format>)
[ 'strd'(<ADDITIONAL header data>) ]
[ 'strn'(<STREAM name>) ]
...
)
...
)
LIST ('movi'
{SubChunk | LIST ('rec '
SubChunk1
SubChunk2
...
)
...
}
...
)
['idx1' (<AVI Index>) ]
)
For more information about the AVI format, see John McGowan�s AVI Overview and the OpenDML AVI extensions.
What does the RIFF parser do?
Given a RIFF file, the parser iterates through the various elements in the file, calling your specific delegates when elements are encountered.
Two example programs are provided (both as Visual Studio .NET 2003 solutions):
- RIFFParserDemo � a console application that outputs all the elements in a given RIFF file.
- RIFFParserDemo2 � a Windows App that examines RIFF files. If the file examined is an
AVI
or a WAV
, the app displays additional information extracted from the RIFF elements.
Using the RIFF parser
First, create a new RiffParser
object.
rp = new RiffParser();
Then, attempt to open the RIFF file.
rp.OpenFile(filename);
If no exceptions were thrown, the file is a valid RIFF file and you can access file type and format information by accessing FileRIFF
and FileType
. Note that, the file RIFF format and file type are FourCC codes. To read the codes in string
format, use the FromFourCC
static method:
public static string FromFourCC(int FourCC)
For example:
txtFileFormat.Text = RiffParser.FromFourCC(rp.FileRIFF);
txtFileType.Text = RiffParser.FromFourCC(rp.FileType);
Once the file type is established, read the elements in the file using the ReadElement()
method.
public bool ReadElement(ref int bytesleft,
ProcessChunkElement chunk, ProcessListElement list)
The ReadElement()
method takes the following arguments:
- A
ref int
specifying the number of bytes left in the current data chunk (initially, the length of data in the file).
- A
delegate
to be called when a chunk element is encountered.
- A
delegate
to be called when a list element is encountered.
The method returns false
when the end of data is reached.
Why is the bytesleft
parameter passed by reference? The byte count is reduced to correctly represent the amount of data left in the current list/chunk. Passing the byte count by reference allows the method caller to possibly skip the rest of the data at this 'child' level and go on to read the next 'parent' level element.
An example using ReadElement()
:
int length = Parser.DataSize;
RiffParser.ProcessChunkElement pdc =
new RiffParser.ProcessChunkElement(ProcessAVIChunk);
RiffParser.ProcessListElement pal =
new RiffParser.ProcessListElement(ProcessAVIList);
while (length > 0)
{
if (false == Parser.ReadElement(ref length, pdc, pal)) break;
}
When done processing the file, call CloseFile()
.
Handling RIFF elements
Handling chunk data
public delegate void ProcessChunkElement(RiffParser rp, int FourCCType,
int unpaddedLength, int paddedLength);
When the ProcessChunkElement
delegate is called, the method is called with four arguments:
- A reference to the
RiffParser
making the call.
- An
int
specifying the FourCC code for the chunk.
- Two
int
s specifying the unpadded and padded length for the chunk data. RIFF data is always WORD
aligned, so even if the chunk contains an odd number of bytes, an even number of bytes must be skipped to access the next element.
The chunk data can either be read or skipped, depending on the circumstance.
Read a chunk:
if (AviRiffData.ckidAVIISFT == FourCC)
{
Byte[] ba = new byte[paddedLength];
rp.ReadData(ba, 0, paddedLength);
StringBuilder sb = new StringBuilder(unpaddedLength);
for (int i = 0; i < unpaddedLength; ++i)
{
if (0 != ba[i]) sb.Append((char)ba[i]);
}
m_isft = sb.ToString();
}
Skip a chunk:
rp.SkipData(paddedLength);
Handling LIST data
public delegate void ProcessListElement(RiffParser rp, int FourCCType, int length);
When the ProcessListElement()
delegate is called, the method is called with three arguments:
- A reference to the calling
RiffParser
.
- An
int
specifying the FourCC code for the list.
- An
int
containing the length of the list data.
The list can then be skipped,
rp.SkipData(length);
or each element can be processed by calling ReadElement()
, possibly with new delegates to handle the elements in the list.
RiffParser.ProcessChunkElement pnc =
new RiffParser.ProcessChunkElement(ProcessNestedChunk);
RiffParser.ProcessListElement pnl =
new RiffParser.ProcessListElement(ProcessNestedList);
while (length > 0)
{
if (false == rp.ReadElement(ref length, pnc, pnl)) break;
}
FourCC conversions
Four static methods are available to ease conversion from and to FourCC int
s:
public static string FromFourCC(int FourCC)
public static int ToFourCC(string FourCC)
public static int ToFourCC(char[] FourCC)
public static int ToFourCC(char c0, char c1, char c2, char c3)
The method I use most is FromFourCC()
.
public static readonly int ckidAVIHeaderList = RiffParser.ToFourCC("hdrl");
public static readonly int ckidMainAVIHeader = RiffParser.ToFourCC("avih");
public static readonly int ckidODML = RiffParser.ToFourCC("odml");
public static readonly int ckidAVIExtHeader = RiffParser.ToFourCC("dmlh");
public static readonly int ckidAVIStreamList = RiffParser.ToFourCC("strl");
public static readonly int ckidAVIStreamHeader = RiffParser.ToFourCC("strh");
public static readonly int ckidStreamFormat = RiffParser.ToFourCC("strf");
public static readonly int ckidAVIOldIndex = RiffParser.ToFourCC("idx1");
public static readonly int ckidINFOList = RiffParser.ToFourCC("INFO");
public static readonly int ckidAVIISFT = RiffParser.ToFourCC("ISFT");
Unsafe and Fixed � are they needed?
RIFF files are binary files. Attempting to read RIFF files one character at a time results in a great performance impact. The data structures stored in the files are designed to be loaded in to memory and then be referenced using fixed-size C structs
. For example, an AVIMAINHEADER struct
is defined as:
typedef struct _avimainheader {
FourCC fcc;
DWORD cb;
DWORD dwMicroSecPerFrame;
DWORD dwMaxBytesPerSec;
DWORD dwPaddingGranularity;
DWORD dwFlags;
DWORD dwTotalFrames;
DWORD dwInitialFrames;
DWORD dwStreams;
DWORD dwSuggestedBufferSize;
DWORD dwWidth;
DWORD dwHeight;
DWORD dwReserved[4];
} AVIMAINHEADER;
In C++ (or C) you would:
Private void DecodeAVIHeader(IOStream& stream)
{
char[] data = new char[sizeof(AVIMAINHEADER)];
stream.Read(data, sizeof(AVIMAINHEADER));
AVIMAINHEADER* avi = (AVIMAINHEADER*)data;
int totalFrames = avi->dwTotalFrames;
�
}
But in C#, in managed code � we cannot do such tricks. Are we limited to reading a single byte at a time and doing a lot of work to decode the data?
This is where fixed
and /unsafe
come in. The fixed
keyword allows us to �fix� a piece of managed data in memory, guaranteeing that the data will not be moved or collected by the memory manager. Once the data is fixed
in memory, pointers to the data can be (relatively safely) manipulated and the data directly accessed. fixed
is like the Unix pin
and unpin
wrapped in a using
directive. Using fixed
requires compiling with the /unsafe
switch (or setting �Allow Unsafe Code Blocks� to true
in the Visual Studio project Configuration Properties page).
private unsafe void DecodeAVIHeader(RiffParser rp, int unpaddedLength, int length)
{
byte[] ba = new byte[length];
rp.ReadData(ba, 0, length);
fixed (Byte* bp = &ba[0])
{
AVIMAINHEADER* avi = (AVIMAINHEADER*)bp;
m_frameRate = avi->dwMicroSecPerFrame;
�
}
}
The managed data structure remains at the same memory location and is safe from collection as long as we are in the fixed
block. Nothing is guaranteed once we leave the fixed
block, so please do not keep any references to pointers or data that might no longer be there! Copy out the needed data and use the copy once outside the fixed
block.
Reading RIFF data (file access)
Reading the RIFF header
m_stream = new FileStream(m_filename, FileMode.Open,
FileAccess.Read, FileShare.Read);
int FourCC;
int datasize;
int fileType;
ReadTwoInts(out FourCC, out datasize);
ReadOneInt(out fileType);
Reading a RIFF element.
int FourCC;
int size;
ReadTwoInts(out FourCC, out size);
...
string type = FromFourCC(FourCC);
if (0 == String.Compare(type, LIST4CC))
{
ReadOneInt(out FourCC);
if (null == list)
{
SkipData(size - 4);
}
else
{
list(this, FourCC, size - 4);
}
bytesleft -= size;
}
else
{
int paddedSize = size;
if (0 != (size & 1)) ++paddedSize;
if (null == chunk)
{
SkipData(paddedSize);
}
else
{
chunk(this, FourCC, size, paddedSize);
}
bytesleft -= paddedSize;
}
Reading two int
s (note use of the unsafe
and fixed
keywords).
public unsafe void ReadTwoInts(out int FourCC, out int size)
{
try {
int readsize = m_stream.Read(m_eightBytes, 0, TWODWORDSSIZE);
if (TWODWORDSSIZE != readsize) {
throw new RiffParserException("Unable to read. Corrupt RIFF file " +
FileName);
}
fixed (byte* bp = &m_eightBytes[0]) {
FourCC = *((int*)bp);
size = *((int*)(bp + DWORDSIZE));
}
}
catch (Exception ex)
{
throw new RiffParserException("Problem accessing RIFF file " + FileName, ex);
}
}
A basic RIFF parser
Following is the complete source code for a simple parser which displays all the elements in a RIFF file:
using System;
using System.Text;
namespace RiffParserDemo
{
class RiffParserDemo
{
static void Main(string[] args)
{
RiffParser rp = new RiffParser();
try
{
string filename = @"C:\Program Files\Microsoft" +
" Visual Studio .NET 2003\Common7\Graphics\videos\BLUR24.avi";
if (0 != args.Length)
{
filename = args[0];
}
rp.OpenFile(filename);
Console.WriteLine("File " + rp.ShortName +
" is a \"" + RiffParser.FromFourCC(rp.FileRIFF)+
"\" with a specific type of \"" +
RiffParser.FromFourCC(rp.FileType) + "\"");
int size = rp.DataSize;
RiffParser.ProcessChunkElement pc =
new RiffParser.ProcessChunkElement(ProcessChunk);
RiffParser.ProcessListElement pl =
new RiffParser.ProcessListElement(ProcessList);
while (size > 0)
{
Console.Write(RiffParser.FromFourCC(rp.FileType) +
" (" + size.ToString() + "): ");
if (false == rp.ReadElement(ref size, pc, pl)) break;
}
rp.CloseFile();
Console.WriteLine();
}
catch (Exception ex)
{
Console.WriteLine("-----------------");
Console.WriteLine("Problem: " + ex.ToString());
}
Console.WriteLine("\n\rDone. Press 'Enter' to exit.");
Console.ReadLine();
}
public static void ProcessList(RiffParser rp, int FourCC, int length)
{
string type = RiffParser.FromFourCC(FourCC);
Console.WriteLine("Found list element of type \"" +
type + "\" and length " + length.ToString());
RiffParser.ProcessChunkElement pc =
new RiffParser.ProcessChunkElement(ProcessChunk);
RiffParser.ProcessListElement pl =
new RiffParser.ProcessListElement(ProcessList);
try {
while (length > 0) {
Console.Write(type + " (" + length.ToString() + "): ");
if (false == rp.ReadElement(ref length, pc, pl)) break;
}
}
catch (Exception ex)
{
Console.WriteLine("Problem: " + ex.ToString());
}
}
public static void ProcessChunk(RiffParser rp,
int FourCC, int length, int paddedLength)
{
string type = RiffParser.FromFourCC(FourCC);
Console.WriteLine("Found chunk element of type \"" +
type + "\" and length " + length.ToString());
rp.SkipData(paddedLength);
}
}
}
Extras
The file AviRiffData.cs contains C# compatible definitions for many AVI and WAV data structures. The file also contains many FourCC constants used in AVI and WAV files.
History