(untagged)

C# RIFF Parser

gtamir

0.00/5 (No votes)

6 Jun 2005

Decode Resource Interchange Files (AVI, WAV, RMID...) using this pure C# parser.

RiffParserDemo2 Windows GUI

Windows App demo

RiffParserDemo console app

Console App demo

Introduction

Ever wondered how to access the information in an AVI file or wanted to extract information directly from a WAV file? Most people refer external COM objects (like AVIFIL32.DLL) to access the information (see A Simple C# Wrapper for the AviFile Library). The RIFF Parser allows you to access the resource information stored in the files directly by C#.

Background

What is a RIFF file?

RIFF is the Resource Interchange File Format. This is a general purpose format for exchanging multimedia data types that was defined by Microsoft and IBM during their long forgotten alliance.

RIFF File Format

A RIFF file consists of a RIFF header followed by zero or more lists (of chunks and other lists) and chunks (of data). For a specific example, see the description of an AVI RIFF form, below:

The RIFF header has the following form:

'RIFF' fileSize fileType (data)

where 'RIFF' is the literal FourCC code 'RIFF', fileSize is a four byte value giving the size of the data in the file, and fileType is a FourCC that identifies the specific file type. The value of fileSize includes the size of the fileType FourCC plus the size of the data that follows, but does not include the size of the 'RIFF' FourCC or the size of fileSize. The file data consists of chunks and lists, in any order.

A chunk has the following form:

ckID ckSize ckData

where ckID is a FourCC that identifies the data contained in the chunk, ckData is a four byte value giving the size of the data in ckData, and ckData is zero or more bytes of data. The data is always padded to nearest WORD boundary. ckSize gives the size of the valid data in the chunk; it does not include the padding, the size of ckID, or the size of ckSize.

A list has the following form:

'LIST' listSize listType listData

where 'LIST' is the literal FourCC code 'LIST', listSize is a four byte value giving the size of the list, listType is a FourCC code, and listData consists of chunks or lists, in any order. The value of listSize includes the size of listType plus the size of listData; it does not include the 'LIST' FourCC or the size of listSize.

FourCCs

A FourCC (four-character code) is a 32-bit unsigned integer created by concatenating four ASCII characters. For example, the FourCC 'abcd' is represented on a Little-Endian system as 0x64636261. FourCCs can contain space characters, so ' abc' is a valid FourCC. The RIFF file format uses FourCC codes to identify stream types, data chunks, index entries, and other information.

What is the �AVI� file format?

AVI RIFF Form

AVI files are identified by the FourCC 'AVI ' in the RIFF header. All AVI files include two mandatory LIST chunks, which define the format of the streams and the stream data, respectively. An AVI file might also include an index chunk, which gives the location of the data chunks within the file. An AVI file with these components has the following form:

RIFF ('AVI '
      LIST ('hdrl' ... )
      LIST ('movi' ... )
      ['idx1' () ]
     )

The 'hdrl' list defines the format of the data and is the first required LIST chunk. The 'movi' list contains the data for the AVI sequence and is the second required LIST chunk. The 'idx1' list contains the index. AVI files must keep these three components in the proper sequence.

Note: The OpenDML extensions define another type of index, identified by the FourCC 'indx'.

The 'hdrl' and 'movi' lists use subchunks for their data. The following example shows the AVI RIFF form expanded with the chunks needed to complete these lists:

RIFF ('AVI '
      LIST ('hdrl'
            'avih'(<MAIN AVI Header>)
            LIST ('strl'
                  'strh'(<STREAM header>)
                  'strf'(<STREAM format>)
                  [ 'strd'(<ADDITIONAL header data>) ]
                  [ 'strn'(<STREAM name>) ]
                  ...
                 )
             ...
           )
      LIST ('movi'
            {SubChunk | LIST ('rec '
                              SubChunk1
                              SubChunk2
                              ...
                             )
               ...
            }
            ...
           )
      ['idx1' (<AVI Index>) ]
     )

For more information about the AVI format, see John McGowan�s AVI Overview and the OpenDML AVI extensions.

What does the RIFF parser do?

Given a RIFF file, the parser iterates through the various elements in the file, calling your specific delegates when elements are encountered.

Two example programs are provided (both as Visual Studio .NET 2003 solutions):

RIFFParserDemo � a console application that outputs all the elements in a given RIFF file.
RIFFParserDemo2 � a Windows App that examines RIFF files. If the file examined is an AVI or a WAV, the app displays additional information extracted from the RIFF elements.

Using the RIFF parser

First, create a new RiffParser object.

rp = new RiffParser();

Then, attempt to open the RIFF file.

rp.OpenFile(filename);

If no exceptions were thrown, the file is a valid RIFF file and you can access file type and format information by accessing FileRIFF and FileType. Note that, the file RIFF format and file type are FourCC codes. To read the codes in string format, use the FromFourCC static method:

public static string FromFourCC(int FourCC)

For example:

txtFileFormat.Text = RiffParser.FromFourCC(rp.FileRIFF);
txtFileType.Text = RiffParser.FromFourCC(rp.FileType);

Once the file type is established, read the elements in the file using the ReadElement() method.

public bool ReadElement(ref int bytesleft, 
         ProcessChunkElement chunk, ProcessListElement list)

The ReadElement() method takes the following arguments:

A ref int specifying the number of bytes left in the current data chunk (initially, the length of data in the file).
A delegate to be called when a chunk element is encountered.
A delegate to be called when a list element is encountered.

The method returns false when the end of data is reached.

Why is the bytesleft parameter passed by reference? The byte count is reduced to correctly represent the amount of data left in the current list/chunk. Passing the byte count by reference allows the method caller to possibly skip the rest of the data at this 'child' level and go on to read the next 'parent' level element.

An example using ReadElement():

int length = Parser.DataSize;

RiffParser.ProcessChunkElement pdc = 
     new RiffParser.ProcessChunkElement(ProcessAVIChunk);
RiffParser.ProcessListElement pal = 
    new RiffParser.ProcessListElement(ProcessAVIList);

while (length > 0) 
{
    if (false == Parser.ReadElement(ref length, pdc, pal)) break;
}

When done processing the file, call CloseFile().

Handling RIFF elements

Handling chunk data

public delegate void ProcessChunkElement(RiffParser rp, int FourCCType, 
       int unpaddedLength, int paddedLength);

When the ProcessChunkElement delegate is called, the method is called with four arguments:

A reference to the RiffParser making the call.
An int specifying the FourCC code for the chunk.
Two ints specifying the unpadded and padded length for the chunk data. RIFF data is always WORD aligned, so even if the chunk contains an odd number of bytes, an even number of bytes must be skipped to access the next element.

The chunk data can either be read or skipped, depending on the circumstance.

Read a chunk:

if (AviRiffData.ckidAVIISFT == FourCC)
{
    Byte[] ba = new byte[paddedLength];
    rp.ReadData(ba, 0, paddedLength);
    StringBuilder sb = new StringBuilder(unpaddedLength);
    for (int i = 0; i < unpaddedLength; ++i) 
    {
        if (0 != ba[i]) sb.Append((char)ba[i]);
    }

    m_isft = sb.ToString();
}

Skip a chunk:

// Unknon chunk - skip

rp.SkipData(paddedLength);

Handling LIST data

public delegate void ProcessListElement(RiffParser rp, int FourCCType, int length);

When the ProcessListElement() delegate is called, the method is called with three arguments:

A reference to the calling RiffParser.
An int specifying the FourCC code for the list.
An int containing the length of the list data.

The list can then be skipped,

rp.SkipData(length);

or each element can be processed by calling ReadElement(), possibly with new delegates to handle the elements in the list.

RiffParser.ProcessChunkElement pnc = 
    new RiffParser.ProcessChunkElement(ProcessNestedChunk);
RiffParser.ProcessListElement pnl = 
    new RiffParser.ProcessListElement(ProcessNestedList);

while (length > 0) 
{
    if (false == rp.ReadElement(ref length, pnc, pnl)) break;
}

FourCC conversions

Four static methods are available to ease conversion from and to FourCC ints:

public static string FromFourCC(int FourCC)
public static int ToFourCC(string FourCC)
public static int ToFourCC(char[] FourCC)
public static int ToFourCC(char c0, char c1, char c2, char c3)

The method I use most is FromFourCC().

// AVI section FourCC codes

public static readonly int ckidAVIHeaderList = RiffParser.ToFourCC("hdrl");
public static readonly int ckidMainAVIHeader = RiffParser.ToFourCC("avih");
public static readonly int ckidODML = RiffParser.ToFourCC("odml");
public static readonly int ckidAVIExtHeader = RiffParser.ToFourCC("dmlh");
public static readonly int ckidAVIStreamList = RiffParser.ToFourCC("strl");
public static readonly int ckidAVIStreamHeader = RiffParser.ToFourCC("strh");
public static readonly int ckidStreamFormat = RiffParser.ToFourCC("strf");
public static readonly int ckidAVIOldIndex = RiffParser.ToFourCC("idx1");
public static readonly int ckidINFOList = RiffParser.ToFourCC("INFO");
public static readonly int ckidAVIISFT = RiffParser.ToFourCC("ISFT");

Unsafe and Fixed � are they needed?

RIFF files are binary files. Attempting to read RIFF files one character at a time results in a great performance impact. The data structures stored in the files are designed to be loaded in to memory and then be referenced using fixed-size C structs. For example, an AVIMAINHEADER struct is defined as:

typedef struct _avimainheader {
    FourCC fcc;
    DWORD  cb;
    DWORD  dwMicroSecPerFrame;
    DWORD  dwMaxBytesPerSec;
    DWORD  dwPaddingGranularity;
    DWORD  dwFlags;
    DWORD  dwTotalFrames;
    DWORD  dwInitialFrames;
    DWORD  dwStreams;
    DWORD  dwSuggestedBufferSize;
    DWORD  dwWidth;
    DWORD  dwHeight;
    DWORD  dwReserved[4];
} AVIMAINHEADER;

In C++ (or C) you would:

Private void DecodeAVIHeader(IOStream& stream)
{
    char[] data = new char[sizeof(AVIMAINHEADER)];

    stream.Read(data, sizeof(AVIMAINHEADER));

    AVIMAINHEADER* avi = (AVIMAINHEADER*)data;
    // Reference the struct members directly

    int totalFrames = avi->dwTotalFrames;
    �
}

But in C#, in managed code � we cannot do such tricks. Are we limited to reading a single byte at a time and doing a lot of work to decode the data?

This is where fixed and /unsafe come in. The fixed keyword allows us to �fix� a piece of managed data in memory, guaranteeing that the data will not be moved or collected by the memory manager. Once the data is fixed in memory, pointers to the data can be (relatively safely) manipulated and the data directly accessed. fixed is like the Unix pin and unpin wrapped in a using directive. Using fixed requires compiling with the /unsafe switch (or setting �Allow Unsafe Code Blocks� to true in the Visual Studio project Configuration Properties page).

private unsafe void DecodeAVIHeader(RiffParser rp, int unpaddedLength, int length)
{
byte[] ba = new byte[length];

    rp.ReadData(ba, 0, length);

    fixed (Byte* bp = &ba[0]) 
    {
        AVIMAINHEADER* avi = (AVIMAINHEADER*)bp;
        m_frameRate = avi->dwMicroSecPerFrame;
    �
    }
}

The managed data structure remains at the same memory location and is safe from collection as long as we are in the fixed block. Nothing is guaranteed once we leave the fixed block, so please do not keep any references to pointers or data that might no longer be there! Copy out the needed data and use the copy once outside the fixed block.

Reading RIFF data (file access)

Reading the RIFF header

// Read the RIFF header

m_stream = new FileStream(m_filename, FileMode.Open, 
     FileAccess.Read, FileShare.Read);
int FourCC;
int datasize;
int fileType;

ReadTwoInts(out FourCC, out datasize);
ReadOneInt(out fileType);

Reading a RIFF element.

int FourCC;
int size;

ReadTwoInts(out FourCC, out size);

...

// Examine the element, is it a list or a chunk

string type = FromFourCC(FourCC);
if (0 == String.Compare(type, LIST4CC))
{
    // We have a list

    ReadOneInt(out FourCC);

    if (null == list)
    {
        SkipData(size - 4);
    }
    else
    {
         // Invoke the list method

         list(this, FourCC, size - 4);
    }

    // Adjust size

    bytesleft -= size;
}
else
{
    // Calculated padded size - padded to WORD boundary

    int paddedSize = size;
    if (0 != (size & 1)) ++paddedSize;

    if (null == chunk)
    {
        SkipData(paddedSize);
    }
    else
    {
        chunk(this, FourCC, size, paddedSize);
    }

    // Adjust size

    bytesleft -= paddedSize;
}

Reading two ints (note use of the unsafe and fixed keywords).

public unsafe void ReadTwoInts(out int FourCC, out int size)
{
  try {
    int readsize = m_stream.Read(m_eightBytes, 0, TWODWORDSSIZE);

    if (TWODWORDSSIZE != readsize) {
      throw new RiffParserException("Unable to read. Corrupt RIFF file " + 
         FileName);
    }

    fixed (byte* bp = &m_eightBytes[0]) {
      FourCC = *((int*)bp);
      size = *((int*)(bp + DWORDSIZE));
    }
  }
  catch (Exception ex)
  {
    throw new RiffParserException("Problem accessing RIFF file " + FileName, ex);
  }
}

A basic RIFF parser

Following is the complete source code for a simple parser which displays all the elements in a RIFF file:

using System;
using System.Text;

namespace RiffParserDemo
{
    class RiffParserDemo
    {
        // Parse a RIFF file

        static void Main(string[] args)
        {
            // Create a parser instance

            RiffParser rp = new RiffParser();
            try 
            {
                string filename = @"C:\Program Files\Microsoft" +
                   " Visual Studio .NET 2003\Common7\Graphics\videos\BLUR24.avi";
                //string filename = @"C:\WINNT\Media\Chimes.wav"

                if (0 != args.Length)  
                {
                    filename = args[0];
                }
                    
                // Specify a file to open

                rp.OpenFile(filename);

                // If we got here - the file is valid. 

                //Output information about the file

                Console.WriteLine("File " + rp.ShortName + 
                    " is a \"" + RiffParser.FromFourCC(rp.FileRIFF)+ 
                    "\" with a specific type of \"" + 
                    RiffParser.FromFourCC(rp.FileType) + "\"");

                // Store the size to loop on the elements

                int size = rp.DataSize;

                // Define the processing delegates

                RiffParser.ProcessChunkElement pc = 
                     new RiffParser.ProcessChunkElement(ProcessChunk);
                RiffParser.ProcessListElement pl = 
                     new RiffParser.ProcessListElement(ProcessList);

                // Read all top level elements and chunks

                while (size > 0)
                {
                    // Prefix the line with the current top level type

                    Console.Write(RiffParser.FromFourCC(rp.FileType) + 
                          " (" + size.ToString() + "): ");
                    // Get the next element (if there is one)

                    if (false == rp.ReadElement(ref size, pc, pl)) break;
                }
                // Close the stream

                rp.CloseFile();
                Console.WriteLine();
            }
            catch (Exception ex)
            {
                Console.WriteLine("-----------------");
                Console.WriteLine("Problem: " + ex.ToString());
            }
            Console.WriteLine("\n\rDone. Press 'Enter' to exit.");
            Console.ReadLine();
        }

        // Process a RIFF list element (list sub elements)

        public static void ProcessList(RiffParser rp, int FourCC, int length)
        {
            string type = RiffParser.FromFourCC(FourCC);
            Console.WriteLine("Found list element of type \"" + 
                  type + "\" and length " + length.ToString());

            // Define the processing delegates

            RiffParser.ProcessChunkElement pc = 
                new RiffParser.ProcessChunkElement(ProcessChunk);
            RiffParser.ProcessListElement pl = 
                new RiffParser.ProcessListElement(ProcessList);

            // Read all the elements in the current list

            try {
                while (length > 0) {
                    // Prefix each line with the type of the current list

                    Console.Write(type + " (" + length.ToString() + "): ");
                    // Get the next element (if there is one)

                    if (false == rp.ReadElement(ref length, pc, pl)) break;
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine("Problem: " + ex.ToString());
            }
        }

        // Process a RIFF chunk element (skip the data)

        public static void ProcessChunk(RiffParser rp, 
              int FourCC, int length, int paddedLength)
        {
            string type = RiffParser.FromFourCC(FourCC);
            Console.WriteLine("Found chunk element of type \"" + 
                type + "\" and length " + length.ToString());

            // Skip data and update bytesleft

            rp.SkipData(paddedLength);
        }
    }
}

Extras

The file AviRiffData.cs contains C# compatible definitions for many AVI and WAV data structures. The file also contains many FourCC constants used in AVI and WAV files.

History

6-Jun-2005
Version 1.0 - Original release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here