Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / IoT / Arduino

Getting Video Stream from USB Web-camera on Arduino Due - Part 8: Streaming

5.00/5 (9 votes)
9 Aug 2015CPOL17 min read 43.3K   712  
Understanding streaming data

Introduction

Part 7 is here.

This is the final article. Here, I attempt to explain one of the ways I used to understand streaming myself. I will log key stream data to see how it changes over a period of time, having that the decision of how to handle stream data can be taken.

Stream Initialization and Some Thoughts

So far, pipe #0 is initialized only and used for control transfers with device's endpoint zero. As known from previous articles the streaming endpoint's number is 2 and it is isochronous (periodic) endpoint. Isochronous transfers have only one transaction consisting of IN or OUT packet followed by DATA0 packet with no acknowledges or stages like in control transfers thus are much simpler.

If a packet has been lost it is lost, no recovery or signaling about it is provided. And that is good enough for video or audio streaming - you cannot interrupt something that is going live to "recover" the loss.

To communicate with device's endpoint number two I will use SAM3X's pipe #1. The initialization is similar to one I did for control pipe #0 with several differences:

C++
uint32_t HCD_InitiateIsochronousINPipeOne(uint8_t Address, uint8_t EndpointNumber)
{
    //Enables a pipe
    UOTGHS->UOTGHS_HSTPIP |= UOTGHS_HSTPIP_PEN1;    
    //Double-bank pipe
    UOTGHS->UOTGHS_HSTPIPCFG[1] |= (UOTGHS_HSTPIPCFG_PBK_3_BANK & UOTGHS_HSTPIPCFG_PBK_Msk);
    //Pipe size is formatted: 7 is 1024 bytes
    UOTGHS->UOTGHS_HSTPIPCFG[1] |= ((7 << UOTGHS_HSTPIPCFG_PSIZE_Pos) & UOTGHS_HSTPIPCFG_PSIZE_Msk);
    //Pipe token
    UOTGHS->UOTGHS_HSTPIPCFG[1] |= (UOTGHS_HSTPIPCFG_PTOKEN_IN & UOTGHS_HSTPIPCFG_PTOKEN_Msk);
    //Pipe type is ISOCHRONOUS                                                
    UOTGHS->UOTGHS_HSTPIPCFG[1] |= (UOTGHS_HSTPIPCFG_PTYPE_ISO & UOTGHS_HSTPIPCFG_PTYPE_Msk);
    //Pipe endpoint number
    UOTGHS->UOTGHS_HSTPIPCFG[1] |=  (((EndpointNumber) << UOTGHS_HSTPIPCFG_PEPNUM_Pos) 
                                                                & UOTGHS_HSTPIPCFG_PEPNUM_Msk);
    //Allocates memory
    UOTGHS->UOTGHS_HSTPIPCFG[1] |= UOTGHS_HSTPIPCFG_ALLOC;
    
    //Check if pipe was allocated OK
    if(0 == (UOTGHS->UOTGHS_HSTPIPISR[1] & UOTGHS_HSTPIPISR_CFGOK))
    {
        //Disables pipe and prints debug message about failure        
        UOTGHS->UOTGHS_HSTPIP &= ~UOTGHS_HSTPIP_PEN1;
        PrintStr("Pipe 1 was not allocated.\r\n");
        return 0;
    }
    
    //Address of this pipe
    UOTGHS->UOTGHS_HSTADDR1 = (UOTGHS->UOTGHS_HSTADDR1 & ~UOTGHS_HSTADDR1_HSTADDRP1_Msk) 
                                                              | UOTGHS_HSTADDR1_HSTADDRP1(Address);
    UOTGHS->UOTGHS_HSTPIPIER[1] = UOTGHS_HSTPIPIER_UNDERFIES;   //Underflow interrupt enable
    UOTGHS->UOTGHS_HSTIER = UOTGHS_HSTIER_PEP_1;                //Enables pipe interrupt    
    
    //Debug message about successful pipe initiation
    PrintStr("Pipe 1 has been allocated successfully.\r\n");
    return 1;
}

Article 5 has description of pipe initialization process.

Note that in case of pipe #1 the differences are: pipe is isochronous (not control), token is always IN as data will be transferred from device to host, three data banks are used (in fact 2 banks give the same result) as opposed to one in control transfer, underflow interrupt is enabled which fires if host cannot read data banks fast enough (faster than data incoming), pipe size is 1024 bytes - I set it on the maximum as I know from previous articles that the camera has endpoint alternate setting with such size.

Why three or two banks are used? Because during isochronous transfers data is coming in fast. Once first bank is filled with data, the program can read it at the same time while second bank is being filled up. Then bank switch happens - program reads bank 2 while bank one is being filled and so on.

Thoughts

SAM3X switches banks regardless of AUTO_SWITCH setting. Even if data size returned from the device is 0, it switches to next bank anyway although the current bank is empty and I could not find a way to stop it.

The consequence of this is that the host must handle current bank data before switch to the next bank occurs - 125 micro seconds (interval between USB bus micro-frames).

If endpoint size is 128 bytes, there is enough time to output directly to TFT monitor but if endpoint size is 512 or 1024 bytes - it cannot output data in time because output operation takes too long (see article 3) due to TFT shield's pins scattered connection to Arduino Due board (it might had enough time if shield was connected differently - all data pins to one port, not to three different ports like it is now). So I can use either 128 bytes transactions and output data directly to TFT monitor or buffer it and then output the whole video frame at once.

Later I'll show that 128 bytes data transfers are not sufficient to transfer the whole frame in time, thus the only way is to store data into buffer as this operation happens in RAM and therefore is very fast.

Next problem is the buffer size. SAM3X has 64 + 32 kB of RAM. As I'm going to output gray-scale video, I need to store only one byte of two for one pixel which gives 160x120=19200, 176x144=25344 or 320x240=76800 bytes for first three frame sizes (there is no reason to try to output higher frame sizes like 640x480 as TFT screen is only 320x240). Buffer for 19200 and 25344 bytes works fine but when I allocate 76800 bytes it builds OK but after the upload to Arduino Due - it becomes a brick, nothing works at all, no messages via RS232 port, nothing at all. I also tried to comment out all Print function because these printed strings can be loaded into RAM and occupy much needed space - still it does not help, so I suspect that there something is going on with hidden C-library's code or other reasons I don't know of.

After many tries and failures, I decided to stop on 160x120 and 176x144 screen sizes only. It will be enough to learn USB streaming and if having some more powerful ARM processor, the task can be successfully extended to deal with higher screen sizes.

Back to Stream Initialization

In the previous article, I ended on function USBD_SetInterfaceAlternateSettingEnd and said that streaming processing starts here, let me upgrade it:

C++
void USBD_SetInterfaceAlternateSettingEnd(uint16_t ByteReceived)
{
    PrintStr("Alternate setting has been set.\r\n");
    if(HCD_InitiateIsochronousINPipeOne(DeviceAddress, 2))
    {
        hcd_IsochronousControlStructure.TransferEnd = VP_ProcessPartialPayload_Logging;
        
        //Enables sending IN packets endlessly
        UOTGHS->UOTGHS_HSTPIPINRQ[1] = UOTGHS_HSTPIPINRQ_INMODE;
        //Enables data received interrupt
        UOTGHS->UOTGHS_HSTPIPIER[1] = UOTGHS_HSTPIPIER_RXINES;
        //Unfreezes pipe 1
        UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_PFREEZEC;
    }    
}

Bold is what I added.

Firstly, pipe #1 is initialized for device's endpoint #2. If pipe initialization was successful, it assigns function pointer TransferEnd with function that will be called every time data comes from USB device (VP_ProcessPartialPayload_Logging will be described later - "partial" means that it processes just a small part of the whole video frame data). Bit INMODE of register Host Pipe 1 IN Request Register (UOTGHS_HSTPIPINRQ[1]) controls pipe IN packets sending. I set it to 1 which makes SAM3X endlessly sending IN packets every USB bus micro-frame to the device endpoint specified for pipe #1 unless pipe #1 is frozen (stopped). And now it is frozen because I never unfroze it before in the code. I also then enable "data received" interrupt on pipe #1 which will trigger data processing. And finally, pipe #1 is unfrozen which makes IN packets to be sent by host to device.

At this point, the device sends back DATA0 packets with partial video stream data each time after receiving IN packets from the host. To handle it, let's return to HCD interrupt handler and change it:

C++
//...

//Getting pipe interrupt number
uint8_t PipeInterruptNumber = HCD_GetPipeInterruptNumber();
if(0 == PipeInterruptNumber)
{
    HCD_HandleControlPipeInterrupt();                        //Pipe 0 interrupt
    return;
}    
if(1 == PipeInterruptNumber)
{
    HCD_HandlePipeInterrupt(PipeInterruptNumber);            //Other pipes interrupt
    return;
}

//Manage Vbus error
if (0 != (UOTGHS->UOTGHS_SR & UOTGHS_SR_VBERRI))
{
    UOTGHS->UOTGHS_SCR = UOTGHS_SCR_VBERRIC;                //Ack VBus error interrupt
    PrintStr("VBus error.\r\n");
    return;
}

//...

As usual, bold code is what I added. If interrupt belongs to pipe 1, function HCD_HandlePipeInterrupt is called:

MC++
void HCD_HandlePipeInterrupt(uint8_t PipeNumber)
{
    //Check if IN packet Data came in interrupt happened
    if(UOTGHS->UOTGHS_HSTPIPISR[PipeNumber] & UOTGHS_HSTPIPISR_RXINI)
    {
        UOTGHS->UOTGHS_HSTPIPICR[PipeNumber] = UOTGHS_HSTPIPICR_RXINIC;   //Ack IN DATA interrupt
        HCD_ProcessINData(PipeNumber);    
        return;
    }
    
    //Underflow condition - means all banks are full, cannot receive data anymore
    if(UOTGHS->UOTGHS_HSTPIPISR[PipeNumber] & UOTGHS_HSTPIPISR_UNDERFI)
    {
        UOTGHS->UOTGHS_HSTPIPICR[PipeNumber] = UOTGHS_HSTPIPICR_UNDERFIC; //Ack underflow
        UOTGHS->UOTGHS_HSTPIPINRQ[PipeNumber] = 0;                        //Stop IN packet generation
        UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_RXINEC;            //Stop IN interrupts
        PrintStr("Error: underflow.\r\n");        
        return;
    }
    
    PrintStr("Uncaught Pipe ");
    PrintDEC(PipeNumber);
    PrintStr(" interrupt.\r\n");
}

This function handles and acknowledges two interrupts - "data received" interrupt and "underflow" interrupt, also it notifies about any other interrupt on pipe 1 which in theory should not happen - just in case.

Remember, underflow interrupt was enabled during pipe 1 initialization and I did it because you have it a lot if you experiment with USB code. It is a nice indication that the code you write is not fast enough to handle incoming stream due to processor slowness or stream is too fast or code is wrong. Regardless of the reason, I acknowledge it to not get this interrupt repeatedly, stop IN packets generation and print message to RS-232 port about failure.

If code reaches the end of this function meaning some other interrupt has happened, the code does not recognize it and therefore does not handle it. Notifying message is printed.

And the most important part is where "data received" interrupt is handled. As usual, interrupt is acknowledged and HCD_ProcessINData function is called:

C++
void HCD_ProcessINData(uint8_t PipeNumber)
{
    //Returns byte count received into FIFO buffer
    uint16_t ByteReceived = UOTGHS->UOTGHS_HSTPIPISR[PipeNumber] >> UOTGHS_HSTPIPISR_PBYCT_Pos;
    
    //Getting pointer to FIFO buffer
    register volatile uint32_t* ptrFIFOBuffer = 
                 (volatile uint32_t*)&(((volatile uint32_t(*)[0x8000/4])UOTGHS_RAM_ADDR)[PipeNumber]);
    
    //Calling Video processing and passing into it pointer to FIFO buffer and its size
    (*hcd_IsochronousControlStructure.TransferEnd)(ByteReceived, (uint32_t*)ptrFIFOBuffer);
}

Data processing on HCD level consists of getting quantity of bytes received, getting pointer to USB module FIFO buffer, calling function pointer TransferEnd points to. Note that currently TransferEnd points to VP_ProcessPartialPayload_Buffering function, it's time to describe it.

Logging the Stream

Before streaming, we need to understand what are we having from the camera. Function VP_ProcessPartialPayload_Buffering is exactly for this purpose - it collects key data from certain quantity of transactions, stops the stream and then outputs collected data in readable format. Let me define what is "key data":

  1. A sequential number of the transaction. If I log 3000 transactions, I must receive records with numbers from 0 to 2999.
  2. Total bytes received in DATA0 packet from the device.
  3. Header size. According to paragraph 2.4 of [4], received data is divided into header and payload. Payload must be extracted and buffered because it is a video data, header is on the other hand must be read to understand where the current frame ends, to see if any errors happened and other supplemental information. Header size is always the first byte of received data and should be 12 bytes.

    Image 1

  4. Second byte of header (and received data) is field called BitField. Bit EOF is set in the last transaction of the frame - it will serve as an indication to start outputting buffered video frame data. That is basically all I'll use from that info. As alternative, bit FID can be used as it toggles every time new frame starts, for example if frame consists of 50 transactions - FID will be 50 times 0, then 50 times 1, then 50 time 0 and so on. Note, there are another 10 bytes of some timing info and clock references I do not use.
  5. SOF (start of frame) number. SAM3X assigns numbers to USB bus SOF packets, this is just an internal number which means nothing to anything else inside microcontroller. They are located in register UOTGHS_HSTFNUM (Host Frame Number Register): field FNUM is 11 bits field and contains Frame number (ever 1ms), field MFNUM is 3 bits field contains Micro-frame number (every 125 micro seconds, or eight time inside a frame).
  6. Current bank. To see how SAM3X toggles them.

    Example:

    Image 2

    It is a 2835th transaction since the start, 1024 bytes received, 12 of which are header, EOF bit in BitField is zero which means this transaction is somewhere inside the video frame, this is 621st SOF and second mSOF, bank0 holds the data.

Structure to store such information:

C++
typedef struct
{
    uint16_t TotalSize;
    uint8_t HeaderSize;
    uint8_t HeaderBitfield;
    uint32_t Frame;
    uint32_t CurrentBank;
} VP_HeaderInfoType;

Function that prints it:

C++
void VP_PrintHeaderInfo(uint32_t Count);

and as usual, I do not show print functions in these articles as they are easy to understand without any explanation and have long size. Please see it in the code section of the article.

For all video logging and processing functions and definitions, I created files VideoProcessing.h and VideoProcessing.c. Quantity of how many transactions to log is defined on top of the VideoProcessing.c.

C++
#define LOG_SIZE            (3000)

Previously mentioned VP_ProcessPartialPayload_Logging function is here too:

C++
void VP_ProcessPartialPayload_Logging(uint16_t ByteReceived, uint32_t* ptrFIFOBuffer)
{
    uint32_t Temp32;
    
    ptrHeaderInfo[Index].Frame = UOTGHS->UOTGHS_HSTFNUM;
    ptrHeaderInfo[Index].CurrentBank = UOTGHS->UOTGHS_HSTPIPISR[1];
    if(0 != ByteReceived)
    {        
        Temp32 = *ptrFIFOBuffer++;
        ptrHeaderInfo[Index].TotalSize = ByteReceived;
        ptrHeaderInfo[Index].HeaderSize = Temp32;
        ptrHeaderInfo[Index].HeaderBitfield = Temp32 >> 8;
    }
    UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_FIFOCONC;
    Index ++;

    if(LOG_SIZE <= Index)
    {
        UOTGHS->UOTGHS_HSTPIPINRQ[1] = 0;
        UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_RXINEC;
        VP_PrintHeaderInfo(LOG_SIZE);
    }
}

For each transaction, it saves Start-Of-Frame (SOF) together with mSOF number and bank that holds data, then it saves header info if more than zero bytes are received from the device. Once it stored everything I wanted - current bank is released by writing bit FIFOCONC into UOTGHS_HSTPIPIDR register for pipe #1.

Index is increased with each transaction and when it approaches LOG_SIZE, IN packet generation is stopped, data received interrupt is disabled and logged data is shown.

I'll do 3000 transaction to endpoint #2 applying different alternate settings: 1, 2 and 3 which means 128, 512 and 1024 bytes (see article 6). Just to remind, alternative setting is set up here:

C++
void USBD_SetInterfaceAlternateSettingBegin(void)
{
    usb_setup_req_t ControlRequest;
    
    ControlRequest.bmRequestType = USB_REQ_DIR_OUT | USB_REQ_TYPE_STANDARD | USB_REQ_RECIP_INTERFACE;
    ControlRequest.bRequest = USB_REQ_SET_INTERFACE;
    
    //Alternate setting: ..., #3 - 1024 bytes, #2 - 512 bytes, #1 - 128 bytes
    ControlRequest.wValue = EP_SIZE;
    
    ControlRequest.wIndex = 1;                //Video Streaming interface number is #1
    ControlRequest.wLength = 0;               //No data
    
    HCD_SendCtrlSetupPacket(DeviceAddress, USB_REQ_DIR_OUT, ControlRequest, 
                                           NULL, 0, USBD_SetInterfaceAlternateSettingEnd);
}

Since I have shown this function previously, I added comment and defined EP_SIZE constant to change it with 1, 2 or 3 in another place for convenience.

I'll run the logging function for those three alternate settings and then discuss the output:

128 bytes alternate setting logging

Image 3

One full and 2 partially visible video frames are shown here. You can easily see the borders where video frames end - EOF bit of BitField is set - and inter-frame pause starts. Each 128 bytes portion (transaction) comes every USB bus micro-frame (0.125ms). Pause interval between video frames varies (46 and 76 ms).

Note: To calculate time intervals, two SOF values can be subtracted, resulting time will be in milliseconds. For example, a full frame ends on line #2003, its SOF is 517 ans micro-SOF is 2. The beginning is on line #1843 with SOF equals to 497 and micro-SOF is 2. Subtracting 497 from 517 gives 20, 2 from 2 gives 0, so that frame video data took exactly 20ms to transmit.

Let's see how much data was transferred for the video frame. #2001 - #1843 = 158 transactions. Each transaction is 128 - 12 = 116 bytes of video data. 116*158 = 18328 bytes of video data in total were sent to host. Full frame size for 160x120 video frame is 160x120*2=38400 bytes. So this proves that using alternate setting #1 for 128 bytes transfer won't work, camera just does not send full frame data, only about a half. I will also show how such video looks.

512 bytes alternate setting logging

Image 4

One full and 2 partially visible video frames are shown here. You can easily see the border where video frames end - EOF bit of BitField is set. No second 12 bytes transaction like in previous alternate setting. Note that one pause is about 2ms and the next one is 31ms which means that it won't be possible to output the whole frame from buffer to TFT monitor when pause is 2ms in time before the next video frame data comes. Thus some sort of frame skipping must occur. No frame skipping is needed during the second pause as it allows enough time to output a video frame.

Also note occurrence of empty transactions inside a frame as opposed to previous alternate setting. It would have given us more time if SAM3X did not switch banks when empty data packets are received.

1024 bytes alternate setting logging

Image 5

This one is very similar to the previous case. It also has long and short pauses between neighboring video frame transfers, so frame skipping is also applied here. The difference here is quantity of empty transactions inside video frame transfers. It is slightly increased.

Summarizing

  1. Quantity of empty transactions grows if transaction size grows for the same video frame size - cannot use this fact because that quantity varies from none to 7, not stable and not always there like line #2359 and #2360 on the previous picture.
  2. Pause between video frame transfers can be different for the same transaction size - frame skipping is needed for short pauses.
  3. EOF bit will be used to trigger frame output from buffer to TFT monitor. If it so happens that EOF bit is set in consecutive line - it can just be ignored.

So far the task looks easy.

Setting Up Frame Area on the Screen

Before any attempt to show video, the area for video on the screen must be set up. The area dimensions must be the same as a video frame dimensions. I will place the area in the middle of TFT monitor, code for this is located inside the main file:

C++
#define LEFT_PIXEL      60
#define TOP_PIXEL       80
#define RIGHT_PIXEL     179
#define BOTTOM_PIXEL    239
#define TOTAL_PIXELS    19200    //120x160=19200

//...

//Painting screen blue
for(uint32_t i=0; i<76800; i++)
{
    LCD_SET_DB(arColors[3]);
    LCD_CLR_WR();
    LCD_SET_WR();
}
    
//Setting frame area in the middle
LCD_SetArea(LEFT_PIXEL, TOP_PIXEL, RIGHT_PIXEL, BOTTOM_PIXEL);
    
//Cursor at left upper corner
LCD_SetCursor(RIGHT_PIXEL, TOP_PIXEL);
LCD_CLR_CS();
    
//Painting frame white
for(uint32_t i=0; i<TOTAL_PIXELS; i++)
{
    LCD_SET_DB(arColors[0]);
    LCD_CLR_WR();
    LCD_SET_WR();
}
    
HCD_SetEnumerationStartFunction(USBD_GetDeviceDescriptorBegin);
HCD_Init();
    
while(1);

//...

Before we have already painted the whole screen blue. Now I set the area, put cursor at the beginning of the area and paint it white in the same way - just outputting 160x120=19200 white pixels, TFT monitor itself carries them to the next line once current line's end is reached.

Note that left, top, right and bottom might not be what they are because the monitor is turned.

Painting frame white is not necessary because it will be overridden by video. I did it just to show its size comparing to overall dimension of the monitor.

Result looks like this:

Image 6

Buffering Frame Function

The device will either log or buffer incoming data. Logging has just been discussed and once we saw what actually comes from the camera, it is time to understand buffering. Normally, a full frame is stored inside Arduino Due's RAM memory. If it so happened that stored frame is not full by the time new frame starts incoming - it is discarded (frame skipping). I named this function similar to logging one but instead of "logging" I used "buffering":

C++
void VP_ProcessPartialPayload_Buffering(uint16_t ByteReceived, uint32_t* ptrFIFOBuffer)
{
    register uint32_t ByteReceived_DW, i, Temp32;
    register uint16_t Temp16;
    register uint8_t Bitfield;
    
    //Empty packet - return and release FIFO buffer
    if(0 == ByteReceived)
    {
        UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_FIFOCONC;
        return;
    }
    
    //Reading 12 inf bytes
    Temp32 = *ptrFIFOBuffer++;
    Bitfield = (Temp32 >> 8);                    //For last frame portion determination
    Temp32 = *ptrFIFOBuffer++;
    Temp32 = *ptrFIFOBuffer++;
    
    //Calculating data size in double words (4 bytes)
    ByteReceived_DW = ((ByteReceived >> 2) - 3);    
    
    //Saving 2 bytes at once
    for(i=0; i<ByteReceived_DW; i++)
    {
        Temp32 = *ptrFIFOBuffer++;        
        Temp16 = (Temp32 & 0xFFu);              //Byte 0
        Temp16 |= ((Temp32 >> 8) & 0xFF00u);    //Byte 2        
        ptrVideoFrame[Index++] = Temp16;        //Saving bytes 0 & 2 as 0 & 1
    }    
    
    //Releasing FIFO buffer
    UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_FIFOCONC;
    
    //Checking if frame has been loaded
    if(Bitfield & 0x02)
    {
        ShowFrame();        
    }
}

During the logging, I show that there were cases when empty packets are received. They are handled by just returning from buffering function with a bank switch (otherwise "underflow" interrupt happens):

C++
//Empty packet - return and release FIFO buffer
if(0 == ByteReceived)
{
    UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_FIFOCONC;    //Release current bank
    return;
}

All other transactions were at least 12 bytes in length, this is because standard uncompressed payload header is 12 bytes according to [4]. Thus full header (12 bytes) should be read before video frame data could be read. Also, second byte (BitField) contains EOF bit which I use as an indication to start outputting stored video frame:

C++
//Reading first 12 header bytes
Temp32 = *ptrFIFOBuffer++;
Bitfield = (Temp32 >> 8);      //For the last frame transaction determination
Temp32 = *ptrFIFOBuffer++;
Temp32 = *ptrFIFOBuffer++;

Note that to read 12 bytes, only 3 read operation required - this is a power of 32-bit systems. That is also why next piece of code is needed - data size is given in bytes but SAM3X reads in double words:

C++
//Calculating data size in double words (4 bytes)
ByteReceived_DW = ((ByteReceived >> 2) - 3);

Now knowing how many iterations in "double words" (4 bytes) are needed, actual video data can be stored saving two pixels at once:

C++
//Saving 2 bytes at once
for(i=0; i<ByteReceived_DW; i++)
{
    Temp32 = *ptrFIFOBuffer++;        
    Temp16 = (Temp32 & 0xFFu);              //Byte 0
    Temp16 |= ((Temp32 >> 8) & 0xFF00u);    //Byte 2        
    ptrVideoFrame[Index++] = Temp16;        //Saving bytes 0 & 2 as 0 & 1
}   

In YUY2 video format, each pixel is coded with 2 bytes but color information is spread over 4 bytes. In other words, to decode one pixel color, next pixel also has be read. And to decode next pixel's color information, previous pixel has to be read - it works in pairs.

As I said, I don't output in color, only in gray-scale. A gray-scale info located in first byte of every pixel. So that is exactly what above piece of code does - extracts first byte of every pixel and because it processes 4 bytes at once - byte 0 and byte 2 are the first bytes of two processed pixels. Then it packs it into one word (2 bytes) and stores it into video frame buffer.

Once data has been stored, current data bank is released.

In the end, there is a check if it was the last transaction of the video frame:

C++
//Checking if frame has been loaded
if(Bitfield & 0x02)
{
    ShowFrame();        
}

Bit EOF (end of frame) is tested - second bit in BitField. If it was - video frame must be shown on screen.

This function is assigned in the same place where "logging" function is. To use one, comment out another and otherwise:

C++
void USBD_SetInterfaceAlternateSettingEnd(uint16_t ByteReceived)
{
    PrintStr("Alternate setting has been set.\r\n");
    if(HCD_InitiateIsochronousINPipeOne(DeviceAddress, 2))
    {
        //hcd_IsochronousControlStructure.TransferEnd = VP_ProcessPartialPayload_Logging;
        hcd_IsochronousControlStructure.TransferEnd = VP_ProcessPartialPayload_Buffering;
        
        //Enables sending IN packets endlessly
        UOTGHS->UOTGHS_HSTPIPINRQ[1] = UOTGHS_HSTPIPINRQ_INMODE;
        //Enables data received interrupt
        UOTGHS->UOTGHS_HSTPIPIER[1] = UOTGHS_HSTPIPIER_RXINES;
        //Unfreezes pipe 1
        UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_PFREEZEC;
    }    
}

Showing Frame Function

To show video frame on screen, video buffer must be read word-by-word, each byte must be converted into pixel and sent to TFT monitor:

C++
void ShowFrame()
{
    register uint8_t PixelPart;
    register uint16_t Pixel;
    register uint32_t i, Temp16;
    
    //When frame is not fully stored (frame skipping)
    if(Index < VIDEO_FRAME_SIZE)    
        return;    
    
    Index = 0;
    UOTGHS->UOTGHS_HSTPIPINRQ[1] = 0;
    UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_RXINEC;
    
    //Cursor at left upper corner
    LCD_SetCursor(RIGHT_PIXEL, TOP_PIXEL);
    LCD_CLR_CS();
    
    i = 0;
    do
    {
        Temp16 = ptrVideoFrame[i];
        
        PixelPart = Temp16;
        Pixel = RGB(PixelPart, PixelPart, PixelPart);
        LCD_SET_DB(Pixel);
        LCD_CLR_WR();
        LCD_SET_WR();
        
        PixelPart = (Temp16 >> 8);
        Pixel = RGB(PixelPart, PixelPart, PixelPart);
        LCD_SET_DB(Pixel);
        LCD_CLR_WR();
        LCD_SET_WR();
        i++;
    }
    while(i<VIDEO_FRAME_SIZE);
    
    UOTGHS->UOTGHS_HSTPIPINRQ[1] = UOTGHS_HSTPIPINRQ_INMODE;
    UOTGHS->UOTGHS_HSTPIPIER[1] = UOTGHS_HSTPIPIER_RXINES;
    UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_PFREEZEC;
}

First thing it does is frame skipping. To explain it, please see that this function stops sending IN packets to the camera for the period of time while it outputs video data:

C++
UOTGHS->UOTGHS_HSTPIPINRQ[1] = 0;
UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_RXINEC;
    
//... 
    
UOTGHS->UOTGHS_HSTPIPINRQ[1] = UOTGHS_HSTPIPINRQ_INMODE;
UOTGHS->UOTGHS_HSTPIPIER[1] = UOTGHS_HSTPIPIER_RXINES;
UOTGHS->UOTGHS_HSTPIPIDR[1] = UOTGHS_HSTPIPIDR_PFREEZEC;

So when pause between two video frames is short (please see "Logging the stream" section), it may not receive several next video frame's transactions, thus making stored data not full (value in VIDEO_FRAME_SIZE). In this case, the function just returns effectively skipping not fully stored video frame:

C++
//When frame is not fully stored (frame skipping)
if(Index < VIDEO_FRAME_SIZE)    
    return;

Before video data can be output, TFT monitor's cursor should be positioned into left-upper corner:

C++
//Cursor at left upper corner
LCD_SetCursor(RIGHT_PIXEL, TOP_PIXEL);
LCD_CLR_CS();

Once cursor is positioned, it reads two bytes at once, converts them into 16-bit pixels with only gray-scale information and sends them to TFT monitor:

C++
i = 0;
do
{
    Temp16 = ptrVideoFrame[i];
      
    PixelPart = Temp16;
    Pixel = RGB(PixelPart, PixelPart, PixelPart);
    LCD_SET_DB(Pixel);
    LCD_CLR_WR();
    LCD_SET_WR();
        
    PixelPart = (Temp16 >> 8);
    Pixel = RGB(PixelPart, PixelPart, PixelPart);
    LCD_SET_DB(Pixel);
    LCD_CLR_WR();
    LCD_SET_WR();
    i++;
}
while(i<VIDEO_FRAME_SIZE);

Simple enough!

Putting All Settings Together

The above code examples use some constants, like LOG_SIZE, VIDEO_FRAME_SIZE, LEFT_PIXEL, etc. I put them all into WebCameraCapturing.h and group into frame sizes:

C++
#define Screen_160x120
// #define Screen_176x144

#ifdef Screen_160x120
    #define LEFT_PIXEL      60
    #define TOP_PIXEL       80
    #define RIGHT_PIXEL     179
    #define BOTTOM_PIXEL    239
    #define TOTAL_PIXELS    19200    //120x160=19200
    #define FRAME_INDEX     5        //120x160
    #define EP_SIZE         3        //512 bytes
#endif // Screen_160x120

#ifdef Screen_176x144
    #define LEFT_PIXEL      48
    #define TOP_PIXEL       72
    #define RIGHT_PIXEL     191
    #define BOTTOM_PIXEL    247
    #define TOTAL_PIXELS    25344    //176x144=25344
    #define FRAME_INDEX     4        //176x144
    #define EP_SIZE         3        //1024 bytes
#endif // Screen_176x144

#define LOG_SIZE            (3000)
#define VIDEO_FRAME_SIZE    (TOTAL_PIXELS/2)

so by commenting out one of two top lines:

C++
#define Screen_160x120
// #define Screen_176x144

you'll have all parameters set for a specified video frame size - correct frame centering on screen, video buffer size, etc. Also video parameters are controlled here: endpoint size and frame index.

Streaming with 160x120 Frame Size

The first is the smallest size. I'll try to set endpoint size to 128 bytes (also the smallest) and prove what I calculated in "Logging the stream" section - it must show about a half of the frame:

Settings:

C++
#define Screen_160x120
// #define Screen_176x144

#ifdef Screen_160x120
    #define LEFT_PIXEL      60
    #define TOP_PIXEL       80
    #define RIGHT_PIXEL     179
    #define BOTTOM_PIXEL    239
    #define TOTAL_PIXELS    19200    //120x160=19200
    #define FRAME_INDEX     5        //120x160
    #define EP_SIZE         1        //128 bytes
#endif // Screen_160x120

Output:

Image 7

For some reason, camera does not send enough data when 128 bytes alternate setting is selected.

If I try to use 512 or 1024 bytes alternate setting:

C++
#define Screen_160x120
// #define Screen_176x144

#ifdef Screen_160x120
    #define LEFT_PIXEL      60
    #define TOP_PIXEL       80
    #define RIGHT_PIXEL     179
    #define BOTTOM_PIXEL    239
    #define TOTAL_PIXELS    19200    //120x160=19200
    #define FRAME_INDEX     5        //120x160
    #define EP_SIZE         2        //512 bytes
#endif // Screen_160x120

Output:

Image 8

(Please note that screenshot quality is a bit worse than how it looks in reality, I did not always make it nice with my Nokia phone.)

A video itself is uploaded to Youtube: https://www.youtube.com/watch?v=T4V321WtYuM

Streaming with 176x144 Frame Size

Comment first and uncomment second line:

C++
// #define Screen_160x120
#define Screen_176x144

//...

#ifdef Screen_176x144
    #define LEFT_PIXEL      48
    #define TOP_PIXEL       72
    #define RIGHT_PIXEL     191
    #define BOTTOM_PIXEL    247
    #define TOTAL_PIXELS    25344    //176x144=25344
    #define FRAME_INDEX     4        //176x144
    #define EP_SIZE         3        //1024 bytes (#2 for 512 bytes will also work perfectly)
#endif // Screen_176x144

Output:

Image 9

Corresponding video: https://www.youtube.com/watch?v=TjBFTG5kr5w

Conclusion

Outputting video stream requires knowledge of many things: a hardware (Arduino Due in my case), USB standard, UVC standard and video format standard (uncompressed in my case). It is not easy to put all those things together, but not impossible even for a C# person like me with no prior experience in ARM Cortex-m3 processors, USB and UVC whatsoever.

Also, this task now can be completed in a nicer quality, easier and with bigger frame sizes if proper ARM hardware is given (something like BeagleBone or Raspberry), because obviously Arduino Due has not enough power and memory to properly handle video streams. Nevertheless, it was a pleasure to make my video study with Arduino due to easiness of programming with just Atmel Studio and without any additional programmers and movements.

The source code can be found here.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)