Introduction
The release of Microsoft Windows 10 IoT Core in 2015 created new opportunities for C# developers to explore the world of robotics using Visual Studio and one of the most popular single board computers - Raspberry Pi. In this article, we will go over my C# code that integrates RPi with Pixy - the vision sensor geared for objects tracking - designed by Charmed Labs.
Based on Pixy's simplified data retrieval protocol, I'll show you how to receive and parse the information about visual object size and position on RPi over I2C bus using Windows 10 IoT Core application. In addition to implementing the technical side of the solution, I'll share my approach to architecturing the codebase. We will break the project into distinct layers, leverage the power of LINQ To Objects for processing data received from Pixy and put common design patterns to work for you. I am hoping that both robotic enthusiasts learning .NET as well as seasoned C# developers new to robotics will find something of interest here.
Background
Many of us when we first dive into robotics at some point have to pick our very first sensor to play with. Although I have not started with Pixy, for moderately experienced programmers, Pixy makes a reasonable first choice. It is crunching lots of visual info to deliver object positioning data to you in a compact format 50 times per second. Being able to track a visual object in your program with an investment of only $69 is pretty cool!
I completed this project over a year ago on Raspberry PI 2 and Visual Studio 2015 but these days you can use RPI 3 Model B+ and VS 2017. At the time of this writing, Pixy CMUcam5 that I used remains the latest version of the device.
For robotics enthusiasts new to Windows 10 IoT Core and C#, I’d like to add that the freely available development framework provided by Microsoft enables mastering the same technology as what numerous professional programmers use to build enterprise software and commercial web sites. Using VS.NET and applying Object Oriented Programming principles, you can build a large well organized system positioned for growth. Standard design patterns, NuGet packages, code libraries and ready-to-use solutions are available to us allowing to extend an experimental app way beyond its original scope. If you consider separation of concerns, segregation of logic within layers and loose coupling between them early in your design, you will be enjoying your growing project for years to come. This remains true whether building robotics apps professionally or as a hobby.
Using Pixy Visual Sensor
Pixy delivers coordinates of several visual objects with preset color signature straight to the RPi in the format explained here.
There are several ways to use this information. You can find code samples that display object boxes on the screen and samples that make Pixy follow your object using two servo motors. I built the latter but servo control goes beyond the scope of this article.
The source code included with this article is intended for feeding coordinates and size of an object to an autonomous robot. Based on preset object size, the distance to an object captured by Pixy and object angle translated from its coordinates, the RPi could then send signals to the motor drive to approach the object . Therefore, we will only be tracking a single object but you can alter this logic to your needs.
Here is a picture of Pixy attached to Pan and Tilt mechanism mounted on my robot:
Pixy can store up to 7 different color signatures in effect enabling tracking of 7 different objects with unique colors. Since in this application, we are only interested in a single object and because significant changes in lighting has an adverse effect on Pixy's color filtering algorithm, I am using all 7 for training Pixy on the same object under 7 different light conditions.
Prerequisites
You will need the following:
- Pixy camera - you can buy it on Amazon, RobotShop or SparkFun for $69
- Raspberry RPi2 or 3 with a power supply and connection wires
- Visual Studio 2015 or 2017
The attached source code is wrapped into a ready-to-build Visual Studio solution, however, it is not expected to be your first Windows 10 IoT Core project. Those willing to experiment would already have a working project containing Universal Windows Platform (UWP) application built for ARM processor architecture and verified to work on Raspberry Pi. Note that my code assumes a headed application (see comments on the Timer
type).
If you haven't played with RPi and Windows IoT Core yet, then "Hello, Blinky" is a popular first time project. If you don't find it at this link, look it up at https://developer.microsoft.com/en-us/windows/iot/samples.
There are many other examples guiding developers through creating their very first Windows 10 IoT Core application for RPi. For example, check out the following article - Building Your First App for Windows 10 IoT using C#.
Connecting Pixy to Raspberry Pi
I strongly recommend using a ribbon cable for connecting your Pixy as opposed to breadboard jumper wires. It is far more secure when it comes to placing your Pixy on Pan and Tilt mechanism. Uxcell IDC Socket 10 Pins Flat Ribbon Cable works quite well for me.
In my robot, I soldered a header to a prototype board where I created I2C hub along with 5V and ground for all my I2C devices. SDA and SCL are connected to PRi's GPIO 2 and 3 via jumper wires soldered to the prototype board. Power is supplied by a separate NiMH battery with 5V voltage regulator although for playing with just Pixy, you can simply use RPis.
Pixy I2C connection pinouts:
1 | 2 Power |
3 | 4 |
5 SCL | 6 Ground |
7 | 8 |
9 SDA | 10 |
Layered Design
The implementation is split into 3 layers:
- Data Access Layer - receives raw data from data source. This layer hosts
PixyDataReaderI2C
class. - Repository - translates data received from the source into object block entity data model. This is accomplished via
PixyObjectFinder
. - App Logic - finds biggest objects of interest and determines the target object using
CameraTargetFinder
.
In a large system involving many different sensors, I’d segregate these layers into separate projects or, at least, into separate project folders.
Control Flow
The Data Access Layer does the hard work of handling every timer event looking for any matching objects. The App Logic on the other hand is only interested in successful matches when such occur.
Information flows from the bottom layer to the top via 2 callbacks implemented via generic delegates. The following flow sums it up but you'd want to revisit this section once you review the details of each layer. Note that higher level objects do not directly create objects at a lower level but use interfaces instead.
- Camera Target Finder passes
m_dlgtFindTarget
delegate to Pixy Object Finder via fFindTarget
- Pixy Object Finder passes
m_dlgtExtractBlocks
delegate to Pixy Data Reader via fParseBlocks
- Camera Target Finder creates instances of the reader and object finder, then starts Pixy Object Finder
- Pixy Object Finder calls Pixy Data Reader to create a timer and starts listening to the device
- When data has been read, Pixy Data Reader invokes
m_dlgtExtractBlocks
in Pixy Object Finder (via fParseBlocks
) to translate data into color signature objects m_dlgtExtractBlocks
invokes m_dlgrFindTarget
in Camera Target Finder (via fFindTarget
) to extract biggest objects of each color signature and determine the target coordinates.
When used in combination with interfaces, this type of flow decouples our classes from their dependencies so that the dependencies can be replaced or updated with minimal or no changes to our classes' source code. More on this below.
Data Model and Interfaces
Pixy Object Block includes x/y coordinates and its width/height. In addition, we want to keep track of time when it was detected:
public class CameraObjectBlock
{
public int signature = -1;
public int x = -1;
public int y = -1;
public int width = -1;
public int height = -1;
public DateTime dt;
}
Camera Data Reader interface defines a signature for the higher level to decouple from dependency on the Reader implementation. While we have no intention of using other readers here, this leaves room for expansion so if we ever decide to use another Reader
, the higher level logic will not have to change to instantiate a different class because that other Reader
would still conform to the established interface.
Next, we define an interface for Pixy Object Finder. It's a good idea to keep all interfaces together separate from their implementation. That way, you can have a distinct domain consisting of data model and operations in effect showing what functions the application performs and what type of data it is dealing with:
public interface ICameraDataReader
{
void Init(Func<byte[], int> fParseBlocks);
Task Start();
void Stop();
void Listen();
int GetBlockSize();
int GetMaxXPosition();
int GetMaxYPosition();
}
public interface ICameraObjectFinder
{
void Start();
void Stop();
List<CameraObjectBlock> GetVisualObjects();
}
public abstract class CameraDataReader
{
protected CameraDataReader(ILogger lf)
{}
}
public abstract class CameraObjectFinder
{
protected CameraObjectFinder(ICameraDataReader iCameraReader,
Func<List<CameraObjectBlock>, bool> fFindTarget,
ILogger lf)
{ }
}
public interface ILogger
{
void LogError(string s);
}
Two abstract
classes have been created to enforce particular constructor parameters.
Data Access Layer
Pixy processes an image frame every 1/50th of a second. This means that you get a full update of all detected objects’ positions every 20ms (PIXY_INTERVAL_MS = 20
). See http://cmucam.org/projects/cmucam5 for more information.
PixyDataReaderI2C
class implements IPixyDataReader
interface:
public class PixyDataReaderI2C : CameraDataReader, ICameraDataReader
{
private DispatcherTimer m_timerRead = null;
private Windows.Devices.I2c.I2cDevice m_deviceI2cPixy = null;
private Func<byte[], int> m_fParseBlocks = null;
private const int PIXY_INTERVAL_MS = 20;
private const int BLOCK_SIZE_BYTES = 14;
private int m_maxNumberOfExpectedObjects = 50;
public int MaxNumberOfExpectedObjects {
get { return m_maxNumberOfExpectedObjects; }
set { m_maxNumberOfExpectedObjects = value; }
}
public int m_sizeLeadingZeroesBuffer = 100;
public int GetMaxXPosition() { return 400; }
public int GetMaxYPosition() { return 200; }
private ILogger m_lf = null;
public PixyDataReaderI2C(ILogger lf) : base(lf)
{
m_lf = lf;
}
public void Init(Func<byte[], int> fParseBlocks)
{
m_fParseBlocks = fParseBlocks;
}
The data reader takes generic delegate fParseBlocks
to allow invocation of higher level translation method w/out having us alter lower level logic, should the translator ever change.
Since my RPi is communicating with Pixy via I2C, we are first retrieving a device selector from the OS and then using it to enumerate I2C controllers. Finally, using the device settings object, we obtain a handle to our device:
public async Task Start()
{
try
{
string deviceSelector = Windows.Devices.I2c.I2cDevice.GetDeviceSelector();
var devicesI2C = await DeviceInformation.FindAllAsync(deviceSelector).AsTask();
if (devicesI2C == null || devicesI2C.Count == 0)
return;
var settingsPixy = new Windows.Devices.I2c.I2cConnectionSettings(0x54);
settingsPixy.BusSpeed = Windows.Devices.I2c.I2cBusSpeed.FastMode;
m_deviceI2cPixy = await Windows.Devices.I2c.I2cDevice
.FromIdAsync(devicesI2C.First().Id, settingsPixy);
}
catch (Exception ex)
{
m_lf.LogError(ex.Message);
}
}
Next, we are setting up a timer and a hander to read raw data from Pixy into dataArray
and call m_fParseBlocks
to translate it:
public void Listen()
{
if (m_timerRead != null)
m_timerRead.Stop();
m_timerRead = new DispatcherTimer();
m_timerRead.Interval = TimeSpan.FromMilliseconds(PIXY_INTERVAL_MS);
m_timerRead.Tick += TimerRead_Tick;
m_timerRead.Start();
}
private void TimerRead_Tick(object sender, object e)
{
try
{
if (m_deviceI2cPixy == null)
return;
byte[] dataArray = new byte[MaxNumberOfExpectedObjects * BLOCK_SIZE_BYTES
+ m_sizeLeadingZeroesBuffer];
m_deviceI2cPixy.Read(dataArray);
m_fParseBlocks(dataArray);
}
catch (Exception ex)
{
m_lf.LogError(ex.Message);
}
}
Note that instead of timers, we could utilize async
/await
- asynchronous design pattern - to build an alternative Reader
. Such Reader
could have been injected into the flow via Class Factory as explained in the App Logic Layer section.
My code assumes headed application but if you are going to run it within a headless app, then change the timer type from DispatcherTimer
to ThreadPoolTimer
. Please see the corresponding note in the source code.
Repository Layer
Generally speaking, we use a Repository
to separate data retrieval logic from the business or application logic through translating source data into the entity model - data structure utilized by the business logic. This additional encapsulation layer is known as the Repository Pattern. In our use case, the translator processes raw data from the data source to extract visual objects of interest. This is accomplished in PixyObjectFinder
that converts Pixy byte stream into objects with x/y/w/h properties:
public class PixyObjectFinder : CameraObjectFinder, ICameraObjectFinder
{
const UInt16 PixySyncWord = 0xaa55;
const int BlockRetentionSeconds = 3;
private ICameraDataReader m_pixy = null;
private ILogger m_lf = null;
public Object m_lockPixy = new Object();
private Func<List<CameraObjectBlock>, bool> m_fFindTarget;
private Func<byte[], int> m_dlgtExtractBlocks;
private List<CameraObjectBlock> m_pixyObjectBlocks = new List<CameraObjectBlock>();
public List<CameraObjectBlock> GetVisualObjects() { return m_pixyObjectBlocks; }
Pixy Object Finder
PixyObjectFinder
translates the buffer from the Pixy object blocks format into our entity model of detected objects so that the App Logic would only deal with its own format and remain agnostic to an underlying source.
PixyObjectFinder
users the Start
method for initializing Pixy and launching its timer within the data access layer.
public void Start()
{
m_pixy.Init(m_dlgtExtractBlocks);
Task.Run(async () => await m_pixy.Start());
m_pixy.Listen();
}
The translation is essentially implemented in m_dlgtExtractBlocks
that is being passed to Pixy data reader as a parameter via m_pixy.Init(m_dlgtExtractBlocks)
.
public PixyObjectFinder(ICameraDataReader ipixy,
Func<List<CameraObjectBlock>, bool> fFindTarget,
ILogger lf) : base(ipixy, fFindTarget, lf)
{
m_pixy = ipixy;
m_fFindTarget = fFindTarget;
m_lf = lf;
m_dlgtExtractBlocks = delegate (byte[] byteBuffer)
{
lock (m_lockPixy)
{
if (byteBuffer == null || byteBuffer.Length == 0)
return 0;
try
{
int blockSize = ipixy.GetBlockSize();
int lengthWords = 0;
int[] wordBuffer = ConvertByteArrayToWords(byteBuffer, ref lengthWords);
if (wordBuffer == null)
return 0;
List<int> blockStartingMarkers = Enumerable.Range(0, wordBuffer.Length)
.Where(i => wordBuffer[i] == PixySyncWord)
.ToList<int>();
m_pixyObjectBlocks=m_pixyObjectBlocks.SkipWhile
(p => ((TimeSpan)(DateTime.Now - p.dt))
.Seconds > BlockRetentionSeconds).ToList();
blockStartingMarkers.ForEach(blockStart =>
{
if (blockStart < lengthWords - blockSize / 2)
m_pixyObjectBlocks.Add(new CameraObjectBlock()
{
signature = wordBuffer[blockStart + 2],
x = wordBuffer[blockStart + 3],
y = wordBuffer[blockStart + 4],
width = wordBuffer[blockStart + 5],
height = wordBuffer[blockStart + 6],
dt = DateTime.Now
});
});
m_fFindTarget(m_pixyObjectBlocks);
m_pixyObjectBlocks.Clear();
}
catch (Exception e)
{
m_lf.LogError(e.Message);
}
}
return m_pixyObjectBlocks.Count;
};
}
PixyObjectFinder
is taking fFindTarget
generic delegate to invoke higher level processor which converts detected objects to target coordinates.
m_pixyObjectBlocks
array contains detected objects. The conversion follows Pixy stream format specified in the above code snippet.
For more details on Pixy data format, see Pixy Serial Protocol. Note that Serial and I2C deliver Pixy data in the same stream format.
In addition, I am accumulating blocks in the array beyond a single read operation to smooth target detection over a longer time period, i.e., longer that 20 ms. It is being done by SkipWhile
dropping objects older than BlockRetentionSeconds
.
Parsing the Data Stream
The above finder method must first convert the input stream of bytes into 16-bit words and place them in an array of integers. It is that array where we are going to find x, y, width and height of the detected objects.
ConvertByteArrayToWords
- PixyObjectFinder
private
method - converts the byte stream received from Pixy I2C device into 16-bit words:
private int[] ConvertByteArrayToWords(byte[] byteBuffer, ref int lengthWords)
{
try
{
byteBuffer = byteBuffer.SkipWhile(s => s == 0).ToArray();
if (byteBuffer.Length == 0)
return new int[0];
int length = byteBuffer.Length;
lengthWords = length / 2 + 1;
int[] wordBuffer = new int[lengthWords];
int ndxWord = 0;
for (int i = 0; i < length - 1; i += 2)
{
if (byteBuffer[i] == 0 && byteBuffer[i + 1] == 0)
continue;
int word = ((int)(byteBuffer[i + 1])) << 8 | ((int)byteBuffer[i]);
if (word == PixySyncWord && ndxWord > 0 && PixySyncWord == wordBuffer[ndxWord - 1])
wordBuffer[ndxWord - 1] = 0;
wordBuffer[ndxWord++] = word;
}
if (ndxWord == 0)
return null;
return wordBuffer;
}
catch (Exception e)
{
m_lf.LogError(e.Message);
return null;
}
}
As you can see, I had to tweak the parser to skip potential leading zeroes and duplicate sync words. If you are using RPi 3 and/or a newer UWP Tools SDK, you may not have to deal with these issues.
Here is an example of a single object block byte sequence received from Pixy into the byte buffer:
00-00-00-00-55-AA-55-AA-BB-01-01-00-3D-01-73-00-04-00-06-00-00-00-
You should be able to review the byte buffer in your debugger via BitConverter.ToString(byteBuffer)
.
App Logic Layer
The Target Finder determines the target based on the selected objects provided by the Repository layer. It is here in this layer that we apply a creation design pattern called Factory
to create and retain instances of lower level objects.
Class Factory
This pattern helps decouple our classes from being responsible for locating and managing the lifetime of dependencies. Note how our class factory only exposes interfaces while calling constructors internally. Both the Data Reader and the Object Finder are created and stored here. We instantiate them using constructor dependency injection which gives us flexibility of dropping in other implementations of readers and finders by creating them in the Class Factory.
By using a Factory
, we apply the principle of Inversion of Control which replaces direct dependencies between objects with dependencies on abstraction, i.e., interfaces. While this concept goes way beyond my example, quite often, a simple class factory is all you need.
The Create
function is passing in the method for calculating the target which is accomplished via delegate Func<List<CameraObjectBlock>, bool> fFindTarget
public class MyDeviceClassFactory
{
private ICameraDataReader m_cameraDataReaderI2C = null;
private ICameraObjectFinder m_cameraObjectFinder = null;
private ILogger m_lf = new LoggingFacility();
public ILogger LF { get { return m_lf; } }
public void Create(Func<List<CameraObjectBlock>, bool> fFindTarget)
{
if (m_cameraObjectFinder != null)
return;
m_cameraDataReaderI2C = new PixyDataReaderI2C(m_lf);
m_cameraObjectFinder = new PixyObjectFinder(m_cameraDataReaderI2C, fFindTarget, m_lf);
}
public ICameraDataReader CameraDataReader { get { return m_cameraDataReaderI2C; } }
public ICameraObjectFinder CameraObjectFinder { get { return m_cameraObjectFinder; } }
}
Target Finder
At the top of this project is CameraTargetFinder
class that filters the pre-selected objects looking for a single object - the target. It ignores objects with an area smaller than minAcceptableAreaPixels
, orders the remaining objects by size and takes one from the top. It can potentially apply other filters. Finally, it calls SetTargetPosition
with the target position and size in pixels.
public class CameraTargetFinder
{
private const int minAcceptableAreaPixels = 400;
private MyDeviceClassFactory cf = new MyDeviceClassFactory();
private Func<List<CameraObjectBlock>, bool> m_dlgtFindTarget;
private Action<int, int, int, int> m_fSetTarget;
public CameraTargetFinder(Action<int, int, int, int> fSetTarget)
{
m_dlgtFindTarget = delegate (List<CameraObjectBlock> objectsInView)
{
try
{
if (objectsInView.Count == 0)
return false;
objectsInView = GetBiggestObjects(objectsInView);
CameraObjectBlock biggestMatch = (from o in objectsInView
where o.width * o.height > minAcceptableAreaPixels
select o)
.OrderByDescending(s => s.width * s.height)
.FirstOrDefault();
if (biggestMatch == null || biggestMatch.signature < 0)
return false;
m_fSetTarget(biggestMatch.x, f.CameraDataReader.GetMaxYPosition() - biggestMatch.y,
biggestMatch.width, biggestMatch.height);
return true;
}
catch (Exception e)
{
cf.LF.LogError(e.Message);
return false;
}
};
m_fSetTarget = fSetTarget;
}
The resulting visual object list often contains a lot of false positives, i.e., tiny objects with the same color signature as the desired target. Besides making adjustments to improve accuracy, we drop them by calling GetBiggestObjects()
to only retain the max size objects for each color signature. This method first groups them by color signature, then finds the maximum size within each and returns these objects only.
private List<CameraObjectBlock> GetBiggestObjects(List<CameraObjectBlock> objectsInView)
{
List<CameraObjectBlock> bestMatches = (from o in objectsInView
group o by o.signature into grpSignatures
let biggest = grpSignatures.Max(t => t.height * t.width)
select grpSignatures.First(p => p.height * p.width == biggest))
.ToList();
return bestMatches;
}
GetBiggestObjects
method is a great example of using LINQ for processing data in robotic apps. Note how compact and clean the query code is comparing to nested loops often found in robotics sample code. Python developers would want to comment here that the power of integrated queries is available to them too albeit with different syntax/predicates.
The App Logic starts the camera and initiates target tracking via the StartCamera
method:
public void StartCamera()
{
try
{
cf.Create(m_dlgtFindTarget);
cf.CameraObjectFinder.Start();
}
catch (Exception e)
{
cf.LF.LogError(e.Message);
throw e;
}
}
Using the Code in Your Project
First off, you have to teach Pixy an object.
Next, create an instance of PixyTagetFinder
passing in a handler for processing target coordinates. Here is an example:
public class BizLogic
{
CameraTargetFinder ctf = new CameraTargetFinder(delegate (int x, int y, int w, int h)
{
Debug.WriteLine("x: {0}, y: {1}, w: {2}, h: {3}", x, y, w, h);
});
public void Test()
{
ctf.StartCamera();
}
}
If you know the actual size of your target object, you can convert height and width to the distance-to-target while converting x and y to angles between the camera and the target so that your controller could turn servo motors accordingly to always point the camera to the target.
In order to run my source code, you could simply add PixyCamera.cs file to your project and - for testing the code - work the above sample into the MainPage
function of your application.
If you'd rather use the attached solution, then set the target platform to ARM in the Visual Studio, build it, deploy to RPi and run in the debug mode. Once Pixy camera initializes, bring your preset object in front of the camera. When Pixy detects the object, its LED indicator will light up and object positioning data will appear in the Visual Studio Output window, for example:
Conclusion
Tracking an object using Pixy and the familiar Visual Studio environment is a very rewarding project, especially when it runs in an autonomous system on a small computer like RPi. It's even more fun when the underlying program is well organized and follows design patterns recognized by other developers. It's worth our time to properly structure and continuously refactor a solution keeping up with project growth.
Feel free to use the code in your personal or commercial projects. It has been well tested by my 20-pound 6-wheeler being guided by Pixy.
Useful Resources