Here we'll provide a very short introduction to the data augmentation technique. Then we select appropriate methods of augmentation suitable for our dataset. Then we provide and explain the Python code for implementing the augmentation algorithms. And finally, we demonstrate the resulting image examples.
Introduction
Unruly wildlife can be a pain for businesses and homeowners alike. Animals like deer, moose, and even cats can cause damage to gardens, crops, and property.
In this article series, we’ll demonstrate how to detect pests (such as a moose) in real time (or near-real time) on a Raspberry Pi and then take action to get rid of the pest. Since we don’t want to cause any harm, we’ll focus on scaring the pest away by playing a loud noise.
You are welcome to download the source code of the project. We are assuming that you are familiar with Python and have a basic understanding of how neural networks work.
In the previous article, we prepared the initial version of our moose classification dataset. It included 200 samples for the moose class and 284 samples for the background class. Rather small, right? In this article, we’ll demonstrate how to enhance our dataset without gathering new images.
Data Augmentation
Data augmentation provides a way to derive new samples from existing images using various image modifications. The most common of these are geometric transformations, color space transformations, data noising, and image filtering. When applying data augmentation algorithms, we should ensure that the image modification is suitable for our dataset, and that it does not harm the samples.
Let’s consider our moose dataset. Which of the modification types would work for it?
Color space transformations and data noising won’t do. Color is an important feature for animal classification, so we should not change the color space of our sample images. Data noising is an effective technique for datasets containing handwritten digits or symbols. We don’t expect this to be useful for images of moose in their natural habitat.
This leaves us with geometric transformations and image filtering. We’ll consider these in detail once while we implement them in Python.
Geometric Transformations
Quite a few geometric transformations could be used for image data augmentation: Flipping, rotation, zooming, cropping, translation, and so on.
An obvious and very useful transformation is the horizontal flip, which mirrors an image relative to the vertical axis.
Another popular geometric transformation is rotation, which emulates viewing the sample from the different angles. Note that this transformation should apply only modest rotation angles to preserve the object shape.
We can easily implement geometric transformations using the Python OpenCV package:
class FlipProcessor:
def name(self):
return "flip"
def process(self, image):
return cv2.flip(image, 1)
class RotateProcessor:
def __init__(self, angle, scale):
self.angle = angle
self.scale = scale
def name(self):
if self.angle>0:
sa = str(self.angle)
else:
sa = "_"+str(-self.angle)
return "rotate"+sa
def process(self, image):
(h, w, c) = image.shape
center = (w/2, h/2)
rmat = cv2.getRotationMatrix2D(center, self.angle, self.scale)
rotated = cv2.warpAffine(image, rmat, (w, h))
return rotated
Image Filtering
Image filtering is a useful but unintuitive type of data augmentation. It’s hard to estimate which filters will produce the most useful data sample from an existing image.
Our dataset consists of frames extracted from a video of wildlife, so it is fair to assume that some frames are clear and some are blurred. So sharpening and smoothing the frames can improve and diversify our dataset. Here is how this looks in Python code:
class SmoothProcessor:
def __init__(self, filter_size):
self.filter_size = filter_size
def name(self):
return "smooth"+str(self.filter_size)
def process(self, image):
smoothed = cv2.GaussianBlur(image,(self.filter_size, self.filter_size),0)
return smoothed
class SharpenProcessor:
def __init__(self, filter_size):
self.filter_size = filter_size
def name(self):
return "sharpen"+str(self.filter_size)
def process(self, image):
sharpen = cv2.bilateralFilter(image, self.filter_size, 150, 150);
return sharpen
One more transformation we need just copies the original image (we’ll explain the need for this later):
class OriginalProcessor:
def name(self):
return ""
def process(self, image):
return image
And Now All the Tricks in a Single Algorithm
Now we have all the transformations we’re going to use. Here is the algorithm that will automatically pull all the images we have through the transformations:
class Augmentor:
def __init__(self, processors):
self.processors = processors
def generate(self, source, dest):
files = FileUtils.get_files(source)
if not os.path.isdir(dest):
os.mkdir(dest)
for (i, fname) in enumerate(files):
img = cv2.imread(fname)
for (j, proc) in enumerate(self.processors):
p_img = proc.process(img)
f = os.path.basename(fname)
f = os.path.splitext(f)[0]
f = f + proc.name() + ".png"
dfname = os.path.join(dest, f)
cv2.imwrite(dfname, p_img)
The generate
method iterates over the files in the source folder, applies all the specified transformations to them, and saves the modified images to the destination directory with the new name (the original name with the processor name concatenated).
We can apply the augmentation algorithm to our sample images with the following code:
folder = r"C:\PI_PEST\moose"
gen_folder = r"C:\PI_PEST\moose_gen"
processors = [OriginalProcessor(), FlipProcessor()]
gen = Augmentor(processors)
gen.generate(folder, gen_folder)
processors = [SmoothProcessor(5), SharpenProcessor(5)]
gen = Augmentor(processors)
gen.generate(gen_folder, gen_folder)
processors = [RotateProcessor(15, 1.2), RotateProcessor(-15, 1.2)]
gen = Augmentator(processors)
gen.generate(gen_folder, gen_folder)
Note how we implemented augmentation in three sequential steps.
First, we applied the flip and the "original" processor. Now we see why OriginalProcessor
just copies the original image as-is: After this step, we want copies of both the original and flipped images into the moose_gen folder.
In the second step, we applied the smoothing and sharpening processors to all the images generated in the first step.
In the last step, we applied two transformations (rotation by 15 degrees clockwise and counterclockwise) to all the images generated in the two previous steps. The resulting images for a sample are shown below:
Next Steps
How many samples do we have now? We have 18 times more!
We have 3600 samples for the moose class and 5112 samples for the background class. For our purposes, a data set of this size will provide acceptable moose detection results.
But if you were developing a commercial pest detector, you’d want a much larger data set. A common theme you’ll encounter when working on AI projects is that acquiring good data is the most difficult task.
There are well-known, battle-tested DNN architectures for many common types of problems, but acquiring a large set of images, cleaning them, and augmenting them can take days, weeks, or even months depending on the size of the problem you are solving.
In the next article, we’ll discuss training our DNN classifier with the augmented dataset.