Here we give very short instructions on how to use pretrained MobileSSD model to detect objects. We then provide and explain Python code for detecting animals on video using the SSD model. Finally we demonstrate the detection result on a video file for some animal type (that the SSD model is able to detect).
Introduction
Unruly wildlife can be a pain for businesses and homeowners alike. Animals like deer, moose, and even cats can cause damage to gardens, crops, and property.
In this article series, we’ll demonstrate how to detect pests (such as a moose) in real time (or near-real time) on a Raspberry Pi and then take action to get rid of the pest. Since we don’t want to cause any harm, we’ll focus on scaring the pest away by playing a loud noise.
You are welcome to download the source code of the project. We are assuming that you are familiar with Python and have a basic understanding of how neural networks work.
In the previous article in the series, we compared two DNN types we can use to detect pests: detectors and classifiers. The detectors won. In this article, we’ll develop Python code for detecting pests using a pre-trained detection DNN.
Selecting Network Architecture
There are several common network architectures for object detection, such as Faster-RCNN, Single-Shot Detector (SSD), and You Only Look Once (YOLO).
Since our network needs to run on an edge device that has limited memory and CPU, we’re going to use the MobileNet Single Shot Detector (SSD) architecture. MobileNet SSD is a lightweight object detector network that performs well on mobile and edge devices. It was trained on the Pascal VOC 2012 dataset, which contains some classes that may represent pests, such as cat, cow, dog, horse, and sheep.
We’ll use the same algorithm for pest detection on video as the algorithm used for human detection in this prior article series.
Code for Pest Detection
First, we need to modify the MobileNet code to make it detect pests.
Let’s start by creating some utility classes to make this task easier:
import cv2
import numpy as np
import os
class CaffeModelLoader:
@staticmethod
def load(proto, model):
net = cv2.dnn.readNetFromCaffe(proto, model)
return net
class FrameProcessor:
def __init__(self, size, scale, mean):
self.size = size
self.scale = scale
self.mean = mean
def get_blob(self, frame):
img = frame
(h, w, c) = frame.shape
if w>h :
dx = int((w-h)/2)
img = frame[0:h, dx:dx+h]
resized = cv2.resize(img, (self.size, self.size), cv2.INTER_AREA)
blob = cv2.dnn.blobFromImage(resized, self.scale, (self.size, self.size), self.mean, False, False)
return blob
class Utils:
@staticmethod
def draw_object(obj, label, color, frame):
(confidence, (x1, y1, w, h)) = obj
x2 = x1+w
y2 = y1+h
cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
y3 = y1-12
text = label + " " + str(confidence)+"%"
cv2.putText(frame, text, (x1, y3), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 1, cv2.LINE_AA)
@staticmethod
def draw_objects(objects, label, color, frame):
for (i, obj) in enumerate(objects):
Utils.draw_object(obj, label, color, frame)
The CaffeModelLoader
class loads a Caffe model from disk using the provided paths for prototype and model files.
The next utility class, FrameProcessor
, converts frames to blobs (specially structured data used as CNN input).
Finally, the Utils
class draws bounding rectangles around any objects detected in a frame. Most of the methods our utility classes use come from the Python version of the OpenCV library. Let’s look at these in detail.
That’s it for our utility classes. Next, we’ll write code that actually detects pests.
Well start with the SSD
class, which detects objects of a specified class in a frame:
class SSD:
def __init__(self, frame_proc, ssd_net):
self.proc = frame_proc
self.net = ssd_net
def detect(self, frame):
blob = self.proc.get_blob(frame)
self.net.setInput(blob)
detections = self.net.forward()
k = detections.shape[2]
obj_data = []
for i in np.arange(0, k):
obj = detections[0, 0, i, :]
obj_data.append(obj)
return obj_data
def get_object(self, frame, data):
confidence = int(data[2]*100.0)
(h, w, c) = frame.shape
r_x = int(data[3]*h)
r_y = int(data[4]*h)
r_w = int((data[5]-data[3])*h)
r_h = int((data[6]-data[4])*h)
if w>h :
dx = int((w-h)/2)
r_x = r_x+dx
obj_rect = (r_x, r_y, r_w, r_h)
return (confidence, obj_rect)
def get_objects(self, frame, obj_data, class_num, min_confidence):
objects = []
for (i, data) in enumerate(obj_data):
obj_class = int(data[1])
obj_confidence = data[2]
if obj_class==class_num and obj_confidence>=min_confidence :
obj = self.get_object(frame, data)
objects.append(obj)
return objects
The key methods in the class are detect
and get_objects
.
The detect
method applies the loaded DNN model to each frame to detect objects of all possible classes.
The get_objects
method looks at the detected objects and selects only those that both belong to the specified class and have a high probability of being correctly detected (confidence).
Then, we’ll the VideoSSD
class, which runs pest detection on an entire video clip:
class VideoSSD:
def __init__(self, ssd):
self.ssd = ssd
def detect(self, video, class_num, min_confidence, class_name):
detection_num = 0;
capture = cv2.VideoCapture(video)
img = None
dname = 'Pest detections'
cv2.namedWindow(dname, cv2.WINDOW_NORMAL)
cv2.resizeWindow(dname, 1280, 960)
while(True):
(ret, frame) = capture.read()
if frame is None:
break
obj_data = self.ssd.detect(frame)
class_objects = self.ssd.get_objects(frame, obj_data, class_num, min_confidence)
p_count = len(class_objects)
detection_num += p_count
if len(class_objects)>0:
Utils.draw_objects(class_objects, class_name, (0, 0, 255), frame)
cv2.imshow(dname,frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
capture.release()
cv2.destroyAllWindows()
return detection_num
The only method in the class is detect
. It processes all the frames extracted from a video file. In each frame, it detects all objects of the class specified by the class_num
parameter and then displays the frame with bounding rectangles around the objects it detected.
Does It Work?
Let’s launch our code and see how it handles a video file. The following code loads a video file and tries to detect dogs:
proto_file = r"C:\PI_PEST\net\mobilenet.prototxt"
model_file = r"C:\PI_PEST\net\mobilenet.caffemodel"
ssd_net = CaffeModelLoader.load(proto_file, model_file)
mobile_proc_frame_size = 300
ssd_proc = FrameProcessor(mobile_proc_frame_size, 1.0/127.5, 127.5)
pest_class = 12
pest_name = "DOG"
ssd = SSD(ssd_proc, ssd_net)
video_file = r"C:\PI_PEST\video\dog_1.mp4"
video_ssd = VideoSSD(ssd)
detections = video_ssd.detect(video_file, pest_class, 0.2, pest_name)
We set the value of pest_class
to 12 because "dog" is the 12th class in the MobileNet SSD model. Here is the video captured while running the above code.
Will It Work on an Edge Device?
As you can see, our SSD detector successfully detected dogs in the video when run on a PC. What about an edge device? Will the detector process the feed fast enough to detect objects in real-time? We can find out by testing the frame rate, measured in frames per second (FPS).
In the article we’d quoted before, the model we borrowed ran at about 1.25 FPS on a Raspberry Pi 3 device. Is that enough to detect pests? We can assume that, on average, an animal would be captured on camera for at least 2 to 3 seconds. That means we’ll have 2 to 3 frames to detect a pest and react to itk. Sounds like decent odds.
Next Steps
So far, the results aren’t very promising for wildlife detection... But let’s not give up!
In the next article, we’ll talk about some ideas for detecting "exotic" pests, such as moose and armadillos.