How to Train a Custom YOLOv5 Model to Detect Objects

Matthew Dennis

4.86/5 (6 votes)

25 Nov 2022CPOL13 min read

23.7K

563

In this article, I will discuss what I experienced while creating a custom model for the detection of backyard pests. You can download the resulting critters.pt file.

This article outlines the process of creating a custom model for object detection using YOLOv5 architecture. It covers setting up the training environment, obtaining a large annotated dataset, training the model, and using the custom model in CodeProject.AI Server. The article presents observations and improvements to achieve higher accuracy in object detection.

Introduction

Object Detection is a common application of Artificial Intelligence. In CodeProject.AI Server, we have added a module that uses the YOLOv5 architecture for object detection. It’s a brilliant system but using the 'standard’ YOLOv5 models means you are limited to the 80 classes available in the default models. To detect objects outside this default set, you need to train your own custom models.

While the YOLOv5 documentation walks you through the process of creating a new model, setting up a training session and producing good and accurate models is not as simple as they would lead you to believe.

This article will walk you through creating a custom model for the detection of backyard pests using the following steps:

Setting up your training environment.
Obtaining a sufficiently large and annotated collection of relevant and diverse images for training, testing, and validation of the model.
Training your model, including critical steps in enabling GPU acceleration
Using your custom model in CodeProject.AI Server

Some of my observations about the training process were unexpected, and I plan on expanding this article as I explore and improve the process and the resulting models.

There are a number of terms and metrics that are used to describe the performance of an AI Model. I will be using these to determine how 'good' a trained model is for the task for which it was trained. You can read about them in the Notes on Training Metrics.

Setting Up Your Training Environment

Hardware

The system I am using for my AI development and testing has the following specs:

Spec	Value
CPU	12th Gen Intel(R) Core(TM) i5-12400 2.50 GHz
RAM	16 GB
GPU	NVidia GeForce RTX 3060
GPU RAM	12GB
Disk	1 TB SSD
OS	Windows 11

Development IDE

I am using Visual Studio Code as my development IDE as it runs on both Windows and Linux. I have this configured for Python development and am using a Python Jupyter Notebook to execute and record results. I am running Python 3.9.13.

Setup Project Folder

For my project, I created a directory c:\Dev\YoloV5_Training and opened it in Visual Studio Code. I then created a Python Notebook named Custom Model Training.ipynb. Having got this out of the way, let's get things going.

Setup a Python Virtual Environment

In order to not pollute the global Python installation, I will be creating a Virtual Environment for this project using the Python venv command. This virtual environment will be created in the venv sub-directory.

Run the following command only once.

Python

!Python -m venv venv

Having done this, you need to tell Visual Studio Code that you want to use the virtual environment you just created. You do this by clicking on the Python Environment selector in the top-right of the Notebook document header.

Setup Tools for Dataset Creation

The most important part of training a custom YOLOv5, or any AI model, is obtaining a sufficiently large and varied set of annotated data with which to train the model. Fortunately, there are several dataset repositories available. One of these is Google's Open Images. This repository contains, for object detection, 16M bounding boxes for 600 object classes on 1.9M images.

In order to manage the selection of images and the creation of datasets from this repository, Google has worked with Voxel51 to integrate Open Images into their FiftyOne "open-source tool for building high-quality datasets and computer vision models." This tool includes both a Python package and UI for managing and viewing image datasets from a variety of sources, including Open Images.

In the attached Jupyter notebook you'll see, at the top, the following:

Python

# install the FiftyOne image dataset management tool
%pip install –upgrade pip
%pip install fiftyone

Setup the Model Training Tools

We will be using the tools supplied with the Ultralytics YOLOv5 GitHub Repository for training and validation of our models. We will clone a copy of this repository into our directory in the yolov5 subdirectory. Execute the following command:

Python

#clone YOLO v5 and install dependencies
!git clone https://github.com/ultralytics/yolov5

Setup the Dependencies

Did you set up the Virtual Environment for the notebook?

It is important that you set the Virtual Environment for the workbook so that you don't install packages into the Python global packages, potentially polluting other systems that rely on the global package store. In the upper-right of the of the Notebook, you can set the Virtual Environment used by Visual Studio Code to run Python from the Notebook. Select the Virtual Environment venv that was created earlier.

Then you can run one of the two following cells, depending on whether you have an Nvidia GPU or not.

Nvidia

# if running on a system with GPU 

%pip install --upgrade pip 

%pip install -r requirements-gpu.txt 

%pip install ipywidgets

CPU

# if running on a system without GPU 

%pip install --upgrade pip 

%pip install -r requirements-cpu.txt 

%pip install ipywidgets

Training a Model with a Small Dataset

Download a Subset from Open Images

Training a model from a large dataset with thousands or tens of thousands of images will take hours or days.

In order to evaluate the process of creating a dataset and training it, we will:

create a dataset with a maximum of 1000 images each for Raccoons, Dogs, Cats, Squirrels, and Skunks each
export the dataset to YOLOv5 format
train the dataset and evaluate its performance metrics.

The intent of this dataset is mainly to detect if our friendly Trash Pandas are invading our garbage containers. We have added other classes than Raccoons to help ensure that the resulting model does not end up thinking that anything with four legs is a Raccoon.

Running the following script will download a subset of the Open Images dataset into our /users/<username>/.FiftyOne directory. In addition, it will create a number of subsets or the data and store this information in a MongoDb database in our /users/<username>/FiftyOne directory.

Python

import fiftyone as fo
import fiftyone.zoo as foz

splits = ["train", "validation", "test"]
numSamples = 1000
seed = 42

# Get 1000 images (maybe in total, maybe of each split) from fiftyone. 
# We'll ask FiftyOne to use images from the open-images-v6 dataset 
# and store information of this download in the
# dataset named "open-imges-critters". 

# The data that's downloaded will include the images, annotations, and a summary of what's
# been downloaded. That summary will be stored /Users/<username>/.FiftyOne in a mongoDB. 
# The images / annoations will be un /Users/<username>/FiftyOne.

if fo.dataset_exists("open-images-critters"):
    fo.delete_dataset("open-images-critters")

dataset = foz.load_zoo_dataset(
    "open-images-v6",
    splits=splits,
    label_types=["detections"],
    classes="Raccoon",
    max_samples=numSamples,
    seed=seed,
    shuffle=True,
    dataset_name="open-images-critters")

# Take a quick peek to see what's there
print(dataset)


# Do the same for cats, dogs, squirrels, and skunks, but after each download
# we'll merge the new downloaded dataset with the existing open-images-critters
# dataset so we can build up one large, multi-class set

if fo.dataset_exists("open-images-cats"):
    fo.delete_dataset("open-images-cats")

cats_dataset = foz.load_zoo_dataset(
    "open-images-v6",
    splits=splits,
    label_types=["detections"],
    classes="Cat",
    max_samples=numSamples,
    seed=seed,
    shuffle=True,
    dataset_name="open-images-cats")

# Now merge this new set with the existing open-images-critters set
dataset.merge_samples(cats_dataset)

if fo.dataset_exists("open-images-dogs"):
    fo.delete_dataset("open-images-dogs")

dogs_dataset = foz.load_zoo_dataset(
    "open-images-v6",
    splits=splits,
    label_types=["detections"],
    classes="Dog",
    max_samples=numSamples,
    seed=seed,
    shuffle=True,
    dataset_name="open-images-dogs")

dataset.merge_samples(dogs_dataset)

if fo.dataset_exists("open-images-squirrels"):
    fo.delete_dataset("open-images-squirrels")

squirrels_dataset = foz.load_zoo_dataset(
    "open-images-v6",
    splits=splits,
    label_types=["detections"],
    classes="Squirrel",
    max_samples=numSamples,
    seed=seed,
    shuffle=True,
    dataset_name="open-images-squirrels")

dataset.merge_samples(squirrels_dataset)

if fo.dataset_exists("open-images-skunks"):
    fo.delete_dataset("open-images-skunks")

skunks_dataset = foz.load_zoo_dataset(
    "open-images-v6",
    splits=splits,
    label_types=["detections"],
    classes="Skunk",
    max_samples=numSamples,
    seed=seed,
    shuffle=True,
    dataset_name="open-images-skunks")
    
dataset.merge_samples(skunks_dataset)

# For whenever you want to see what's been loaded.
print(fo.list_datasets())

# uncomment the following line if you wish to explore the resulting datasets in the FiftyOne UI
# session = fo.launch_app(dataset, port=5151)

Export the Dataset to YOLOv5 Format

Before we can train the model using YOLOv5, we need to export the open-images-critters dataset to the correct format.

Fortunately, FiftyOne supplies the tools to perform this conversion. The following code will export the dataset to the datasets\critters subfolder. This will have the following structure:

Python

import fiftyone as fo

export_dir = "datasets/critters"
label_field = "detections"  # for example

# The splits to export
splits = ["train", "validation","test"]

# All splits must use the same classes list
classes = ["Raccoon", "Cat", "Dog", "Squirrel", "Skunk"]

# The dataset or view to export
# We assume the dataset uses sample tags to encode the splits to export
dataset_or_view = fo.load_dataset("open-images-critters")

# Export the splits
for split in splits:
    split_view = dataset_or_view.match_tags(split)
    split_view.export(
        export_dir=export_dir,
        dataset_type=fo.types.YOLOv5Dataset,
        label_field=label_field,
        split=split,
        classes=classes,
    )

Correct the dataset.yml file

The file datasets\critters\dataset.yaml is created during this process.

YAML

names:
- Raccoon
- Cat
- Dog
- Squirrel
- Skunk
nc: 5
path: c:\Dev\YoloV5_Training\datasets\critters
test: .\images\test\
train: .\images\train\
validation: .\images\validation\

The training code expects the label val rather than validation. Update the file to:

YAML

names: 
- Raccoon 
- Cat 
- Dog 
- Squirrel 
- Skunk 
nc: 5 
path: c:\Dev\YoloV5_Training\datasets\critters 
test: .\images\test\ 
train: .\images\train\ 
val: .\images\validation\

We are now ready to train the model.

Train the Small Dataset

To ensure that our process is correct, we will train a model with a small number of epochs (iterations). We will train using the standard yolov5s.pt model as a starting point with 50 epochs. The training results will be stored in train/critters/epochs50. In this directory, you will also find the resulting weights (model) as well as graphs and tables detailing the process and the resulting performance metrics.

Due to memory constraints, we had to reduce the batch size to 32. Make sure you have shut down any memory hogging applications such as Docker, otherwise, your training may stop with no warning.

Run the following script to do the training. This took 51 minutes on my machine.

Python

!python yolov5/train.py --batch 32 --weights
yolov5s.pt --data datasets/critters/dataset.yaml --project train/critters
--name epochs50 --epochs 50

Once this is complete, the train/critters.epochs50 directory will contain a number of interesting files.

results.csv which contains performance information for each epoch
several PNG files with graphs for various performance metrics. The PR_curve.png and results.png are of particular interest.
- The PR_Curve.png shows the Precision vs. Recall curve for the best model.
- The results.png shows the values of the training metrics for each epoch of the training.

Precision/Recall Curve

Training Metrics

The resulting best.pt has a mAP@50 of 0.757 which is fairly good, but the low mAP@[0.5:0.95] 0f 0.55 indicates that will miss detecting some objects.

So let's verify that the model train\critters\epochs50\weights\best.pt actually will detect a Raccoon in an image. I will use the image datasets\critters\images\validation\8fbdeff053852ee7.jpg for this. You may wish to use a different image.

To get an idea of performance, run the detection twice. The first run has some setup resulting in longer inferencing times, so the second inference timing will be more representative.

Awww…who’s a cutey?

Python

import torch
model2 = torch.hub.load('ultralytics/yolov5', 'custom', 'train/critters/epochs50/weights/best.pt', device="0")
#print(model)
result = model2("datasets/critters/images/validation/8fbdeff053852ee7.jpg")
result.print()

Using cache found in C:\Users\matth/.cache\torch\hub\ultralytics_yolov5_master
YOLOv5  2022-11-8 Python-3.9.13 torch-1.13.0+cu117 CUDA:0 (NVIDIA GeForce RTX 3060, 12288MiB)

Fusing layers... 
Model summary: 157 layers, 7023610 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape... 
image 1/1: 768x1024 1 Raccoon
Speed: 13.0ms pre-process, 71.0ms inference, 4.0ms NMS per image at shape (1, 3, 480, 640)

Improve the Model

Considering that the standard yolov5s.pt model has an m_AP@50 of 0.568, our model's 0.757 appears to be impressive. However, previous runs of this process have shown that the generated model will miss some objects, particularly if they are small.

Improving the model's performance, according to the Ultralytics YOLOv5 documentation, should be done in two ways:

increase the number of epochs to at least 300
increase the number of images in the training set

We will do this in two stages so that we can see the effect each change impacts the performance metrics.

First, we will train with 300 epochs. In addition, we will use the model we just trained as a starting point. No sense wasting that work.

This will take several hours (5 hours on my machine), so you might want to do this in the evening.

Python

!python yolov5/train.py --batch 32 --weights
train/critters/epochs50/weights/best.pt --data datasets/critters/dataset.yaml
--project train/critters --name epochs300 -

Once this is complete, the train/critters/epochs300 folder will contain the results of the training.

The performance can be seen in the PR_curve.png and results.png graphs.

Precision/Recall Curve

Training Metrics

The new value for the m_AP@50 is 0.777 which is an improvement of 2.6%. This improved mAP@[0.5:0.95] of 0.62 shows that the model will miss fewer objects.

Again, let's verify that it works.

Python

import torch
model2 = torch.hub.load('ultralytics/yolov5', 'custom', 'train/critters/epochs300/weights/best.pt', device="0")
#print(model)
result = model2("datasets/critters/images/validation/8fbdeff053852ee7.jpg")
result.print()

Using cache found in C:\Users\matth/.cache\torch\hub\ultralytics_yolov5_master
YOLOv5  2022-11-8 Python-3.9.13 torch-1.13.0+cu117 CUDA:0 (NVIDIA GeForce RTX 3060, 12288MiB)

Fusing layers... 
Model summary: 157 layers, 7023610 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape... 
image 1/1: 768x1024 1 Raccoon
Speed: 8.0ms pre-process, 71.4ms inference, 2.0ms NMS per image at shape (1, 3, 480, 640)

Training a Model with a Large Dataset

Having shown that we can train a custom YOLOv5 dataset and obtain reasonable performance with a small dataset, we want to try this with a larger dataset. We will then check to see if any additional performance merits the extra effort and time.

Download a Subset from Open Images

For this:

create a dataset with a maximum of 25,000 total images for Raccoons, Dogs, Cats, Squirrels, and Skunks.
export the dataset to YOLOv5 format
train the dataset with 300 epochs starting with our last best.pt and evaluate its performance metrics.

As before, running the following script will download a subset of the Open Images dataset into our /users/<username>/.FiftyOne directory.

Python

import fiftyone as fo
import fiftyone.zoo as foz

splits = ["train", "validation", "test"]
numSamples = 25000
seed = 42

# Get 25,000 images (maybe in total, maybe of each split) from Fiftyone. We'll ask FiftyOne to
# use images from the open-images-v6 dataset. Store information on this download in the
# dataset named "open-images-critters-large". 

# The data that's downloaded will include the images, annotations, and a summary of what's
# been downloaded. That summary will be stored /Users/<username>/.FiftyOne in a mongoDB. 
# The images / annoations will be un /Users/<username>/FiftyOne.

if fo.dataset_exists("open-images-critters-large"):
    fo.delete_dataset("open-images-critters-large")

dataset = foz.load_zoo_dataset(
    "open-images-v6",
    splits=splits,
    label_types=["detections"],
    classes=["Raccoon", "Dog", "Cat", "Squirrel", "Skunk"],
    max_samples=numSamples,
    seed=seed,
    shuffle=True,
    dataset_name="open-images-critters-large")

print(fo.list_datasets())

# Take a quick peek to see what's there
print(dataset)

#session = fo.launch_app(dataset, port=5151)

Export the Large Dataset

As before, we will export the dataset to YOLOv5 format. We will save it in datasets/critters-large.

Python

import fiftyone as fo

export_dir = "datasets/critters-large"
label_field = "detections"  # for example

# The splits to export
splits = ["train", "validation","test"]

# All splits must use the same classes list
classes = ["Raccoon", "Cat", "Dog", "Squirrel", "Skunk"]

# The dataset or view to export
# We assume the dataset uses sample tags to encode the splits to export
dataset_or_view = fo.load_dataset("open-images-critters-large")

# Export the splits
for split in splits:
    split_view = dataset_or_view.match_tags(split)
    split_view.export(
        export_dir=export_dir,
        dataset_type=fo.types.YOLOv5Dataset,
        label_field=label_field,
        split=split,
        classes=classes,
    )

Correct the dataset.yml file

As before, the file datasets\critters-large\dataset.yaml is created during this process needs to be corrected.

YAML

names:
- Raccoon
- Cat
- Dog
- Squirrel
- Skunk
nc: 5
path: c:\Dev\YoloV5_Training\datasets\critters-large
test: .\images\test\
train: .\images\train\
validation: .\images\validation\

The training code expects the label val rather than validation. Update the file to:

YAML

names:
- Raccoon
- Cat
- Dog
- Squirrel
- Skunk
nc: 5
path: c:\Dev\YoloV5_Training\datasets\critters-large
test: .\images\test\
train: .\images\train\
val: .\images\validation\

We are now ready to train the model.

Train with the Larger Dataset

Again, we need to reduce the batch size due to memory constrains. In this case, I set it to 24.

Python

!python yolov5/train.py --batch 24 --weights
train/critters/epochs300/weights/best.pt --data
datasets/critters-large/dataset.yaml --project train/critters-large --name
epochs300 --epochs 300 ^C

I had to stop the run after 15 hours. Fortunately, the training can be resumed. After each epoch, a best.pt file is created in the weights directory which, surprisingly enough, contains the best model found so far.

We can get the performance metrics for this model using the validation script.

Python

!python yolov5/val.py --weights
train/critters-large/epochs300/weights/best.pt --data
datasets/critters-large/dataset.yaml --project val/critters --name large
--device 0

Once the validation is complete, the PR Curve, and other graphs, will be available in the val/critters/large folder.

We can already see that this model is much improved over the previous model and has a m_AP@50 of 0.811 or an additional 4.3%.

I can then restart the training using the --resume parameter on the training script:

Python

!python yolov5/train.py --resume
train/critters-large/epochs300/weights/last.pt

Once this is complete, the train/critters-large/epochs300 folder will contain the results of the training.

The performance can be seen in the PR_curve.png graph.

Precision/Recall Curve

Training Metrics

The new value for the mAP@50 is 0.878 which is an additional improvement of 13% over the 300 epoch small model, or 16% over the small model with 50 epochs. In addition, the mAP@[0.5:0.95] of 0.75 indicates that the model will detect most objects.

Furthermore, the upward slope of the mAP graphs at the end of training indicate that additional training may further improve the model.

Again, let's verify that it works.

Python

import torch
model2 = torch.hub.load('ultralytics/yolov5', 'custom', 'train/critters-large/epochs300/weights/best.pt', device="0")
#print(model)
result = model2("datasets/critters/images/validation/8fbdeff053852ee7.jpg")
result.print()

Validate the Model

The YOLOv5 code provides tools for validating the performance of the custom model. We will validate the model with and without image augmentation during inferencing.

Augmentation manipulates the image being inferenced so that multiple images with different modifications are supplied to the model. Of course, this means that there is more processing time required for any additional accuracy.

Run the following scripts at least twice as there is setup for the first inference.

Python

!python yolov5/val.py --weights
train/critters-large/epochs300/weights/best.pt --data
datasets/critters-large/dataset.yaml --project val/critters-large --name
augmentee --device 0 --augment

!python yolov5/val.py --weights
train/critters-large/epochs300/weights/best.pt --data
datasets/critters-large/dataset.yaml --project val/critters-large --name not

The results of these validation runs are shown in the PR_curve.png graphs.

Normal Inferencing Precision/Recall Curve

Augmented Inferencing Precision/Recall Curve

The unaugmented validation has mAP@50 of 0.877, a mAP@50:95 of 0.756, and took 6.4ms.
The augmented validation has mAP@50 of 0.85, a mAP@50:95 of 0.734, and took 14.0ms.

This indicates that augmented inference does not improve the performance and in fact slightly reduces the performance and takes twice as long. So, at least for this model, augmented inferencing is not recommended.

Conclusions

We have shown that using readily available tools and data, it is a relatively simple, if time consuming task to:

create a large, annotated dataset suitable for YOLOv5 training
train and verify the operation of a custom YOLOv5 model

While I didn't go through the process here, I did try training with:

freezing the all the layers but the last
freezing the backbone (first 10 layers)
hyper-parameter evolution

The first two did not result in any significant improvement, or actually decreased in performance.

The last just took too long and was terminated. This has been left for a future exercise.

Next Steps

For you, train a custom model to detect something you would find useful. This could be detecting:

Amazon packages at you front door
the presence of birds at your bird feeder and the species
the School Bus arriving to pick up or drop off you children
Guns, knives, and other weapons
Face detection
Face mask detection, or lack of face mask detection

The possibilities are endless.

For me, I would like to:

evaluate modifications to the hyper-parameters to provide better and faster convergence on best models.
evaluate the evolution of the hyper-parameters during training.
as YOLOv5 now supports Classification and Segmentation, look at training and use of these types of models
look at encapsulating the dataset creation and training process into a simple tool/UI.

Be on the lookout for follow up articles.

As always, comments, bug reports, and any improvement recommendations are always welcome.

Notes on Training Metrics

These are explained in detail at The Confusing Metrics of AP and mAP for Object Detection / Instance Segmentation and Mean Average Precision (mAP) Explained: Everything You Need to Know. I've summarized this here:

Term	Formula	Description
Precision	Precision = Tp / (Tp + Fp)	A measurement of the probability of a positive result (object detection) is correct.
Recall	Recall = Tp /(Tp + Fn)	A measurement of the probability of a positive result when it should have been.
AP	The area under the Precision-Recall Curve	Average Precision
IoU	IoU = Ai / Au	Union over Intersection is a measure of the amount of overlap of two bounding boxes. When one of the bounding boxes is the Ground Truth and the other is the predicted value, the IoU is a measurement of the accuracy of the prediction. The larger the value, the more the predicted and Ground Truth values match.
mAP	mAP = 1/nc ∑ APi for i=1 to nc	Mean Average Precision. The average of the sum of the Average Precision for each class of object. This value is calculated for a particular value of IoU threshold for a positive value.
mAP@50		mAP for an IoU threshold of .5 for a positive prediction.
mAP@[0.5:0.95]		The average of the mAP values for a number of IoU values between .5 and .95. Typically 10 intervals or a value of .05 is used.
Ground Truth		The true value of the bounding box for the object. This is typically defined by the manual or automatic labeling process which defines the classes and bounding boxes for objects in an image.

Where:

Tp is the number of correct Positive results
Fp is the number of incorrect Positive results
Tn is the number of correct Negative results
Fn is the number of incorrect Negative results.
Ai is the area of Intersection of two bounding boxes
Au is the area of the Union of two bounding boxes
APi is the AP for class i
nc is the number of classes

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)