Labeling And Visualizing Images For Object Detection

The classic example of a deep learning for computer vision project starts out with a dataset containing images, and labels. However, in most computer vision problems labeling the data is a challenge. This article walks through image labeling at scale and associated challenges

Labeling And Visualizing Images For Object Detection
Image from Google Maps with Annotated Bounding Boxes | Skanda Vivek

The classic example of a deep learning for computer vision project starts out with a dataset containing images, and labels. Depending on what type of labels you have and the task you want to accomplish - image classification, object detection, or image segmentation; one would select from an appropriate set of deep learning models. There are many resources one can follow including Kaggle datasets and notebooks, GitHub repos, and also built in example datasets such as the MNIST data set available in deep learning packages like TensorFlow and PyTorch. The focus in these sorts of example projects is mainly in choosing model architectures and tuning hyperparameters. Sometimes, for unique datasets, it might make sense to employ transfer learning where you apply a pre-trained model (see my blog for example on Deep Transfer Learning Tutorial in PyTorch on Animals-10 Dataset).

However, in most computer vision problems that you encounter whether at a company that tries to detect defects on large machinery, or a client that has specific needs - labeling the data is a challenge. As an example, let's say a client wants to detect houses from satellite images that are taken by their drone.

Image from Google Maps | Skanda Vivek

Unfortunately, images taken by a drone or satellite images from Google Maps do not come pre-labeled. So much of the existing deep learning tutorials that you might have followed are now lacking in a crucial step that is necessary for you to start the project. How do you label these images in the first place?

It turns out that there are multiple image labeling service providers. However, there are a lack of tutorials on how to label images. Which is quite surprising. I found a good free online solution- As you can see below, it is easy to get started.

Image from

Once you upload an image, make sure to choose the right labeling task depending on whether you want to detect multiple objects in an image, or classify (recognize here) an image.

Image from

Next, create a list of labels. Here I'm just detecting houses, so I will create only 1 class of labels.

Image from

Next, I annotate images by drawing a bounding box around individual images.

Image from

Finally I download the annotations as a particular format. In this case I choose YOLO which is a popular family of object detection models.

Image from

When I open the zip folder, the annotations are in a .txt file whose contents are below. Each row is a distinct house, and the 5 columns denote object class, and the 4 following columns denote the center X and Y as well as the width and height of the annotations.

#YOLO annotations

0 0.100204 0.266547 0.122651 0.425760
0 0.245263 0.257603 0.122651 0.436494
0 0.373811 0.248658 0.110858 0.450805
0 0.502359 0.251342 0.117934 0.420394
0 0.633265 0.277281 0.113217 0.411449
0 0.761224 0.281753 0.119113 0.441860
0 0.229931 0.764758 0.103782 0.449016
0 0.367324 0.751342 0.104961 0.404293
0 0.499410 0.739714 0.123831 0.459750
0 0.766531 0.722719 0.129727 0.486583
0 0.633855 0.753131 0.119113 0.432916
0 0.909820 0.735242 0.126189 0.490161

How do we know that our annotations are correct? We can load our annotations and images in Python, and make a custom function to visualize our annotations as below:

from PIL import Image, ImageDraw
import numpy as np
import matplotlib.pyplot as plt
import os

#code adapted from

def plot_bounding_box(image, annotation_list):
    annotations = np.array(annotation_list)
    w, h = image.size
    plotted_image = ImageDraw.Draw(image)

    transformed_annotations = np.copy(annotations)
        transformed_annotations[:,[1,3]] = annotations[:,[1,3]] * w
        transformed_annotations[:,[2,4]] = annotations[:,[2,4]] * h 
        transformed_annotations[:,1] = transformed_annotations[:,1] - (transformed_annotations[:,3] / 2)
        transformed_annotations[:,2] = transformed_annotations[:,2] - (transformed_annotations[:,4] / 2)
        transformed_annotations[:,3] = transformed_annotations[:,1] + transformed_annotations[:,3]
        transformed_annotations[:,4] = transformed_annotations[:,2] + transformed_annotations[:,4]
        transformed_annotations[[1,3]] = annotations[[1,3]] * w
        transformed_annotations[[2,4]] = annotations[[2,4]] * h 
        transformed_annotations[1] = transformed_annotations[1] - (transformed_annotations[3] / 2)
        transformed_annotations[2] = transformed_annotations[2] - (transformed_annotations[4] / 2)
        transformed_annotations[3] = transformed_annotations[1] + transformed_annotations[3]
        transformed_annotations[4] = transformed_annotations[2] + transformed_annotations[4]  
    for ann in transformed_annotations:
            obj_cls, x0, y0, x1, y1 = ann
            plotted_image.rectangle(((x0,y0), (x1,y1)), width = 10, outline="#0000ff")
            obj_cls= transformed_annotations[0]
            plotted_image.rectangle(((x0,y0), (x1,y1)), width = 10, outline="#0000ff")

#get an annotation file
annotation_file = './houses.txt'

#Get the corresponding image file
image_file = annotation_file.replace("txt", "png")
assert os.path.exists(image_file)

#Load the image
image =

#Plot the Bounding Box
plot_bounding_box(image, np.loadtxt(annotation_file))

Great- our annotations look spot on!

Image from Google Maps with Annotated Bounding Boxes | Skanda Vivek

Conclusion and challenges

While is a great free platform, it is not scalable. To train a YOLO style object detection model, you need at least hundreds (thousands and hundreds of thousands are preferred) to get reasonable accuracy. Makesense does not save any annotations or incorporate workflows. You need to do all of this individually in one sitting. It would take an extremely large amount of time if one person sat for hours on end labeling thousands of images. There are other services such as that provide customized quotes and labeling services amounting to a few cents per image. In my opinion this is an area for vast improvement and untapped potential. As the general field of AI gets more popular and accessible - the need for easy, accurate, and cheap labeling at scale will get all the more important.

I hope this showed you the tip of the iceberg for end-to-end computer vision projects; and that model training is only a part of the puzzle. Look out for more blogs that talk about other key aspects of business focused end-to-end deep learning!

You can find the code from this post on GitHub:

GitHub - skandavivek/Visualizing-YOLO-annotations: Visualizing YOLO annotations on an image (created in
Visualizing YOLO annotations on an image (created in - GitHub - skandavivek/Visualizing-YOLO-annotations: Visualizing YOLO annotations on an image (created in https://www...

Thanks for reading! For Data Science and Machine Learning mentoring, please contact us! We develop custom learning pathways for individual clients and enable cutting edge AI based research. We also provide access to high-end computational servers based on needs.