The classic example of a deep learning for computer vision project starts out with a dataset containing images, and labels. Depending on what type of labels you have and the task you want to accomplish - image classification, object detection, or image segmentation; one would select from an appropriate set of deep learning models. There are many resources one can follow including Kaggle datasets and notebooks, GitHub repos, and also built in example datasets such as the MNIST data set available in deep learning packages like TensorFlow and PyTorch. The focus in these sorts of example projects is mainly in choosing model architectures and tuning hyperparameters. Sometimes, for unique datasets, it might make sense to employ transfer learning where you apply a pre-trained model (see my blog for example on Deep Transfer Learning Tutorial in PyTorch on Animals-10 Dataset).
However, in most computer vision problems that you encounter whether at a company that tries to detect defects on large machinery, or a client that has specific needs - labeling the data is a challenge. As an example, let's say a client wants to detect houses from satellite images that are taken by their drone.
Unfortunately, images taken by a drone or satellite images from Google Maps do not come pre-labeled. So much of the existing deep learning tutorials that you might have followed are now lacking in a crucial step that is necessary for you to start the project. How do you label these images in the first place?
It turns out that there are multiple image labeling service providers. However, there are a lack of tutorials on how to label images. Which is quite surprising. I found a good free online solution- makesense.ai. As you can see below, it is easy to get started.
Once you upload an image, make sure to choose the right labeling task depending on whether you want to detect multiple objects in an image, or classify (recognize here) an image.
Next, create a list of labels. Here I'm just detecting houses, so I will create only 1 class of labels.
Next, I annotate images by drawing a bounding box around individual images.
Finally I download the annotations as a particular format. In this case I choose YOLO which is a popular family of object detection models.
When I open the zip folder, the annotations are in a .txt file whose contents are below. Each row is a distinct house, and the 5 columns denote object class, and the 4 following columns denote the center X and Y as well as the width and height of the annotations.
#YOLO annotations 0 0.100204 0.266547 0.122651 0.425760 0 0.245263 0.257603 0.122651 0.436494 0 0.373811 0.248658 0.110858 0.450805 0 0.502359 0.251342 0.117934 0.420394 0 0.633265 0.277281 0.113217 0.411449 0 0.761224 0.281753 0.119113 0.441860 0 0.229931 0.764758 0.103782 0.449016 0 0.367324 0.751342 0.104961 0.404293 0 0.499410 0.739714 0.123831 0.459750 0 0.766531 0.722719 0.129727 0.486583 0 0.633855 0.753131 0.119113 0.432916 0 0.909820 0.735242 0.126189 0.490161
How do we know that our annotations are correct? We can load our annotations and images in Python, and make a custom function to visualize our annotations as below:
from PIL import Image, ImageDraw import numpy as np import matplotlib.pyplot as plt import os #code adapted from https://blog.paperspace.com/train-yolov5-custom-data/ def plot_bounding_box(image, annotation_list): annotations = np.array(annotation_list) w, h = image.size plotted_image = ImageDraw.Draw(image) transformed_annotations = np.copy(annotations) try: transformed_annotations[:,[1,3]] = annotations[:,[1,3]] * w transformed_annotations[:,[2,4]] = annotations[:,[2,4]] * h transformed_annotations[:,1] = transformed_annotations[:,1] - (transformed_annotations[:,3] / 2) transformed_annotations[:,2] = transformed_annotations[:,2] - (transformed_annotations[:,4] / 2) transformed_annotations[:,3] = transformed_annotations[:,1] + transformed_annotations[:,3] transformed_annotations[:,4] = transformed_annotations[:,2] + transformed_annotations[:,4] except: transformed_annotations[[1,3]] = annotations[[1,3]] * w transformed_annotations[[2,4]] = annotations[[2,4]] * h transformed_annotations = transformed_annotations - (transformed_annotations / 2) transformed_annotations = transformed_annotations - (transformed_annotations / 2) transformed_annotations = transformed_annotations + transformed_annotations transformed_annotations = transformed_annotations + transformed_annotations print(transformed_annotations) for ann in transformed_annotations: try: obj_cls, x0, y0, x1, y1 = ann plotted_image.rectangle(((x0,y0), (x1,y1)), width = 10, outline="#0000ff") except: obj_cls= transformed_annotations x0=transformed_annotations y0=transformed_annotations x1=transformed_annotations y1=transformed_annotations plotted_image.rectangle(((x0,y0), (x1,y1)), width = 10, outline="#0000ff") plt.imshow(np.array(image)) plt.show() #get an annotation file annotation_file = './houses.txt' #Get the corresponding image file image_file = annotation_file.replace("txt", "png") assert os.path.exists(image_file) #Load the image image = Image.open(image_file) #Plot the Bounding Box plot_bounding_box(image, np.loadtxt(annotation_file))
Great- our annotations look spot on!
Conclusion and challenges
While makesense.ai is a great free platform, it is not scalable. To train a YOLO style object detection model, you need at least hundreds (thousands and hundreds of thousands are preferred) to get reasonable accuracy. Makesense does not save any annotations or incorporate workflows. You need to do all of this individually in one sitting. It would take an extremely large amount of time if one person sat for hours on end labeling thousands of images. There are other services such as ango.ai that provide customized quotes and labeling services amounting to a few cents per image. In my opinion this is an area for vast improvement and untapped potential. As the general field of AI gets more popular and accessible - the need for easy, accurate, and cheap labeling at scale will get all the more important.
I hope this showed you the tip of the iceberg for end-to-end computer vision projects; and that model training is only a part of the puzzle. Look out for more blogs that talk about other key aspects of business focused end-to-end deep learning!
You can find the code from this post on GitHub:
Thanks for reading! For Data Science and Machine Learning mentoring, please contact us! We develop custom learning pathways for individual clients and enable cutting edge AI based research. We also provide access to high-end computational servers based on needs.