Basic Object Detection in Python

Reymund Virtus
4 min readDec 21, 2021

A guide on how to make object detection from scratch using YOLO

Source: [Canva]

Introduction

  • Object detection is one of the most well-known and well-researched subjects in Computer Vision.
  • To put it another way, Object Detection is concerned with detecting and localizing several classes from a picture, such as a person, automobile, bus, spoon, and so on. This may be accomplished by creating a bounding box around the target class in question.
  • In this article, I’m gonna show you how to build your own object detection from scratch. We’re going to use OpenCV an open-source library for computer vision for python.

So buckle up and get ready for the ride :)

Install Some Requirements

  • First, you need to install OpenCV on your computer, you can skip this if you have already OpenCV installed. To download OpenCV head over to this link and select the machine you’re using.
  • Second, let’s download the yolov3.cfg and yolov3.weights files. To download the file, let’s head over to this link.
  • Once you click the link, it will redirect you to a website called YOLO and you need to find the YOLOv3–608 to download the most accurate dataset and click the two with a red box label cfg and weights to download both files.
  • Lastly, the coco file. To download head over to this link and copy the names inside the file. On your computer make a new file and name it “coco.names” and paste the names you copied from the link.

Let’s Code!

  • Once you’ve downloaded all the requirements we are now ready to code. On your selected directory (make sure that the downloaded files are also inside of the directory), create a new python file and open it in your favorite text editor or IDE.
  • Now let’s import the required python library including OpenCV and NumPy
  • Then, we read the yolov3 weights and cfg file and assigned it in the net variable. We open and read the coco.names file and put the names on the classes list. We also set the VideoCapture to 0 to activate our webcam and assigned it in the webcam variable.
import cv2
import numpy as np
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')
classes = []
with open('coco.names', 'r') as f:
classes = f.read().splitlines()
webcam = cv2.VideoCapture(0)
  • Once our webcam is working were going to add a while loop set to True. Then we read the webcam and assigned it to the image variable. We set the height and width base on the image shape.
  • Then we use the blobFromImage() function to create inputs that we're gonna pass on the setInput() function. We also need to use the getUnconnectedOutlayersNames() function to get the out layered names in order for us to fit them in forward() function to get the layer output.
while True:
_, image = webcam.read()
height, width, _ = image.shape
blob = cv2.dnn.blobFromImage(image, 1 / 255, (416, 416), (0, 0, 0), swapRB = True, crop = False) net.setInput(blob)
out_layers_names = net.getUnconnectedOutLayersNames()
layer_output = net.forward(out_layers_names)
  • Next, we make a list of boxes, confidences, and classes id’s to store the results. Then we create a for loop where the output is in layer_output and another for loop where the detection is in output. Inside the nested, for loop, we assigned the detection starting from 5 to the last and store it in scores variable and get the max scores and assigned it in class_id variable and assigned it in confidence variable.
  • We also check if the score of the confidence is greater than 0.5 (50%). Then if it is we detect the object in x and y coordinate and also detect its height and width. Then we append the detection in the boxes list we also append the float value of confidence in the confidences list and class_id on the class_ids list.
  • We also assigned the value of boxes and confidence in indexes variable by using the NMSBoxes() function because we want to detect multiple objects. Then we also set the fonts for the label in the box.
    boxes = []
confidences = [] # store the confidences
class_ids = [] # store the classes id
for output in layer_output:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if
confidence > 0.5:
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
width_detection = int(detection[2] * width)
height_detection = int(detection[3] * height)
x_detection = int(center_x - width_detection / 2)
y_detection = int(center_y - height_detection / 2)
boxes.append([x_detection, y_detection, width_detection, height_detection]) confidences.append((float(confidence)))
class_ids.append(class_id)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
  • Finally, we were here in the last for loop. In this section, we pass all the information, the label, and confidence and draw a rectangle and fit the object detected. Then use putText() function to display the label and the accuracy based on the object detected.
    for i in indexes:
x_detection, y_detection, width_detection, height_detection = boxes[i]
label = str(classes[class_ids[i]])
confidence = str(round(confidences[i], 2))
cv2.rectangle(image, (x_detection, y_detection), (x_detection + width_detection, y_detection + height_detection), (0, 255, 0), 2) cv2.putText(image, label + " " + confidence, (x_detection, y_detection - 20), font, 2, (0, 255, 0), 2) cv2.imshow('image', image)
key = cv2.waitKey(1)
if key == 27:
break
webcam.release()
cv2.destroyAllWindows()

In this case, if you click the Esc key it will terminate the program, you can change that if you want.

That’s it! You now have your own working Object Detection in Python. You can modify the code if you want and you can use it to experiment on your own.

--

--