간단한 YOLO 구현(OpenCV)

March 10, 2022 4 분 소요

객체 탐지

객체 탐지(Object Detection)는 컴퓨터 비전 기술의 세부 분야중 하나로써 주어진 이미지내 사용자가 관심 있는 객체를 탐지하는 기술입니다.

객체 탐지 모델을 만들기에 앞서, 우선시 되어야 할 과정은 바운딩 박스를 만드는 것 입니다.

바운딩 박스란 (X 최소값, Y 최소값, X 최대값, Y 최대값)으로 표현되는 타겟 위치를 사각형으로 표현한 것입니다.

다운로드.jpeg

YOLO란

You Only Look Once

다운로드.png

YOLO는 한 개의 네트워크에서 탐지를 원하는 물체의 영역(bounding box)와 이름을 표시합니다. 원리는 다음과 같습니다.

이미지를 입력으로 받습니다.(한 장의 이미지이든 비디오이든 웹캠 스트림이든, 컴퓨터 캡쳐 화면이든 nparray로 치환할 수 있다면 뭐든지 받습니다.)
S x S 크기의 그리드로 이미지를 나눕니다. 각 그리드에서 예측을 한 후 이를 종합해서 bounding box를 구성합니다.

YOLD 사용법

YOLO를 실행 할 딥러닝 프레임워크가 필요하다.

YOLO와 호환되는 가장 많이 사용되는 3가지 프레임워크

Darknet : YOLO 개발자가 만든 프레임워크. YOLO를 위해 특별히 제작되었다.

장점 : 빠르다. GPU또는 CPU와 함께 사용가능

단점 : 리눅스에서만 호환…
Darkflow : Darknet을 텐서플로우에 적용한것

장점 : 빠르고 GPU 또는 CPU와 함께 사용 가능하고 리눅스, 윈도우, 맥에서 호환

단점 : 설치 복잡
OpenCV : 최소 3.4.2버전 필요

장점 : openCV외에 설치할 것이 없다

단점 : CPU에서만 작동하기 때문에 비디오를 실시간으로 처리하는 데 속도가 빠르진 않다

OpenCV DNN YOLO

Tensorflow에서 Yolo를 지원하지 않는다. Yolo의 창시자 사이트에 가서 weight와 conf파일을 받아와야 한다.

알고리즘을 로드합니다. 알고리즘을 실행하기 위해서 3개의 파일이 필요합니다.

Weight file : 훈련된 model
Cfg file : 구성파일. 알고리즘에 관한 모든 설정이 있다.
Name files : 알고리즘이 감지할 수 있는 객체의 이름을 포함한다.

https://pjreddie.com/darknet/yolo/

위의 홈페이지에서 받아옵니다.

참고한 본문에서는 yolov3.weights, yolov3.cfg 파일을 사용했다.

데이터셋 준비

자동차 인식 데이터셋

https://www.kaggle.com/sshikamaru/car-object-detection

에서 데이터셋을 받아옵니다.

import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt

box = pd.read_csv('/content/drive/MyDrive/cardataset/train_solution_bounding_boxes (1).csv')
box

	image	xmin	ymin	xmax	ymax
0	vid_4_1000.jpg	281.259045	187.035071	327.727931	223.225547
1	vid_4_10000.jpg	15.163531	187.035071	120.329957	236.430180
2	vid_4_10040.jpg	239.192475	176.764801	361.968162	236.430180
3	vid_4_10020.jpg	496.483358	172.363256	630.020260	231.539575
4	vid_4_10060.jpg	16.630970	186.546010	132.558611	238.386422
...	...	...	...	...	...
554	vid_4_9860.jpg	0.000000	198.321729	49.235251	236.223284
555	vid_4_9880.jpg	329.876184	156.482351	536.664239	250.497895
556	vid_4_9900.jpg	0.000000	168.295823	141.797524	239.176652
557	vid_4_9960.jpg	487.428988	172.233646	616.917699	228.839864
558	vid_4_9980.jpg	221.558631	182.570434	348.585579	238.192196

559 rows × 5 columns

sample = cv2.imread('/content/drive/MyDrive/cardataset/training_images/vid_4_1000.jpg')
sample = cv2.cvtColor(sample, cv2.COLOR_BGR2RGB)
point = box.iloc[0]
pt1 = (int(point['xmin']), int(point['ymax']))
pt2 = (int(point['xmax']), int(point['ymin']))
cv2.rectangle(sample, pt1, pt2, color=(255,0,0), thickness=2)
plt.imshow(sample)

<matplotlib.image.AxesImage at 0x7f95257047d0>

sample = cv2.imread('/content/drive/MyDrive/cardataset/training_images/vid_4_10000.jpg')
sample = cv2.cvtColor(sample, cv2.COLOR_BGR2RGB)
point = box.iloc[1]
pt1 = (int(point['xmin']), int(point['ymax']))
pt2 = (int(point['xmax']), int(point['ymin']))
cv2.rectangle(sample, pt1, pt2, color=(255,0,0), thickness=2)
plt.imshow(sample)

<matplotlib.image.AxesImage at 0x7f95231328d0>

YOLO 구현

# Yolo 로드
net = cv2.dnn.readNet("/content/drive/MyDrive/yolov3.weights", "/content/drive/MyDrive/yolov3.cfg")
classes = []
with open("/content/drive/MyDrive/coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# 이미지 가져오기
img = cv2.imread('/content/drive/MyDrive/cardataset/training_images/vid_4_10000.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
height, width, channels = img.shape

네트워크에서 이미지를 바로 사용할 수 없기때문에 먼저 이미지를 Blob으로 변환해야 한다.

Blob은 이미지에서 특징을 잡아내고 크기를 조정하는데 사용된다.

YOLO가 허용하는 세가지 크기

320 × 320 : 작고 정확도는 떨어지지 만 속도 빠름
609 × 609 : 정확도는 더 높지만 속도 느림
416 × 416 : 중간

blob = cv2.dnn.blobFromImage(img, 1/256, (416, 416), (0, 0, 0), swapRB=True, crop=False)
net.setInput(blob)

# outs는 출력으로 탐지된 개체에 대한 모든 정보와 위치를 제공한다.
outs = net.forward(output_layers)

# 정보를 화면에 표시
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            # Object detected
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            # 좌표
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

confidence 값이 1에 가까우면 정확도가 높아지고, 0에 가까우면 정확도가 떨어지지만 검출되는 개체 수가 많아진다.

위의 검출 과정에서 동일한 객체에 생기는 바운딩 박스가 많아지기 때문에 이를 제거하기 위한 코드(Non maximum suppresion)가 필요하다.

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

마지막으로 모든 정보를 추출하여 화면에 표시합니다.

Box : 감지된 개체를 둘러싼 사각형의 좌표
Label : 감지된 물체의 이름
Confidence : 0에서 1까지의 탐지에 대한 신뢰도

font = cv2.FONT_HERSHEY_PLAIN
colors = np.random.uniform(0, 255, size=(len(boxes), 3))

for i in indexes.flatten():
    x, y, w, h = boxes[i]
    print(x, y, w, h)
    label = str(classes[class_ids[i]])
    confidence = str(round(confidences[i], 2))
    color = colors[i]
    cv2.rectangle(img, (x, y), ((x+w), (y+h)), color, 2)
    cv2.putText(img, label + " " + confidence, (x, y+20), font, 2, (0, 255, 0), 2)

plt.imshow(img)

22 197 92 32

<matplotlib.image.AxesImage at 0x7efe54c00690>

하나의 함수로 시각화 구현

def predict_yolo(img_path):
  # 이미지 가져오기
  img = cv2.imread(img_path)
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  height, width, channels = img.shape

  blob = cv2.dnn.blobFromImage(img, 1/256, (416, 416), (0, 0, 0), swapRB=True, crop=False)
  net.setInput(blob) 
  outs = net.forward(output_layers)

  class_ids = []
  confidences = []
  boxes = []
  for out in outs:
      for detection in out:
          scores = detection[5:]
          class_id = np.argmax(scores)
          confidence = scores[class_id]
          if confidence > 0.5:
              # Object detected
              center_x = int(detection[0] * width)
              center_y = int(detection[1] * height)
              w = int(detection[2] * width)
              h = int(detection[3] * height)
              # 좌표
              x = int(center_x - w / 2)
              y = int(center_y - h / 2)
              boxes.append([x, y, w, h])
              confidences.append(float(confidence))
              class_ids.append(class_id)

  indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

  font = cv2.FONT_HERSHEY_PLAIN
  colors = np.random.uniform(0, 255, size=(len(boxes), 3))
  if len(indexes) > 0:
    for i in indexes.flatten():
        x, y, w, h = boxes[i]
        print(x, y, w, h)
        label = str(classes[class_ids[i]])
        confidence = str(round(confidences[i], 2))
        color = colors[i]
        cv2.rectangle(img, (x, y), ((x+w), (y+h)), color, 2)
        cv2.putText(img, label + " " + confidence, (x, y+20), font, 2, (0, 255, 0), 2)

    plt.imshow(img)
  
  else:
    print('탐지된 물체가 없습니다.')

import glob
import random

paths = glob.glob('/content/drive/MyDrive/cardataset/testing_images/*.jpg')

img_path = random.choice(paths)

predict_yolo(img_path)

211 197 112 29
2 201 52 23

img_path = random.choice(paths)

predict_yolo(img_path)

412 177 187 59
197 195 38 15

Twitter Facebook LinkedIn

간단한 YOLO 구현(OpenCV)

객체 탐지

YOLO란

YOLD 사용법

OpenCV DNN YOLO

데이터셋 준비

YOLO 구현

하나의 함수로 시각화 구현

공유하기

댓글남기기

참고

DALL-E 2 사용법 (사용기), 텍스트로 이미지를 만드는 인공지능

구글 드라이브 파일 다운받는 gdown 사용법과 안될 시 해결법

스테이블 디퓨전(Stable Diffusion) 간단한 사용법과 가이드 및 원리 이해 by 코랩(colab)

머신러닝 - K-최근접 이웃(KNN classifier)을 이용한 분류