객체 탐지 설명과 Faster R-CNN 구현하기
객체 탐지
객체 탐지(Object Detection)는 컴퓨터 비전 기술의 세부 분야중 하나로써 주어진 이미지내 사용자가 관심 있는 객체를 탐지하는 기술입니다.
객체 탐지 모델을 만들기에 앞서, 우선시 되어야 할 과정은 바운딩 박스를 만드는 것 입니다.
바운딩 박스란 (X 최소값, Y 최소값, X 최대값, Y 최대값)으로 표현되는 타겟 위치를 사각형으로 표현한 것입니다.
객체 탐지 모델 형태
객체 탐지 모델은 크게 One-Stage 모델과 Two-Stage 모델로 구분할 수 있습니다.
- One-Stage Detector
Classification, Regional Proposal을 동시에 수행하여 결과를 얻는 방법입니다.
- Two-Stage Detector
Classification, Regional Proposal을 순차적으로 수행하여 결과를 얻는 방법입니다.
One-Stage Detector
모델: YOLO, SSD, RetinaNet
Classification, Regional Proposal을 동시에 수행하여 결과를 얻는 방법입니다.
Two-Stage Detector
모델: R-CNN, Fast R-CNN, Faster R-CNN
Classification, Regional Proposal을 순차적으로 수행하여 결과를 얻는 방법입니다.
결과적으로 One-Stage Detector는 비교적 빠르지만 정확도가 낮고, Two-Stage Detector는 비교적 느리지만 정확도가 높습니다.
Faster R-CNN 설명
학습 과정
-
흔히 Backbone이라고 부르는 Conv Layer에서 Feature Map을 추출합니다.
-
추출한 Feature Map을 RPN Layer에 넣어, region proposal을 생성합니다.
-
Feature Map과 RPN을 거쳐 얻은 Region Proposal을 통해 Bounding Box Regression과 Classifier를 수행합니다.
RoI Pooling + RoI Head (Fast R-CNN)
여기서 두 가지 Loss를 통해 학습하게 되는데,
RPN layer의 loss와 RoI의 Loss입니다.
이 둘을 합해서 하나의 Total Loss를 계산하고, 이를 통해 전체 네트워크를 업데이트합니다.
데이터셋 준비
자동차 인식 데이터셋
https://www.kaggle.com/sshikamaru/car-object-detection
에서 데이터셋을 받아옵니다.
import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt
import os
from tqdm import tqdm
import torch
from torch.utils.data import Dataset, DataLoader
box = pd.read_csv('/content/drive/MyDrive/cardataset/train_solution_bounding_boxes (1).csv')
box
image | xmin | ymin | xmax | ymax | |
---|---|---|---|---|---|
0 | vid_4_1000.jpg | 281.259045 | 187.035071 | 327.727931 | 223.225547 |
1 | vid_4_10000.jpg | 15.163531 | 187.035071 | 120.329957 | 236.430180 |
2 | vid_4_10040.jpg | 239.192475 | 176.764801 | 361.968162 | 236.430180 |
3 | vid_4_10020.jpg | 496.483358 | 172.363256 | 630.020260 | 231.539575 |
4 | vid_4_10060.jpg | 16.630970 | 186.546010 | 132.558611 | 238.386422 |
... | ... | ... | ... | ... | ... |
554 | vid_4_9860.jpg | 0.000000 | 198.321729 | 49.235251 | 236.223284 |
555 | vid_4_9880.jpg | 329.876184 | 156.482351 | 536.664239 | 250.497895 |
556 | vid_4_9900.jpg | 0.000000 | 168.295823 | 141.797524 | 239.176652 |
557 | vid_4_9960.jpg | 487.428988 | 172.233646 | 616.917699 | 228.839864 |
558 | vid_4_9980.jpg | 221.558631 | 182.570434 | 348.585579 | 238.192196 |
559 rows × 5 columns
sample = cv2.imread('/content/drive/MyDrive/cardataset/training_images/vid_4_1000.jpg')
sample = cv2.cvtColor(sample, cv2.COLOR_BGR2RGB)
point = box.iloc[0]
pt1 = (int(point['xmin']), int(point['ymax']))
pt2 = (int(point['xmax']), int(point['ymin']))
cv2.rectangle(sample, pt1, pt2, color=(255,0,0), thickness=2)
plt.imshow(sample)
<matplotlib.image.AxesImage at 0x7f95257047d0>
sample = cv2.imread('/content/drive/MyDrive/cardataset/training_images/vid_4_10000.jpg')
sample = cv2.cvtColor(sample, cv2.COLOR_BGR2RGB)
point = box.iloc[1]
pt1 = (int(point['xmin']), int(point['ymax']))
pt2 = (int(point['xmax']), int(point['ymin']))
cv2.rectangle(sample, pt1, pt2, color=(255,0,0), thickness=2)
plt.imshow(sample)
<matplotlib.image.AxesImage at 0x7f95231328d0>
모델훈련을 위한 데이터셋을 만들시
target에 바운딩박스와 label, image_id, area을 지정해 주어야 한다.
-
“image_id”는 마스크가 속한 이미지 id이다.
-
“area”는 box의 면적이다. 나중에 예측 box와 실제 box의 IoU를 쉽게 구하기 위함이다.
-
“iscrowd”는 물체가 너무 작은데 많아서 하나의 군집으로 박스를 처리하여 레이블링 했는지에 관한 여부이다. object detection을 하다보면, labeling 시 위와 같은 기준이 필요한 것이 몇가지 있다. 물체가 숨어있거나, 가려져있거나 등의 경우이다.
바운딩박스와 label은 반드시 있어야 한다.
여기서 class가 “car” 1종류이기 때문에 label은 1로만 지정한다.
class CarDataset(Dataset):
def __init__(self, df, image_dir, transforms=None):
super().__init__()
self.image_ids = df["image"].unique() # all image filenames
self.df = df
self.image_dir = image_dir # dir to image files
self.transforms = transforms
def __getitem__(self, idx: int):
image_id = self.image_ids[idx]
records = self.df[self.df["image"] == image_id]
image = cv2.imread(f"{self.image_dir}/{image_id}", cv2.IMREAD_COLOR)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)
image /= 255.0
image = torch.tensor(image)
image = image.permute(2,0,1)
boxes = records[["xmin", "ymin", "xmax", "ymax"]].values
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
area = torch.as_tensor(area, dtype=torch.float32)
# class가 1종류이기 때문에 label은 1로만 지정
labels = torch.ones((records.shape[0]), dtype=torch.int64)
target = {}
target["boxes"] = torch.tensor(boxes)
target["labels"] = labels
target["image_id"] = torch.tensor([idx])
target["area"] = area
if self.transforms:
sample = {"image": image, "boxes": target["boxes"], "labels": labels}
sample = self.transforms(**sample)
image = sample["image"]
target["boxes"] = torch.stack(tuple(map(torch.tensor, zip(*sample["boxes"])))).permute(1, 0)
return image, target, image_id
def __len__(self):
return self.image_ids.shape[0]
def collate_fn(batch):
return tuple(zip(*batch))
dir_train = "/content/drive/MyDrive/cardataset/training_images"
train_ds = CarDataset(box, dir_train)
train_dl = DataLoader(train_ds, batch_size=8, shuffle=False, num_workers=2, collate_fn=collate_fn)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
Faster R-CNN 모델 생성
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection import fasterrcnn_resnet50_fpn
# Load model pretrained on COCO
model = fasterrcnn_resnet50_fpn(pretrained=True)
Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
0%| | 0.00/160M [00:00<?, ?B/s]
num_classes = 2 # 1 class (car) + background
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace pre-trained head with new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
print(model)
FasterRCNN( (transform): GeneralizedRCNNTransform( Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) Resize(min_size=(800,), max_size=1333, mode='bilinear') ) (backbone): BackboneWithFPN( (body): IntermediateLayerGetter( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): FrozenBatchNorm2d(64, eps=0.0) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(64, eps=0.0) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(64, eps=0.0) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(256, eps=0.0) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): FrozenBatchNorm2d(256, eps=0.0) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(64, eps=0.0) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(64, eps=0.0) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(256, eps=0.0) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(64, eps=0.0) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(64, eps=0.0) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(256, eps=0.0) (relu): ReLU(inplace=True) ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(128, eps=0.0) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(128, eps=0.0) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(512, eps=0.0) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): FrozenBatchNorm2d(512, eps=0.0) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(128, eps=0.0) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(128, eps=0.0) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(512, eps=0.0) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(128, eps=0.0) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(128, eps=0.0) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(512, eps=0.0) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(128, eps=0.0) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(128, eps=0.0) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(512, eps=0.0) (relu): ReLU(inplace=True) ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=0.0) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=0.0) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=0.0) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): FrozenBatchNorm2d(1024, eps=0.0) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=0.0) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=0.0) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=0.0) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=0.0) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=0.0) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=0.0) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=0.0) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=0.0) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=0.0) (relu): ReLU(inplace=True) ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=0.0) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=0.0) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=0.0) (relu): ReLU(inplace=True) ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(256, eps=0.0) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(256, eps=0.0) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(1024, eps=0.0) (relu): ReLU(inplace=True) ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(512, eps=0.0) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(512, eps=0.0) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(2048, eps=0.0) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): FrozenBatchNorm2d(2048, eps=0.0) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(512, eps=0.0) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(512, eps=0.0) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(2048, eps=0.0) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d(512, eps=0.0) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d(512, eps=0.0) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d(2048, eps=0.0) (relu): ReLU(inplace=True) ) ) ) (fpn): FeaturePyramidNetwork( (inner_blocks): ModuleList( (0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (3): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) ) (layer_blocks): ModuleList( (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (extra_blocks): LastLevelMaxPool() ) ) (rpn): RegionProposalNetwork( (anchor_generator): AnchorGenerator() (head): RPNHead( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1)) (bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1)) ) ) (roi_heads): RoIHeads( (box_roi_pool): MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'], output_size=(7, 7), sampling_ratio=2) (box_head): TwoMLPHead( (fc6): Linear(in_features=12544, out_features=1024, bias=True) (fc7): Linear(in_features=1024, out_features=1024, bias=True) ) (box_predictor): FastRCNNPredictor( (cls_score): Linear(in_features=1024, out_features=2, bias=True) (bbox_pred): Linear(in_features=1024, out_features=8, bias=True) ) ) )
model.to(device)
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(params, lr=0.0005, weight_decay=0.0005)
model.train()
num_epochs = 5
for epoch in range(num_epochs):
for i, (images, targets, image_ids) in enumerate(train_dl):
optimizer.zero_grad()
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
losses.backward()
optimizer.step()
if (i+1) % 10 == 0:
print(f'Epoch {epoch+1} - Total: {losses:.4f}, Regression: {loss_dict["loss_box_reg"]:.4f}, Classifier: {loss_dict["loss_classifier"]:.4f}')
/usr/local/lib/python3.7/dist-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Epoch 1 - Total: 0.1604, Regression: 0.0524, Classifier: 0.0684 Epoch 1 - Total: 0.5372, Regression: 0.2511, Classifier: 0.2091 Epoch 1 - Total: 0.2611, Regression: 0.0858, Classifier: 0.1268 Epoch 1 - Total: 0.3183, Regression: 0.1235, Classifier: 0.1529 Epoch 2 - Total: 0.1834, Regression: 0.0804, Classifier: 0.0849 Epoch 2 - Total: 0.4357, Regression: 0.2278, Classifier: 0.1599 Epoch 2 - Total: 0.4400, Regression: 0.1949, Classifier: 0.2288 Epoch 2 - Total: 0.3511, Regression: 0.1298, Classifier: 0.2001 Epoch 3 - Total: 0.1376, Regression: 0.0723, Classifier: 0.0566 Epoch 3 - Total: 0.4160, Regression: 0.2526, Classifier: 0.1377 Epoch 3 - Total: 0.4002, Regression: 0.1993, Classifier: 0.1778 Epoch 3 - Total: 0.1822, Regression: 0.0882, Classifier: 0.0737 Epoch 4 - Total: 0.1445, Regression: 0.0693, Classifier: 0.0615 Epoch 4 - Total: 0.4580, Regression: 0.3000, Classifier: 0.1404 Epoch 4 - Total: 0.3905, Regression: 0.2133, Classifier: 0.1617 Epoch 4 - Total: 0.1606, Regression: 0.0979, Classifier: 0.0531 Epoch 5 - Total: 0.1215, Regression: 0.0660, Classifier: 0.0499 Epoch 5 - Total: 0.3379, Regression: 0.2133, Classifier: 0.1068 Epoch 5 - Total: 0.3033, Regression: 0.1506, Classifier: 0.1336 Epoch 5 - Total: 0.1366, Regression: 0.0756, Classifier: 0.0420
images = cv2.imread("/content/drive/MyDrive/cardataset/testing_images/vid_5_26640.jpg", cv2.IMREAD_COLOR)
images = cv2.cvtColor(images, cv2.COLOR_BGR2RGB).astype(np.float32)
images /= 255.0
sample = images
images = torch.tensor(images)
images = images.permute(2,0,1)
images = torch.unsqueeze(images, 0)
images = images.to(device)
model.eval()
cpu_device = torch.device("cpu")
outputs = model(images)
outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
mask = outputs[0]['scores'] > 0.5
boxes = outputs[0]["boxes"][mask].detach().numpy().astype(np.int32)
for box in boxes:
cv2.rectangle(sample,
(box[0], box[1]),
(box[2], box[3]),
(220, 0, 0), 3)
plt.imshow(sample)
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
<matplotlib.image.AxesImage at 0x7f9509c92090>
댓글남기기