ChengYue's Cat

🍉 COCO2017数据集简介与使用


MSCOCO 数据集是微软构建的一个数据集,其包含 detection, segmentation, keypoints等任务, 是目标检测数据集中主流的数据集之一。

与PASCAL COCO数据集相比,COCO中的图片包含了自然图片以及生活中常见的目标图片,背景比较复杂,目标数量比较多目标尺寸更小,因此COCO数据集上的任务也更难,当前评价一个检测模型好坏一般使用COCO数据集。

🍉🍉 数据集下载


COCO数据集论文

➡️coco数据集官网

🔄训练集下载链接 118k/18GB

🔄验证集下载链接 5k/1GB

🔄测试集下载链接 41K/6GB

🔄数据集stuff标注下载链接

什么是stuff类别 ❓

🍉🍉 数据集类别信息与标注文件格式


🍉🍉🍉 类别信息

需要强调的COCO目标检测数据集是80个类别,但是类别ID最大显示为90,最小为1, 不包括Background.

object的80类与stuff中的91类的区别在哪 ❓

每个类别对应的实例个数分布图如下:

🍉🍉🍉 标注文件格式

instances_train2017标注文件为例:

import json

annotation_file = '/data/code/Det_Code/COCO_2017/annotations/instances_train2017.json'
with open(annotation_file, 'r') as f:
    dataset = json.load(f)

读取后, 得到一个包含四个key: [‘info’, ‘licenses’, ‘images’, ‘annotations’, ‘categories’]的字典。

annotations

是一个列表对应全部实例而不是图像的标注信息,以第一个图像实例为例来说:

{'segmentation': [[239.97, 260.24, 222.04, 270.49, 199.84, 253.41, 213.5, 227.79, 259.62, 200.46, 274.13, 202.17, 277.55, 210.71, 249.37, 253.41, 237.41, 264.51, 242.54, 261.95, 228.87, 271.34]], 'area': 2765.1486500000005, 'iscrowd': 0, 'image_id': 558840, 'bbox': [199.84, 200.46, 77.71, 70.88], 'category_id': 58, 'id': 156}

每个key分别对应:目标的分割信息(polygons多边形)(segmentation),实例面积(area),目标边界框信息(左上角x,y坐标,以及宽高)(bbox),类别id(category_id)。

categories

是一个列表(元素个数对应检测目标的类别数)列表中每个元素都是一个dict对应一个类别的目标信息,包括类别id、类别名称和所属超类,注意实际只有80个Object类。如最后一个类别:

79 {'supercategory': 'indoor', 'id': 90, 'name': 'toothbrush'}

images

是一个列表(元素个数对应图像的张数),列表中每个元素都是一个dict,对应一张图片的相关信息。包括对应图像名称、图像宽度、高度等信息。

{'license': 3, 'file_name': '000000391895.jpg', 'coco_url': 'http://images.cocodataset.org/train2017/000000391895.jpg', 'height': 360, 'width': 640, 'date_captured': '2013-11-14 11:18:45', 'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg', 'id': 391895}

每个key分别对应,文件名/file_name,高度/height,宽度/width

🍉🍉🍉 官方cocoAPI查看数据

官方有给出一个读取MS COCO数据集信息的API, 这个API具有数据的读取等重要功能。

pip install pycocotools  

使用pycocotools 可视化实例边界框

import os
from pycocotools.coco import COCO
from PIL import Image, ImageDraw, ImageFont
import matplotlib.pyplot as plt

json_path = "/data/code/Det_Code/COCO_2017/annotations/instances_val2017.json"  # 图像注释文件路径
img_path = "/data/code/Det_Code/COCO_2017/val2017"   # 图像文件路径

# COCO是 官方API中加载注释文件的 类
coco = COCO(annotation_file=json_path)

# 获取注释文件中的images全部相关信息
ids = list(sorted(coco.imgs.keys()))
print("number of images: {}".format(len(ids)))

# get all coco class labels
coco_classes = dict([(v["id"], v["name"]) for k, v in coco.cats.items()])

# 遍历前两张图像
for img_id in ids[:2]:
    # 获取对应图像id的所有annotations idx信息
    ann_ids = coco.getAnnIds(imgIds=img_id)

    # 根据annotations idx信息获取所有标注信息
    targets = coco.loadAnns(ann_ids)

    # get image file name
    path = coco.loadImgs(img_id)[0]['file_name']

    # read image
    img = Image.open(os.path.join(img_path, path)).convert('RGB')
    draw = ImageDraw.Draw(img)
    # draw box to image
    for target in targets:
        x, y, w, h = target["bbox"]
        x1, y1, x2, y2 = x, y, int(x + w), int(y + h)
        draw.rectangle((x1, y1, x2, y2))
        draw.text((x1, y1), coco_classes[target["category_id"]])

    # show image
    plt.imshow(img)
    plt.savefig('example_{}.png'.format(img_id), dpi=50)
    plt.show()

这里需要参考csdn博客给出的代码,里面调用了API中的不少函数,还需要进一步学习。

使用pycocotools 可视化实例分割mask

这里依旧可以参考csdn博客,这里就不放了。


🍉🍉 评估标准

mAP

PASCAL 中在测试mAP时,是在IOU=0.5时测的,而COCO的主要评价指标是AP,指 IOU从0.5到0.95 每变化 0.05 就测试一次 AP,然后求这10次测量结果的平均值作为最终的 AP,也就是说COCO评价指标的 AP@0.5 跟PASCAL VOC中的mAP是相同的含义,AP@0.75 跟PASCAL VOC中的mAP也相同,只是IOU阈值提高到了0.75,显然这个要求的准确度更高,精度也会更低。

IOU越高,AP就越低,所以最终的平均之后的AP要比 AP@0.5 小很多,这也就是为什么COCO的AP 超过 50%的只有寥寥几个而已,因为超过50%还是很难的。

针对不同大小图片的评测

COCO数据集还针对 三种不同大小(small,medium,large) 的图片提出了测量标准,COCO中包含大约 41% 的小目标 (area<32×32), 34% 的中等目标 (32×32<area<96×96), 和 24% 的大目标 (area>96×96). 小目标的AP是很难提升的。

pycocotools同样给出上述评价模型的代码。

🍉🍉 COCO数据集的读取

🍉🍉🍉 COCO目标检测数据集类实现

这里以DETR中读取代码实现为例,COCO目标检测数据集类 CocoDetection 定义为:


class CocoDetection(torchvision.datasets.CocoDetection):
    def __init__(self, img_folder, ann_file, transforms, return_masks):
        super(CocoDetection, self).__init__(img_folder, ann_file)
        self._transforms = transforms
        self.prepare = ConvertCocoPolysToMask(return_masks)

    def __getitem__(self, idx):
        # breakpoint()
        # img: PIL.Image 读取的结果, target 是这个图像的全部实例标注信息
        img, target = super(CocoDetection, self).__getitem__(idx)  
        image_id = self.ids[idx]
        target = {'image_id': image_id, 'annotations': target}
        img, target = self.prepare(img, target)
        # 对target进一步处理,使其方便后面的训练
        if self._transforms is not None:
            img, target = self._transforms(img, target)
        return img, target

可以看到这个类继承了较多torchvision.datasets.CocoDetection的方法,torchvision.datasets.CocoDetectiontorchvision借助 cocoAPI也就是 pycocotools 包装实现的,里面包含了较多诸如图像读取,注释读取的函数。这里暂不做仔细深究,只给出我们在使用时涉及得到的 CocoDetection类的输入和输出。

CocoDetection的初始化只用给出准备好了的COCO2017数据集的图像文件夹路径 img_folder, 比如训练集图像数据/data/code/Det_Code/COCO_2017/train2017, 以及注释文件路径 ann_file,如/data/code/Det_Code/COCO_2017/annotations/instances_train2017.jsontransforms(指图像的增广变换函数),return_masks(对于检测涉及不到, 与分割相关)。

🍉🍉🍉 ConvertCocoPolysToMask的实现

DETR给出的实现如下:

def convert_coco_poly_to_mask(segmentations, height, width):
    masks = []
    for polygons in segmentations:
        rles = coco_mask.frPyObjects(polygons, height, width)
        mask = coco_mask.decode(rles)
        if len(mask.shape) < 3:
            mask = mask[..., None]
        mask = torch.as_tensor(mask, dtype=torch.uint8)
        mask = mask.any(dim=2)
        masks.append(mask)
    if masks:
        masks = torch.stack(masks, dim=0)
    else:
        masks = torch.zeros((0, height, width), dtype=torch.uint8)
    return masks


class ConvertCocoPolysToMask(object):
    def __init__(self, return_masks=False):
        self.return_masks = return_masks

    def __call__(self, image, target):
        w, h = image.size

        image_id = target["image_id"]
        image_id = torch.tensor([image_id])

        anno = target["annotations"]

        anno = [obj for obj in anno if 'iscrowd' not in obj or obj['iscrowd'] == 0]

        boxes = [obj["bbox"] for obj in anno]
        # guard against no boxes via resizing
        boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)
        boxes[:, 2:] += boxes[:, :2]
        boxes[:, 0::2].clamp_(min=0, max=w)
        boxes[:, 1::2].clamp_(min=0, max=h)

        classes = [obj["category_id"] for obj in anno]
        classes = torch.tensor(classes, dtype=torch.int64)

        if self.return_masks:
            segmentations = [obj["segmentation"] for obj in anno]
            masks = convert_coco_poly_to_mask(segmentations, h, w)

        keypoints = None
        if anno and "keypoints" in anno[0]:
            keypoints = [obj["keypoints"] for obj in anno]
            keypoints = torch.as_tensor(keypoints, dtype=torch.float32)
            num_keypoints = keypoints.shape[0]
            if num_keypoints:
                keypoints = keypoints.view(num_keypoints, -1, 3)

        keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])
        boxes = boxes[keep]
        classes = classes[keep]
        if self.return_masks:
            masks = masks[keep]
        if keypoints is not None:
            keypoints = keypoints[keep]

        target = {}
        target["boxes"] = boxes
        target["labels"] = classes
        if self.return_masks:
            target["masks"] = masks
        target["image_id"] = image_id
        if keypoints is not None:
            target["keypoints"] = keypoints

        # for conversion to coco api
        area = torch.tensor([obj["area"] for obj in anno])
        iscrowd = torch.tensor([obj["iscrowd"] if "iscrowd" in obj else 0 for obj in anno])
        target["area"] = area[keep]
        target["iscrowd"] = iscrowd[keep]

        target["orig_size"] = torch.as_tensor([int(h), int(w)])
        target["size"] = torch.as_tensor([int(h), int(w)])

        return image, target

看着还挺复杂啊,这里我们同样只给出函数的作用。 处理前与处理后的image没发生变化,发生变化的是target. 处理前的样子如下,一共包含了8个实例的标注信息

[
    {'segmentation': [[500.49, 473.53, 599.73, 419.6, 612.67, 375.37, 608.36, 354.88, 528.54, 269.66, 457.35, 201.71, 420.67, 187.69, 389.39, 192.0, 19.42, 360.27, 1.08, 389.39, 2.16, 427.15, 20.49, 473.53]], 'area': 120057.13925, 'iscrowd': 0, 'image_id': 9, 'bbox': [1.08, 187.69, 611.59, 285.84], 'category_id': 51, 'id': 1038967}, <br>
    {'segmentation': [[357.03, 69.03, 311.73, 15.1, 550.11, 4.31, 631.01, 62.56, 629.93, 88.45, 595.42, 185.53, 513.44, 230.83, 488.63, 232.99, 437.93, 190.92, 429.3, 189.84, 434.7, 148.85, 410.97, 121.89, 359.19, 74.43, 358.11, 65.8]], 'area': 44434.751099999994, 'iscrowd': 0, 'image_id': 9, 'bbox': [311.73, 4.31, 319.28, 228.68], 'category_id': 51, 'id': 1039564}, 
    {'segmentation': [[249.6, 348.99, 267.67, 311.72, 291.39, 294.78, 304.94, 294.78, 326.4, 283.48, 345.6, 273.32, 368.19, 269.93, 385.13, 268.8, 388.52, 257.51, 393.04, 250.73, 407.72, 240.56, 425.79, 230.4, 441.6, 229.27, 447.25, 237.18, 447.25, 256.38, 456.28, 254.12, 475.48, 263.15, 486.78, 271.06, 495.81, 264.28, 498.07, 257.51, 500.33, 255.25, 507.11, 259.76, 513.88, 266.54, 513.88, 273.32, 513.88, 276.71, 526.31, 276.71, 526.31, 286.87, 519.53, 291.39, 519.53, 297.04, 524.05, 306.07, 525.18, 315.11, 529.69, 329.79, 529.69, 337.69, 530.82, 348.99, 536.47, 339.95, 545.51, 350.12, 555.67, 360.28, 557.93, 380.61, 561.32, 394.16, 565.84, 413.36, 522.92, 441.6, 469.84, 468.71, 455.15, 474.35, 307.2, 474.35, 316.24, 464.19, 330.92, 438.21, 325.27, 399.81, 310.59, 378.35, 301.55, 371.58, 252.99, 350.12]], 'area': 49577.94434999999, 'iscrowd': 0, 'image_id': 9, 'bbox': [249.6, 229.27, 316.24, 245.08], 'category_id': 56, 'id': 1058555}, 
    {'segmentation': [[434.48, 152.33, 433.51, 184.93, 425.44, 189.45, 376.7, 195.58, 266.94, 248.53, 179.78, 290.17, 51.62, 346.66, 16.43, 366.68, 1.9, 388.63, 0.0, 377.33, 0.0, 357.64, 0.0, 294.04, 22.56, 294.37, 56.14, 300.82, 83.58, 300.82, 109.08, 289.2, 175.26, 263.38, 216.9, 243.36, 326.34, 197.52, 387.03, 172.34, 381.54, 162.33, 380.89, 147.16, 380.89, 140.06, 370.89, 102.29, 330.86, 61.94, 318.91, 48.38, 298.57, 47.41, 287.28, 37.73, 259.51, 33.85, 240.14, 32.56, 240.14, 28.36, 247.57, 24.17, 271.46, 15.13, 282.11, 13.51, 296.96, 18.68, 336.34, 55.48, 391.55, 106.81, 432.87, 147.16], [62.46, 97.21, 130.25, 69.77, 161.25, 59.12, 183.52, 52.02, 180.94, 59.12, 170.93, 78.17, 170.28, 90.76, 157.05, 95.92, 130.25, 120.78, 119.92, 129.49, 102.17, 115.29, 64.72, 119.81, 0.0, 137.89, 0.0, 120.13, 0.0, 117.87]], 'area': 24292.781700000007, 'iscrowd': 0, 'image_id': 9, 'bbox': [0.0, 13.51, 434.48, 375.12], 'category_id': 51, 'id': 1534147}, 
    {'segmentation': [[376.2, 61.55, 391.86, 46.35, 424.57, 40.36, 441.62, 43.59, 448.07, 50.04, 451.75, 63.86, 448.07, 68.93, 439.31, 70.31, 425.49, 73.53, 412.59, 75.38, 402.92, 84.13, 387.71, 86.89, 380.8, 70.77]], 'area': 2239.2924, 'iscrowd': 0, 'image_id': 9, 'bbox': [376.2, 40.36, 75.55, 46.53], 'category_id': 55, 'id': 1913551}, 
    {'segmentation': [[473.92, 85.64, 469.58, 83.47, 465.78, 78.04, 466.87, 72.08, 472.84, 59.59, 478.26, 47.11, 496.71, 38.97, 514.62, 40.6, 521.13, 49.28, 523.85, 55.25, 520.05, 63.94, 501.06, 72.62, 482.6, 82.93]], 'area': 1658.8913000000007, 'iscrowd': 0, 'image_id': 9, 'bbox': [465.78, 38.97, 58.07, 46.67], 'category_id': 55, 'id': 1913746}, 
    {'segmentation': [[385.7, 85.85, 407.12, 80.58, 419.31, 79.26, 426.56, 77.94, 435.45, 74.65, 442.7, 73.66, 449.95, 73.99, 456.87, 77.94, 463.46, 83.87, 467.74, 92.77, 469.39, 104.63, 469.72, 117.15, 469.39, 135.27, 468.73, 141.86, 466.09, 144.17, 449.29, 141.53, 437.1, 136.92, 430.18, 129.67]], 'area': 3609.3030499999995, 'iscrowd': 0, 'image_id': 9, 'bbox': [385.7, 73.66, 84.02, 70.51], 'category_id': 55, 'id': 1913856}, 
    {'segmentation': [[458.81, 24.94, 437.61, 4.99, 391.48, 2.49, 364.05, 56.1, 377.77, 73.56, 377.77, 56.1, 392.73, 41.14, 403.95, 41.14, 420.16, 39.9, 435.12, 42.39, 442.6, 46.13, 455.06, 31.17]], 'area': 2975.276, 'iscrowd': 0, 'image_id': 9, 'bbox': [364.05, 2.49, 94.76, 71.07], 'category_id': 55, 'id': 1914001}]

处理后的结果是:

{
    'boxes': 
        tensor([[  1.0800, 187.6900, 612.6700, 473.5300],
                [311.7300,   4.3100, 631.0100, 232.9900],
                [249.6000, 229.2700, 565.8400, 474.3500],
                [  0.0000,  13.5100, 434.4800, 388.6300],
                [376.2000,  40.3600, 451.7500,  86.8900],
                [465.7800,  38.9700, 523.8500,  85.6400],
                [385.7000,  73.6600, 469.7200, 144.1700],
                [364.0500,   2.4900, 458.8100,  73.5600]]), 
    'labels': 
        tensor([51, 51, 56, 51, 55, 55, 55, 55]), 
    'image_id': tensor([9]), 
    'area': 
        tensor([120057.1406,  44434.7500,  49577.9453,  24292.7812,   2239.2925, 1658.8914,   3609.3030,   2975.2759]), 
    'iscrowd': 
        tensor([0, 0, 0, 0, 0, 0, 0, 0]), 
    'orig_size':  tensor([480, 640]), 
    'size': tensor([480, 640])}

🍉🍉🍉 不同大小图像数据的读取

在DETR中,每张读进来的图像大小不同,无法直接组合成Batch,因此需要手动实现一下 DataLoader 中的 collate_fn函数,如下:

def collate_fn(batch):
    # batch 是一个列表,列表中每个元素是一个元组tuple(image, target), 即 [(image_1, tuple_1), (image_2, tuple_2), ...], collate_fn 就是要把这些tuple给组装起来,
    # 这一步转换成 [(image_1, image_2, ...), (target_1, target_2, ...)]
    batch = list(zip(*batch))
    batch[0] = nested_tensor_from_tensor_list(batch[0])
    return tuple(batch)

这里注意一下 nested_tensor_from_tensor_list作用是将大小不同的图像输入,扩展成大小一致的图像,怎么操作的呢❓

首先统计一下全部的图像最大的宽度W和最大的高度 H,构造出 bs x c x h x w 的值为0的tensor, 然后将这些图像覆盖这个值为0的tensor,从左上角开始,没有覆盖的地方就为0,类似于图像的padding, 与 bs x c x h x w 同时产生还有一个大小为 bs x h x w 的 mask, 还不清楚其作用哈,二者通过NestedTensor包装成一个特殊的实例,这个实例有两个成员一个就是tensor,另一个就是mask.

class NestedTensor(object):
    def __init__(self, tensors, mask: Optional[Tensor]):
        self.tensors = tensors
        self.mask = mask

    def to(self, device):
        # type: (Device) -> NestedTensor # noqa
        cast_tensor = self.tensors.to(device)
        mask = self.mask
        if mask is not None:
            assert mask is not None
            cast_mask = mask.to(device)
        else:
            cast_mask = None
        return NestedTensor(cast_tensor, cast_mask)

    def decompose(self):
        return self.tensors, self.mask

    def __repr__(self):
        return str(self.tensors)

结合上面的 collate_fn 函数可以看出, batch[0] 是一个 NestedTensor 实例,batch[1] 是图像的标注信息 target, 是一个元组,batchsize=2时,如下所示:

(
    {
        'boxes': 
            tensor([[0.6278, 0.6666, 0.2496, 0.1161],
            [0.4574, 0.4170, 0.3599, 0.7220],
            [0.7177, 0.4324, 0.4917, 0.2580],
            [0.5541, 0.6099, 0.6642, 0.2646],
            [0.3204, 0.6822, 0.1735, 0.2127],
            [0.6150, 0.6643, 0.0393, 0.1714],
            [0.4473, 0.3286, 0.0334, 0.0536],
            [0.5382, 0.4616, 0.0716, 0.0671]]), 
        'labels': tensor([18,  1,  1, 15, 27, 44, 84, 27]), 
        'image_id': tensor([151988]), 
        'area': tensor([10100.7021, 51015.5039, 29404.1680, 30322.9062, 13935.7568,  2856.5737, 512.4734,  1741.8429]), 
        'iscrowd': tensor([0, 0, 0, 0, 0, 0, 0, 0]), 
        'orig_size': tensor([480, 640]), 
        'size': tensor([608, 810])
        }, 
    {
        'boxes': tensor([[0.2746, 0.5719, 0.5491, 0.4382]]), 
        'labels': tensor([6]), 
        'image_id': tensor([116848]), 
        'area': tensor([82879.5156]), 
        'iscrowd': tensor([0]), 
        'orig_size': tensor([421, 640]), 
        'size': tensor([512, 778])
        }
)

至此可以大致清楚数据集读取以及读取的结果了。

🍉🍉 参考


MS COCO数据集介绍以及pycocotools简单使用

目标检测数据集MSCOCO简介

Pytorch torchvision构建Faster-rcnn(一)—-coco数据读取