Annotating With Bounding Boxes

Published by onesixx on

13 Best Image Annotation Tools of 2021 [Reviewed] to compare your options.

어떻게 bbox를 그려야하나?

Ensure pixel-perfect tightness

bbox의 edge는 object에 가장 바깥 픽셀까지 , Gap없이, IOU가 1이될정도로

Bounding box performed using V7 on a road sign depicting a cow and UFO

Pay attention to box size variation

박스크기의 변화는 일정해야한다.  

Obj가 대체적으로 크다면, 같은타입의 Obj가 더 작게 보일때, 모델이 잘 작동하지 않을 꺼다..

아주 큰 obj는 다소 잘 dectect이 안되는 경향이 있다.

왜냐면 상대적인 IoU는

중간사이즈나 더 작은 obj에서 더 적은 수를 차지하고 있는 것 보다는

아주 큰 obj가 필셀의 대부분을 차지 하고 있을 때 영향을 덜 받기 때문이다.

If an object is usually large, your model will perform worse in cases when the same type of object appears smaller.

Very large objects also tend to underperform. It’s because their relative IoU is impacted less when they take up a large number of pixels than when they take up a smaller number in medium or small objects.

큰 obj를 아주 많이 가지고 있는 플젝을 가정한다면,

BBox보다는 polygons으로 object를 labeling하는 식을 고려해봐야 하고,

object detection보다는 instance segmentation 모델을 running하는 것을 고려해 봐야 한다.

Suppose your project contains a high number of large objects—

In that case, you may want to consider labeling objects with polygons rather than bounding boxes and running instance segmentation models rather than object detection.

? Pro Tip: Check out A Gentle Introduction to Image Segmentation for Machine Learning and AI to learn more about different image segmentation techniques.
A comparison between image classification, localization, object detection and instance segmentation

Reduce box overlap

Bbox detector가 IoU를 고려하여 학습하면서, 어떻게 해서라도 overlap을 피해야한다.

Box들은 종종 아래 그림과 같이 overlap되는 경우가 있다.

이런 obj를 overlapp하여 label된다.

Wrenches annotated with boundng boxes

If these objects are labeled with overlapping bounding boxes, they will perform significantly worse. 

The model will struggle to associate a box with the item enclosing it for as long as two of them overlap frequently.

만약 이미지의 본질이 overlap되어 있는 거라면,
ploygon이나, instance segmentation model 사용해서 obj를 라벨링하는 것을 고려해 보자.

10%의 recall 개선이 가능할 꺼다.

Consider labeling the object using polygons and using an instance segmentation model if you cannot avoid overlap due to the nature of your images. You’ll be able to expect a 10%+ recall improvement.

박스크기에 제한을 고려해라…Take into account box size limits

라벨링해야 할 Obj를 얼마나 크게 할지 설정할때,

모델의 input크기와 network의 다운샘플링(pooling, convolution)을 고려해보자.

너무 작으면, 네트웍구조의 다운샘플링 과정중에서 정보를 잃어버린다.

10*10 픽셀 또는 이미지의 1.5% (ex) 2000*2000 이미지에서 30*30이하)보다 더 작은 Obj는 실패할 가능성이 많다.

찾아지긴 할테지만,

Consider your model’s input size and network 다운샘플링(pooling, convolution), when establishing how large the objects you label should be. 

If they are too small, their information may be lost during the image downsampling parts of your network architecture.

When training on V7’s built-in models, we recommend assuming potential failures on objects smaller than 10×10 pixels, or 1.5% of the image dimensions, whichever is larger.

For example, if your image is 2,000 by 2,000, objects below 30×30 pixels will perform significantly worse. 

Nonetheless, they will still be identified.

While this is true of V7’s models, it may not be true on other neural network architectures.

? Pro tip: Looking for the perfect data annotation tool? Check out 

Avoid diagonal items

대각선으로 위치된 Obj, (특히, 팬슬이나 도로표시판 같은 얇고가는 obj)

(둘러쌓은 배경에 비해) 상당히 더 작은 Bbox 차지할꺼다.

Bounding box annotation of a bridge using V7

사람눈에는 당연히 다리를 염두해두고 bbox를 한거지만, bbox로 위와 같이 둘러그린다면,

박스안에 각 픽셀은 같은 영향력으로 모델에 학습된다.

결과적으로 , obj주변의 배경이 obj 그자체하고 여겨질 가능성이 많다.

overlap된 obj와 마찬가지로, diagonal된 objk는 polygons이나 instance segmentation으로 라벨링되는 것이 좋다.

그렇지만, 충분한 training data가

To human eyes, it seems obvious that we are interested in the bridge, but if we enclose it in a bounding box, we’re actually teaching the model to credit each pixel within this box equally. 

As a result, it may achieve a very high score just by assuming that the background around your object is the object itself.

As with overlapping objects, diagonal objects are best labeled using polygons and instance segmentation instead. They will, however, will be identified given enough training data with a bounding box detector.

Categories: vision


Blog Owner


Inline Feedbacks
View all comments
Would love your thoughts, please comment.x