Current status of model¶
Till now we've only used the final convolutional feature maps of grid size
(4 x 4) for
16 anchor boxes, which are of a fixed size and a fixed aspect ratio. Since the activations coming from the model can only modify the shape of these anchor boxes by
50%, the predicted bounding boxes can only do a good job on objects which are similar in size to these anchor boxes. Hence, as seen in the validation results last time, the model is not able to properly localize an object which is larger in size than the maximum possible bounding box.
One way to solve this problem would be to start with anchor boxes of varied shapes and sizes to begin with. We can also have these anchors lie on grids corresponding to different scales. Last time we had
16 anchors on a
4 x 4 grid. We can allow prediction of detections at multiple scales by adding more convolutional layers in the custom head and using their activations for predictions. So this time we'll use activations from the convolutional feature maps of grid sizes
4 x 4,
2 x 2, and
1 x 1.