Deep-Learning

Understanding Object Detection Part 4: More Anchors!

This post is fourth in a series on object detection. The other posts can be found here, here, and here. The last post covered use of anchor boxes for detecting multiple objects in an image. I ended that one with a model that was doing fine with detecting the presence of various objects, but the predicted bounding boxes were not able to properly localize objects with non-squared shapes. This post will detail techniques for further improving that baseline model. ...

Understanding Object Detection Part 3: Single Shot Detector

This post is third in a series on object detection. The other posts can be found here, here, and here. This post will detail a technique for classifying and localizing multiple objects in an image using a single deep neural network. Going from single object detection to multiple object detection is a fairly hard problem, so this is going to be a long post. The approach used below is based on learnings from fastai’s Deep Learning MOOC (Part 2). ...

Understanding Object Detection Part 1: The Basics

This post is first in a series on object detection. The succeeding posts can be found here, here, and here. One of the primary takeaways for me after learning the basics of object detection was that the very backbone of the convnet architecture used for classification can also be utilised for localization. Intuitively, it does make sense, as convnets tend to preserve spatial information present in the input images. I saw some of that in action (detailed here and here) while generating localization maps from activations of the last convolutional layer of a Resnet-34, which was my first realization that a convnet really does take into consideration the spatial arrangement of pixels while coming up with a class score. ...

Understanding Object Detection Part 2: Single Object Detection

This post is second in a series on object detection. The other posts can be found here, here, and here. This is a direct continuation to the last post where I explored the basics of object detection. In particular, I learnt that a convnet can be used for localization by using appropriate output activations and loss function. I built two separate models for classification and localization respectively and used them on the Pascal VOC dataset. ...

Evolution of Grad-CAM heat-maps along a ResNet-34

This exercise is a continuation of my last post, which was an exploration in generating class discriminative localization maps for a convnet. In particular, I used feature map activations of the last convolutional layer (after BatchNorm), along with gradients of a specific class score wrt these activations to create heat-maps that help visualize parts of input image that contribute most coming up with a prediction. I wanted to extend that approach to see how these heat-maps shape up as we move deeper into the network, starting with the very first convolutional layer. Similar to the last post, inspiration for this comes from a fastai Deep Learning MOOC lecture which is itself inspired by Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization by Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. ...

Generating class discriminative heat-maps using Grad-CAM

I’m quite interested in understanding and interpreting how convnets “see” and process input images that we feed them. I first got a taste of this kind of work after reading Visualizing and Understanding Convolutional Networks by Matthew D Zeiler and Rob Fergus, which is 5 years old as of today. I’m guessing a lot of work has been/is being done by the deep learning research community to make convnets more intuitive and understandable. I’m trying to take strides towards understanding that work. ...

Understanding ResNets

I’m currently enrolled in fastai’s Deep Learning MOOC (version 3), and loving it so far. It’s only been 2 lectures as of today, but folks are already building awesome stuff based on the content taught so far. The course starts with the application of DL in Computer Vision, and in the very first lecture, course instructor Jeremy teaches us how to leverage transfer learning by making use of pre-trained ResNet models. I’ve been meaning to dive into the details of Resnets for a while, and this seems like a good time to do so. ...

Summary Notes: GRU and LSTMs

This post is sort-of a continuation to my last post, which was a Summary Note on the workings of basic Recurrent Neural Networks. As I mentioned in that post, I’ve been learning about the workings of RNNs for the past few days, and how they deal with sequential data, like text. An RNN can be built using either a basic RNN unit (described in the last post), a Gated Recurrent unit, or an LSTM unit. This post will describe how GRUs/LSTMs learn long term dependencies in the data, which is something basic RNN units are not so good at. ...

Word Embeddings and RNNs

One of the simplest ways to convert words from a natural language into mathematical tensors is to simply represent them as one-hot vectors where the length of these vectors is equal to the size of the vocabulary from where these words are fetched. For example, if we have a vocabulary of size 8 containing the words: "a", "apple", "has", "matrix", "pineapple", "python", "the", "you" the word “matrix” can be represented as: [0,0,0,1,0,0,0,0] ...

Summary Notes: Basic Recurrent Neural Networks

I’ve been learning about Recurrent Neural Nets this week, and this post is a “Summary Note” for the same. A “Summary Note” is just a blog post version of the notes I make for something, primarily for my own reference (if I need to come back to the material in the future). These summary notes won’t go into the very foundations of whatever they’re about, but rather serve as a quick and practical reference for that particular topic. ...