ICASSP 2020 - A review on my favourite papers

On Network Science and Mutual Information for Explaining Deep Neural Networks

This paper works toward interpretable neural network models. This work is in part of a bigger move in the machine learning community, to open the so-called “black box” and be able to explain how the machine is learning. This study investigates how the information flows through feedforward networks. They propose using information theory on top of the network science to calculate an information measure that represents the amount of information that flows between two neurons.

The technique to codify this information flow is called Neural Information Flow (NIF). Basically, NIF weights the importance that edges of the neurons have in a multilayer perceptron (MLP) or Convolutional Neural Networks (CNN) while using the mutual information between nodes which is modelled as distribution. Feature attribution is computed as follows, an importance value is placed along all the edges of the network, a product of all these values in a given path is calculated, to finally sum all these products across all possible paths from an input and output.

NIF provides information on the most crucial paths of a network. Hence, less important parameters can be removed without loss of accuracy, facilitating network pruning at inference time. Furthermore, NIF can help in visualising edge communities, understanding how nodes form communities, for instance in an MLP. This could help in better training of a network, but needs to further be investigated. However, NIF is of a high computation complexity, which seems to be the main area for improvement.

Read the paper here

Towards High-Performance Object Detection: Task-Specific Design Considering Classification and Localization Separation

This paper tackles the efficiency of object detection. Object detection is a process of simultaneous localisation and classification. While the first one gives the category the object belongs to, the second one tells where this object is located. Both tasks require robust features that well represent an object.

However, these tasks have many non-shared characteristics. Classification concentrates on partial areas or the most prominent region during recognition, i.e. the head of a cat, whereas localisation considers a larger area of the image. Classification is translation invariant while localisation has translation variant characteristics. Hence, the authors propose a network that in addition to considering the common properties, also considers task-specific characteristics of both tasks.

They propose altering existing object detection in three stages. Having a lower layer that shares less semantic features between classification and localisation. Consequently, separating the backbone layers to learn task-specific semantic features. Finally, fuse these two separated features by concatenating and 1×1 convolution to have the same number of channels with the separated features. Experimental results show that such an approach can encode two-task specific features while improving performance. However, these improvements are not substantial and further detailed investigation is needed for the task-specific objective functions.

Read the paper here

Unsupervised Domain Adaptation for Semantic Segmentation with Symmetric Adaptation Consistency

Domain adaptation deals with learning a predictor when the training and test sets come from a different distribution. An example of this situation could be semantic segmentation. If a network trained in synthetic images, fully labelled, has to segment real-world images. These two types of distributions are very different; therefore, a mapping of features is needed.

Unsupervised domain adaptation uses the labels from the training time to solve tasks in the shifted distribution data with no labels. This paper utilizes adversarial learning and semi-supervised learning for domain adaptation in semantic segmentation. The two stages of this method are image-to-image translation and feature-level domain adaptation. Firstly, images from source domain are translated to the targeted domain using a translation model.

Finally, the semantic segmentation model is trained in an adversarial and semi-supervised manner at the same time. This is done by first symmetrically training two segmentation models with adversarial learning and then between the outputs of the two models introduce the consistency into semi-supervised learning to improve accuracy on pseudo labels that highly affect the final adaptation performance. They achieve state-of-the-art performance on semantic segmentation on the GTA-to-Cityscapes.

Read the paper here