Spatial Path & Context Path

4 min readAug 19, 2018

The recipe of the Success of BiSeNet (Bilateral Segmentation Network).

Bilateral Segmentation Network is a state-of-the-art novel approach to Real-time Semantic Segmentation which employs two main novel approaches:

Spatial Path
Context Path

Semantic segmentation is a fundamental task in computer vision. The main goal of it is to assign semantic labels to each pixel in a image.

The applications of this technology are very broad. It can be applied to the fields of Augmented Reality(AR), Autonomous Driving, Robotics and Video surveillance. But the applications have high demand for efficient inference speed for fast interaction or response.

Note: I assume you are familiar with Convolution Neural Networks(CNN).

So what is Spatial Path, Context Path and how do they affect the performance of real-time semantic segmentation system?

These two components are devised to confront with the loss of spatial information( high-resolution features in a image/video) which is crucial to predicting the detailed output and also the shrinkage of receptive field( the region in the input space that a particular CNN’s feature is looking at) which is also crucial to cover large objects which lead to rich discriminative ability.

With that said, lets get into it. Shall we ?!

What is Spatial Path?

Spatial Path is a method proposed to preserve spatial size of the original input image and encode affluent spatial information(features). The Spatial Path is made of three convolutions layers. Each convolution layer has a stride = 2, followed by batch normalization and ReLU(Rectified Linear Unit).

This path extracts the output feature maps that is 1/8 (about 12.5%) of the original image. It encodes rich spatial information due to the large spatial size of feature maps.

Example of rich spatial information but very poor receptive field

What is Context Path ?

Context Path is designed to working hand-to-hand with the Spatial path, providing sufficient receptive field. In the semantic segmentation task, receptive field is of great significance for the performance. To enlarge the receptive field, there are some approaches like:

Pyramid pooling module
Atrous spatial pyramid pooling or “large kernel” and etc.

But all the above have computational demanding and memory consuming operations, which result in low speed.

Considering the large receptive field and simultaneously efficient computation requirements, context path is the best fit to the task.

It utilizes lightweight model and global average pooling to provide large receptive field. A perfect example of a lightweight model would be Xception model which is deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. Xception model can downsample the feature map fast to obtain large receptive field, which encodes high level semantic context information. Then add a global average pooling on the tail of the lightweight model, which can provide the maximum receptive field with global context information, therefore being able to capture more information from the input image/video regardless of the size of the input.

Thank you very much for reading this post. If you like it give it a clap.

Please comment bellow what you think. If there is any improvements, doubts or error too, I am open and glad to answer you.

Sources:

A guide to receptive field arithmetic for Convolutional Neural Networks

The receptive field is perhaps one of the most important concepts in Convolutional Neural Networks (CNNs) that deserves…

medium.com

Activation Functions: Neural Networks

Sigmoid, tanh, Softmax, ReLU, Leaky ReLU EXPLAINED !!!

towardsdatascience.com

[1610.02357] Xception: Deep Learning with Depthwise Separable Convolutions

Abstract: We present an interpretation of Inception modules in convolutional neural networks as being an intermediate…

arxiv.org

[1808.00897] BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Abstract: Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern…

arxiv.org

[1512.00567] Rethinking the Inception Architecture for Computer Vision

Abstract: Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety…

arxiv.org

Spatial Path & Context Path

So what is Spatial Path, Context Path and how do they affect the performance of real-time semantic segmentation system?

What is Spatial Path?

What is Context Path ?

A guide to receptive field arithmetic for Convolutional Neural Networks

The receptive field is perhaps one of the most important concepts in Convolutional Neural Networks (CNNs) that deserves…

Activation Functions: Neural Networks

Sigmoid, tanh, Softmax, ReLU, Leaky ReLU EXPLAINED !!!

[1610.02357] Xception: Deep Learning with Depthwise Separable Convolutions

Abstract: We present an interpretation of Inception modules in convolutional neural networks as being an intermediate…

[1808.00897] BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Abstract: Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern…

[1512.00567] Rethinking the Inception Architecture for Computer Vision

Abstract: Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety…

Written by Prince Canuma

No responses yet