BiSeNet for Real-Time Segmentation Part II
In this article, we are going to download the dataset that we are going to use for our model.
This article is the second part of the series about this amazing Segmentation Network used for the task of semantic segmentation. Furthermore, this Segmentation Network brings a novel approach to decouple the function of spatial information preservation (high-resolution features) and receptive field offering two paths. Specifically, it is proposed a Bilateral Segmentation Network(BiSeNet) with a Spatial Path(SP) and a Context(CP). This two paths came to fix previous approaches used in the semantic segmentation task, that compromise of accuracy over speed.
I have alredy covered Semantic segmentation and BiSeNet introduction, its not in the scope of this blog post. Therefore, for more details about BiSeNet with SP & CP, please click on the links to see my previous blog posts.
Without further ado, let’s get this show on the road!
Dataset
A data set (or dataset) is a collection of data. Most commonly a data set corresponds to the contents of a single database table. i.e An excel spreadsheet is a great example of a dataset.
First things first, we have got to have data to train our Artificial Intelligence(AI) algorithm, for example: In supervised learning we want the AI algorithm to learn the mapping between the input(x) and output(y), so that given a new input(x) that is not part of the dataset it was trained on it can predict the output(y).
In order to get a dataset, we can collect it ourselves via web scraping or download a dataset someone else put together. There are some pretty famous dataset repositories which offer us a variety of datasets for many ends, such as :
- ImageNet
- CamVid (which is the dataset we will be using)
- COCO stuff
- Kaggle and etc..
Unlike many think, Kaggle does not only host Data Science competitions, it also a dataset and kernel repositories, where Data Scientist share their dataset and their kernels that give more insight about their datasets.
For our implementation of the BiSeNet, we are going to use a dataset called CamVid, that was mentioned in the research paper by the researchers that invented BiSeNet.
The CamVid dataset is a street scene dataset from the perspective of a driving automobile. It is composed of 701 images in total, in which 367 for training, 101 for validation and 233 for testing. The images have a resolution of 960x720 and 11 semantic categories/labels.
To download the dataset and labels click here.
Preprocessing and Visualizing
Pre-processing refers to the transformations applied to our data before feeding it to the algorithm.
Data Preprocessing is a technique that is used to convert the raw data into a clean data set. In other words, whenever the data is gathered from different sources it is collected in raw format which is not feasible for the analysis. Furthermore, we might not use the entire dataset because some features are not important or relevant to our problem.
For example if we want an algorithm to distinguish dogs from cats, and our dataset contains their pictures in one column and the name of the owners in the other column as input features, we can discard the name of the owners column because it does not contribute at all to distinguish dogs from cats, all we need is the features like shape of the ears , nose, and type of fur, which are in picture’s column.
Implementation
For those of you who are newbies to the AI field, I used the following tools for this tutorial:
In this section, I’m going to present you the code I use to load the files, convert them into a Numpy array and plot them using jupyter notebooks.
The first step is to import the libraries we are going to use to manipulate the data and get the path where the dataset is saved, whether on your laptop or cloud.
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg#image_path is defined as a global variableimage_path = "path to where you saved your dataset"
After this, we are going to write a function which receives path as a parameter in order to get all the files in that path and put them in a list.
def loadImages(path):
image_files = sorted([os.path.join(path, 'train', file)
for file in os.listdir(path + "/train") if file.endswith('.png')])
annotation_files = sorted([os.path.join(image_path, 'trainannot', file)
for file in os.listdir(path + "/trainannot") if file.endswith('.png')])
image_val_files = sorted([os.path.join(path, 'val', file)
for file in os.listdir(path + "/val") if file.endswith('.png')])
annotation_val_files = sorted([os.path.join(path, "valannot", file)
for file in os.listdir(path + "/valannot") if file.endswith('.png')])
return image_files, annotation_files, image_val_files, annotation_val_files
Then we can make the main function that calls the loadImages() function and passes the parameter path. Algorithms can’t read an image as they are, we need to convert them into a 2d array so we can then feed it to the algorithm and visualize it.
def main():#calling global variableglobal image_path# The var Dataset is a list of size 4 that gets all files from different folders so we can access different folders using dataset[0...3]dataset = loadImages(image_path)train_set = dataset[0]r = [i for i in train_set]
print(r[0])
img_1 = mpimg.imread(r[0])
display = plt.imshow(img_1)
print(img_1)main()
For the full code used in this post, here is the link for my GitHub.
This concludes the Part II of this series about BiSeNet, stay tuned for more amazing content and Part III with the code for implementing this state-of-the-art Real-time semantic segmentation Network research paper.
Thank you for reading if you have any thoughts, comments or critics please comment down below.
If you like it please give me a round of applause👏👏 👏(+50)and share it with your friends.