The future of AI is in mobile & IoT devices(Part I)

In this article, I’m going to give a recap of Tensorflow Dev Summit ’19, specifically the TensorFlow Lite talk.

All rights reserved to TensorFlow.

This is a summit I longed for, TensorFlow(TF) was my first love yet it lost the magic really fast when I met Pytorch and Fastai(a library that sits on top of Pytorch).

Even though Keras API is sitting on top of TensorFlow, it didn’t make things any easier. TensorFlow felt like they were trying to invent a whole new programming language while Pytorch built upon the foundations of python and it feels very natural and pythonic to work with, that’s why on January 19 I made the switch to Pytorch v1.0.

Tensorflow was the first framework I used back in 2017 and it is what got me started in AI, I can say that back then I had no idea of how young TF was, discovering Keras made me think that TF was the best AI framework on the face of hearth!

“I know who I am. When I look in the mirror, I see me.” — Tracy Morgan

Took me a while to find out that wasn’t true, but with this Summit, they included everything I left them for and included features that blew my mind away. In TensorFlow 2.0 Google went above and beyond, copied from the best as well as made their core stronger. TensorFlow was and still is the best framework for Serving and deploying ML-based apps — TF 2.0 really pointed that out, amazing tools released that I will be covering in next week’s article.

“It always seems impossible until it’s done.” —

Without further ado, let’s take a dive into the TF-Lite talk.

TF-Lite

TF-Lite is a lightweight solution for deploying machine learning models on mobile and IoT devices.

This framework is what Google is using internally to power their many products like Google Home voice recognition and NLP(natural language processing) and the amazing single camera Pixel-3 bokeh effect.

As of last week, they announced that their speech-to-text functionality will be done on the device, that means no server lag and it will work offline. All of this is made possible by TF-Lite.

Major announcements:

  • Model Quantization

Quantizing deep neural networks uses techniques that allow compacting the knowledge(weights) of a neural network for both storage and computation limitation. This offers a lot of benefits such as reduced memory cost and fast inference times without losing too much accuracy from your model.

“They do say, the smaller the feet, the better the dancer.” —

TF Lite offers 2 levels of Quantization, which are Post Training Quantization and Training Aware Quantization — meaning you can train and save your model normally and then use a function to convert it to TF Lite format or you can train your model with quantization enabled, this latter method yields better overall accuracy than the previous one.

  • Improved Keras integration

Keras is being tightly integrated with TF core and now has full access to all the low-level ops. Given this deep integration, it’s only normal that all other solutions from the TF library such as TF Lite to make their model optimization functions easily implementable via Keras API.

  • Edge TPU devices
Coral Edge TPU devices

Edge devices are the future and Google is very aware of it, so much so that is investing heavily and pushing the bar higher and higher.

They launched two devices namely: Dev Board and USB accelerator under the brand that use Google’s (tensor processing unit) which is an AI accelerator application specific integrated circuit(AISC) powering most if not all Google products.

Google released TPU back in 2016, they created this processor specifically for AI and since version 1 they were rated incredibly fast and now they have taken the same processor and fitted it into a raspberry form factor, reducing inference time of ML models to 2ms with or without internet — this unheard-of!

“Technology, like art, is a soaring exercise of the human imagination.“—

  • TF-Lite for Microcontrollers

Microcontrollers are everywhere and Google too, they have services such as Google assistant that runs on millions of embedded devices. A great example of how they used flite for Microcontrollers is how they made the keyword calling more efficient by making a Neural Network with a few KB run on a microcontroller allowing always listening for the keyword and therefore reducing CPU usage, run offline and increasing battery life.

They announced the 1st experimental support for embedded platforms in TF-Lite. To give you perspective during the presentation they used a Sparkfun Microcontroller board(beta version) with 384KB RAM, 1MB memory and with single coin battery that ran a 20kb Neural network that recognised the word ‘yes’. Please stop and think about this! An amazing feat of engineering, although not perfect, slowly we are getting there.

Last but not least it is Open source so anyone can contribute, build on top of it and come up with innovative ways of leveraging ML on Microcontroller.

“The science of today is the technology of tomorrow.” —

Summary

Things to remember from this article:

  • TF-Lite is the best solution for mobile & IoT devices, currently available.
  • It allows you to train a model from scratch with quantization enabled as well as take a trained model and quantize it with a few lines of code, 4to be precise.
  • Edge TPU devices offer’s you fast on-device ML with no lag and doesn’t need an internet connection — this type of devices can be great for developing projects that you need dedicated hardware where the size also matters.
  • TF-Lite for Microcontrollers can improve the overall performance of a system, deviating tasks that don’t need really big models to the Microcontroller and freeing up the CPU from trivial tasks such as always- on keyword wake call(the famous ‘OK Google!’).

This article is part of a series where I will be sharing the highlights of the TF Dev Summit ’19 and surprise spin-off article that I will share a new AI race.

Computer Engineering Student, Web Dev. & AI/ML dev