New Book: Efficient Deep Learning

Subtitled “Fast, smaller, and better models”. This book goes through algorithms and techniques used by researchers and engineers at Google Research, Facebook AI Research (FAIR), and other eminent AI labs to train and deploy their models on devices ranging from large server-side machines to tiny microcontrollers. The book presents a balance of fundamentals as well as practical know-how to fully equip you to go ahead and optimize your model training and deployment workflows such that your models perform as well or better than earlier, with a fraction of resources.

Target Audience

The minimally qualified reader is someone who has a basic understanding of ML and at least some experience of training deep learning models. They can do basic fine-tuning of models by changing common parameters, can make minor changes to model architectures, etc. and get the modified models to train to a good accuracy. However, they are running into problems with productionizing these models / want to optimize them further. This is primarily because the book does not teach deep learning basics. For a basic introduction on the subject, seeĀ Deep Learning with Python. Any reader having this pre-requisite knowledge would be able to enjoy the book.


  • Gaurav Menghani is Staff Software Engineer / Tech Lead at Google Research, working on efficient Deep Learning, On-Device machine learning and AI. He was previously senior software engineer at Facebook, working on search quality and ranking.
  • Naresh Singh graduated at Stony Brook University, in data analysis and machine learning. He worked as software engineer at Microsoft and Amazon.


The material covers the following topics: quantization, learning techniques and efficiency, data augmentation, smaller and faster models, efficient architectures, long term dependencies, automation and autoML, hyper-parameter tuning, clustering and classification, contrastive learning, microcontrollers, NLP, computer vision, TensorFlow, PyTorch and more.

Many projects and exercises are discussed throughout the book, including:

  • Compressing images from the Mars Rover
  • Quantizing a deep learning model
  • Increasing the accuracy of an image or text classification model with data augmentation
  • Increasing the accuracy of an speech identification model with distillation
  • Using pre-trained embeddings to improve accuracy of a NLP task
  • News classification using RNN and Attention Models
  • Snapchat-like filters for pets
  • Searching over model architectures for boosting model accuracy
  • Comparing compression techniques for optimizing a speech detection model
  • Learning to classify with 10% labels
  • Benchmarking a tiny on-device model with TFLite
  • Speech detection on a microcontroller with TFMicro
  • Face recognition on the web with TensorFlow.JS
  • Google Tensor Processing Unit: training BERT efficiently with TPU

You can download the first four chapters (PDF format) on the official website, Projects, codelabs and tutorials are available on GitHub, here.

For related books, visit our books section, here.

%d bloggers like this: