Build an Image Classifier in 5 steps

By Ashwin Vijayakumar, October 23 2017

What is Image Classification?

Image classification is a computer vision problem that aims to classify a subject or an object present in an image into predefined classes. A typical real-world example of image classification is showing an image flash card to a toddler and asking the child to recognize the object printed on the card. Traditional approaches to providing such visual perception to machines have relied on complex computer algorithms that use feature descriptors, like edges, corners, colors, and so on, to identify or recognize objects in the image.

Click here for a community contributed Chinese translation of this blog.

Deep learning takes a rather interesting, and by far most efficient approach, to solving real-world imaging problems. It uses multiple layers of interconnected neurons, where each layer uses a specific computer algorithm to identify and classify a specific descriptor. For example if you wanted to classify a traffic stop sign, you would use a deep neural network (DNN) that has one layer to detect edges and borders of the sign, another layer to detect the number of corners, the next layer to detect the color red, the next to detect a white border around red, and so on. The ability of a DNN to break down a task into many layers of simple algorithms allows it work with a larger set of descriptors, which makes DNN-based image processing much more effective in real-world applications.

Stop sign

NOTE: the above image is a simplified representation of how a DNN would identify different descriptors of an object. It is by no means an accurate representation of a DNN used to classify STOP signs.

Image classification is different from object detection. Classification assumes there is only one object in the entire image, sort of like the ‘image flash card for toddlers’ example I referred to above. Object detection, on the other hand, can process multiple objects within the same image. It can also tell you the location of the object within the image.

Practical learning!

You will build…

A program that reads an image from a folder and classifies them into the top 5 categories.

You will learn…

  • How to use pre-trained networks to do image classification
  • How to use Intel® Movidius™ Neural Compute SDK’s API framework to program the Intel Movidius NCS

You will need…

  • An Intel Movidius Neural Compute Stick - Where to buy
  • An x86_64 laptop/desktop running Ubuntu 16.04

If you haven’t already done so, install NCSDK on your development machine. Refer NCS Quick Start Guide for installation instructions.


If you would like to see the final output before diving into programming, download the code from our sample code repository (NC App Zoo) and run it.

cd ~/workspace
git clone
cd ncappzoo/apps/image-classifier

You should see an output similar to:

------- predictions --------
prediction 1 is n02123159 tiger cat
prediction 2 is n02124075 Egyptian cat
prediction 3 is n02113023 Pembroke, Pembroke Welsh corgi
prediction 4 is n02127052 lynx, catamount
prediction 5 is n02971356 carton

Let’s build!

Thanks to NCSDK’s comprehensive API framework, it only takes a couple lines of Python scripts to build an image classifier. Below are some of the user configurable parameters of

  1. GRAPH_PATH: Location of the graph file, against with we want to run the inference
    • By default it is set to ~/workspace/ncappzoo/caffe/GoogLeNet/graph
  2. IMAGE_PATH: Location of the image we want to classify
    • By default it is set to ~/workspace/ncappzoo/data/images/cat.jpg
  3. IMAGE_DIM: Dimensions of the image as defined by the choosen neural network
    • ex. GoogLeNet uses 224x224 pixels, AlexNet uses 227x227 pixels
  4. IMAGE_STDDEV: Standard deviation (scaling value) as defined by the choosen neural network
    • ex. GoogLeNet uses no scaling factor, InceptionV3 uses 128 (stddev = 1/128)
  5. IMAGE_MEAN: Mean subtraction is a common technique used in deep learning to center the data
    • For ILSVRC dataset, the mean is B = 102 Green = 117 Red = 123

Before using the NCSDK API framework, we have to import mvncapi module from mvnc library

import mvnc.mvncapi as mvnc

Step 1: Open the enumerated device

Just like any other USB device, when you plug the NCS into your application processor’s (Ubuntu laptop/desktop) USB port, it enumerates itself as a USB device. We will call an API to look for the enumerated NCS device.

# Look for enumerated Intel Movidius NCS device(s); quit program if none found.
devices = mvnc.EnumerateDevices()
if len( devices ) == 0:
    print( 'No devices found' )

Did you know that you can connect multiple Neural Compute Sticks to the same application processor to scale inference performance? More about this in a later blog, but for now let’s call the APIs to pick just one NCS and open it (get it ready for operation).

# Get a handle to the first enumerated device and open it
device = mvnc.Device( devices[0] )

Step 2: Load a graph file onto the NCS

To keep this project simple, we will use a pre-compiled graph of a pre-trained AlexNet model, which was downloaded and compiled when you ran make inside the ncappzoo folder. We will learn how to compile a pre-trained network in an another blog, but for now let’s figure out how to load the graph into the NCS.

# Read the graph file into a buffer
with open( GRAPH_PATH, mode='rb' ) as f:
    blob =

# Load the graph buffer into the NCS
graph = device.AllocateGraph( blob )

Step 3: Offload a single image onto the Intel Movidius NCS to run inference

The Intel Movidius NCS is powered by the Intel Movidius visual processing unit (VPU). It is the same chip that provides visual intelligence to millions of smart security cameras, gesture controlled drones, industrial machine vision equipment, and more. Just like the VPU, the NCS acts as a visual co-processor in the entire system. In our case, we will use the Ubuntu system to simply read images from a folder and offload it to the NCS for inference. All of the neural network processing is done solely by the NCS, thereby freeing up the application processor’s CPU and memory resources to perform other application-level tasks.

In order to load an image onto the NCS, we will have to pre-process the image.

  1. Resize/crop the image to match the dimensions defined by the pre-trained network.
    • GoogLeNet uses 224x224 pixels, AlexNet uses 227x227 pixels.
  2. Subtract mean per channel (Blue, Green and Red) from the entire dataset.
    • This is a common technique used in deep learning to center the data.
  3. Convert the image into a half-precision floating point (fp16) array and use LoadTensor function-call to load the image onto NCS.
    • skimage library can do this in just one line of code.
# Read & resize image [Image size is defined during training]
img = print_img = IMAGES_PATH )
img = skimage.transform.resize( img, IMAGE_DIM, preserve_range=True )

# Convert RGB to BGR [skimage reads image in RGB, but Caffe uses BGR]
img = img[:, :, ::-1]

# Mean subtraction & scaling [A common technique used to center the data]
img = img.astype( numpy.float32 )
img = ( img - IMAGE_MEAN ) * IMAGE_STDDEV

# Load the image as a half-precision floating point array
graph.LoadTensor( img.astype( numpy.float16 ), 'user object' )

Step 4: Read and print inference results from the NCS

Depending on how you want to integrate the inference results into your application flow, you can choose to use either a blocking or non-blocking function call to load tensor (previous step) and read inference results. We will learn more about this functionality in a later blog, but for now let’s just use the default, which is a blocking call (no need to call a specific API).

# Get the results from NCS
output, userobj = graph.GetResult()

# Print the results
print('\n------- predictions --------')

labels = numpy.loadtxt( LABELS_FILE_PATH, str, delimiter = '\t' )

order = output.argsort()[::-1][:6]
for i in range( 0, 5 ):
    print ('prediction ' + str(i) + ' is ' + labels[order[i]])

# Display the image on which inference was performed IMAGES_PATH ) )

Step 5: Unload the graph and close the device

In order to avoid memory leaks and/or segmentation faults, we should close any open files or resources and deallocate any used memory.


Inferred image

Congratulations! You just built a DNN-based image classifier.

Further experiments

  • This example script reads only one image; modify the script to read and infer multiple images from a folder
  • Use OpenCV to display the image(s) and their inference results on a graphical window
  • Replicate this project on an embedded board like RPI3 or MinnowBoard

Further reading