Intel® Movidius™ Neural Compute SDK Python API v2

The Intel® Movidius™ Neural Compute SDK (Intel® Movidius™ NCSDK) comes with a Python Language API that enables applications that utilize hardware accelerated Deep Neural Networks via neural compute devices such as the Intel® Movidius™ Neural Compute Stick.

The Python API is provided as a single Python module (, which is placed on the development computer when the NCSDK is installed. It has been validated with Python 2.7 and 3.5.

Python API Documentation

DeviceHwVersion Contains neural compute device hardware versions.
DeviceOption Contains neural compute device options.
DeviceState Contains neural compute device NCAPI states.
FifoDataType Contains FIFO queue element data types.
FifoOption Contains FIFO queue options.
FifoState Contains FIFO queue NCAPI states.
FifoType Contains FIFO queue access types.
GlobalOption Contains global (application-level) options.
GraphOption Contains network graph options.
GraphState Contains network graph NCAPI states.
LogLevel Contains application logging levels.
Status Contains status code return values for NCAPI functions.
TensorDescriptor Holds information that describes the shape of a tensor.
Global Functions  
enumerate_devices() Returns a list of identifiers for neural compute devices present in the system.
global_get_option() Gets the value of a global option for the application.
global_set_option() Sets the value of a global option for the application.
Device Represents a neural compute device and provides methods to communicate with the device.
Fifo Represents a first in, first out (FIFO) queue for network input and output.
Graph Represents a neural network graph and provides methods to perform inferences.

Python NCAPI Overview

1. Import the NCAPI module

The Python NCAPI is in the mvncapi module within the mvnc package.

from mvnc import mvncapi

You can get and set application-level information and options with global_get_option() and global_set_option() for options in the GlobalOption enumeration.

2. Set up a neural compute device

The Device class represents a neural compute device and provides methods to communicate with the device.

The global function enumerate_devices() is used to get a list of neural compute devices that are attached to your host system.

# Get a list of available device identifiers
device_list = mvncapi.enumerate_devices()

Initialize the Device with one of the device identifiers obtained from the call to enumerate_devices().

# Initialize a Device
device = mvncapi.Device(device_list[0])

Initialize the neural compute device and open communication with

# Initialize the device and open communication

You can get information about the device using Device.get_option() for options in the DeviceOption enumeration.

Note: If you are using more than one neural compute device, you must create and open a separate Device for each.

3. Set up a network graph and associated FIFO queues for the device

The NCSDK requires a neural network graph file compiled with the mvNCCompile NCSDK tool. Many network models from TensorFlow* and Caffe are supported. See Configuring Your Network for the Intel® Movidius™ Neural Compute SDK for more information about preparing your network model for use with the NCSDK.

When you have a compiled graph, load the graph file data to a buffer.

# Load graph file data
GRAPH_FILEPATH = './graph'
with open(GRAPH_FILEPATH, mode='rb') as f:
    graph_buffer =

The Graph class provides methods for utilizing your network graph.

Initialize the Graph with a name string. The name string can be anything you like up to mvncapi.MAX_NAME_SIZE characters, or just an empty string.

# Initialize a Graph object
graph = mvncapi.Graph('graph1')

Graph input and output is done with FIFO (first-in, first-out) queues. The Fifo class represents one of these queues and provides methods for managing it.

Create input and output Fifo queues for your Graph and allocate the graph to your device with Graph.allocate_with_fifos(). You can omit the keyword parameters to use default Fifo settings or you can specify other values as needed.

# Allocate the graph to the device and create input and output Fifos with default arguments
input_fifo, output_fifo = graph.allocate_with_fifos(device, graph_file_buffer)
# Allocate the graph to the device and create input and output Fifos with keyword arguments (default values shown)
input_fifo, output_fifo = graph.allocate_with_fifos(device, graph_file_buffer,
        input_fifo_type=mvncapi.FifoType.HOST_WO, input_fifo_data_type=mvncapi.FifoDataType.FP32, input_fifo_num_elem=2, 
        output_fifo_type=mvncapi.FifoType.HOST_RO, output_fifo_data_type=mvncapi.FifoDataType.FP32, output_fifo_num_elem=2)

Optional parameters:

  • input_fifo_type/output_fifo_type: This sets the read/write access for the Fifo. The input_fifo will be used to provide input to your network graph and should be a HOST_WO (write-only) FifoType, which allows the API (“HOST”) to write to the Fifo. The output_fifo will be used to get output from your network graph and should be a HOST_RO (read-only) FifoType, which allows the API to read from the Fifo.
  • input_fifo_data_type/output_fifo_data_type: This sets the type of data the the Fifo will store. The default data type for Fifos is 32-bit floating point (32FP). You can also set the data type to 16-bit floating point (16FP). Note: Regardless of what data types the input and output Fifos are configured for, the API will convert tensors to 16FP while performing inferences.
  • input_fifo_num_elem/output_fifo/num_elem: This sets the size of the Fifo queue, or the maximum number of elements that each Fifo will hold. Choose a number that makes sense for your application flow and memory constraints. Also keep in mind that the method to write to the input Fifo will block if the input Fifo is full, and the method to read from the output Fifo will block if the output Fifo is empty.

See the Fifo class documentation for more information about individual Fifo creation and allocation for greater control over Fifo set up.

You can get information about a Graph using Graph.get_option() for options in the GraphOption enumeration. You can get information about a Fifo using Fifo.get_option() and Fifo.set_option() for options in the FifoOption enumeration.

*Note: You must create and allocate a Graph for each network graph file that you wish to use. A Device can have more than one Graph allocated to it, but each Graph can only be allocated to a single Device.

4. Get an input tensor

The way that you obtain and pre-process your input tensor will depend on your individual application. If you are using Python3, the cv2 module provides an easy way to load images from file or a camera feed. GStreamer is a popular alternative.

Here is an example of using the cv2 module to read an image from file and resize it to fit your network’s requirements. However, additional pre-processing specific to the network model that you are using and the image that you are loading is probably necessary.

import cv2

# Read an image from file
tensor = cv2.imread('img.jpg')
# Do pre-processing specific to this network model (resizing, subtracting network means, etc.)

You can also use numpy to manipulate tensors.

import numpy

# Convert an input tensor to 32FP data type
tensor = tensor.astype(numpy.float32)

Input tensor data must be the data type specified by the RW_DATA_TYPE option for the input Fifo. The default is 32-bit floating point, but Fifos can also be configured to store 16-bit floating point data. See the FifoDataType enumeration.

Tensor data should be stored in a numpy ndarray.

5. Perform an inference

Use Graph.queue_inference_with_fifo_elem() to write the input tensor to your input Fifo and queue it for inference. When the inference is complete the input tensor will be removed from the input_fifo queue and the result tensor will be placed in the output_fifo queue. The third parameter must be None. The fourth parameter can be any object that you wish to have associated with this particular tensor when you read the inference results, such as the original tensor or a window handle, or None.

# Write the tensor to the input_fifo and queue an inference
graph.queue_inference_with_fifo_elem(input_fifo, output_fifo, tensor, 'user object')

If the input Fifo is full, this method call will block until there is room to write to the Fifo. You can check how many elements are in the input and output Fifos with Fifo.get_option() for RO_WRITE_FILL_LEVEL and RO_READ_FILL_LEVEL, respectively. Note that the inference will take some amount of time to complete, depending on network model speed and device communication latency, so you may need to wait to see updated levels.

After the inference is complete, you can get the inference result with Fifo.read_elem(). This will also return the user object that you passed to Fifo.write_elem().

# Get the results from the output queue
output, user_obj = output_fifo.read_elem()

You can then use the output result as intended for your particular network model.

6. Clean up

Before closing communication with the device, use Graph.destroy() and Fifo.destroy() to destroy the Graph and Fifo objects and clean up associated memory. The Fifos must be empty before being destroyed. Then use Device.close() to close the device and Device.destroy() to destroy the Device object and clean up associated memory.

# Clean up