============================================================

Movidius Neural Compute SDK Release Notes

V2.10.01 2019-01-27

============================================================

This is a 2.x release of the Intel NCSDK which is not backwards compatible with the 1.x releases of the Intel NCSDK. Please look at the documentation for differences in tools and APIs.

SDK Notes:

New features:

TensorFlow SSD networks added.
Multi threaded execution on device. The graph option NC_RW_GRAPH_EXECUTORS_NUM which was previously limited to values 1 or 2 for Myriad X based devices, but can now be set to any value in the range 1-4 inclusive. This value corresponds to the number of executor threads to be used on the device for the graph. Each executor thread will use the number of shaves specified in the graph file (via the -s option on the compiler command.) The number of executors times the number of shaves specified in the graph file can not exceed the total number of shaves on the device (12 for Myriad2 or 16 for MyriadX.) Does not apply to Myriad 2 based devices

Notable bug fixes:

Force numpy 1.15 to avoid known issue with 1.16 release.
Force scikit-image to >= 0.13.0 and <= 0.14.0 to address issue with 0.12 RPi.

API Notes:

Apps written with NCAPI v1 are not compatible with this release and need to be migrated to NCAPI v2, refer to Migrating Applications from NCAPI v1 to NCAPI v2 for information about migrating apps to the new API.

Network Notes:

Caffe

Tested networks:

AlexNet
GoogLeNet v1
MobileNet v1
Resnet 18
Resnet 50
SqueezeNet v1.1
SSD MobileNet v1
Tiny Yolo v1
VGG 16 (Configuration D)
RefineDet (Hardware only, see erratum #21)

Untested networks that likely work:

LeNet
CaffeNet
VGG (Sousmith VGG_A)
Tiny Yolo v2

TensorFlow (r1.09)

Tested networks:

Facenet based on inception-resnet-v1
inception-v1
inception-v2 (GoogLeNet v2)
inception-v3
inception-v4
Inception ResNet v2
Mobilenet_V1_1.0 variants:
- MobileNet_v1_1.0_224
- MobileNet_v1_1.0_192
- MobileNet_v1_1.0_160
- MobileNet_v1_1.0_128
- MobileNet_v1_0.75_224
- MobileNet_v1_0.75_192
- MobileNet_v1_0.75_160
- MobileNet_v1_0.75_128
- MobileNet_v1_0.5_224
- MobileNet_v1_0.5_192
- MobileNet_v1_0.5_160
- MobileNet_v1_0.5_128
- MobileNet_v1_0.25_224
- MobileNet_v1_0.25_192
- MobileNet_v1_0.25_160
- MobileNet_v1_0.25_128
TinyYolo v2 via Darkflow tranformation
VGG 16 (Configuration D)
SSD Inception v2
SSD Mobilenet v1
SSD Mobilenet v2

Firmware Features in shaves:

Convolutions
- The following convolution cases have been extensively tested (for stride s): 1x1s1,3x3s1,5x5s1,7x7s1, 7x7s2, 7x7s4
- Group convolution
- Depth Convolution
- Dilated convolution
Max Pooling Radix NxM with Stride S (See erratum #15)
Average Pooling: Radix NxM with Stride S, Global average pooling (See erratum #15)
Local Response Normalization
Relu, Relu-X, Prelu, Leaky-Relu (see erratum #6)
Softmax
Sigmoid
Tanh (see erratum #6)
Deconvolution
Slice (in SW via crop layer)
Scale
ElmWise unit : supported operations - sum, prod, max
Fully Connected Layers (limited support – : see erratum #10)
Reshape
Flatten
Power
Crop (SW in ChannelMinor format only)
ELU
Batch Normalization (fused)
L2 Normalization
Input Layer

Firmware Features in NCEs

Convolution
Max Pooling Radix NxM with Stride S (See erratum #15)
Average Pooling: Radix NxM with Stride S, Global average pooling
Fully Connected Layers
Relu, Relu-X, Prelu, Leaky-Relu
Fused Non-overlapping pooling

Bug Fixes:

Docker now works with multiple devices.

Errata:

Python 2.7 is fully supported for making user applications, but only the helloworld_py example runs as-is in both python 2.7 and 3.5 due to dependencies on modules.
Depth-wise convolution may not be supported if channel multiplier > 1.
If working behind proxy, proper proxy settings must be applied for the installer to succeed.
Although improved, the installer is known to take a long time on Raspberry Pi. Date/time must be correct for SDK installation to succeed on Raspberry Pi.
Convolution may fail to find a solution for very large inputs.
Depth-wise convolution is tested for 3x3 kernels.
A TanH layer’s “top” & “bottom” blobs must have different names. This is different from a ReLU layer, whose “top” & “bottom” should be named the same as its previous layer.
On upgrade from previous versions of SDK, the installer will detect if openCV 3.3.0 was installed, for example from http://github.com/movidius/ncappzoo/apps/stream_ty_gn/install-opencv-from_source.sh. For this release, the installer will prompt to uninstall this specific version of openCV. This is required for ssd-caffe to run correctly. After installation is complete, openCV 3.3.0 can be re-installed and the ssd-caffe will continue to function.
The MTCNN network in the app zoo is showing unexpected behaviour for this release, and is being investigated. To use MTCNN, please use version 1.12.00 of SDK.
For Caffe networks, although mvNCCheck shows per-pixel error for some metrics for mobilenet_v1_224 and hardware GoogLeNet, classification results are not impacted.
Only Ubuntu 16.04 LTS is supported as a host OS for this release. Ubuntu 18.04 is being evaluated.
For this release, use of Myriad devices connected to some specific hubs can fail. If you encounter errors, please try direct connect to PC port, or try a different hub.
Layer optimization for layers that run on HW are seen in the profiler graph. Profiler graph, if using new parser, shows multiple connections to and out of depth wise convolutions and some other implicit layers. Note that the different groups of depthwise convolutions (optimized for HW) don’t show up explicitly in the profiler graph.
For this release, networks with small input channels on Tensorflow may experience a performance penalty.
Non-Overlapping Pooling can run as post operation on HW and as a separate operation in SW. Overlapping pooling is supported as a separate operation on both HW and SW
FC with input NxNxD where N is higher than 1 are not supported natively on CNN Engines. Therefore, they run as a convolution which requires modification of prototxt for this release.
A Caffe Scale layer only supports 1 input tensor.
The –accuracy_adjust=VALUES flag should be used if accuracy for HW networks is low when the network is compiled with the default parser. A setting of –accuracy_adjust=”ALL:256” has been found to improve the accuracy for most networks.
For some networks, compiling and running a graph with 5 and 15 shaves is not supported.
Average pooling in CNN Engine would compute incorrect values near the edges as the scale factor applied is constant depending on the size of the kernel.
RefineDet must be compiled to run in hardware (with the –ma2480 flag) for this release