Friday, July 21, 2017

Accelerating Deep Learning with the OpenCL™ Platform and Intel® Stratix® 10 FPGAs

https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01269-accelerating-deep-learning-with-opencl-and-intel-stratix-10-fpgas.pdf



Introduction
Internet video traffic will grow fourfold from 2015 to 2020.
[1]
 With this explosion
of visual data, it is critical to find effective methods for sorting, classifying, and
identifying imagery. Convolutional neural networks (CNNs), a machine learning
methodology based on the function of the human brain, are commonly used to
analyze images. Software separates the image into sections, often overlapping,
and then analyzes them to form an overall map of the visual space. This process
involves several complex mathematical steps to analyze, compare, and identify the
image with a low error rate.
Developers create CNNs using computationally intensive algorithms and
implement them on a variety of platforms. This white paper discusses a CNN
implementation on an Intel® Stratix® 10 FPGA that processes 14,000 images/
second at 70 images/second/watt for large batches and 3,015 images/second
at 18 images/second/watt for batch sizes of 1.

 As these numbers show, Intel
Stratix 10 FPGAa are competitive with other high-performance computing (HPC)
devices such as GPUs for large batch sizes, and are significantly faster than other
devices at low batch sizes.
CNN benchmarks
The Stanford Vision Lab has hosted the ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) since 2010. Competitors are challenged to develop a CNN
algorithm that can analyze and classify objects in data sets comprising millions of
images or video clips. The 2012 contest-winning algorithm, dubbed AlexNet,*
[2]
provided a huge leap forward in reducing the classification error rates compared
to previous algorithms.
[3]
 In 2014, the winning algorithm (GoogLeNet*) used an
improved algorithm to reduce the error rate even further.
[4]
 Intel has developed a
novel design that implements these benchmark algorithms with modifications to
boost the performance on Intel FPGAs.
CNN algorithms consist of a series of operations or layers. For example, the AlexNet
algorithm has:

Convolution layers that perform a convolution operation on a 3-dimensional (3D)
data array (called a feature map) and a 3D filter. The operation uses a rectified
linear unit (ReLU) as an activation function.

Cross-channel local response normalization layers that scale the feature map
elements by a factor that is a function of the elements at the same location in
adjacent channels as the element being normalized.

Max pooling layers that read the data in 2-dimensional (2D) windows and output
the maximum values

No comments:

Post a Comment