

Session 8 - Convolutional Neural Networks

Time: Thursday, 2019-04-11, 10:30AM - 12:00PM

Room: Wilhem-Köhler-Saal, S1|03/283

Session chair: Andreas Koch

Filter-wise Pruning Approach to FPGA Implementation of Fully Convolutional Network for Semantic Segmentation

Masayuki Shimoda, Youki Sada, Hiroki Nakahara

This paper presents a hardware-aware sparse fully convolu- tional network (SFCN) for semantic segmentation on an FPGA. Seman- tic segmentation attracts interest since for self-driving car it is important to recognize road and obstacles in pixel level. However, it is hard to im- plement the system on embedded systems since the number of weights for the SFCN is so large that embedded systems cannot store them using limited on-chip memory. To realize good a trade-off between speed and accuracy, we construct an AlexNet-based SFCN which has no skip con- nections and deconvolution layers to reduce the computation costs and the latency. Furthermore, we propose a filter-wise pruning technique that sorts the weights of each filter by their absolute values and prunes them by a preset percent filter-by-filter from a small order. It is more suitable for the hardware implementation since the number of computation of each filter becomes equal. We trained the AlexNet-based SFCN by us- ing Camvid image dataset and implemented on Xilinx zcu102 evaluation board. The results show that the FPGA system is 10.14 times faster than a mobile GPU one, and its performance per power consumption is 24.49 times higher than the GPU counterpart.

Exploring Data Size to Run Convolutional Neural Networks in Low Density FPGAs

Ana Goncalves, Tiago Peres, Mário Véstias

Convolutional Neural Networks (CNNs) obtain very good results in several computer vision applications at the cost of high com- putational and memory requirements. Therefore, CNN typically run on high performance platforms. However, CNNs can be very useful in em- bedded systems and its execution right next to the source of data has many advantages, like avoiding the need for data communication and real-time decisions turning these systems into smart sensors. In this pa- per, we explore data quantization for fast CNN inference in low density FPGAs. We redesign LiteCNN, an architecture for real-time inference of large CNN in low density FPGAs, to support hybrid quantization. We study the impact of quantization over the area, performance and accu- racy of LiteCNN. LiteCNN with improved quantization of activations and weights improves the best state of the art results for CNN inference in low density FPGAs. With our proposal, it is possible to infer an image in AlexNet in 7.4 ms in a ZYNQ7020 and in 14.8 ms in a ZYNQ7010 with 3% accuracy degradation. Other delay versus accuracy ratios were identified permiting the designer to choose the most appropriate.

Faster Convolutional Neural Networks in Low Density FPGAs using Block Pruning

Tiago Peres, Ana Goncalves, Mário Véstias

Convolutional Neural Networks (CNNs) are achieving promis- ing results in several computer vision applications. Running these models is computationally very intensive and needs a large amount of memory to store weights and activations. Therefore, CNN typically run on high per- formance platforms. However, the classification capabilities of CNNs are very useful in many applications running in embedded platforms close to data production since it avoids data communication for cloud processing and permits real-time decisions turning these systems into smart em- bedded systems. In this paper, we improve the inference of large CNN in low density FPGAs using pruning. We propose block pruning and apply it to LiteCNN, an architecture for CNN inference that achieves high performance in low density FPGAs. With the proposed LiteCNN optimizations, we have an architecture for CNN inference with an aver- age performance of 275 GOPs for 8-bit data in a XC7Z020 FPGA. With our proposal, it is possible to infer an image in AlexNet in 5.1 ms in a ZYNQ7020 and in 13.2 ms in a ZYNQ7010 with only 2.4% accuracy degradation.

Important Dates:

► Paper Submission:
23 November 2018
► Paper Submission:
07 December 2018
► Tutorial Proposals:
18 January 2019
► Author Notification:
18 January 2019
► Camera-ready:
10 February 2019
► Symposium:
09 - 11 April 2019


► 2019-02-11: Registration now open
Registration for the symposium is now open. Information about the registration and a link to the registration site is available.
ARC 2019 will feature a tutorial about the open-source TaPaSCo framework on Thursday afternoon.
► 2018-11-22: Deadline Extended
Due to popular demand, the paper submission deadline for ARC 2019 has been extended to December 7. We will not be able to offer any further extensions beyond that.
► 2018-11-01: Second CFP
The 2nd CFP announces the Program Committee and the planed Tutorials.
► 2018-10-18: Submission open
Manuscripts can now be submitted as described in the author guidelines.
► 2018-09-11: Special issue confirmed
Extended versions of selected papers are invited to a special issue of Springer’s Journal of Signal Processing Systems.
► 2018-08-30: CFP published
The CFP topics have been published.
► 2018-08-22: Deadlines Fixed
The deadlines for paper submission, author notification, and camera ready submission are available.
► 2018-07-31: Hotel rooms reserved
A number of nearby hotel rooms with preferential prices are available.
► 2018-06-27: Schedule changed
The conference date was shifted by one week.


via RSS

