ARC2019

TaPaSCo Tutorial

2019-04-11T13:15:00+02:00

TBD.

Lunch

2019-04-11T12:15:00+02:00

Session description goes here. It’s a regular post so feel free to do use markup or markdown.

Conference Closing

2019-04-11T12:00:00+02:00

Session 8 - Convolutional Neural Networks

2019-04-11T10:30:00+02:00

Filter-wise Pruning Approach to FPGA Implementation of Fully Convolutional Network for Semantic Segmentation

Masayuki Shimoda, Youki Sada, Hiroki Nakahara

This paper presents a hardware-aware sparse fully convolu- tional network (SFCN) for semantic segmentation on an FPGA. Seman- tic segmentation attracts interest since for self-driving car it is important to recognize road and obstacles in pixel level. However, it is hard to im- plement the system on embedded systems since the number of weights for the SFCN is so large that embedded systems cannot store them using limited on-chip memory. To realize good a trade-off between speed and accuracy, we construct an AlexNet-based SFCN which has no skip con- nections and deconvolution layers to reduce the computation costs and the latency. Furthermore, we propose a filter-wise pruning technique that sorts the weights of each filter by their absolute values and prunes them by a preset percent filter-by-filter from a small order. It is more suitable for the hardware implementation since the number of computation of each filter becomes equal. We trained the AlexNet-based SFCN by us- ing Camvid image dataset and implemented on Xilinx zcu102 evaluation board. The results show that the FPGA system is 10.14 times faster than a mobile GPU one, and its performance per power consumption is 24.49 times higher than the GPU counterpart.

Exploring Data Size to Run Convolutional Neural Networks in Low Density FPGAs

Ana Goncalves, Tiago Peres, Mário Véstias

Convolutional Neural Networks (CNNs) obtain very good results in several computer vision applications at the cost of high com- putational and memory requirements. Therefore, CNN typically run on high performance platforms. However, CNNs can be very useful in em- bedded systems and its execution right next to the source of data has many advantages, like avoiding the need for data communication and real-time decisions turning these systems into smart sensors. In this pa- per, we explore data quantization for fast CNN inference in low density FPGAs. We redesign LiteCNN, an architecture for real-time inference of large CNN in low density FPGAs, to support hybrid quantization. We study the impact of quantization over the area, performance and accu- racy of LiteCNN. LiteCNN with improved quantization of activations and weights improves the best state of the art results for CNN inference in low density FPGAs. With our proposal, it is possible to infer an image in AlexNet in 7.4 ms in a ZYNQ7020 and in 14.8 ms in a ZYNQ7010 with 3% accuracy degradation. Other delay versus accuracy ratios were identified permiting the designer to choose the most appropriate.

Faster Convolutional Neural Networks in Low Density FPGAs using Block Pruning

Tiago Peres, Ana Goncalves, Mário Véstias

Convolutional Neural Networks (CNNs) are achieving promis- ing results in several computer vision applications. Running these models is computationally very intensive and needs a large amount of memory to store weights and activations. Therefore, CNN typically run on high per- formance platforms. However, the classification capabilities of CNNs are very useful in many applications running in embedded platforms close to data production since it avoids data communication for cloud processing and permits real-time decisions turning these systems into smart em- bedded systems. In this paper, we improve the inference of large CNN in low density FPGAs using pruning. We propose block pruning and apply it to LiteCNN, an architecture for CNN inference that achieves high performance in low density FPGAs. With the proposed LiteCNN optimizations, we have an architecture for CNN inference with an aver- age performance of 275 GOPs for 8-bit data in a XC7Z020 FPGA. With our proposal, it is possible to infer an image in AlexNet in 5.1 ms in a ZYNQ7020 and in 13.2 ms in a ZYNQ7010 with only 2.4% accuracy degradation.

Coffee Break

2019-04-11T10:00:00+02:00

Session description goes here. It’s a regular post so feel free to do use markup or markdown.

Session 7 - Safety and Security

2019-04-11T09:00:00+02:00

Leveraging the Partial Reconfiguration Capability of FPGAs for Processor-Based Fail-Operational Systems

Tobias Dörr, Timo Sandmann, Florian Schade, Falco K. Bapp, Jürgen Becker

Processor-based digital systems are increasingly being used in safety-critical environments. To meet the associated safety require- ments, these systems are usually characterized by a certain degree of redundancy. This paper proposes a concept to introduce a redundant processor on demand by using the partial reconfiguration capability of modern FPGAs. We describe a possible implementation of this concept and evaluate it experimentally. The evaluation focuses on the fault han- dling latency and the resource utilization of the design. It shows that an implementation with 32 KiB of local processor memory handles faults within 0.82 ms and, when no fault is present, consumes less than 46 % of the resources that a comparable static design occupies.

(ReCo)Fuse Your PRC or Lose Security: Finally Reliable Reconfiguration-based Countermeasures on FPGAs

Kenneth Schmitz, Buse Ustaoglu, Daniel Große, Rolf Drechsler

Partial reconfiguration is a powerful technique to adapt the functionality of Field Programmable Gate Arrays (FPGAs) at run time. When performing partial reconfiguration a dedicated Intellectual Property (IP) component of the FPGA vendor, i.e. the Partial Reconfiguration Controller (PRC), among a wide range of IP components has to be used. While ensuring the functional safety of FPGA designs is well understood, ensuring hardware security is still very challenging. This applies in par- ticular to reconfiguration-based countermeasures which are intensively used to form a moving target for the attacker. However, from the system security perspective a critical component is the above mentioned PRC as noticed by many papers implementing reconfiguration-based counter- measures against SCA/DPA attacks. In this work, we leverage a new proposed safety mechanism which creates a container around an IP, to encapsulate and thereby to protect and observe the PRC of an FPGA. The proposed encapsulation scheme results in an architecture consisting of so-called ReCoFuses (RCFs), each capturing a specific protective goal which have to be fulfilled at any time during PRC operation. The termi- nology follows the classical electric installation including a fuse box. In our scheme we employ formal verification to guarantee the correctness in detecting a security violation. Only after successful verification, the RCFs are integrated into the ReCoFuse Container. Experimental results demonstrate the advantage of our approach by preventing attacks on the PRC of a system secured by reconfiguration.

Registration & Welcome

2019-04-11T08:00:00+02:00

Session description goes here. It’s a regular post so feel free to do use markup or markdown.

Social 2 - Visit to ESOC & Dinner

2019-04-10T16:45:00+02:00

ESOC guided tour

The tour includes a short introduction film and a visit of ESOC’s operations facilities, e.g. the Main Control Room (MCR).

Participants can also take a look at the Rosetta engineering model and further mission specific control rooms.

Dinner

After the guided tour, we will have dinner in the Comedy Hall. A bus is going to take all participants from the conference to ESOC and to the dinner location after the guided tour. If you do not participate in the guided tour, please come directly to Comedy Hall no later than 7:30pm.

Use tram lines 6, 7 or 8 towards Eberstadt/Alsbach and get off at stop Bessunger Straße.

Invited Talk: Third Party CAD Tools for FPGA Design - A Survey of the Current Landscape

2019-04-10T16:15:00+02:00

The FPGA community is at an exciting juncture in the development of 3rd party CAD tools for FPGA design. Much has been learned in the past decade in the development and use of 3rd party tools such RapidSmith, Torc, and IceStorm. New independent open-source CAD tool projects are emerging which promise to provide alternatives to existing vendor tools. The recent release of the RapidWright tool suggests that Xilinx itself is interested in enabling the user community to develop new use cases and specialized tools for FPGA design. This talk provides a survey of the current landscape, discusses parts of what has been learned over the past decade in the author’s work with 3rd party CAD tool development, and provides some thoughts on the future.

Brent Nelson is department chair and a professor in the Department of Electrical and Computer Engineering at Brigham Young University. He received his PhD in computer science in 1984 from the University of Utah in the area of VLSI CAD. His current research interests focus on CAD tools for the design of digital electronic systems (especially FPGA-based systems) and high-performance computing applications using FPGAs and GPGPU devices.

Session 6 - Design Frameworks and Methodology

2019-04-10T15:15:00+02:00

Hybrid Prototyping for Manycore Design and Validation

Leonard Masing, Fabian Lesniak, Jürgen Becker

The trend towards more parallelism in information process- ing is unbroken. Manycore architectures provide both massive parallelism and flexibility, yet they raise the level of complexity in design and pro- gramming. Prototyping of such architectures helps in handling this com- plexity by evaluating the design space and discovering design errors. Several system simulators exist but they can only be used for early soft- ware development and interface specification. FPGA-based prototypes on the other hand are restricted by available FPGA resources or expen- sive multi-FPGA prototyping platforms. We present a hybrid prototyp- ing approach for manycore systems that consists of an FPGA-part and a virtual part of the architecture on a host system. The hybrid proto- typing requires less FPGA resources while retaining its speed advantage and enabling flexible modeling in the virtual platform. We describe the concept, provide an analysis of timing accuracy and syn- chronization of the FPGA with the Virtual Platform (VP) and show an example in which the hybrid prototype is used for feature development and evaluation of a scientific manycore architecture. The hybrid proto- type allows us to evaluate a 7x7 architecture on a Virtex-7 XC7VX485T FPGA board which otherwise could only fit a reduced 2x2 design of our architecture.

Umar Ibrahim Minhas, Roger Woods, Georgios Karakonstantis

Whilst FPGAs have been integrated in cloud ecosystems, strict constraints for mapping hardware to spatially diverse distribution of heterogeneous resources at run-time, makes their utilization for shared multi tasking challenging. This work aims at analyzing the effects of such constraints on the achievable compute density, i.e the efficiency in uti- lization of available compute resources. A hypothesis is proposed and uses static off-line partitioning and mapping of heterogeneous tasks to improve space sharing on FPGA. The hypothetical approach allows the FPGA resource to be treated as a service from higher level and supports multi-task processing, without the need for low level infrastructure sup- port. To evaluate the effects of existing constraints on our hypothesis, we implement a relatively comprehensive suite of ten real high perfor- mance computing tasks and produce multiple bitstreams per task for fair evaluation of the various schemes. We then evaluate and compare our proposed partitioning scheme to previous work in terms of achieved system throughput. The simulated results for large queues of mixed in- tensity (compute and memory) tasks show that the proposed approach can provide higher than 3× system speedup. The execution on the Nal- latech 385 FPGA card for selected cases suggest that our approach can provide on average 2.9× and 2.3× higher system throughput for compute and mixed intensity tasks while 0.2× lower for memory intensive tasks.

ARC2019

TaPaSCo Tutorial

Lunch

Conference Closing

Session 8 - Convolutional Neural Networks

Filter-wise Pruning Approach to FPGA Implementation of Fully Convolutional Network for Semantic Segmentation

Exploring Data Size to Run Convolutional Neural Networks in Low Density FPGAs

Faster Convolutional Neural Networks in Low Density FPGAs using Block Pruning

Coffee Break

Session 7 - Safety and Security

Leveraging the Partial Reconfiguration Capability of FPGAs for Processor-Based Fail-Operational Systems

(ReCo)Fuse Your PRC or Lose Security: Finally Reliable Reconfiguration-based Countermeasures on FPGAs

Registration & Welcome

Social 2 - Visit to ESOC & Dinner

ESOC guided tour

Dinner

Invited Talk: Third Party CAD Tools for FPGA Design - A Survey of the Current Landscape

Session 6 - Design Frameworks and Methodology

Hybrid Prototyping for Manycore Design and Validation

Evaluation of FPGA Partitioning Schemes for Time and Space Sharing of Heterogeneous Tasks