大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2016.08】【含源码】ZynqNet:一种FPGA加速的嵌入式卷积神经网络

已有 366 次阅读 2019-11-16 18:38 |系统分类:科研笔记|文章来源:转载

本文为瑞士苏黎世联邦理工学院(作者:David Gschwend)的硕士论文,共102页。

 

从医学诊断到自动驾驶汽车,图像理解正成为越来越多应用中的一个重要特征。许多应用程序需要嵌入式解决方案,这些解决方案可以集成到具有严格实时性和电源限制的现有系统中。卷积神经网络(CNN)目前在所有图像理解基准中都达到了破纪录的精度,但计算复杂度非常高。因此,嵌入式CNN需要小型、高效但功能强大的计算平台。

 

本硕士论文探讨了基于FPGACNN加速的潜力,并在Zynq芯片系统上演示了CNN实现概念的完整功能证明。ZynqNet嵌入式CNN是为ImageNet的图像分类而设计的,它由ZynqNet CNN、一个优化定制的CNN拓扑结构以及ZynqNet FPGA加速器(一种基于FPGA的评估架构)组成。

 

ZynqNet CNN是一种高效的CNN拓扑结构。使用定制设计的NetscopeCNN分析仪对先前的拓扑结构进行详细分析和优化,使CNN在计算复杂度仅为5.3亿次乘累加运算的情况下,具有进入前584.5%的精度。该拓扑结构具有很高的规则性,仅由卷积层、非线性ReLU和一个全局池化层组成。CNN非常适合于FPGA加速器。

 

ZynqNetFPGA加速器允许对ZynqNetCNN进行有效评估,它基于一个嵌套循环算法以加速整个网络,该算法将算术运算和内存访问的次数最小化。针对Xilinx Zynq XC-7Z045进行了高级综合,实现了频率为200MHz、器件利用率为80%90%FPGA加速器。

 

Image Understanding is becoming a vitalfeature in ever more applications ranging from medical diagnostics toautonomous vehicles. Many applications demand for embedded solutions thatintegrate into existing systems with tight real-time and power constraints. ConvolutionalNeural Networks (CNNs) presently achieve record-breaking accuracies in allimage understanding benchmarks, but have a very high computational complexity. EmbeddedCNNs thus call for small and efficient, yet very powerful computing platforms.

This master thesis explores the potentialof FPGA-based CNN acceleration and demonstrates a fully functionalproof-of-concept CNN implementation on a Zynq System-on-Chip. The ZynqNetEmbedded CNN is designed for image classification on ImageNet and consists of ZynqNetCNN, an optimized and customized CNN topology, and the ZynqNet FPGAAccelerator, an FPGA-based architecture for its evaluation.

ZynqNet CNN is a highly efficient CNNtopology. Detailed analysis and optimization of prior topologies using thecustom-designed Netscope CNN Analyzer have enabled a CNN with 84.5 % top-5accuracy at a computational complexity of only 530 million multiply accumulateoperations. The topology is highly regular and consists exclusively ofconvolutional layers, ReLU nonlinearities and one global pooling layer. The CNNfits ideally onto the FPGA accelerator.

The ZynqNet FPGA Accelerator allows anefficient evaluation of ZynqNet CNN. It accelerates the full network based on anested-loop algorithm which minimizes the number of arithmetic operations andmemory accesses. The FPGA accelerator has been synthesized using High LevelSynthesis for the Xilinx Zynq XC-7Z045, and reaches a clock frequency of 200MHz with a device utilization of 80 % to 90 %.


引言

背景与概念

卷积神经网络分析、训练与优化

4 FPGA加速器设计与实现

评估与结果

结论

附录原创声明

附录任务描述

附录卷积神经网络可视化

附录D CNN训练细节与结果

附录E FPGA加速器细节


更多精彩文章请关注公众号:qrcode_for_gh_60b944f6c215_258.jpg



http://blog.sciencenet.cn/blog-69686-1206398.html

上一篇:[转载]【计算机科学】【1991】开关网络控制的神经网络设计
下一篇:[转载]【电信学】【2017.08】物联网与现代供应链管理

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备14006957 )

GMT+8, 2019-12-7 08:05

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部