刘小邦的个人博客分享 http://blog.sciencenet.cn/u/iamliuzhiyong 浮生浪迹笑明月 千愁散尽一剑轻

博文

Random Decision Tree Body Part Recognition Using FPGAs

已有 3012 次阅读 2013-7-13 09:00 |个人分类:论文阅读|系统分类:科研笔记

摘要

Decision tree 被用于kinect vision pipeline,来识别body part和gesture。这种tree-based classification占有极大的computational load和memory bandwidth。这让highly-optimized hardware implementation非常有吸引力,特别是在strict power和form factor limitation。我们呈现了一个complete architecture,它连接了Kinect depth sensor和FPGA-based implementation of pixel classification algorithm。关键性能参数,algorithmic improvement和design trade-off被讨论。

While vision-based systembecomes more ubiquitous, software implementation is sometimes unfeasible forlow-end embedded platform due to low computational capability and low power. Hardwareacceleration can significantly boost system efficiency in terms of both powerconsumption and bandwidth. In addition to power consideration, the low levelcontrol and parallel nature make FPGA a good candidate for hardwareacceleration.

该文的主要贡献是,第一个verilog implementation of decision tree for pixel classification algorithm,第二个是novel algorithmic and architectural optimization for processing randomized decision trees that account for the limitation。第三是develop and debugging vision algorithm on Xilinx ML605 board。是否target pixel surrounding the current pixel属于同一个player object? 相对position of the target pixel是由tree node定义的。是否两个pixel之间的the distance in depth大于一个threshold。如果这样的,move to the right child,否则到left child。

Forest Fire Memory Optimization

正如section 2描述,处理一帧的所需的step是N*T*L的乘积,其中N是需要分类的像素的数目,T是树的数目,而L是每个树上的level的数量。(假设所有的tree拥有identical depth)。在Kinect系统中,N是0到19,200的数目,T等同于3,而L等同于20。在这些步骤中,traversal of each pixel是独立的。因此,这个算法理论上可以parallelized N*T 的方式。

但是,look at section 2的evaluation function。有两个需要在每个step都要做的critical memory access。这些memory access会对realistic physical implementation形成bottlenect,因为memory拥有有限的bandwidth。第一个memory access是当前的数据库中的tree node。这个数据库lookup retrive两个value:一个pixel offset和comparision threshold。第二个memory access是original depth image,其地址是基于pixel offset。输入的depth image相对较小,大概38KB,因此可以在FPGA内部存储。但是database for Forest fire algorithm相对很大,因此可以在外部memory上存储,比如DDR。因为random decision tree,很少有预先的pattern in either the offset或者threshold value。因为这个,the prescribed tree node必须对每次decision进行consulted。因为bandwidth到external是很有限的resource,可以实现的系统performance严重的被lookup to the data base所限制。(are bounded by)初次之外,对于system-on-chip,pixel classification只是一个component,其分享access to the DDR with other computation。由于这些原因,我们take a close look at减小到external database memory acess的数量。

Traversal Order

当所有的algorithm明确所有的tree in the forest都必须traversed to collect the probability distribution。每个pixel都可以proceed independently of 其他的pixel。因此,the order of traversal可以改变without affecting the result。Depth first traversal,其中single pixel可以完全处理from root to leaf,在next pixel被处理之前。 这是非常attractive for light-weight implementation. 这是因为非常少的state需要存储。如果我们拥有one in-flight pixel, 我们只需要a single pointer to its current tree node。但是,algorithmic decison有一个impact on the number of external memory required。

举个例子,DDR module 被组织为不同的pages。Consecutive access within a page会更快,因此消耗更少的bandwidth超过accesses cross a page。例如DDR3 DIMM在Xilinx ML605 Board, mated with Xilinx's Memory Interface Generator控制器in our experiment。我们发现fully pipelined in-page read可以served at a rate of one read every other cycle。相反的,each read request that cross一个8KB page boundary产生了一个10 cycle bubble in the pipeline。因为任何given tree都是large并且span太多的pages,depth-first traversal保证。这个case,每个depth-first traversal会send三个read request,形成level 0 node, level 1 node 和 level 2 node。Processing所有的像素要求N T L个read request, 而产生了N T L pages.

Forest Fire Hardware Architecture

因为已经有清楚的bandwidth advantage,来sort breadth-first tree traversal for hardware-based system,我们使用这个方法来实现基于FPGA的Forest Fire algorithm。Fig 6显示了block diagram of 如何hardware fit into 完全的kinect vision pipeline,interfacing the FPGA with Kinect Camera和其他部分。这个系统开始于连接Kinect Sensor到PC通过USB。 除了要handle USB 2.0 protocol,来接受raw depth image from camera,Host machine同时解决first stage of kinect vision pipeline:Image segmentation。这个过程identify这些pixels,代表moving objects,区分它们于static background。深度图像with these tagged active pixels被送回到input buffer on FPGA。一旦输入的buffer已经被filled,FPGA运行Forest Fire algorithm。这一phase分类active pixel,产生the probability for thirty one distinct body part。这些probability被写进output buffer到FPGA上,而这又会送回host PC。在这点上,remaining part resume operation来进行skeleton tracking。这个process连接body part identified by the classification phase,然后或者用于display,或者传导给User application。

 

 

 

 

 

 

 

http://research.microsoft.com/pubs/170804/fpl12_CameraReady.pdf



https://blog.sciencenet.cn/blog-942948-707657.html

上一篇:[机器学习]矩阵分析-中国科学院空中课堂
下一篇:[机器视觉]现代计算机视觉-中国科学院空中课堂
收藏 IP: 111.37.7.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-10-19 23:40

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部