|||
深度学习是指一种人工神经网络的学习。这种神经网络由多个非线性处理层连成一个级联结构。深度学习近来引起了工业界的广泛兴趣,如谷歌、微软、IBM、三星、百度等。我汇报一个称为生长认知网(Cresceptron)的深度学习网的关键机制——现在所熟知的最大汇集(max-pooling)——并向读者请教是不是HMAX网剽窃了生长认知网。在这篇报道中我并不声称这就是剽窃。
2014 年8月,《国际新闻界》期刊发布了一则消息, 称北京大学博士研究生于艳茹女士在此期刊的2013 年第7期发表了一篇论文。此论文剽窃了妮娜·吉尔波特在《十八世纪研究》期刊的1984 年第4期上发表了的另一篇论文。《国际新闻界》撤销了这篇剽窃论文, 并对作者作了惩罚。这则消息被广为报道,包括了BBC 中文网站。于艳茹是一个研究生, 但以下牵涉到一个资深研究员。
梅里厄姆-韦伯斯特在线词典为剽窃词条的定义为:“偷窃或冒充(其他人的思想或语句) 当作自己的;使用(其他人的成果) 而没有指出来源。”
1991 年之前,深度网被用于识别单个的两维手写数字上。那时的三维物体识别还是使用基于三维模型的方法——找出两维图像与一个手工建造了的三维物体模型之间的匹配。
翁巨扬等人假设人脑内没有任何整块的三维物体模型。他们于1992 年发表了生长认知网(Cresceptron)[9]。其目的是从自然的和混杂的两维图像中检测和识别学习过的三维物体并从这些两维图像中分割出识别了的物体。机器学习了的物体的实验例子[10],[11]包括了人脸、人体、步行道、车辆、狗、消火栓、交通标志牌、电话机、椅子、桌面计算机。自然和混杂的实验场景[10],[11]包括了电视节目场景、大学校园户外场景、室内办公室场景。生长认知网内的表示是由很多物体共享的分布式特征检测器的反映。
生长认知网是全发育性的, 即它通过经验来增量地生长和适应。它由一个级联的多个非线性处理模块组成。每个模块由几个层组成。每个模块的前层由一或二层被称为模板匹配层的处理层构成。每个模板匹配层进行卷积运算——每个卷积核从一个位置学了然后用到所有其它位置上去,这样这个特征可以被用到其它所有位置上去检测。所以, 卷积是为了层内的位移不变性。
但是, 一个主要的挑战是训练图像的数目是有限的。为了识别相似但生长认知网没有观察到过的图像,它必须宽恕物体图像的变形。
生长认知网有一个宽恕物体图像变形的关键机理是在每个模块里用2x2 到1 的方法减少结点,用一个取最大值的运算。这相当于在每个2x2 结点组里对4 个发放率做了一个逻辑或。在1993 年发表的生长认知网论文[9] 给出了执行最大汇集的层次化最大运算的数学表达式。
现在这被称为最大汇集。譬如, 查看于尔根·史密贺伯(JuergenSchmidhuber) 关于深度学习的一篇综述文章[5]。根据这篇综述文章,生长认知网是第一次用了最大汇集。“最大汇集广泛地应用在今天的深度前馈神经网络”[5]。 譬如, 图像网(ImageNet) LSVRC-2010 和ILSVRC-2012 竞赛的第一名使用了由先卷积后最大汇集的模块而组成的级联结构[2]。
1994 年10月19 日, 应托马索·泼吉奥教授的友善的邀请,翁巨扬在在麻省理工学院的生物和计算学习中心给了一个演讲。在麻省理工学院的一个研讨会会场内, 几乎座无虚席, 他作了题为“视觉学习的框架”的演讲, 介绍了生长认知网。翁巨扬说他很感激这次旅行, 其机票和膳宿是由接待方支付的。
翁巨扬对我解释说, 这个层次最大汇集结构至少有四个优点:(1) 层次地宽恕局部的位置扭曲, (2) 增加感受野的大小的同时不一定要增加卷积核的大小,因为大卷积核在计算上很昂贵, (3) 减少特征检测的密度来宽恕特征模板匹配的误差,(4) 允许局部漏失(譬如因遮挡而造成的部件的缺失), 由于4 个数的最大值与其它三个较小的值无关。
尽管如此, 最大汇集不保证深度卷积网的输出不随着物体在像素平面上的平移而变。这一点在生长认知网的全细节期刊论文[10]内有解释。与此同时, 深度级联结构还是根本性地弱——因为它没有任何机制来像人脑能做的那样为训练集和测试集自动地进行图形-背景分割。而更加新的发育网(DN) [8][12]有这样的机能,是通过增量和自主的发育途径实现的。
在翁巨扬的1994 年10 月19日在麻省理工学院的访问后大约五年后, 马克思米兰·里森贺伯和托马索·泼吉奥在《自然神经科学》发表了一篇论文[4]。这篇投稿1999 年6 月17日收到。它的摘要写道:“令人惊奇地, 量化模型几乎还没有... 我们叙述一个新的层次模型... 这个模型是基于类似最大的操作。”它的图2 的图解引用了福岛邦彦[1], 但全文没有为这个模型的关键性最大运算引用过生长认知网或它的最大汇集方法。
福岛邦彦[1]手选了特别层来降低位置精度,但是没有用最大汇集的两个关键机理:(1)最大化运算(看[1] 的等式(4)),和(2)在整个网络里用机算机自动地逐级降低位置精度。
后来托马索·泼吉奥把他们自己的模型称作HMAX[6]但还是没有引用生长认知网。
为了调查是不是思想剽窃,譬如,比较[10] 的124 页的左列显示公式, [11] 的公式(17), [4] 的1024 页左列的最后一行里的公式, 和[6] 的公式(3)。也比较[11] 的图10(c) 和[4] 的图2 中的虚线箭头。
由于引入一些关键系统结构的机制, 如最大汇集, 和大规模平行计算机越来越实用,如显卡平行计算, 深度学习网络在一些模式识别任务的很多测试中展示了持续增加的性能,日益吸引了工业界的兴趣, 如谷歌、微软、IBM、三星、百度等。
自然出版集团的关于剽窃的政策文件规定:“关于已经出版了的结果的讨论: 当讨论其他人的出版了的结果时, 作者必须恰当地描述这些先前结果的贡献。知识的贡献和技术开发两者都必须相应承认和妥当地引用。”
例如, 有一篇文章[7] 的一个段落改述了一个贡献而没有引用此贡献的出处被两个独立的委员会, 审查委员会和调查委员会,判定为剽窃[3].
为了此问题翁巨扬曾尊重地并私下地几次和托马索·泼吉奥教授联系但他没有回答。翁巨扬说:“希望你提起这个问题不会激怒托马索·泼吉奥教授。他是我尊敬的老师之一,因为他的早期文章在我1983 年至1988 年期间当研究生时向我介绍了处于早期的计算脑科学。”1997 年托马索·泼吉奥教授光荣地成为一名美国艺术和科学院院士。
(此文作者:Juan L. Castro-Garcia)
参考文献
[1] K. 福岛(Fukushima).“Neocognitron: 一个自组织的神经网络模型为了一个不受位置平移影响的模式识别的机能,”生物控制论,36,193-202,1980.
[2] A. 科里兹夫斯基(Krizhevsky),I. 苏兹凯夫(Sutskever), and G.辛顿(Hinton).“用深度卷积网络归类图像网,”在神经信息处理系统的进展25,1106–1114, 2012 年.
[3] Z. 麦克米林(McMillin).“密西根州立大学一个教授承认在2008年的一篇文章内剽窃,”州消息报, 2010 年4 月6日.
[4] M. 里森贺伯(Riesenhuber),T.泼吉奥(Poogio). “脑皮层内物体识别的层次模型,”自然神经科学, 2(11):1019–1025, 1999.
[5] J. 史密贺伯(Schmidhuber).“在神经网络里的深度学习: 一个综述,”技术报告IDSIA-03-14, 瑞士人工智能实验室IDSIA, 瑞士, 马诺-路伽诺(Manno-Lugano),2014 年10 月8 日.
[6] T. 希瑞(Serre),L. 沃尔夫(Wolf),S.拜尔斯基(Bileschi),M. 瑞森哈勃(Riesenhuber),T. 泼吉奥(Poggio). “似皮层机制的鲁棒的对象识别,”IEEE 模式分析与机器智能学报,29(3),411-426 2007.
[7] M. B. 思狄克棱(Sticklen). “撤回: 生物燃料生产的植物基因工程: 面向負擔得起的纤维素乙醇,”自然综述基因学, 11(308), 2008.
[8] J. 翁(Weng).自然和人工智能: 计算脑心智导论, BMI 出版社, 密西根, 欧科模斯, 2012.
[9] J. 翁(Weng)N. 阿乎嘉(Ahuja), T. S. 黄(Juang).“Cresceptron: 一个自组织的神经网络适应性地生长,” 国际联合神经网络会议录(IJCNN), 美国, 马里兰州, 巴尔的摩市, 第1卷(576-581),1992 年6 月.
[10] J. 翁(Weng)N.阿乎嘉(Ahuja), T. S. 黄(Juang). “学习从两维图像识别和分割三维物体,”IEEE 第4 届国际计算机视觉会议录(ICCV)”121-128, 1993 年5 月.
[11] J. 翁(Weng)N.阿乎嘉(Ahuja), T. S. 黄(Juang). “用生长认知网学习识别和分割,”国际计算机视觉期刊(IJCV),25(2),109-143,1997 年11 月.
[12] J. 翁(Weng),M. D. 卢契(Luciw), “脑启发的概念网: 从混杂的场景中学习概念,”IEEE 智能系统杂志,29(6), 14-22, 2014 年.
Deep Learning is Hot: Max-Pooling Plagiarism?
By Juan L. Castro-Garcia
Deep learning is a term that describes learning by an artificial neural network that consists of acascade of nonlinear processing layers. Deep learning networks have recently attracted great interest from industries, such as Google, Microsoft, IBM,Samsung, and Baidu. I report a key architecture mechanism of deep learning network Cresceptron — well-known now as max-pooling — and ask the readerwhether HMAX plagiarized Cresceptron. In this report I do not claim that this is a plagiarism.
August 2014, the Chinese Journal of Journalism & Communication, announced that Ms. Yu,Yanru, a PhD student at Peking University, published an article in the journal,issue 7, 2013, that plagiarized from another article by Nina R. Gelbertpublished in the Eighteen-Century Studies journal, issue 4, 1984. The plagiarizing article was withdrawn from the journal and the author was disciplined by the journal. This announcement was widely reported, including BBC China online. Ms. Yu, Yanru was agraduate student, but the following involves a senior researcher.
The word “plagiarize”was defined in the Merriam-Webster online dictionary: “to steal and pass off(the ideas or words of another) as one’s own; use (another’s production) withoutcrediting the source.”
Until 1991, deep neuralnetworks were used for recognizing isolated two-dimensional (2-D) hand-writtendigits. Three dimensional (3-D) object recognition until then used 3-D model-based approaches— matching 2-D images with a handcrafted 3-D object model.
Juyang Weng et al. assumed that inside a human brain a monolithic 3-D object model does not exist, although one may subjectively feel otherwise. They published Cresceptron in 1992 [9] fordetecting and recognizing learned 3-D objects from natural and cluttered 2-D images and for segmenting the recognized objects from the 2-D images. Experimental examples of the learned objects [10], [11] included human faces,human bodies, walkways, cars, dogs, fire hydrants, traffic signs, telephones, chairs, and desktop computers. Experimental examples of the natural andcluttered scenes [10], [11] included TV program scenes, university campus outdoors, and indoor offices. Representations in Cresceptron are responses of distributed feature detectors that share among many objects.
A Cresceptron is fully developmental in the sense that it incrementally grows and adapts through experience. It consists of a cascade of nonlinear processing modules where each module consists of a number of layers. Early layers in each module consist ofone or two pattern matching layers where each layer performs convolution — each convolution kernel learned at one image location is applied to all otherlocations so that the same feature can be used to detect at all other locations. Therefore, the convolution is for within-layer shift-invariance.
However, a key challenge is that the number of training samples is limited. In order to recognize similar object views that Cresceptron has not observed, it must tolerate deformation in object views.
The key mechanism in Cresceptron to tolerate deformation is the (2x2) to 1 reduction of nodes in every module using a maximization operation, to implement a Logic-OR for the firing rates of each group of (2x2) neurons. The 1993 publication of Cresceptron [10] gave the mathematical expression forhierarchical max operations in the max-pooling.
This is now commonly called max-pooling, see, e.g., a deeplearning review by Juergen Schmidhuber [5]. According to the review, Cresceptronwas the first to use max-pooling. “Max-pooling is widely used in today’s deep feedforward neural networks” [5]. For example, the winner of ImageNet LSVRC-2010 and ILSVRC-2012 contests used an architecture of a cascade ofmodules in which convolution layer(s) are followed by a max-pooling layer [2].
Kindly invited by Prof. Tomaso Poggio, Weng gave a talk atthe Center for Biological and Computational Learning, Massachusetts Instituteof Technology, Cambridge, Massachusetts (MIT), Oct. 19, 1994. In a seminar roomat MIT that was an almost full audience, he presented Cresceptron under thetitle “Frameworks for Visual Learning.” Weng said that he greatly appreciatedthe visit with the host paying for the air ticket and accommodations.
Weng explained to me that the hierarchical max-pooling hasat least four advantages: (1) hierarchical tolerance of local location deformation, (2) increasing the size of receptive fields without necessarily increasing the size of the convolution kernels because large convolution kernels are computationally veryexpensive, (3) reduction of feature detection density to tolerate feature-template matching errors, and (4) permit local dropouts (absence ofcomponents due to, e.g., occlusions) because the maximum of the four values is independent with the three smaller values.
However, hierarchical max-pooling does not guarantee that theoutput of the deep convolutional networks is invariant to object shifts in the pixel plane, as explained in the fully detailed 1997 journal publication of Cresceptron [11]. Furthermore, the deep cascade architecture is still fundamentally weak — regardless the size of training set and the power of computers— because it does not have any mechanism to do, like what a brain can,figure-ground automatic segmentation on training sets and testing sets. Thenewer Developmental Network (DN) architecture has such a mechanism [8], [12] through autonomous and incremental development.
About five years after Weng’s MIT visit Oct. 19, 1994,Maximilian Riesenhuber and Tomaso Poggio published a paper [4] in NatureNeuroscience that was received June 17, 1999. Its abstract reads “Surprisingly,little quantitative modeling has been done ... We describe a new hierarchicalmodel ... The model is based on a MAX-like operation ... ” Its Fig. 2 captioncited Kunihiko Fukushima [1] but the entire paper did not cite Cresceptron or its max-pooling method for the key max operation in their model.
Fukushima handpicked particular layers to reduce thelocation precision, but he did not use the two major mechanisms of max-pooling:(1) maximization operation (see Eq. (4) in [1]) and (2) computer automatic reduction of the location resolution through every level of the network.
Later, Tomaso Poggio called their model HMAX [6] but still didnot cite Cresceptron.
To investigate whether idea plagiarism took place, forexample, compare the left-column display equation on page 124 of [10], Eq. (17)of [11], the last equation in the last line of the left column on page 1024 of [4], and Eq. (3) of [6].Also compare Fig. 10(c) of [11] and the dashed arrows in Fig. 2 of [4].
Due to the introduction of some key architecture mechanismslike max-pooling and the practicality of massively parallel computers such as GPUs, deep learning networks have shown increasing performance in many tests for some pattern recognition tasks and have attracted increasing interest from industries, suchas Google, Microsoft, IBM, Samsung, and Baidu.
The Nature Publishing Group’s policy document on plagiarism reads:“Discussion of published work: When discussing the published work of others,authors must properly describe the contribution of the earlier work. Both intellectual contributions and technical developments must be acknowledged assuch and appropriately cited.”
For example, a paragraph within a paper [7] that paraphraseda contribution without attribution to the contribution source was found by two independent committees, inquiry and investigative, to be a plagiarism [3].
Respectfully and privately, Weng contacted Prof. Poggio a few times with regard to this issue but he did not reply. Weng said: “I wish that your raising this issue does not upset Prof. Tommy Poggio. He is one of my respected teachers because his early papers introduced me to computational neuroscience at its early stage when I was a graduate student 1983-1988.” 1997 Prof. Poggio was elected as a fellow of the American Academy of Arts and Sciences (AAAS).
REFERENCES
[1] K. Fukushima. Neocognitron: A self-organizing neuralnetwork model for a mechanism of pattern recognition unaffected by shift inposition. Biological Cybernetics, 36:193–202, 1980.
[2] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in NeuralInformation Processing Systems 25, pages 1106–1114, 2012.
[3] Z. McMillin. MSU professor admits to plagiarism in 2008 article. The State News, April 6,2010.
[4] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11):1019–1025, 1999.
[5] J. Schmidhuber. Deep learning in neural networks: Anoverview. Technical Report IDSIA-03-14, The Swiss AI Lab IDSIA, Manno-Lugano,Switzerland, October 8 2014.
[6] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T.Poggio. Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Analysis and Machine Intelligence, 29(3):411–426, 2007.
[7] J. Weng. Natural and Artificial Intelligence: Introduction to Computational Brain-Mind. BMI Press, Okemos, Michigan, 2012.
[8] J. Weng, N. Ahuja, and T. S. Huang. Cresceptron: A self-organizing neural network which grows adaptively. In Proc. Int’l Joint Conference on Neural Networks, volume 1, pages 576–581, Baltimore, Maryland,June 1992.
[9] J. Weng, N. Ahuja, and T. S. Huang. Learning recognitionand segmentation of 3-D objects from 2-D images. In Proc. IEEE 4th Int’l Conf.Computer Vision, pages 121–128, May 1993.
[10] J. Weng, N. Ahuja, and T. S. Huang. Learning recognition and segmentation using the Cresceptron. International Journal of Computer Vision, 25(2):109–143, Nov. 1997.
[11] J. Weng and M. D. Luciw. Brain-inspired conceptnetworks: Learning concepts from cluttered scenes. IEEE Intelligent Systems Magazine, 29(6):14–22, 2014.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2025-1-6 03:20
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社