大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2017】一种新的基于数学的神经网络分析框架

已有 1743 次阅读 2019-1-19 11:16 |系统分类:科研笔记|文章来源:转载


本文为加拿大滑铁卢大学(作者:Anthony L. Caterini)的硕士论文,共102页。

 

在过去的十年中,深度神经网络(Deep Neural NetworksDNN)由于其在许多领域的成功应用而成为处理大量数据的非常流行的模型。这些模型是分层的,通常在网络中的每一层包含参数化的线性和非线性变换。然而,我们并不完全理解DNN为什么如此有效。在这篇论文中,我们探索了一种解决这个问题的方法:开发了一个表示神经网络的通用数学框架,并演示了如何使用这个框架来表示特定的神经网络结构。

 

在第一章中,我们首先探讨了数学对神经网络的贡献。我们可以严格地解释DNN的一些性质,但是这些结果并不能完全描述一般神经网络的机制。我们还注意到,描述神经网络的大多数方法依赖于将参数和输入分解成标量,而不是参考其底层向量空间,这增加了分析中的一些尴尬。我们的框架严格地在这些空间上进行操作,一旦我们使用的数学对象被很好地定义和理解,就提供了对DNN的更自然的描述。

 

然后,我们在第三章中开发了通用框架。我们能够描述一种算法,用于直接计算在其中定义了参数的内积空间上的梯度下降的每一步。此外,我们可以用简洁紧凑的形式表示误差反向传播步骤。除了标准平方损失或交叉熵损失,我们还证明了我们的框架(包括梯度计算)能够扩展到更复杂的涉及网络一阶导数的损失函数。

 

在开发了通用框架之后,我们在第四章中将其应用于三个具体的神经网络实例。我们从多层感知器(MLP)开始,它是最简单的DNN类型,并展示了如何为它生成梯度下降的步骤。然后,我们对卷积神经网络(CNN)进行了表征,它包含更复杂的输入空间、参数空间和每一层的变换。然而,CNN仍然符合通用框架。最后我们考虑的结构是深度自动编码器(DAE),它在每层具有不完全独立的参数。我们还能够扩展通用框架来处理类似的这种情况。

 

在第五章中,我们利用前几章的一些结果开发了一个递归神经网络(RNN的框架,即序列解析DNN体系结构。这些参数在网络的所有层上共享,因此我们需要一些额外的组织来描述RNN。我们首先描述了一个通用的RNN,然后描述了普通RNN的具体情况。我们再次直接在内积空间上计算梯度。

 

Over the past decade, Deep Neural Networks(DNNs) have become very popular models for processing large amounts of databecause of their successful application in a wide variety of fields. Thesemodels are layered, often containing parametrized linear and non-lineartransformations at each layer in the network. At this point, however, we do notrigorously understand why DNNs are so effective. In this thesis, we explore oneway to approach this problem: we develop a generic mathematical framework forrepresenting neural networks, and demonstrate how this framework can be used torepresent specific neural network architectures. In chapter 1, we start byexploring mathematical contributions to neural networks. We can rigorouslyexplain some properties of DNNs, but these results fail to fully describe themechanics of a generic neural network. We also note that most approaches todescribing neural networks rely upon breaking down the parameters and inputsinto scalars, as opposed to referencing their underlying vector spaces, whichadds some awkwardness into their analysis. Our framework strictly operates overthese spaces, affording a more natural description of DNNs once themathematical objects that we use are well-defined and understood. We thendevelop the generic framework in chapter 3. We are able to describe analgorithm for calculating one step of gradient descent directly over the innerproduct space in which the parameters are defined. Also, we can represent theerror backpropagation step in a concise and compact form. Besides a standardsquared loss or cross-entropy loss, we also demonstrate that our framework,including gradient calculation, extends to a more complex loss functioninvolving the first derivative of the network. After developing the genericframework, we apply it to three specific network examples in chapter 4. Westart with the Multilayer Perceptron (MLP), the simplest type of DNN, and showhow to generate a gradient descent step for it. We then represent theConvolutional Neural Network (CNN), which contains more complicated inputspaces, parameter spaces, and transformations at each layer. The CNN, however,still fits into the generic framework. The last structure that we consider isthe Deep Auto-Encoder (DAE), which has parameters that are not completelyindependent at each layer. We are able to extend the generic framework tohandle this case as well. In chapter 5, we use some of the results from theprevious chapters to develop a framework for Recurrent Neural Networks (RNNs),the sequence-parsing DNN architecture. The parameters are shared across alllayers of the network, and thus we require some additional machinery todescribe RNNs. We describe a generic RNN first, and then the specific case ofthe vanilla RNN. We again compute gradients directly over inner product spaces.

 

引言与研究动机

数学基础知识

神经网络的通用表达式

具体的神经网络描述

递归神经网络RNN

结论与未来工作展望


下载英文原文地址:

http://page2.dfpan.com/fs/2lcj42216291267b292/ 


更多精彩文章请关注微信号:qrcode_for_gh_60b944f6c215_258.jpg



https://blog.sciencenet.cn/blog-69686-1157922.html

上一篇:[转载]【图片新闻】据报道中国人民解放军在西北地区部署东风-26弹道导弹
下一篇:[转载]【读书2】【2014】基于MATLAB的雷达信号处理基础(第二版)——基本雷达信号处理(1)
收藏 IP: 222.190.121.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-20 08:24

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部