大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2012.12】训练深度神经网络进行瓶颈特征提取

已有 1872 次阅读 2019-5-9 18:39 |系统分类:科研笔记|文章来源:转载


本文为德国卡尔斯鲁厄理工学院作者:Jonas Gehring)的硕士论文52

 

在自动语音识别系统中对音频信号进行预处理生成特征是实现良好识别率的重要组成部分现有的研究表明人工神经网络可以用来提取出比人工设计的特征提取算法具有更好识别性能的特征一种可能的方法是训练一个具有小瓶颈层的网络然后使用该层中单元的激活为系统其余部分生成特征向量

 

深度学习是机器学习的一个领域它是一种能够处理具有许多隐藏层的神经网络的有效训练算法并从数据中自动发现相关特征虽然深度学习通常应用于计算机视觉但最近的多篇文献已经证明了深度网络在语音识别任务上也能取得优异的性能

 

本文提出了一种从深度神经网络中提取瓶颈特征的新方法首先对去噪自动编码器堆栈在同一层次以无监督的方式进行训练然后将该堆栈转换为前馈神经网络和瓶颈层再添加一个隐藏层和分类层最后对整个网络进行微调以估计语音目标的状态以便在瓶颈层中生成识别特征多次广东话会话的电话语音实验表明该体系结构可以有效地利用深度神经网络所带来的容量增加从而生成更有用的特征获得更好的识别性能实验证明这种能力在很大程度上依赖于通过预训练初始化的自动编码器堆栈与倒谱系数的特征相比从对数mel尺度滤波器组系数中提取特征会产生额外的增益此外可以通过使用更多数据的预训练自动编码器来实现小的改进这对于只有少量转录数据可用的设置来说是一个有趣的特性对较大数据集的评估导致使用标准特征的基线系统的识别错误率相对误差为8%10%)显著降低因此证明了本文所提出体系结构的一般适用性

 

In automatic speech recognition systems, preprocessing the audio signal to generate features is an important part of achieving a good recognition rate. Previous works have shown that artificial neural networks can be used to extract good, discriminative features that yield better recognition performance than manually engineered feature extraction algorithms. One possible approach for this is to train a network with a small bottleneck layer, and then use the activations of the units in this layer to produce feature vectors for the remaining parts of the system. Deep learning is a field of machine learning that deals with efficient training algorithms for neural networks with many hidden layers, and with automatic discovery of relevant features from data. While most frequently used in computer vision, multiple recent works have demonstrated the ability of deep networks to achieve superior performance on speech recognition tasks as well. In this work, a novel approach for extracting bottleneck features from deep neural networks is proposed. A stack of denoising auto-encoders is first trained in a layer-wise and unsupervised manner. Afterwards, the stack is transformed to a feed-forward neural network and a bottleneck layer, an additional hidden layer and the classification layer are added. The whole network is then fine-tuned to estimate phonetic target states in order to generate discriminative features in the bottleneck layer. Multiple experiments on conversational telephone speech in Cantonese show that the proposed architecture can effectively leverage the increased capacity introduced by deep neural networks by generating more useful features that result in better recognition performance. Experiments confirm that this ability heavily depends on initializing the stack of autoencoders with pre-training. Extracting features from log mel scale filterbank coecients results in additional gains when compared to features from cepstral coecients. Further, small improvements can be achieved by pre-training auto-encoders with more data, which is an interesting property for settings where only little transcribed data is available. Evaluations on larger datasets result in significant reductions of recognition error rates (8% to 10% relative) over baseline systems using standard features, and therefore demonstrate the general applicability of the proposed architecture.

 

引言

项目背景

通过深度神经网络得到的瓶颈特征

实验

结论

附录训练算法

附录图形处理器上的神经网络训练


下载英文原文地址:

http://page4.dfpan.com/fs/7lc4j232152931679f1/ 


更多精彩文章请关注微信号:qrcode_for_gh_60b944f6c215_258.jpg



https://blog.sciencenet.cn/blog-69686-1178076.html

上一篇:[转载]【图片新闻】美国海军正在太平洋地区部署一艘“闪电航母”
下一篇:[转载]【源码】FEATool Multiphysics:PDE、物理学、FEM和CFD仿真工具箱
收藏 IP: 60.169.68.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-19 03:21

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部