大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【信息技术】【2011.10】基于激励的语音声学事件的信号处理分析

已有 1463 次阅读 2019-2-6 11:29 |系统分类:科研笔记|文章来源:转载


本文为印度理工学院(作者:DHANANJAYAN.)的博士论文,共227页。

 

传统上,语音信号中的信息被表示为一系列特征向量,每个向量从大约20 - 30毫秒的片段中派生,并以5 - 10毫秒移动一次。通常特征向量表示与声道系统形状相对应的短时谱包络信息。在大多数情况下,特征向量由mel频率倒谱系数(MFCCs)或加权线性预测倒谱系数(wLPCCs)或这些参数的一些变体组成。许多语音系统,如语音和说话人识别系统,都是利用这些语音表示作为属于公共特征空间的特征向量序列来开发的。然而,在这些表示中,对语音产生机制的信息,特别是激励源信息,并没有得到充分的利用或捕获。

 

众所周知,语音是由一系列声学事件产生和感知的。事件的信息也可以从公共特征空间中特征向量序列的表示中得到。但是,构成声学事件的许多特征取决于每个事件的不同产生性质。这些性质可能需要强调不同的产生机理,如激励源的自然属性和声道系统不同部分的耦合/去耦特性。这些事件的表示可能需要不同的集合特征,因此可能无法在公共特征空间中表示它们。换句话说,语音信号可能具有大量重要的信息,这些信息可能仅通过频谱包络中的信息和总激励信息(如浊音/清音)是无法表示的。由于特征向量表示中的语音信号缺乏附加信息,语音系统的性能可能受到限制。

 

本研究的目的是确定一些重要的声学事件及其所需的特征。为了提取这些特征,除了频谱分析工具外,还需要新的信号处理工具。特别是,在描述许多有用事件时,声道系统激励源中的信息非常重要。有些事件可能需要足够的时间或频谱分辨率,而使用基于标准离散傅立叶变换(DFT)的频谱分析可能无法实现。有趣的是,频谱特征有时只需要在重要激励的瞬间确定,而不是在任意选择的瞬间附近选择任意的时间段来确定。

 

在这项研究中,我们选择了一些稳定的声学事件来检验在不同事件中不同的信息表达方式的必要性。不考虑短脉冲中发生的事件,本文提出了零频率滤波、零时间同态滤波和群延迟处理等信号处理方法,从而提取常规短时谱分析工具无法提取的信息。用于详细分析的声学事件包括:浊音/清音、声带、颤音、鼻音和摩擦音。为了充分描述这些事件,需要激励源和声道系统的特征,验证了这些特征在连续语音中检测声学事件的应用。

 

电话识别器的发展过程中,通过对声音事件的分析,证明了附加信息的重要性。从基于激励的声学事件分析中获得的信息被考虑用于在不同层次(即特征、约束和决策层次)改进基本的电话识别器。结果表明,将这些额外的事件信息添加到现有的特征向量中,可以提高电话识别器的性能。这可以通过首先在特征向量上附加已知的事件标签来说明,然后利用所提出的信号处理方法提取的事件信息来研究性能的改善。本文通过检测语音中的声音事件来实现有限的语音分割,从而提高了基本系统在检测电话发音方式方面的性能。

 

Information in speech signal istraditionally represented as a sequence of feature vectors, each vector derivedfrom a segment of about 20-30 ms for a shift of every 5-10 ms. Typically thefeature vector represents the short-time spectral envelope information, whichcorresponds to the shape of the vocal tract system. In most cases the feature vectorconsists of either mel frequency cepstral coefficients (MFCCs) or weightedlinear prediction cepstral coefficients (wLPCCs) or some variants of theseparameters. Many speech systems such as speech and speaker recognition systemsare developed using these representations of speech as a sequence of featurevectors belonging to a common feature space. However, the knowledge of thespeech production mechanism, especially the excitation source information, isnot adequately utilized or captured in these representations. It is well knownthat speech is produced and perceived as a sequence of acoustic events. Theinformation of the events is also derived from the representation in the formof a sequence of feature vectors in a common feature space. But many of thefeatures that make up an acoustic event depend on the distinct productioncharacteristics for each event. These characteristics may require emphasis ondifferent aspects of production such as nature of excitation source andcoupling/decoupling of different parts of the vocal tract system.Representation of these events may need different sets of features and hencemay not be possible to represent them in a common feature space. In otherwords, speech signal may have significant amount of information which may notbe possible to represent only by the information in the spectral envelope andgross excitation information such as voiced/nonvoiced. Performance of speechsystems may be limited due to lack of the additional information in the speechsignal in the feature vector representation. The objective of this study is toidentify some significant acoustic events and the features needed to representthem. To extract those features, new signal processing tools are needed,besides spectral analysis tools. In particular, the information in the sourceof excitation of the vocal tract system is important in describing many usefulevents. Some of the events may need adequate temporal or spectral resolution,which may not be possible to realize using the standard discrete Fouriertransform (DFT) based spectrum analysis. It is also interesting to note thatsometimes the spectral features need to be determined only around the instantsof significant excitation, rather than over an arbitrarily chosen intervalaround an arbitrarily chosen instant. In this study a few steady acousticevents are chosen to examine the need for different ways of representing theinformation in different events. Events occurring in short bursts such as stopsare not considered. Signal processing methods such as zerofrequency filtering,zero-time liftering and group delay processing are proposed to extractinformation that is not possible to extract by the conventional short-timespectral analysis tools. The acoustic events considered for detailed analysisare: voiced/nonvoiced, voice bars, trills, nasals and fricatives. To describethese events adequately, both the excitation source and vocal tract systemfeatures are needed. Application of these features for spotting acoustic eventsin continuous speech is demonstrated. Significance of the additionalinformation derived by the analysis of acoustic events is demonstrated in thecontext of development of a phone recognizer. The information derived from theexcitation-based analysis of acoustic events is considered for refinement ofthe baseline phone recognizer at various levels, namely, feature, constraintand decision levels. It is shown that appending this additional event knowledgeto the existing feature vector can improve the performance of the phonerecognizer. This is illustrated by first appending the feature vector withknown labels of the events. The events information extracted by the proposedsignal processing methods is then used to study the improvement in performance.The limited acoustic-phonetic segmentation achieved in this thesis by detectingacoustic events in speech is shown to improve the performance of the baselinesystem in detecting the manner of articulation of a phone.

 

引言

语音分析综述

基于激励分析语音的信号处理方法

语音中的浊音/清音分割

声带分析

颤音分析

7鼻杂音分析

摩擦音分析

9电话识别的语音分析

10 总结与结论


下载英文原文地址:

http://page2.dfpan.com/fs/7lc0j2e21f29a1655c7/  


更多精彩文章请关注微信号:qrcode_for_gh_60b944f6c215_258.jpg



https://blog.sciencenet.cn/blog-69686-1160932.html

上一篇:[转载]【新书推荐】【2018】天线分析与设计研究进展:第1卷
下一篇:[转载]【读书2】【2014】基于MATLAB的雷达信号处理基础(第二版)——雷达散射截面的统计描述(4)
收藏 IP: 183.160.73.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-19 12:52

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部