大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916



已有 1120 次阅读 2020-8-23 16:44 |系统分类:科研笔记|文章来源:转载

本文为澳大利亚新南威尔士大学(作者:Vidhyasaharan Sethu)的博士论文,共186页。




An essential step to achievinghuman-machine speech communication with the naturalness of communicationbetween humans is developing a machine that is capable of recognising emotionsbased on speech. This thesis presents research addressing this problem, bymaking use of acoustic and prosodic information. At a feature level, novelgroup delay and weighted frequency features are proposed. The group delayfeatures are shown to emphasise information pertaining to formant bandwidthsand are shown to be indicative of emotions. The weighted frequency feature,based on the recently introduced empirical mode decomposition, is proposed as acompact representation of the spectral energy distribution and is shown tooutperform other estimates of energy distribution. Feature level comparisonssuggest that detailed spectral measures are very indicative of emotions whileexhibiting greater speaker specificity. Moreover, it is shown that all featuresare characteristic of the speaker and require some of sort of normalisationprior to use in a multi-speaker situation. A novel technique for normalisingspeaker-specific variability in features is proposed, which leads tosignificant improvements in the performances of systems trained and tested ondata from different speakers. This technique is also used to investigate theamount of speaker-specific variability in different features. A preliminarystudy of phonetic variability suggests that phoneme specific traits are notmodelled by the emotion models and that speaker variability is a moresignificant problem in the investigated setup. Finally, a novel approach toemotion modelling that takes into account temporal variations of speechparameters is analysed. An explicit model of the glottal spectrum isincorporated into the framework of the traditional source-filter model, and theparameters of this combined model are used to characterise speech signals. Anautomatic emotion recognition system that takes into account the shape of thecontours of these parameters as they vary with time is shown to outperform asystem that models only the parameter distributions. The novel approach is alsoempirically shown to be on par with human emotion classification performance.




1. 引言

2. 语音与情感

3. 语音特征

4. 说话人可变性

5. 静态分类方法

6. 情感识别的语音参数化

7. 结论与未来工作展望



收藏 IP: 114.102.184.*| 热度|


该博文允许注册用户评论 请点击登录 评论 (0 个评论)


Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-9-19 07:27

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社
