大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2018】基于深度学习的单麦克风语音增强与分离

已有 165 次阅读 2020-3-31 16:15 |系统分类:科研笔记|关键词:学者|文章来源:转载

本文为丹奥尔堡大学(作者:Morten Kolbæk)的博士论文,共255页。

 

鸡尾酒会问题包括在复杂的声学环境中聆听和理解语音信号的挑战性任务,在复杂的声学环境中,多个扬声器和背景噪声信号会同时干扰感兴趣的语音信号。在这种复杂的声学环境中,有效提高语音信号可懂度和质量的信号处理算法是非常理想的。特别是在涉及移动通信设备和助听设备的应用中,提高语音清晰度和噪声语音信号的质量已经成为科学家和工程师半个多世纪以来的目标。由于机器学习技术的重新出现,今天被称为深度学习,这种算法所涉及的挑战可能会被克服。

 

在这篇博士论文中,我们研究了鸡尾酒会问题中的两个主要子学科:单麦克风语音增强单麦克风多扬声器语音分离的基于深度学习的技术。具体来说,我们对基于深度学习的单麦克风语音增强算法的泛化能力进行了深入的实证分析。结果表明,该算法的性能与训练数据密切相关,通过精心设计训练数据可以获得良好的泛化能力。此外,我们还提出了一种基于深度学习的单麦克风语音分离算法,即全音级置换不变训练(uPIT,并报告了与说话人无关的多说话人语音分离任务的最新结果。此外,我们还发现,uPIT没有明确的噪声类型或扬声器数量的先验知识的情况下,能够很好地实现联合语音分离和增强。最后,我们证明了基于深度学习的语音增强算法是为了最小化经典的短时谱幅度均方误差而设计的,这使得语音增强信号在短时目标可懂度(STOI方面基本上是最优的,而短时目标可懂度(STOI)是一种最新的语音可懂度估计器。这是重要的,因为它表明,通过基于深度学习的语音增强算法实现目的是最大化STOI

 

The cocktail party problem comprises the challenging task oflistening to and understanding a speech signal in a complex acousticenvironment, where multiple speakers and background noise signalssimultaneously interfere with the speech signal of interest. A signalprocessing algorithm that can effectively increase the speech intelligibilityand quality of speech signals in such complicated acoustic situations is highlydesirable. Especially for applications involving mobile communication devicesand hearing assistive devices, increasing speech intelligibility and quality ofnoisy speech signals has been a goal for scientists and engineers for more thanhalf a century. Due to the re-emergence of machine learning techniques, today,known as deep learning, the challenges involved with such algorithms might beovercome. In this PhD thesis, we study and develop deep learning-basedtechniques for two major sub-disciplines of the cocktail party problem: single-microphone speech enhancement andsingle-microphone multi-talker speech separation. Specifically, we conduct in-depth empirical analysis of thegeneralizability capability of modern deep learning-based single-microphonespeech enhancement algorithms. We show that performance of such algorithms is closelylinked to the training data, and good generalizability can be achieved withcarefully designed training data. Furthermore, we propose utterancelevelPermutation Invariant Training (uPIT), a deep learning-based algorithm forsingle-microphone speech separation and we report state-of-the-art results on aspeaker-independent multi-talker speech separation task. Additionally, we showthat uPIT works well for joint speech separation and enhancement withoutexplicit prior knowledge about the noise type or number of speakers, which, atthe time of writing, is a capability only shown by uPIT. Finally, we show thatdeep learning-based speech enhancement algorithms designed to minimize theclassical short-time spectral amplitude mean squared error leads to enhancedspeech signals which are essentially optimal in terms of Short-Time ObjectiveIntelligibility (STOI), a state-of-theart speech intelligibility estimator.This is important as it suggests that no additional improvements in STOI can beachieved by a deep learning-based speech enhancement algorithm, which isdesigned to maximize STOI.

 

1. 语音增强与分离

2. 深度学习

3. 用于增强与分离的深度学习研究

4. 科学贡献

5. 未来研究方向

附录通用和专用深度神经网络语音增强系统的语音可懂度潜力

附录基于长短期记忆的递归神经网络语音增强在抗噪说话人验证中的应用

附录独立说话人语音分离深度模型的置换不变训练

附录基于深度递归神经网络的多人语音分离

附录基于递归神经网络和置换不变训练的带噪多人语音联合分离与去噪

附录基于短时目标可懂度测量的深度神经网络单声道语音增强

附录短时目标可懂度与短时谱幅度均方误差的关系


更多精彩文章请关注公众号:qrcode_for_gh_60b944f6c215_258.jpg



http://blog.sciencenet.cn/blog-69686-1226182.html

上一篇:[转载]【信息技术】【2006.12】医学图像配准及其在ATLAS分割中的应用
下一篇:[转载]【计算机科学】【2014】优化生成图像的神经网络

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2020-6-5 23:24

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部