大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2019.04】机器学习与音频处理

已有 1053 次阅读 2021-5-27 20:29 |系统分类:科研笔记|文章来源:转载

图片


本文为新西兰梅西大学(作者:Junbo Ma)的博士论文,共150页。

 

在这篇论文中,我们分别讨论了深度神经网络和聚类的两个重要理论问题。此外,我们还提出了一种新的复调声音事件检测方法,这是音频处理领域最重要的应用之一。提出了三种新的方法:(i)大裕度递归神经网络(LMRNN),它通过在广泛使用的交叉熵损失函数中引入大裕度项,提高了原递归神经网络的识别能力。所提出的大裕度项利用大裕度判别原理作为启发式项,在训练过程中充分利用数据标签的信息,同时考虑目标类别和竞争类别。(ii)鲁棒多视图连续子空间聚类(RMVCSC)方法,该方法在从所有视图中学习的公共视图不变子空间上进行聚类。聚类结果和公共表示子空间由一个连续的目标函数同时优化。在目标函数中,采用鲁棒估计方法,在保持簇内一致性的同时,自动裁剪似是而非的簇间连接。因此,开发的RMVCSC可以在不预先设置簇数目的情况下化解开严重混合的簇。(iii)基于关系递归神经网络(RRNN)的复调声音事件检测方法,该方法利用RRNN的关系推理能力来解开录音中重叠的声音事件。不同于以往将所有历史信息混合并打包到一个共同的隐藏记忆向量中的方法,该方法允许历史信息在录音过程中相互作用,从而有效地解决了重叠的声音事件。这三种方法都在广泛使用的数据集上进行了测试,并与最近发表的著作进行了比较。实验结果证明了所提出方法的有效性。

 

In this thesis, we addressed two important theoretical issues in deep neural networks and clustering, respectively. Also, we developed a new approach for polyphonic sound event detection, which is one of the most important applications in the audio processing area. The developed three novel approaches are: (i) The Large Margin Recurrent Neural Network (LMRNN), which improves the discriminative ability of original Recurrent Neural Networks by introducing a large margin term into the widely used cross-entropy loss function. The developed large margin term utilises the large margin discriminative principle as a heuristic term to navigate the convergence process during training, which fully exploits the information from data labels by considering both target category and competing categories. (ii) The Robust Multi-View Continuous Subspace Clustering (RMVCSC) approach, which performs clustering on a common view-invariant subspace learned from all views. The clustering result and the common representation subspace are simultaneously optimised by a single continuous objective function. In the objective function, a robust estimator is used to automatically clip specious inter-cluster connections while maintaining convincing intra-cluster correspondences. Thus, the developed RMVCSC can untangle heavily mixed clusters without pre-setting the number of clusters. (iii) The novel polyphonic sound event detection approach based on Relational Recurrent Neural Network (RRNN), which utilises the relational reasoning ability of RRNNs to untangle the overlapping sound events across audio recordings. Different from previous works, which mixed and packed all historical information into a single common hidden memory vector, the developed approach allows historical information to interact with each other across an audio recording, which is effective and efficient in untangling the overlapping sound events. All three approaches are tested on widely used datasets and compared with recently published works. The experimental results have demonstrated the effectiveness and efficiency of the developed approaches.

 

1.       引言

2. 大裕度递归神经网络

3. 鲁棒多视图的连续子空间聚类

4. 复音事件检测

5. 结论与未来展望


更多精彩文章请关注公众号:205328s611i1aqxbbgxv19.jpg




https://blog.sciencenet.cn/blog-69686-1288491.html

上一篇:[转载]【信息技术】【2012.08】高质量的音乐音频源分离
下一篇:[转载]【信息技术】【2015.12】基于热成像的广域监控目标检测与跟踪
收藏 IP: 60.169.68.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-19 14:51

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部