||
这个项目描述了在音频分割和分类系统开发方面所做的工作。许多现有的音频分类工作都涉及到对已知同质音频片段进行分类的问题。在这项工作中,录音被分成声学上相似的区域,并被分为基本的音频类型,如语音、音乐或静音。
本项目使用的音频特征包括Mel倒谱系数(MFCC)、过零率和短时能量(STE)。这些特征是从以WAV格式存储的音频文件中提取的,还考虑了直接从MPEG音频文件中提取特征的可能用途。基于这些特征的统计方法被用来分割和分类音频信号。使用的分类方法包括一般混合模型(GMM)和k-最近邻(k-NN)算法。实验结果表明,该系统对离散音频分类的准确率达到95%以上。
This project describes the work done on the development of an audio segmentation and classification system. Many existing works on audio classification deal with the problem of classifying known homogeneous audio segments. In this work, audio recordings are divided into acoustically similar regions and classified into basic audio types such as speech, music or silence. Audio features used in this project include Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate and Short Term Energy (STE). These features were extracted from audio files that were stored in a WAV format. Possible use of features, which are extracted directly from MPEG audio files, is also considered. Statistical based methods are used to segment and classify audio signals using these features. The classification methods used include the General Mixture Model (GMM) and the k- Nearest Neighbour (k-NN) algorithms. It is shown that the system implemented achieves an accuracy rate of more than 95% for discrete audio classification.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-9-27 06:34
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社