基于高斯混合模型的感知域音频编码方法

doi:10.3969/j.issn.1000-3428.2015.10.050

计算机工程

基于高斯混合模型的感知域音频编码方法

吕亚平¹,高戈¹,陈怡²,张康¹

(1.武汉大学计算机学院国家多媒体软件工程技术研究中心,武汉 430072;2.华中师范大学计算机学院,武汉 430072)

收稿日期:2014-09-16 出版日期:2015-10-15 发布日期:2015-10-15
作者简介:吕亚平(1990-),女,硕士研究生,主研方向:音频编码与处理;高戈、陈怡,副教授、博士;张康,硕士研究生。
基金资助:
国家自然科学基金资助项目(614712710)。

Perceptual Domain Audio Coding Method Based on Gaussian Mixture Model

LV Yaping ¹,GAO Ge ¹,CHEN Yi ²,ZHANG Kang ¹

(1.National Engineering Research Center for Multimedia Software,Computer College,Wuhan University, Wuhan 430072,China; 2.Computer College,Central China Normal University,Wuhan 430072,China)

Received:2014-09-16 Online:2015-10-15 Published:2015-10-15

摘要/Abstract

摘要： 传统感知音频编码方案采用心理声学掩蔽降低编码码率,其声道模型+信号激励的方式难以同时实现高质量的中低码率语音和音频信号编码。为此,提出一种基于高斯混合模型的感知域音频编码方法,利用Gammatone滤波器组模拟人耳听觉系统,采用多路复用掩蔽模型替换降低包络脉冲的数量,对结构化模型进行拟合,使用高斯-牛顿算法对听觉包络进行高斯混合模型参数的拟合,将高斯混合模型参数替代音频信号特征。实验结果表明,与基于稀疏包络表示重构的音频编码方法相比,该方法的主观测试高0.5分~0.8分,客观测试高5分~10分,解码得到的语音和大部分音乐信号都能还原到原始音频信号,可用于实现高质量的中低码率语音和音频编码。

关键词: 人耳听觉系统, 感知域音频编码, 高斯混合模型, Gammatone滤波器组, 高斯-牛顿算法

Abstract: For the traditional perceptual audio encoding scheme using the psychoacoustic mask effect to reduce coding rate,the channel model + signal incentive way is difficult to simultaneously realize high quality in low bit rate speech and audio signal coding.It proposes a perceptual domain audio coding algorithm based on Gaussian Mixture Model(GMM).The algorithm uses Gammatone filter groups to simulate the human auditory system,using multiplexer masking model and replace to reduce the number of pulse envelope and facilitate the use of structural model fitting,using the Gauss-Newton algorithm for the fitting of Gaussian mixture model parameters,using Gaussian mixture model parameter replace audio signal characteristics.The results prove that compared with the audio coding method based on the envelope with sparse reconstruction,subjective test is higher than 0.5 point to 0.8 point,and the objective test is higher than 5 point to 10 point,most of the speech and music signal can be restored to the effect of the original audio signal by decoding,and can be used to achieve high quality speech and audio encoding at low bit rate.

Key words: human auditory system, perceptual domain audio coding, Gaussian Mixture Model(GMM), Gammatone filter bank, Gauss-Newton algorithm

中图分类号:

TN912

吕亚平,高戈,陈怡,张康. 基于高斯混合模型的感知域音频编码方法[J]. 计算机工程.

LV Yaping,GAO Ge,CHEN Yi,ZHANG Kang. Perceptual Domain Audio Coding Method Based on Gaussian Mixture Model[J]. Computer Engineering.

https://www.ecice06.com/CN/Y2015/V41/I10/265

[1]	陈天宇, 楚程钱, 万思远, 万永菁, 孙静. 基于条件轻量级神经网络的视频入侵检测算法[J]. 计算机工程, 2023, 49(12): 152-160.
[2]	王文欣, 贺煜航, 陈刚. 基于EM路由算法的医学图像分割UCaps网络[J]. 计算机工程, 2022, 48(2): 268-274.
[3]	胡高珍, 徐胜军, 孟月波, 刘光辉, 冯峰, 段中兴. 基于边缘约束局部区域MRF的图像分割方法[J]. 计算机工程, 2021, 47(6): 253-261,270.
[4]	张墨华, 彭建华. 面向图像先验建模的可扩展高斯混合模型[J]. 计算机工程, 2020, 46(4): 220-227.
[5]	姚博凡, 邓红平, 蔡铭. 基于随机抽样GMM的城市交通运行状态模式分类[J]. 计算机工程, 2020, 46(12): 36-42.
[6]	郑文秀, 赵峻毅, 文心怡, 姚引娣. 基于瓶颈复合特征的声学模型建立方法[J]. 计算机工程, 2020, 46(11): 301-305,314.
[7]	邓路佳,刘平山. 基于GMM-FMs的广告点击率预测研究[J]. 计算机工程, 2019, 45(5): 122-126.
[8]	朱晓妤,严云洋,刘以安,高尚兵. 基于深度森林模型的火焰检测[J]. 计算机工程, 2018, 44(7): 264-270.
[9]	王万耀,段先华,徐丹,於跃成,黄炜亮. 基于显著性的Grabcut图像分割方法[J]. 计算机工程, 2018, 44(7): 230-236,243.
[10]	刘宇廷,毕海滨,郭强,倪颖杰. 基于网络拓扑与节点元数据的社团检测算法[J]. 计算机工程, 2018, 44(11): 178-183.
[11]	刘攀登,刘清明. 稀疏数据中基于高斯混合模型的位置推荐框架[J]. 计算机工程, 2018, 44(1): 62-68.
[12]	高晨兰,朱嘉钢. 静止背景下的人体行为识别方法[J]. 计算机工程, 2017, 43(10): 192-197.
[13]	贾阳,林高华,王进军,方俊,张永明. 基于显著性检测和高斯混合模型的早期视频烟雾分割算法[J]. 计算机工程, 2016, 42(2): 206-209,217.
[14]	董兰芳,余家奎. 基于图像分离的视频烟雾检测方法[J]. 计算机工程, 2015, 41(9): 251-254,260.
[15]	冀续烨,陈明,冯国富,赵海乐. 一种多模型协同的目标提取方法[J]. 计算机工程, 2015, 41(5): 254-258,263.

选择文件类型/文献管理软件名称

选择包含的内容

基于高斯混合模型的感知域音频编码方法

Perceptual Domain Audio Coding Method Based on Gaussian Mixture Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于高斯混合模型的感知域音频编码方法

Perceptual Domain Audio Coding Method Based on Gaussian Mixture Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价