作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 多媒体技术及应用 • 上一篇    下一篇

基于高斯混合模型的感知域音频编码方法

吕亚平1,高戈1,陈怡2,张康1   

  1. (1.武汉大学计算机学院国家多媒体软件工程技术研究中心,武汉 430072;2.华中师范大学计算机学院,武汉 430072)
  • 收稿日期:2014-09-16 出版日期:2015-10-15 发布日期:2015-10-15
  • 作者简介:吕亚平(1990-),女,硕士研究生,主研方向:音频编码与处理;高戈、陈怡,副教授、博士;张康,硕士研究生。
  • 基金资助:
    国家自然科学基金资助项目(614712710)。

Perceptual Domain Audio Coding Method Based on Gaussian Mixture Model

LV Yaping  1,GAO Ge  1,CHEN Yi  2,ZHANG Kang  1   

  1. (1.National Engineering Research Center for Multimedia Software,Computer College,Wuhan University, Wuhan 430072,China; 2.Computer College,Central China Normal University,Wuhan 430072,China)
  • Received:2014-09-16 Online:2015-10-15 Published:2015-10-15

摘要: 传统感知音频编码方案采用心理声学掩蔽降低编码码率,其声道模型+信号激励的方式难以同时实现高质量的中低码率语音和音频信号编码。为此,提出一种基于高斯混合模型的感知域音频编码方法,利用Gammatone滤波器组模拟人耳听觉系统,采用多路复用掩蔽模型替换降低包络脉冲的数量,对结构化模型进行拟合,使用高斯-牛顿算法对听觉包络进行高斯混合模型参数的拟合,将高斯混合模型参数替代音频信号特征。实验结果表明,与基于稀疏包络表示重构的音频编码方法相比,该方法的主观测试高0.5分~0.8分,客观测试高5分~10分,解码得到的语音和大部分音乐信号都能还原到原始音频信号,可用于实现高质量的中低码率语音和音频编码。

关键词: 人耳听觉系统, 感知域音频编码, 高斯混合模型, Gammatone滤波器组, 高斯-牛顿算法

Abstract: For the traditional perceptual audio encoding scheme using the psychoacoustic mask effect to reduce coding rate,the channel model + signal incentive way is difficult to simultaneously realize high quality in low bit rate speech and audio signal coding.It proposes a perceptual domain audio coding algorithm based on Gaussian Mixture Model(GMM).The algorithm uses Gammatone filter groups to simulate the human auditory system,using multiplexer masking model and replace to reduce the number of pulse envelope and facilitate the use of structural model fitting,using the Gauss-Newton algorithm for the fitting of Gaussian mixture model parameters,using Gaussian mixture model parameter replace audio signal characteristics.The results prove that compared with the audio coding method based on the envelope with sparse reconstruction,subjective test is higher than 0.5 point to 0.8 point,and the objective test is higher than 5 point to 10 point,most of the speech and music signal can be restored to the effect of the original audio signal by decoding,and can be used to achieve high quality speech and audio encoding at low bit rate.

Key words: human auditory system, perceptual domain audio coding, Gaussian Mixture Model(GMM), Gammatone filter bank, Gauss-Newton algorithm

中图分类号: