Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (4): 97-106. doi: 10.19678/j.issn.1000-3428.0069185

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Multi-Feature Speech Emotion Recognition Based on Improved Efficient Channel Attention Mechanism

DU Chenyang, ZHANG Xueying, HUANG Lixia*(), LI Juan   

  1. College of Electronic Information and Optical Engineering, Taiyuan University of Technology, Taiyuan 030024, Shanxi, China
  • Received:2024-01-08 Online:2025-04-15 Published:2024-05-29
  • Contact: HUANG Lixia

基于改进高效通道注意力机制的多特征语音情感识别

杜晨阳, 张雪英, 黄丽霞*(), 李娟   

  1. 太原理工大学电子信息与光学工程学院, 山西 太原 030024
  • 通讯作者: 黄丽霞
  • 基金资助:
    国家自然科学基金(62271342)

Abstract:

The attention mechanism has been widely employed in the field of Speech Emotion Recognition (SER). However, traditional attention modules, while enhancing model performance, also significantly increase the model parameter count. Although the Efficient Channel Attention (ECA) mechanism has a small number of parameters, it can only generate attention weights for the channel dimension. In response to this challenge, an Improved ECA (IECA) module is proposed. IECA module generates corresponding weights for various dimensions of input feature maps with a relatively small number of parameters, enabling the model to more effectively focus on and utilize crucial information within the feature maps. Additionally, to further enhance recognition rates, spectrogram and IS10 features are separately extracted from the speech data. Employing a fusion network, predictions from different branches are combined to yield the final prediction. The proposed model obtained Weighted Accuracy (WA) of 91.63% and 92.46% and Unweighted Average Recall (UAR) of 91.25% and 92.33% on EMODB and CASIA datasets, respectively, which are higher by 2.69-8.43 percentage points and 4.16-10.69 percentage points, respectively, than those reported in previous research.

Key words: deep learning, Speech Emotion Recognition (SER), attention mechanism, multi-feature fusion, decision level fusion

摘要:

注意力机制已经广泛地用于语音情感识别(SER)领域, 但是传统注意力模块在提升模型性能表现的同时也会大幅增加模型的参数量。高效通道注意力(ECA)机制虽然参数量较小, 但是只能对通道维度生成注意力权重。针对这个问题, 提出一种改进ECA (IECA)模块, 该模块以较小的参数量对输入的特征图的各个维度生成对应的权重, 使得模型更关注和利用特征图中的重要信息。此外, 为了进一步提升识别率, 分别提取语音的语谱图特征和IS10特征, 通过融合网络对不同支路的预测结果进行决策融合, 得到最终的预测结果。所提出的模型在EMODB和CASIA两个语音情感数据集上分别取得了91.63%、92.46%的加权准确率(WA)和91.25%、92.33%的未加权平均召回率(UAR), 相较之前的研究结果分别有2.69~8.43和4.16~10.69百分点的提升。

关键词: 深度学习, 语音情感识别, 注意力机制, 多特征融合, 决策级融合