Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (2): 35-53. doi: 10.19678/j.issn.1000-3428.0068739

• Research Hotspots and Reviews • Previous Articles     Next Articles

Review of Research on Keyword Spotting in Low-Resource Environments

WANG Yuehao, ZHOU Ruohua*()   

  1. School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
  • Received:2023-11-01 Online:2025-02-15 Published:2025-02-28
  • Contact: ZHOU Ruohua

低资源环境下的语音唤醒研究综述

王月昊, 周若华*()   

  1. 北京建筑大学电气与信息工程学院 北京 100044
  • 通讯作者: 周若华
  • 基金资助:
    国家自然科学基金(11590774)

Abstract:

Keyword Spotting (KWS) is a crucial technology for enabling human-computer interaction and has long been a focal point in speech technology research. As deep learning technology has advanced, research methodologies have transitioned from traditional Large-Vocabulary Continuous Speech Recognition (LVCSR) techniques to neural network-based approaches. However, challenges remain in achieving efficient KWS on small devices and training models with limited sample data, particularly in the design of low-resource KWS systems. This review begins by defining the concept of low-resource KWS, distinguishing it from general speech recognition and related terminology. It then introduces classic KWS models and their applicable scenarios while detailing the current global state of research on low-resource KWS. Next, mainstream technologies and optimization strategies for acoustic feature extraction and modeling are explained, with a focus on the structural components of KWS systems. An analysis of model lightweight methods is then conducted, where their advantages and disadvantages are compared. Common solutions for low-resource KWS, such as few- and zero-shot learning as well as transfer learning, are summarized, and common KWS datasets and evaluation metrics are introduced. Finally, future research directions for low-resource KWS technology are discussed.

Key words: Keyword Spotting(KWS), low-resource, model quantization, few-shot learning, human-computer interaction

摘要:

语音唤醒作为实现人机交互的关键技术, 一直是语音领域的研究热点。随着深度学习技术的发展, 其研究方法的重心已从传统的大词汇连续语音识别(LVCSR)技术逐渐转向基于神经网络的技术, 然而如何在小型设备上实现高效唤醒并利用有限的样本数据进行模型训练仍是低资源语音唤醒系统设计面临的挑战。首先, 定义了语音唤醒中的低资源概念, 区分了语音唤醒和语音识别以及相关术语, 介绍了经典的语音唤醒模型及其适配场景, 阐述了低资源语音唤醒的国内外研究现状。其次, 从语音唤醒系统的结构组成的角度分别说明了声学特征提取与声学模型的主流技术和优化策略。然后, 对语音唤醒模型的轻量化方法展开分析并对其优缺点进行比较, 总结了数据低资源语音唤醒中常见的小样本学习、零样本学习、迁移学习等解决方法, 并介绍了常见语音唤醒数据集和评价指标。最后, 探讨并展望了低资源语音唤醒技术未来的研究方向。

关键词: 语音唤醒, 低资源, 模型量化, 小样本学习, 人机交互