1 |
杨世强, 梁丁宏, 傅卫平. 智能机器人语音远程控制系统的设计. 计算机工程与应用, 2009, 45 (25): 71-73, 88.
doi: 10.3778/j.issn.1002-8331.2009.25.022
|
|
YANG S Q , LIANG D H , FU W P . Design for speech remote control system of intelligent robot. Computer Engineering and Applications, 2009, 45 (25): 71-73, 88.
doi: 10.3778/j.issn.1002-8331.2009.25.022
|
2 |
DABBABI K , MARS A . Spoken utterance classification task of Arabic numerals and selected isolated words. Arabian Journal for Science and Engineering, 2022, 47 (8): 10731- 10750.
doi: 10.1007/s13369-022-06649-0
|
3 |
杨鹏, 谢磊, 张艳宁. 低资源语言的无监督语音关键词检测技术综述. 中国图象图形学报, 2015, 20 (2): 211- 218.
|
|
YANG P , XIE L , ZHANG Y N . Survey on unsupervised spoken term detection for low-resource languages. Journal of Image and Graphics, 2015, 20 (2): 211- 218.
|
4 |
TSAKALIDIS S, HSIAO R, KARAKOS D, et al. The 2013 BBN Vietnamese telephone speech keyword spotting system[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2014: 7829-7833.
|
5 |
YU D , DENG L . Automatic speech recognition. Berlin, Germany: Springer, 2016.
|
6 |
KËPUSKA V Z , KLEIN T B . A novel wake-up-word speech recognition system, wake-up-word recognition task, technology and evaluation. Nonlinear Analysis: Theory, Methods [WT《Times New Roman》] & Applications, 2009, 71 (12): 2772- 2789.
doi: 10.1016/j.na.2009.06.089
|
7 |
XIE G K , HAO S , ZHANG P P , et al. Research and implementation of intelligent home pension system based on speech and semantic recognition. Advances in Multimedia, 2022, (1): 6141295.
|
8 |
WEINTRAUB M. Keyword-spotting using SRI's DECIPHER large-vocabulary speech-recognition system[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington D.C., USA: IEEE Press, 1993: 463-466.
|
9 |
DAHL G E, SAINATH T N, HINTON G E. Improving deep neural networks for LVCSR using rectified linear units and dropout[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D.C., USA: IEEE Press, 2013: 8609-8613.
|
10 |
DO C T, STYLIANOU Y. Weighting time-frequency representation of speech using auditory saliency for automatic speech recognition[C]//Proceedings of the Interspeech 2018. [S. l. ]: ISCA, 2018: 1591-1595.
|
11 |
WEINTRAUB M. LVCSR log-likelihood ratio scoring for keyword spotting[C]//Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Washington D.C., USA: IEEE Press, 1995: 297-300.
|
12 |
AMODEI D, ANUBHAI R, BATTENBERG E, et al. Deep speech 2: end-to-end speech recognition in English and mandarin[EB/OL]. [2023-10-17]. http://arxiv.org/abs/1512.02595.
|
13 |
|
14 |
WANG Y Y, LONG Y H. Keyword spotting based on CTC and RNN for mandarin Chinese speech[C]//Proceedings of the 11th International Symposium on Chinese Spoken Language Processing (ISCSLP). Washington D.C., USA: IEEE Press, 2018: 374-378.
|
15 |
SZOKE I, SCHWARZ P, MATEJKA P, et al. Comparison of keyword spotting approaches for informal continuous speech[C]//Proceedings of the Interspeech 2005. [S. l. ]: ISCA, 2005: 633-663.
|
16 |
CHEN G G, YILMAZ O, TRMAL J, et al. Using proxies for OOV keywords in the keyword search task[C]//Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. Washington D. C., USA: IEEE Press, 2013: 416-421.
|
17 |
PARLAK S, SARACLAR M. Spoken term detection for Turkish broadcast news[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2008: 5244-5247.
|
18 |
WILPON J G , RABINER L R , LEE C H , et al. Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1990, 38 (11): 1870- 1878.
doi: 10.1109/29.103088
|
19 |
GAUVAIN J L , LEE C H . Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 1994, 2 (2): 291- 298.
doi: 10.1109/89.279278
|
20 |
|
21 |
YUE X H , LIN J R , GUTIERREZ F R , et al. Self-supervised learning with segmental masking for speech representation. IEEE Journal of Selected Topics in Signal Processing, 2022, 16 (6): 1367- 1379.
doi: 10.1109/JSTSP.2022.3191845
|
22 |
|
23 |
|
24 |
SEO D , OH H S , JUNG Y . Wav2KWS: transfer learning from speech representations for keyword spotting. IEEE Access, 2021, 9, 80682- 80691.
doi: 10.1109/ACCESS.2021.3078715
|
25 |
|
26 |
LIM H , KIM Y , KIM H . Cross-informed domain adversarial training for noise-robust wake-up word detection. IEEE Signal Processing Letters, 2020, 27, 1769- 1773.
doi: 10.1109/LSP.2020.3026947
|
27 |
ZHANG Y D, GLASS J R. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams[C]//Proceedings of the IEEE Workshop on Automatic Speech Recognition [WT《Times New Roman》] & Understanding. Washington D.C., USA: IEEE Press, 2009: 398-403.
|
28 |
|
29 |
WILPON J G, MILLER L G, MODI P. Improvements and applications for key word recognition using hidden Markov modeling techniques[C]//Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. Washington D.C., USA: IEEE Press, 1991: 309-312.
|
30 |
ALDARMAKI H , ULLAH A , RAM S , et al. Unsupervised automatic speech recognition: a review. Speech Communication, 2022, 139, 76- 91.
doi: 10.1016/j.specom.2022.02.005
|
31 |
ROSENBERG A, AUDHKHASI K, SETHY A, et al. End-to-end speech recognition and keyword search on low-resource languages[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2017: 5280-5284.
|
32 |
杨润延, 程高峰, 刘建. 基于端到端语音识别的关键词检索技术研究. 计算机科学, 2022, 49 (1): 53- 58.
doi: 10.11896/jsjkx.210800269
|
|
YANG R Y , CHENG G F , LIU J . Study on keyword search framework based on end-to-end automatic speech recognition. Computer Science, 2022, 49 (1): 53- 58.
doi: 10.11896/jsjkx.210800269
|
33 |
马晗, 唐柔冰, 张义, 等. 语音识别研究综述. 计算机系统应用, 2022, 31 (1): 1- 10.
|
|
MA H , TANG R B , ZHANG Y , et al. Survey on speech recognition. Computer Systems [WT《Times New Roman》] & Applications, 2022, 31 (1): 1- 10.
|
34 |
SEKI H, YAMAMOTO K, NAKAGAWA S. A deep neural network integrated with filterbank learning for speech recognition[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2017: 5480-5484.
|
35 |
MISHRA J, MAHADEVA PRASANNA S R. Challenges in spoken language diarization in code-switched scenario[C]//Proceedings of the National Conference on Communications (NCC). Washington D.C., USA: IEEE Press, 2023: 1-6.
|
36 |
|
37 |
RIVIELLO A, DAVID J P. Binary speech features for keyword spotting tasks[C]//Proceedings of the Interspeech 2019. [S. l. ]: ISCA, 2019: 3460-3464.
|
38 |
LEI L , YUAN G S , ZHANG T L , et al. Low-power feature-attention Chinese keyword spotting framework with distillation learning. ACM Transactions on Asian and Low-Resource Language Information Processing, 2023, 22 (2): 1- 14.
|
39 |
IBRAHIM E A, HUISKEN J, FATEMI H, et al. Keyword spotting using time-domain features in a temporal convolutional network[C]//Proceedings of the 22nd Euromicro Conference on Digital System Design (DSD). Washington D.C., USA: IEEE Press, 2019: 313-319.
|
40 |
CHOI S, SEO S, SHIN B, et al. Temporal convolution for real-time keyword spotting on mobile devices[C]//Proceedings of the Interspeech 2019. [S. l. ]: ISCA, 2019: 3372-3337.
|
41 |
LIM H, KIM Y, YEOM K, et al. Lightweight feature encoder for wake-up word detection based on self-supervised speech representation[C]//Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2023: 1-5.
|
42 |
LEE S , KANG T , BELL J , et al. An eight-element frequency-selective acoustic beamformer and bitstream feature extractor. IEEE Journal of Solid-State Circuits, 2022, 57 (6): 1812- 1823.
doi: 10.1109/JSSC.2021.3103727
|
43 |
CHEN G G, PARADA C, HEIGOLD G. Small-footprint keyword spotting using deep neural networks[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2014: 4087-4091.
|
44 |
CHEN H T, WANG Y H, XU C J, et al. AdderNet: do we really need multiplications in deep learning? [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 1465-1474.
|
45 |
|
46 |
HIGUCHI T, GHASEMZADEH M, YOU K, et al. Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection[C]//Proceedings of the Interspeech 2020. Washington D.C., USA: IEEE Press, 2020: 1-7.
|
47 |
GRUENSTEIN A, ALVAREZ R, THORNTON C, et al. A cascade architecture for keyword spotting on mobile devices[EB/OL]. [2023-10-17]. http://arxiv.org/abs/1712.03603.
|
48 |
|
49 |
CHEN J J , WU Z , WANG Z , et al. Practical accuracy estimation for efficient deep neural network testing. ACM Transactions on Software Engineering and Methodology, 2020, 29 (4): 1- 35.
|
50 |
|
51 |
TANG R, LIN J. Deep residual learning for small-footprint keyword spotting[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2018: 5484-5488.
|
52 |
|
53 |
MAJUMDAR S, GINSBURG B. MatchboxNet: 1D time-channel separable convolutional neural network architecture for speech commands recognition[EB/OL]. [2023-10-17]. https://arxiv.org/abs/2004.08531.
|
54 |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 770-778.
|
55 |
虞秋辰, 周若华, 袁庆升. 基于Ghost-SE-Res2Net的多模型融合语音唤醒词检测方法. 计算机工程, 2024, 50 (3): 52- 59.
doi: 10.19678/j.issn.1000-3428.0067197
|
|
YU Q C , ZHOU R H , YUAN Q S . Multi-model fusion speech wake-up word detection method based on Ghost-SE-Res2Net. Computer Engineering, 2024, 50 (3): 52- 59.
doi: 10.19678/j.issn.1000-3428.0067197
|
56 |
PAN J, SHAPIRO J, WOHLWEND J, et al. ASAPP-ASR: multistream CNN and self-attentive SRU for SOTA speech recognition[EB/OL]. [2023-10-17]. https://arxiv.org/abs/2005.10469.
|
57 |
SAINATH T N, PARADA C. Convolutional neural networks for small-footprint keyword spotting[C]//Proceedings of the Interspeech 2015. [S. l. ]: ISCA, 2015: 1478-1482.
|
58 |
LIMONOVA E, ALFONSO D, NIKOLAEV D, et al. ResNet-like architecture with low hardware requirements[C]//Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Washington D.C., USA: IEEE Press, 2021: 6204-6211.
|
59 |
|
60 |
XU M L, ZHANG X L. Depthwise separable convolutional ResNet with squeeze-and-excitation blocks for small-footprint keyword spotting[EB/OL]. [2023-10-17]. https://arxiv.org/abs/2004.12200.
|
61 |
PETER D, ROTH W, PERNKOPF F. End-to-end keyword spotting using neural architecture search and quantization[C]//Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2022: 3423-3427.
|
62 |
MITTERMAIER S, KVRZINGER L, WASCHNECK B, et al. Small-footprint keyword spotting on raw audio data with Sinc-convolutions[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2020: 7454-7458.
|
63 |
OLALEYE K , ONEAŢǍ D , KAMPER H . Keyword localisation in untranscribed speech using visually grounded speech models. IEEE Journal of Selected Topics in Signal Processing, 2022, 16 (6): 1454- 1466.
doi: 10.1109/JSTSP.2022.3180220
|
64 |
WU J B , YILMAZ E , ZHANG M L , et al. Deep spiking neural networks for large vocabulary automatic speech recognition. Frontiers in Neuroscience, 2020, 14, 199.
doi: 10.3389/fnins.2020.00199
|
65 |
CÁMBARA G , LÓPEZ F , BONET D , et al. TASE: task-aware speech enhancement for wake-up word detection in voice assistants. Applied Sciences, 2022, 12 (4): 1974.
doi: 10.3390/app12041974
|
66 |
|
67 |
DENG X , ZHANG Z . Sparsity-control ternary weight networks. Neural Networks, 2022, 145, 221- 232.
doi: 10.1016/j.neunet.2021.10.018
|
68 |
|
69 |
王澳回, 张珑, 宋文宇, 等. 端到端流式语音识别研究综述. 计算机工程与应用, 2023, 59 (2): 22- 33.
|
|
WANG A H , ZHANG L , SONG W Y , et al. Review of end-to-end streaming speech recognition. Computer Engineering and Applications, 2023, 59 (2): 22- 33.
|
70 |
BAGCHI D, HARTMANN W. Learning from the best: a teacher-student multilingual framework for low-resource languages[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2019: 6051-6055.
|
71 |
TAKASHIMA R, LI S, KAWAI H. An investigation of a knowledge distillation method for CTC acoustic models[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2018: 5809-5813.
|
72 |
KURATA G, AUDHKHASI K. Improved knowledge distillation from bi-directional to uni-directional LSTM CTC for end-to-end speech recognition[C]//Proceedings of the IEEE Spoken Language Technology Workshop (SLT). Washington D.C., USA: IEEE Press, 2018: 411-417.
|
73 |
DIGHE D, MARCHI E, VISHNUBHOTLA S, et al. Knowledge transfer for efficient on-device false trigger mitigation[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2021: 6838-6842.
|
74 |
GHOSH A, FUHS M, BAGCHI D, et al. Low-resource low-footprint wake-word detection using knowledge distillation[EB/OL]. [2023-10-17]. https://arxiv.org/abs/2207.03331.
|
75 |
HOU J Y, SHI Y Y, OSTENDORF M, et al. Mining effective negative training samples for keyword spotting[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2020: 7444-7448.
|
76 |
PETER D, ROTH W, PERNKOPF F. Resource-efficient DNNs for keyword spotting using neural architecture search and quantization[C]//Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Washington D.C., USA: IEEE Press, 2021: 9273-9279.
|
77 |
BLUCHE T, PRIMET M, GISSELBRECHT T. Small-footprint open-vocabulary keyword spotting with quantized LSTM networks[EB/OL]. [2023-10-17]. https://arxiv.org/abs/2002.10851.
|
78 |
LI Y H, GONG R H, TAN X, et al. BRECQ: pushing the limit of post-training quantization by block reconstruction[EB/OL]. [2023-10-17]. http://arxiv.org/abs/2102.05426.
|
79 |
TUCKER G, WU M H, SUN M, et al. Model compression applied to small-footprint keyword spotting[C]//Proceedings of the Interspeech 2016. [S. l. ]: ISCA, 2016: 1878-1882.
|
80 |
WANG Z W, WU Z Y, LU J W, et al. BiDet: an efficient binarized object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 2046-2055.
|
81 |
SINGH A , KABRA R , KUMAR R , et al. On-device system for device directed speech detection for improving human computer interaction. IEEE Access, 2021, 9, 131758- 131766.
doi: 10.1109/ACCESS.2021.3114371
|
82 |
SIMONS T , LEE D J . A review of binarized neural networks. Electronics, 2019, 8 (6): 661.
doi: 10.3390/electronics8060661
|
83 |
WANG P S , HE X Y , CHENG J . Toward accurate binarized neural networks with sparsity for mobile application. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35 (1): 272- 284.
|
84 |
|
85 |
YE J , WANG J , ZHANG S . Distillation-guided residual learning for binary convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33 (12): 7765- 7777.
|
86 |
SHAN W W , YANG M H , WANG T , et al. A 510 nW wake-up keyword-spotting chip using serial-FFT-based MFCC and binarized depthwise separable CNN in 28 nm CMOS. IEEE Journal of Solid-State Circuits, 2021, 56 (1): 151- 164.
|
87 |
|
88 |
LIU B, CAI H, WANG Z, et al. A 22 nm, 10.8 μW/15.1 μW dual computing modes high power-performance-area efficiency domained background noise aware keyword-spotting processor[J]. IEEE Transactions on Circuits and Systems I, 2020, 67(12): 4733-4746.
|
89 |
|
90 |
LEE D, KIM M, MUN S H, et al. Fully unsupervised training of few-shot keyword spotting[C]//Proceedings of the IEEE Spoken Language Technology Workshop (SLT). Washington D.C., USA: IEEE Press, 2023: 266-272.
|
91 |
陈良臣, 傅德印. 面向小样本数据的机器学习方法研究综述. 计算机工程, 2022, 48 (11): 1- 13.
doi: 10.19678/j.issn.1000-3428.0065347
|
|
CHEN L C , FU D Y . Survey on machine learning methods for small sample data. Computer Engineering, 2022, 48 (11): 1- 13.
doi: 10.19678/j.issn.1000-3428.0065347
|
92 |
FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]// Proceedings of the 34th International Conference on Machine Learning. New York, USA: ACM Press, 2017: 1126-1135.
|
93 |
PARNAMI A, LEE M. Few-shot keyword spotting with prototypical networks[C]//Proceedings of the 2022 7th International Conference on Machine Learning Technologies (ICMLT). New York, USA: ACM Press, 2022: 277-283.
|
94 |
SNELL J, SWERSKY K, ZEMEL R S. Prototypical networks for few-shot learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 4080-4090.
|
95 |
|
96 |
|
97 |
TIAN Y, YAO H T, CAI M, et al. Improving RNN transducer modeling for small-footprint keyword spotting[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2021: 5624-5628.
|
98 |
THOMAS S, GANAPATHY S, HERMANSKY H. Multilingual MLP features for low-resource LVCSR systems[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2012: 4269-4272.
|
99 |
|
100 |
PATTANAYAK B , ROUT J K , PRADHAN G . Adaptive spectral smoothening for development of robust keyword spotting system. IET Signal Processing, 2019, 13 (5): 544- 550.
|
101 |
|
102 |
WANG X, SUN S N, SHAN C H, et al. Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2019: 6366-6370.
|
103 |
LÓPEZ-ESPEJO I , TAN Z H , HANSEN J H L , et al. Deep spoken keyword spotting: an overview. IEEE Access, 2022, 10, 4169- 4199.
|
104 |
|
105 |
COUCKE A, CHLIEH M, GISSELBRECHT T, et al. Efficient keyword spotting using dilated convolutions and gating[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2019: 6351-6355.
|
106 |
PARDEDE H F , ADHI P , ZILVAN V , et al. Deep convolutional neural networks-based features for Indonesian large vocabulary speech recognition. IAES International Journal of Artificial Intelligence, 2023, 12 (2): 610.
|
107 |
ZHAO Z Y , ZHANG W Q . End-to-end keyword search system based on attention mechanism and energy scorer for low resource languages. Neural Networks, 2021, 139, 326- 334.
|
108 |
MISHCHENKO Y, GOREN Y, SUN M, et al. Low-bit quantization and quantization-aware training for small-footprint keyword spotting[C]//Proceedings of the 18th IEEE International Conference on Machine Learning and Applications (ICMLA). Washington D.C., USA: IEEE Press, 2019: 706-711.
|
109 |
|
110 |
HUANG Y T, HUGHES T, SHABESTARY T Z, et al. Supervised noise reduction for multichannel keyword spotting[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2018: 5474-5478.
|
111 |
|
112 |
COUCKE A, CHLIEH M, GISSELBRECHT T, et al. Efficient keyword spotting using dilated convolutions and gating[EB/OL]. [2023-10-17]. http://arxiv.org/abs/1811.07684.
|
113 |
|
114 |
HOU J Y , SHI Y Y , OSTENDORF M , et al. Region proposal network based small-footprint keyword spotting. IEEE Signal Processing Letters, 2019, 26 (10): 1471- 1475.
|
115 |
QIN X Y, BU H, LI M. Hi-Mia: a far-field text-dependent speaker verification database and the baselines[C]//Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2020: 7609-7613.
|