Soundscape Recognition: Explorations and Frontiers of Acoustic Scene Classification in the Digital Era

doi:10.19678/j.issn.1000-3428.0069005

Abstract

Abstract:

Acoustic Scene Classification (ASC) aims to enable computers to simulate the human auditory system in the task of recognizing various acoustic environments, which is a challenging task in the field of computer audition. With rapid advancements in intelligent audio processing technologies and neural network learning algorithms, a series of new algorithms and technologies for ASC have emerged in recent years. To comprehensively present the technological development trajectory and evolution in this field, this review systematically examines both early work and recent developments in ASC, providing a thorough overview of the field. This review first describes application scenarios and the challenges encountered in ASC and then details the mainstream frameworks in ASC, with a focus on the application of deep learning algorithms in this domain. Subsequently, it systematically summarizes frontier explorations, extension tasks, and publicly available datasets in ASC and finally discusses the prospects for future development trends in ASC.

Key words: Acoustic Scene Classification(ASC), deep learning, audio classification, speech recognition, Data Augmentation(DA)

摘要：

声学场景分类(ASC)旨在让计算机模拟人类听觉识别不同的声学环境，是计算机听觉领域中具有挑战性的任务之一。随着智能音频处理技术以及神经网络学习算法的快速进步，近年来ASC任务也涌现出一系列新算法和新技术。为了全面展示该领域的技术发展脉络和演进过程，梳理了该领域的早期工作和近期发展，全面介绍了ASC任务。首先描述了ASC的应用场景和面临的挑战；其次详细介绍了ASC的主流框架，重点阐述了应用于此领域的深度学习算法；然后系统性地总结了ASC的前沿探索与延伸任务以及公开数据集；最后对ASC的发展趋势进行探讨与展望。

关键词: 声学场景分类, 深度学习, 音频分类, 语音识别, 数据增强

PANG Xin, GE Fengpei, LI Yanling. Soundscape Recognition: Explorations and Frontiers of Acoustic Scene Classification in the Digital Era[J]. Computer Engineering, 2025, 51(6): 1-19.

庞鑫, 葛凤培, 李艳玲. 声景识音：数字化时代声学场景分类的探索与前沿[J]. 计算机工程, 2025, 51(6): 1-19.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069005

https://www.ecice06.com/EN/Y2025/V51/I6/1

Figures/Tables 13

Fig.1 Common framework of ASC systems

Fig.2 Development history of traditional ASC methods

Fig.3 Development history of ASC based on deep learning

Fig.4 Framework of ASC based on CNN

Fig.5 Xception architecture based on multi-scale fusion

Fig.6 Framework of attention-based ASC

Fig.7 Parallel branch structure based on Temporal & Spectral Attention

References 124

1	VIRTANEN T, PLUMBLEY M D, ELLIS D. Computational analysis of sound scenes and events. Berlin, Germany: Springer, 2018.
2	VALIN J M, MICHAUD F, HADJOU B, et al. Localization of simultaneous moving sound sources for mobile robot using a frequency-domain steered beamformer approach[C]//Proceedings of the IEEE International Conference on Robotics and Automation. Washington D.C., USA: IEEE Press, 2004: 1033-1038.
3	ALEXANDRE E, CUADRA L, ROSA M, et al. Feature selection for sound classification in hearing aids through restricted search driven by genetic algorithms. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(8): 2249- 2256. doi: 10.1109/TASL.2007.905139
4	VIVEK V S, VIDHYA S, MADHANMOHAN P. Acoustic scene classification in hearing aid using deep learning[C]//Proceedings of the International Conference on Communication and Signal Processing (ICCSP). Washington D.C., USA: IEEE Press, 2020: 695-699.
5	STOWELL D, CLAYTON D. Acoustic event detection for multiple overlapping similar sources[C]//Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Washington D.C., USA: IEEE Press, 2015: 1-5.
6	PHAM L, PHAN H, NGUYEN T, et al. Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework. Digital Signal Processing, 2021, 110, 102943. doi: 10.1016/j.dsp.2020.102943
7	BARCHIESI D, GIANNOULIS D, STOWELL D, et al. Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 2015, 32(3): 16- 34. doi: 10.1109/MSP.2014.2326181
8	SINGH V K, SHARMA K, SUR S N. A survey on preprocessing and classification techniques for acoustic scene. Expert Systems with Applications, 2023, 229, 120520. doi: 10.1016/j.eswa.2023.120520
9	CHANDRAKALA S, JAYALAKSHMI S L. Environmental audio scene and sound event recognition for autonomous surveillance: a survey and comparative studies. ACM Computing Surveys, 2019, 52(3): 1- 34.
10	WANG D L, BROWN G J. Computational auditory scene analysis: principles, algorithms, and applications. [S. l.]: Wiley, 2006.
11	CLARKSON B, SAWHNEY N, PENTLAND A. Auditory context awareness via wearable computing. Energy, 1998, 400, 20. URL
12	YE J X, KOBAYASHI T, MURAKAWA M, et al. Acoustic scene classification based on sound textures and events[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York, USA: ACM Press, 2015: 1291-1294.
13	SALAMON J, JACOBY C, BELLO J P, et al. A dataset and taxonomy for urban sound research[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York, USA: ACM Press, 2014: 1041-1044.
14	ERONEN A J, PELTONEN V T, TUOMI J T, et al. Audio-based context recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2005, 14(1): 321- 329.
15	AUCOUTURIER J J, DEFREVILLE B, PACHET F. The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. The Journal of the Acoustical Society of America, 2007, 122(2): 881- 891. doi: 10.1121/1.2750160
16	PASEDDULA C, GANGASHETTY S V. Late fusion framework for acoustic scene classification using LPCC, SCMC, and log-mel band energies with deep neural networks. Applied Acoustics, 2021, 172, 107568. doi: 10.1016/j.apacoust.2020.107568
17	GREEN M C, ADAVANNE S, MURPHY D. Acoustic scene classification using higher-order ambisonic features[C]//Proceedings of 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Washington D.C., USA: IEEE Press, 2019: 42-45.
18	康丽霞, 马建芬, 张朝霞. 基于多特征后期融合的声学场景分类. 计算机工程与设计, 2023, 44(1): 141- 147.
	KANG L X, MA J F, ZHANG Z X. Acoustic scene classification based on multi-feature post-fusion. Computer Engineering and Design, 2023, 44(1): 141- 147.
19	WALDEKAR S, SAHA G. Wavelet transform based mel-scaled features for acoustic scene classification[C]//Proceedings of the INTERSPEECH'18. Washington D.C., USA: IEEE Press, 2018: 3323-3327.
20	SAWHNEY N, MAES P. Situational awareness from environmental sounds[EB/OL]. [2023-11-07]. http://www.researchgate.net/publication/2796654_Situational_Awareness_from_Environmental_Sounds/download.
21	ERONEN A, TUOMI J, KLAPURI A, et al. Audio-based context awareness-acoustic modeling and perceptual evaluation[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington D.C., USA: IEEE Press, 2003: 520-529.
22	DORFER M, LEHNER B, EGHBAL-ZADEH H, et al. Acoustic scene classification with fully convolutional neural networks and I-Vectors[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1607.02383.
23	OO M M. Comparative study of MFCC feature with different machine learning techniques in acoustic scene classification. International Journal of Research and Engineering, 2018, 5(7): 439- 444. URL
24	SUN J Y, LIU X B, MEI X H, et al. Deep neural decision forest for acoustic scene classification[C]//Proceedings of the 30th European Signal Processing Conference (EUSIPCO). Washington D.C., USA: IEEE Press, 2022: 772-776.
25	PICZAK K J. Environmental sound classification with convolutional neural networks[C]//Proceedings of the IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). Washington D.C., USA: IEEE Press, 2015: 1-6.
26	YANG L P, TAO L J, CHEN X X, et al. Multi-scale semantic feature fusion and data augmentation for acoustic scene classification. Applied Acoustics, 2020, 163, 107238. doi: 10.1016/j.apacoust.2020.107238
27	沈昕昊, 陈嘉烨, 宋晓宁. 基于配对特征融合的声学场景分类方法. 计算机应用研究, 2023, 40(6): 1771- 1776.
	SHEN X H, CHEN J Y, SONG X N. Acoustic scene classification method based on paired feature fusion. Application Research of Computers, 2023, 40(6): 1771- 1776.
28	FEDORISHIN D, SANKARAN N, MOHAN D D, et al. Waveforms and spectrograms: enhancing acoustic scene classification using multimodal feature fusion[EB/OL]. [2023-11-07]. https://dcase.community/documents/workshop2021/proceedings/DCASE2021Workshop_Fedorishin_69.pdf.
29	常月, 侯元波, 谭奕舟, 等. 基于自注意力机制的多模态场景分类. 复旦学报(自然科学版), 2023, 62(1): 46- 52.
	CHANG Y, HOU Y B, TAN Y Z, et al. Multimodal scene classification based on self-attention mechanism. Journal of Fudan University (Natural Science), 2023, 62(1): 46- 52.
30	ABEER J. A review of deep learning based methods for acoustic scene classification. Applied Sciences, 2020, 10(6): 2020. doi: 10.3390/app10062020
31	ZIELIŃSKI S K. Feature extraction of surround sound recordings for acoustic scene classification. Berlin, Germany: Springer International Publishing, 2018.
32	KAWAMURA T, KINOSHITA Y, ONO N, et al. Effectiveness of inter- and intra-subarray spatial features for acoustic scene classification[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2023: 1-5.
33	SALAMON J, BELLO J P. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 2017, 24(3): 279- 283. URL
34	AYTAR Y, VONDRICK C, TORRALBA A. SoundNet: learning sound representations from unlabeled video[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2016: 892-900.
35	TOKOZUME Y, USHIKU Y, HARADA T. Learning from between-class examples for deep sound recognition[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1711.10282v2.
36	ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1710.09412v2.
37	YUN S, HAN D, CHUN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2019: 6022-6031.
38	KIM G, HAN D K, KO H. SpecMix: a mixed sample data augmentation method for training with time-frequency domain features[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2108.03020v1.
39	PARK D S, CHAN W, ZHANG Y, et al. SpecAugment: a simple data augmentation method for automatic speech recognition[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1904.08779v3.
40	WANG H L, ZOU Y X, WANG W W. SpecAugment++: a hidden space data augmentation method for acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2103.16858v3.
41	ZHOU K Y, YANG Y X, QIAO Y, et al. Domain generalization with MixStyle[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2104.02008v1.
42	KIM B, YANG S, KIM J, et al. Domain generalization with relaxed instance frequency-wise normalization for multi-device acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2206.12513v1.
43	MOROCUTTI T, SCHMID F, KOUTINI K, et al. Device-robust acoustic scene classification via impulse response augmentation[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2305.07499v2.
44	XIE W, HE Q H, YAN H K, et al. Acoustic scene classification using deep CNNs with time-frequency representations[C]//Proceedings of the IEEE 21st International Conference on Communication Technology (ICCT). Washington D.C., USA: IEEE Press, 2021: 1325-1329.
45	VALENTI M, SQUARTINI S, DIMENT A, et al. A convolutional neural network approach for acoustic scene classification[C]//Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN). Washington D.C., USA: IEEE Press, 2017: 1547-1554.
46	LIM M, LEE D, PARK H, et al. Convolutional neural network based audio event classification. KSII Transactions on Internet and Information Systems, 2018, 12(6): 2748- 2760. URL
47	PICZAK K J. The details that matter: frequency resolution of spectrograms in acoustic scene classification[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events. Washington D.C., USA: IEEE Press, 2017: 103-107.
48	HAN Y, PARK J, LEE K. Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification[C]//Proceedings of Workshop on Detection and Classification of Acoustic Scenes and Events. Washington D.C., USA: IEEE Press, 2018: 46-50.
49	REN Z, KONG Q Q, HAN J, et al. Attention-based atrous convolutional neural networks: visualisation and understanding perspectives of acoustic scenes[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2019: 56-60.
50	KOUTINI K, EGHBAL-ZADEH H, DORFER M, et al. The receptive field as a regularizer in deep convolutional neural networks for acoustic scene classification[C]//Proceedings of the 27th European Signal Processing Conference (EUSIPCO). Washington D.C., USA: IEEE Press, 2019: 1-5.
51	YANG L, CHEN X, TAO L. Acoustic scene classification using multi-scale features[EB/OL]. [2023-11-07]. https://www.semanticscholar.org/paper/Acoustic-scene-classification-using-multi-scale-Yang-Chen/860c67a658275e399f6c23ccde88e4347f6d5170.
52	KOUTINI K, EGHBAL-ZADEH H, WIDMER G. Receptive-field-regularized CNN variants for acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1909.02859v1.
53	BASBUG A M, SERT M. Acoustic scene classification using spatial pyramid pooling with convolutional neural networks[C]//Proceedings of the IEEE 13th International Conference on Semantic Computing (ICSC). Washington D.C., USA: IEEE Press, 2019: 128-131.
54	PHAYE S S R, BENETOS E, WANG Y. SubSpectralNet-using sub-spectrogram based convolutional neural networks for acoustic scene classification[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2019: 825-829.
55	HU H, YANG C H, XIA X J, et al. Device-robust acoustic scene classification based on two-stage categorization and data augmentation[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2007.08389v2.
56	KEK X Y, CHIN C S, LI Y. Multi-timescale wavelet scattering with genetic algorithm feature selection for acoustic scene classification. IEEE Access, 2022, 10, 25987- 26001.
57	曹毅, 费鸿博, 李平, 等. 基于多流卷积和数据增强的声场景分类方法. 华中科技大学学报(自然科学版), 2022, 50(4): 40- 46.
	CAO Y, FEI H B, LI P, et al. Acoustic scene classification method based on multi-stream convolution and data augmentation. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2022, 50(4): 40- 46.
58	HASAN N W, SAUDI A S, KHALIL M I, et al. A genetic algorithm approach to automate architecture design for acoustic scene classification. IEEE Transactions on Evolutionary Computation, 2023, 27(2): 222- 236. URL
59	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1706.03762.
60	GONG Y, CHUNG Y A, GLASS J. AST: audio spectrogram Transformer[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2104.01778v3.
61	ZHANG Z C, XU S G, ZHANG S Q, et al. Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing, 2021, 453, 896- 903. URL
62	WANG C Y, SANTOSO A, WANG J C. Acoustic scene classification using self-determination convolutional neural network[C]//Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Washington D.C., USA: IEEE Press, 2017: 19-22.
63	WANG J, LI S. Self-attention mechanism based system for dcase2018 challenge task1 and task4[EB/OL]. [2023-11-07]. https://www.semanticscholar.org/paper/SELF-ATTENTION-MECHANISM-BASED-SYSTEM-FOR-DCASE-1-4-Wang-Li/22d7984334c314fc4c890df33dea6a5ac3959f61.
64	TRIPATHI A, PAUL K. Temporal self attention-based residual network for environmental sound classification[C]//Proceedings of INTERSPEECH'22. Washington D.C., USA: IEEE Press, 2022: 128.
65	ZHANG Z C, XU S G, ZHANG S Q, et al. Learning attentive representations for environmental sound classification. IEEE Access, 2019, 7, 130327- 130339. URL
66	WANG H L, ZOU Y X, CHONG D D, et al. Environmental sound classification with parallel temporal-spectral attention[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1912.06808v3.
67	WANG Y, FENG C Y, ANDERSON D V. A multi-channel temporal attention convolutional neural network model for environmental sound classification[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2021: 930-934.
68	LI Z T, HOU Y B, XIE X, et al. Multi-level attention model with deep scattering spectrum for acoustic scene classification[C]//Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (ICMEW). Washington D.C., USA: IEEE Press, 2019: 396-401.
69	REN Z, KONG Q, QIAN K, et al. Attention-based convolutional neural networks for acoustic scene classification[EB/OL]. [2023-11-07]. https://www.researchgate.net/publication/329186395_Attention-based_Convolutional_Neural_Networks_for_Acoustic_Scene_Classification.
70	SINGH A, RAJAN P, BHAVSAR A. Deep multi-view features from raw audio for acoustic Scene Classification[C]//Proceedings of DCASE'19. New York, USA: ACM Press, 2019: 229-233.
71	KOUTINI K, SCHLVTER J, EGHBAL-ZADEH H, et al. Efficient training of audio transformers with patchout[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2110.05069v3.
72	HUANG J, LU H, MEYER P L, et al. Acoustic scene classification using deep learning-based ensemble averaging[C]//Proceedings of DCASE'19. New York, USA: ACM Press, 2019: 94-98.
73	KUMAR A, KHADKEVICH M, FVGEN C. Knowledge transfer from weakly labeled audio using convolutional neural network for sound events and scenes[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2018: 326-330.
74	RUDER S. An overview of multi-task learning in deep neural networks[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1706.05098v1.
75	NWE T L, DAT T H, MA B. Convolutional neural network with multi-task learning scheme for acoustic scene classification[C]//Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Washington D.C., USA: IEEE Press, 2017: 1347-1350.
76	SCHMID F, MOROCUTTI T, MASOUDIAN S, et al. CP-JKU submission to DCASE23: efficient acoustic scene classification with CP-MOBILE[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Schmid_28_t1.pdf.
77	CAI Y, LIN M, ZHU C, et al. DCASE2023 task1 submission: device simulation and time-frequency separable convolution for acoustic scene classification[EB/OL]. [2023-11-07]. https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0%2C5&q=Waveforms+and+speDCASE2023+task1+submission%3A+Device+simulation+and+time-frequency+separable+convolution+for+acoustic+scene+classificationctrograms%3A+enhancing+acoustic+scene+classification+using+multimodal+feature+fusion&btnG=.
78	TAN J, LI Y. Low-complexity acoustic scene classification using blueprint separable convolution and knowledge distillation[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Tan_62_t1.pdf.
79	GREIF J, PICHLER N, WILLDONER C, et al. MALACH23 submission to DCASE2023: acoustic scene classification with receptive-field regularized convolution neural networks and state space models[EB/OL]. [2023-11-07]. https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0%2C5&q=+MALACH23+submission+to+DCASE2023%3A+acoustic+scene+classification+with+receptive-field+regularized+convolution+neural+networks+and+state+space+models&btnG=.
80	KIM T S, RHO D, LEE G, et al. Dual-strategy enhancement of acoustic scene and event classification: integrating Res2Net, GhostNet, and MobileFormer architectures[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Park_35_t1.pdf.
81	CAI W, ZHANG M, ZHANG X. TENCENT submission to DCASE23 TASK1: low-complexity deep learning solution for acoustic scene classification[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Cai_53_t1.pdf.
82	SCHMIDT L P, KILIÇ B, PETERS N. Submission to DCASE2023 TAKS1: device invariant training with structured filter pruning for low complexity acoustic scene classification[EB/OL]. [2023-11-07]. https://www.researchgate.net/publication/374723456_Submission_to_DCASE_2023_Task_1_Device_Invariant_Training_with_Structured_Filter_Pruning_for_Low_Complexity_Acoustic_Scene_Classification.
83	FEI H, LI X, JIA J. Acoustic scene classification based on multi-teacher knowledge distillation and SERFR-CNN[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Fei_12_t1.pdf.
84	TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2302.13971v1.
85	LATIF S, SHOUKAT M, SHAMSHAD F, et al. Sparks of large audio models: a survey and outlook[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2308.12792v3.
86	FATHULLAH Y, WU C Y, LAKOMKIN E, et al. Prompting large language models with speech recognition abilities[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2307.11795v1.
87	OpenAI. GPT-4 technical report[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2303.08774.
88	ZHANG D, LI S M, ZHANG X, et al. SpeechGPT: empowering large language models with intrinsic cross-modal conversational abilities[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2305.11000v2.
89	DU Z H, WANG J M, CHEN Q, et al. LauraGPT: listen, attend, understand, and regenerate audio with GPT[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2310.04673v4.
90	ZHANG H, SI N W, CHEN Y Q, et al. Tuning large language model for end-to-end speech translation[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2310.02050v1.
91	CAI R, LU L, HANJALIC A, et al. A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(3): 1026- 1039. URL
92	GIANNOULIS D, STOWELL D, BENETOS E, et al. A database and challenge for acoustic scene classification and event detection[C]//Proceedings of the 21st European Signal Processing Conference. Washington D.C., USA: IEEE Press, 2013: 1-5.
93	CHAUDHURI S, RAJ B. Unsupervised hierarchical structure induction for deeper semantic analysis of audio[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D.C., USA: IEEE Press, 2013: 833-837.
94	GIANNOULIS D, BENETOS E, STOWELL D, et al. Detection and classification of acoustic scenes and events: an IEEE AASP challenge[C]//Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Washington D.C., USA: IEEE Press, 2013: 1-4.
95	MESAROS A, HEITTOLA T, VIRTANEN T. TUT database for acoustic scene classification and sound event detection[C]//Proceedings of the 24th European Signal Processing Conference. Washington D.C., USA: IEEE Press, 2016: 1128-1132.
96	MESAROS A, HEITTOLA T, BENETOS E, et al. Detection and classification of acoustic scenes and events: outcome of the DCASE 2016 challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(2): 379- 393. URL
97	MESAROS A, HEITTOLA T, DIMENT A, et al. DCASE 2017 challenge setup: tasks, datasets and baseline system[EB/OL]. [2023-11-07]. https://www.researchgate.net/publication/319842878_DCASE_2017_CHALLENGE_SETUP_TASKS_DATASETS_AND_BASELINE_SYSTEM.
98	MESAROS A, HEITTOLA T, VIRTANEN T. A multi-device dataset for urban acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1807.09840v2.
99	MESAROS A, HEITTOLA T, VIRTANEN T. Acoustic scene classification in DCASE 2019 challenge: closed and OpenSet classification and data mismatch setups[EB/OL]. [2023-11-07]. http://homepages.tuni.fi/annamaria.mesaros/pubs/mesaros_ASC_DCASE2019.
100	HEITTOLA T, MESAROS A, VIRTANEN T. Acoustic scene classification in DCASE 2020 challenge: generalization across devices and low complexity solutions[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2005.14623v2.
101	WANG S S, MESAROS A, HEITTOLA T, et al. A curated dataset of urban scenes for audio-visual scene analysis[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2021: 626-630.
102	WANG S S, HEITTOLA T, MESAROS A, et al. Audio-visual scene classification: analysis of DCASE2021 challenge submissions[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2105.13675v2.
103	MARTÍ N-MORATÓ I, PAISSAN F, ANCILOTTO A, et al. Low-complexity acoustic scene classification in DCASE2022 Challenge[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2206.03835v2.
104	PICZAK K J, PICZAK K J. ESC: dataset for environmental sound classification[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York, USA: ACM Press, 2015: 1015-1018.
105	GEMMEKE J F, ELLIS D P W, FREEDMAN D, et al. AudioSet: an ontology and human-labeled dataset for audio events[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2017: 776-780.
106	FONSECA E, FAVORY X, PONS J, et al. FSD50K: an open dataset of human-labeled sound events. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 30, 829- 852. URL
107	SAKI F, GUO Y Y, HUNG C Y, et al. Open-set evolving acoustic scene classification system[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events. New York, USA: ACM Press, 2019: 219-223.
108	WILKINGHOFF K, KURTH F. Open-set acoustic scene classification with deep convolutional autoencoders[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events. New York, USA: ACM Press, 2019: 258-262.
109	LEHNER B, KOUTINI K, SCHWARZLMVLLER C, et al. Acoustic scene classification with reject option based on resnets[EB/OL]. [2023-11-07]. https://www.researchgate.net/publication/337834114_ACOUSTIC_SCENE_CLASSIFICATION_WITH_REJECT_OPTION_BASED_ON_RESNETS.
110	KOUTINI K, HENKEL F, EGHBAL-ZADEH H, et al. CP-JKU submissions to DCASE'20: Low-complexity cross-device acoustic scene classification with rf-regularized CNNs[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2020/technical_reports/DCASE2020_Koutini_142.pdf.
111	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 4510-4520.
112	TAN M, LE Q. Efficientnet: rethinking model scaling for convolutional neural networks[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1905.11946.
113	CHEN W L, WILSON J T, TYREE S, et al. Compressing neural networks with the hashing trick[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. New York, USA: ACM Press, 2015: 2285-2294.
114	KOUTINI K, JAN S, WIDMER G. CPJKU submission to DCASE21: cross-device audio scene classification with wide sparse frequency-damped CNNs[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2021/technical_reports/DCASE2021_Koutini_112_t1.pdf.
115	SINGH A, KING J A, LIU X B, et al. Low-complexity CNNs for acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2208.01555v1.
116	ZHAO N. Low-complexity acoustic scene classification using knowledge distillation and multiple classifiers[EB/OL]. [2023-11-07]. https://dcase.community/documents/workshop2023/proceedings/DCASE2023Workshop_Weng_75.pdf.
117	SCHMID F, MASOUDIAN S, KOUTINI K, et al. Knowledge distillation from Transformers for low-complexity acoustic scene classification[EB/OL]. [2023-11-07]. https://dcase.community/documents/workshop2022/proceedings/DCASE2022Workshop_Schmid_27.pdf.
118	MEZZA A I, HABETS E A P, MVLLER M, et al. Unsupervised domain adaptation for acoustic scene classification using band-wise statistics matching[C]//Proceedings of the 28th European Signal Processing Conference (EUSIPCO). Washington D.C., USA: IEEE Press, 2021: 11-15.
119	KOSMIDER M. Calibrating neural networks for secondary recording devices[EB/OL]. [2023-11-07]. https://www.semanticscholar.org/paper/CALIBRATING-NEURAL-NETWORKS-FOR-SECONDARY-RECORDING-Kosmider/9c02158873465752b43444c23265ecd38813acad.
120	KIM B, YANG S, KIM J, et al. QTI submission to DCASE2021: residual normalization for device-imbalanced acoustic scene classification with efficient design[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2206.13909v2.
121	MEZZA A I, HABETS E A P, MVLLER M, et al. Unsupervised domain adaptation via principal subspace projection for acoustic scene classification. Journal of Signal Processing Systems, 2022, 94(2): 197- 213. doi: 10.1007/s11265-021-01720-9?utm_source=xmol&utm_content=meta
122	GHARIB S, DROSSOS K, ÇAKIR E, et al. Unsupervised adversarial domain adaptation for acoustic scene classification[C]//Proceedings of Workshop on Detection and Classification of Acoustic Scenes and Events. New York, USA: ACM Press, 2018: 138-142.
123	OLVERA M, VINCENT E, GASSO G. On the impact of normalization strategies in unsupervised adversarial domain adaptation for acoustic scene classification[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2022: 631-635.
124	DROSSOS K, MAGRON P, VIRTANEN T. Unsupervised adversarial domain adaptation based on the Wasserstein distance for acoustic scene classification[C]//Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Washington D.C., USA: IEEE Press, 2019: 259-263.

[1]	CAO Bei, ZHAO Kui. Dual Emotion and Multi-feature Fusion Based Fake News Detection [J]. Computer Engineering, 2025, 51(6): 193-203.
[2]	QIN Yongwang, ZHANG Yang, HU Xing, LIU Sheng, LI Shaoqing. Gate-level Netlist Function Recognition Based on Graph Attention Networks [J]. Computer Engineering, 2025, 51(6): 29-37.
[3]	CHEN Sifan, YANG Jiazhi, HUANG Lin, Lü Zhiwei, SHEN Lu. Edge Convolutional Network for Point Cloud Classification and Segmentation Incorporated Deformable Kernel and Self-Attention [J]. Computer Engineering, 2025, 51(6): 146-154.
[4]	WANG Peiji, ZOU Chengming. Optimization Method for Convolutional Computing Based on Vector Transformation [J]. Computer Engineering, 2025, 51(6): 74-82.
[5]	LIAO Dingding, LIU Junfeng, ZENG Jun, QIU Xiaohuan. A Continuous Learning Algorithm Based on Block Average and Orthogonal Weight Modification [J]. Computer Engineering, 2025, 51(6): 57-64.
[6]	ZHAO Yaoqian, TENG Qizhi, HE Xiaohai, SHUI Ai, CHEN Honggang. Lightweight Image Super-Resolution Reconstruction Based on Self-Attention Feature Distillation [J]. Computer Engineering, 2025, 51(5): 257-265.
[7]	ZHUANG Ziwei, ZHU Junguo. Vietnamese Text Error Detection Method for Multi-source Text [J]. Computer Engineering, 2025, 51(5): 93-102.
[8]	LI Dandan, LI Zhi, ZHENG Long, ZHANG Li. Robust Reversible Watermarking Algorithm for Diffusion Tensor Images [J]. Computer Engineering, 2025, 51(5): 279-287.
[9]	HAO Zhifeng, LI Yanglin, XU Boyan, CAI Ruichu. Hypergraph Neural Networks for Cross-domain Text-to-SQL [J]. Computer Engineering, 2025, 51(5): 114-123.
[10]	WEI Mingkang, LI Jianan, HAN Lin, GAO Wei, ZHAO Rongcai, WANG Hongsheng. Support and Optimization of Multi-Granularity Quantization Framework for Deep Learning Compiler [J]. Computer Engineering, 2025, 51(5): 62-72.
[11]	JIANG Jieping, WANG Mingwen. Residual Behavior Recognition Model Based on Spatio-Temporal Shuffle Attention Mechanism [J]. Computer Engineering, 2025, 51(4): 119-128.
[12]	DU Chenyang, ZHANG Xueying, HUANG Lixia, LI Juan. Multi-Feature Speech Emotion Recognition Based on Improved Efficient Channel Attention Mechanism [J]. Computer Engineering, 2025, 51(4): 97-106.
[13]	DAI Kangjia, XU Huiying, ZHU Xinzhong, LI Xiyu, HUANG Xiao, CHEN Guoqiang, ZHANG Zhixiong. YGL-SLAM: Point and Line Based Semantic SLAM System for Dynamic Scenes [J]. Computer Engineering, 2025, 51(3): 95-104.
[14]	HAN Peng, HUANG Yunzhi, REN Caiyue, CHENG Jingyi, XU Jun. Assessment of Neoadjuvant Chemotherapy Efficacy in Breast Cancer Using Dual-Branch Network with PET Imaging [J]. Computer Engineering, 2025, 51(3): 293-299.
[15]	HU Chaoju, GUO Fengyi. MODF Port State Detection Algorithm Based on Improved YOLOv7 [J]. Computer Engineering, 2025, 51(2): 78-85.

Please choose a citation manager

Content to export