声景识音：数字化时代声学场景分类的探索与前沿

doi:10.19678/j.issn.1000-3428.0069005

计算机工程 ›› 2025, Vol. 51 ›› Issue (6): 1-19. doi: 10.19678/j.issn.1000-3428.0069005

声景识音：数字化时代声学场景分类的探索与前沿

庞鑫¹, 葛凤培²^,*(), 李艳玲¹^,³

1. 内蒙古师范大学计算机科学技术学院, 内蒙古呼和浩特 010022
2. 北京邮电大学图书馆, 北京 100876
3. 内蒙古师范大学无穷维哈密顿系统及其算法应用教育部重点实验室, 内蒙古呼和浩特 010022

收稿日期:2023-12-12 出版日期:2025-06-15 发布日期:2025-06-27
通讯作者: 葛凤培
基金资助:
国家自然科学基金(12204062); 国家自然科学基金(62266033); 国家自然科学基金(61806103); 国家自然科学基金(61562068); 无穷维哈密顿系统及其算法应用教育部重点实验室开放课题(2023KFZD03); 内蒙古自治区自然科学基金(2022LHMS06001); 内蒙古师范大学基本科研业务费专项资金(2022JBQN106); 内蒙古师范大学基本科研业务费专项资金(2022JBQN111); 内蒙古师范大学基本科研业务费专项资金(2022JBTD016); 内蒙古师范大学研究生创新基金(CXJJS23066)

Soundscape Recognition: Explorations and Frontiers of Acoustic Scene Classification in the Digital Era

PANG Xin¹, GE Fengpei²^,*(), LI Yanling¹^,³

1. School of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, Inner Mongolia, China
2. Library, Beijing University of Posts and Telecommunications, Beijing 100876, China
3. Key Laboratory of Infinite-dimensional Hamiltonian System and Its Algorithm Application, Ministry of Education, Inner Mongolia Normal University, Hohhot 010022, Inner Monglia, China

Received:2023-12-12 Online:2025-06-15 Published:2025-06-27
Contact: GE Fengpei

摘要/Abstract

摘要：

声学场景分类(ASC)旨在让计算机模拟人类听觉识别不同的声学环境，是计算机听觉领域中具有挑战性的任务之一。随着智能音频处理技术以及神经网络学习算法的快速进步，近年来ASC任务也涌现出一系列新算法和新技术。为了全面展示该领域的技术发展脉络和演进过程，梳理了该领域的早期工作和近期发展，全面介绍了ASC任务。首先描述了ASC的应用场景和面临的挑战；其次详细介绍了ASC的主流框架，重点阐述了应用于此领域的深度学习算法；然后系统性地总结了ASC的前沿探索与延伸任务以及公开数据集；最后对ASC的发展趋势进行探讨与展望。

关键词: 声学场景分类, 深度学习, 音频分类, 语音识别, 数据增强

Abstract:

Acoustic Scene Classification (ASC) aims to enable computers to simulate the human auditory system in the task of recognizing various acoustic environments, which is a challenging task in the field of computer audition. With rapid advancements in intelligent audio processing technologies and neural network learning algorithms, a series of new algorithms and technologies for ASC have emerged in recent years. To comprehensively present the technological development trajectory and evolution in this field, this review systematically examines both early work and recent developments in ASC, providing a thorough overview of the field. This review first describes application scenarios and the challenges encountered in ASC and then details the mainstream frameworks in ASC, with a focus on the application of deep learning algorithms in this domain. Subsequently, it systematically summarizes frontier explorations, extension tasks, and publicly available datasets in ASC and finally discusses the prospects for future development trends in ASC.

Key words: Acoustic Scene Classification(ASC), deep learning, audio classification, speech recognition, Data Augmentation(DA)

庞鑫, 葛凤培, 李艳玲. 声景识音：数字化时代声学场景分类的探索与前沿[J]. 计算机工程, 2025, 51(6): 1-19.

PANG Xin, GE Fengpei, LI Yanling. Soundscape Recognition: Explorations and Frontiers of Acoustic Scene Classification in the Digital Era[J]. Computer Engineering, 2025, 51(6): 1-19.

https://www.ecice06.com/CN/Y2025/V51/I6/1

图/表 13

图1 ASC系统通用框架

Fig.1 Common framework of ASC systems

图2 传统ASC方法发展历程

Fig.2 Development history of traditional ASC methods

图3 基于深度学习的ASC发展历程

Fig.3 Development history of ASC based on deep learning

图4 基于CNN的ASC框架

Fig.4 Framework of ASC based on CNN

图5 基于多尺度融合的Xception架构

Fig.5 Xception architecture based on multi-scale fusion

图6 基于注意力的ASC框架

Fig.6 Framework of attention-based ASC

图7 基于时间-频谱注意力的并行分支结构

Fig.7 Parallel branch structure based on Temporal & Spectral Attention

参考文献 124

1	VIRTANEN T, PLUMBLEY M D, ELLIS D. Computational analysis of sound scenes and events. Berlin, Germany: Springer, 2018.
2	VALIN J M, MICHAUD F, HADJOU B, et al. Localization of simultaneous moving sound sources for mobile robot using a frequency-domain steered beamformer approach[C]//Proceedings of the IEEE International Conference on Robotics and Automation. Washington D.C., USA: IEEE Press, 2004: 1033-1038.
3	ALEXANDRE E, CUADRA L, ROSA M, et al. Feature selection for sound classification in hearing aids through restricted search driven by genetic algorithms. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(8): 2249- 2256. doi: 10.1109/TASL.2007.905139
4	VIVEK V S, VIDHYA S, MADHANMOHAN P. Acoustic scene classification in hearing aid using deep learning[C]//Proceedings of the International Conference on Communication and Signal Processing (ICCSP). Washington D.C., USA: IEEE Press, 2020: 695-699.
5	STOWELL D, CLAYTON D. Acoustic event detection for multiple overlapping similar sources[C]//Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Washington D.C., USA: IEEE Press, 2015: 1-5.
6	PHAM L, PHAN H, NGUYEN T, et al. Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework. Digital Signal Processing, 2021, 110, 102943. doi: 10.1016/j.dsp.2020.102943
7	BARCHIESI D, GIANNOULIS D, STOWELL D, et al. Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 2015, 32(3): 16- 34. doi: 10.1109/MSP.2014.2326181
8	SINGH V K, SHARMA K, SUR S N. A survey on preprocessing and classification techniques for acoustic scene. Expert Systems with Applications, 2023, 229, 120520. doi: 10.1016/j.eswa.2023.120520
9	CHANDRAKALA S, JAYALAKSHMI S L. Environmental audio scene and sound event recognition for autonomous surveillance: a survey and comparative studies. ACM Computing Surveys, 2019, 52(3): 1- 34.
10	WANG D L, BROWN G J. Computational auditory scene analysis: principles, algorithms, and applications. [S. l.]: Wiley, 2006.
11	CLARKSON B, SAWHNEY N, PENTLAND A. Auditory context awareness via wearable computing. Energy, 1998, 400, 20. URL
12	YE J X, KOBAYASHI T, MURAKAWA M, et al. Acoustic scene classification based on sound textures and events[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York, USA: ACM Press, 2015: 1291-1294.
13	SALAMON J, JACOBY C, BELLO J P, et al. A dataset and taxonomy for urban sound research[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York, USA: ACM Press, 2014: 1041-1044.
14	ERONEN A J, PELTONEN V T, TUOMI J T, et al. Audio-based context recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2005, 14(1): 321- 329.
15	AUCOUTURIER J J, DEFREVILLE B, PACHET F. The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. The Journal of the Acoustical Society of America, 2007, 122(2): 881- 891. doi: 10.1121/1.2750160
16	PASEDDULA C, GANGASHETTY S V. Late fusion framework for acoustic scene classification using LPCC, SCMC, and log-mel band energies with deep neural networks. Applied Acoustics, 2021, 172, 107568. doi: 10.1016/j.apacoust.2020.107568
17	GREEN M C, ADAVANNE S, MURPHY D. Acoustic scene classification using higher-order ambisonic features[C]//Proceedings of 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Washington D.C., USA: IEEE Press, 2019: 42-45.
18	康丽霞, 马建芬, 张朝霞. 基于多特征后期融合的声学场景分类. 计算机工程与设计, 2023, 44(1): 141- 147.
	KANG L X, MA J F, ZHANG Z X. Acoustic scene classification based on multi-feature post-fusion. Computer Engineering and Design, 2023, 44(1): 141- 147.
19	WALDEKAR S, SAHA G. Wavelet transform based mel-scaled features for acoustic scene classification[C]//Proceedings of the INTERSPEECH'18. Washington D.C., USA: IEEE Press, 2018: 3323-3327.
20	SAWHNEY N, MAES P. Situational awareness from environmental sounds[EB/OL]. [2023-11-07]. http://www.researchgate.net/publication/2796654_Situational_Awareness_from_Environmental_Sounds/download.
21	ERONEN A, TUOMI J, KLAPURI A, et al. Audio-based context awareness-acoustic modeling and perceptual evaluation[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington D.C., USA: IEEE Press, 2003: 520-529.
22	DORFER M, LEHNER B, EGHBAL-ZADEH H, et al. Acoustic scene classification with fully convolutional neural networks and I-Vectors[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1607.02383.
23	OO M M. Comparative study of MFCC feature with different machine learning techniques in acoustic scene classification. International Journal of Research and Engineering, 2018, 5(7): 439- 444. URL
24	SUN J Y, LIU X B, MEI X H, et al. Deep neural decision forest for acoustic scene classification[C]//Proceedings of the 30th European Signal Processing Conference (EUSIPCO). Washington D.C., USA: IEEE Press, 2022: 772-776.
25	PICZAK K J. Environmental sound classification with convolutional neural networks[C]//Proceedings of the IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). Washington D.C., USA: IEEE Press, 2015: 1-6.
26	YANG L P, TAO L J, CHEN X X, et al. Multi-scale semantic feature fusion and data augmentation for acoustic scene classification. Applied Acoustics, 2020, 163, 107238. doi: 10.1016/j.apacoust.2020.107238
27	沈昕昊, 陈嘉烨, 宋晓宁. 基于配对特征融合的声学场景分类方法. 计算机应用研究, 2023, 40(6): 1771- 1776.
	SHEN X H, CHEN J Y, SONG X N. Acoustic scene classification method based on paired feature fusion. Application Research of Computers, 2023, 40(6): 1771- 1776.
28	FEDORISHIN D, SANKARAN N, MOHAN D D, et al. Waveforms and spectrograms: enhancing acoustic scene classification using multimodal feature fusion[EB/OL]. [2023-11-07]. https://dcase.community/documents/workshop2021/proceedings/DCASE2021Workshop_Fedorishin_69.pdf.
29	常月, 侯元波, 谭奕舟, 等. 基于自注意力机制的多模态场景分类. 复旦学报(自然科学版), 2023, 62(1): 46- 52.
	CHANG Y, HOU Y B, TAN Y Z, et al. Multimodal scene classification based on self-attention mechanism. Journal of Fudan University (Natural Science), 2023, 62(1): 46- 52.
30	ABEER J. A review of deep learning based methods for acoustic scene classification. Applied Sciences, 2020, 10(6): 2020. doi: 10.3390/app10062020
31	ZIELIŃSKI S K. Feature extraction of surround sound recordings for acoustic scene classification. Berlin, Germany: Springer International Publishing, 2018.
32	KAWAMURA T, KINOSHITA Y, ONO N, et al. Effectiveness of inter- and intra-subarray spatial features for acoustic scene classification[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2023: 1-5.
33	SALAMON J, BELLO J P. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 2017, 24(3): 279- 283. URL
34	AYTAR Y, VONDRICK C, TORRALBA A. SoundNet: learning sound representations from unlabeled video[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2016: 892-900.
35	TOKOZUME Y, USHIKU Y, HARADA T. Learning from between-class examples for deep sound recognition[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1711.10282v2.
36	ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1710.09412v2.
37	YUN S, HAN D, CHUN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2019: 6022-6031.
38	KIM G, HAN D K, KO H. SpecMix: a mixed sample data augmentation method for training with time-frequency domain features[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2108.03020v1.
39	PARK D S, CHAN W, ZHANG Y, et al. SpecAugment: a simple data augmentation method for automatic speech recognition[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1904.08779v3.
40	WANG H L, ZOU Y X, WANG W W. SpecAugment++: a hidden space data augmentation method for acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2103.16858v3.
41	ZHOU K Y, YANG Y X, QIAO Y, et al. Domain generalization with MixStyle[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2104.02008v1.
42	KIM B, YANG S, KIM J, et al. Domain generalization with relaxed instance frequency-wise normalization for multi-device acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2206.12513v1.
43	MOROCUTTI T, SCHMID F, KOUTINI K, et al. Device-robust acoustic scene classification via impulse response augmentation[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2305.07499v2.
44	XIE W, HE Q H, YAN H K, et al. Acoustic scene classification using deep CNNs with time-frequency representations[C]//Proceedings of the IEEE 21st International Conference on Communication Technology (ICCT). Washington D.C., USA: IEEE Press, 2021: 1325-1329.
45	VALENTI M, SQUARTINI S, DIMENT A, et al. A convolutional neural network approach for acoustic scene classification[C]//Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN). Washington D.C., USA: IEEE Press, 2017: 1547-1554.
46	LIM M, LEE D, PARK H, et al. Convolutional neural network based audio event classification. KSII Transactions on Internet and Information Systems, 2018, 12(6): 2748- 2760. URL
47	PICZAK K J. The details that matter: frequency resolution of spectrograms in acoustic scene classification[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events. Washington D.C., USA: IEEE Press, 2017: 103-107.
48	HAN Y, PARK J, LEE K. Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification[C]//Proceedings of Workshop on Detection and Classification of Acoustic Scenes and Events. Washington D.C., USA: IEEE Press, 2018: 46-50.
49	REN Z, KONG Q Q, HAN J, et al. Attention-based atrous convolutional neural networks: visualisation and understanding perspectives of acoustic scenes[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2019: 56-60.
50	KOUTINI K, EGHBAL-ZADEH H, DORFER M, et al. The receptive field as a regularizer in deep convolutional neural networks for acoustic scene classification[C]//Proceedings of the 27th European Signal Processing Conference (EUSIPCO). Washington D.C., USA: IEEE Press, 2019: 1-5.
51	YANG L, CHEN X, TAO L. Acoustic scene classification using multi-scale features[EB/OL]. [2023-11-07]. https://www.semanticscholar.org/paper/Acoustic-scene-classification-using-multi-scale-Yang-Chen/860c67a658275e399f6c23ccde88e4347f6d5170.
52	KOUTINI K, EGHBAL-ZADEH H, WIDMER G. Receptive-field-regularized CNN variants for acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1909.02859v1.
53	BASBUG A M, SERT M. Acoustic scene classification using spatial pyramid pooling with convolutional neural networks[C]//Proceedings of the IEEE 13th International Conference on Semantic Computing (ICSC). Washington D.C., USA: IEEE Press, 2019: 128-131.
54	PHAYE S S R, BENETOS E, WANG Y. SubSpectralNet-using sub-spectrogram based convolutional neural networks for acoustic scene classification[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2019: 825-829.
55	HU H, YANG C H, XIA X J, et al. Device-robust acoustic scene classification based on two-stage categorization and data augmentation[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2007.08389v2.
56	KEK X Y, CHIN C S, LI Y. Multi-timescale wavelet scattering with genetic algorithm feature selection for acoustic scene classification. IEEE Access, 2022, 10, 25987- 26001.
57	曹毅, 费鸿博, 李平, 等. 基于多流卷积和数据增强的声场景分类方法. 华中科技大学学报(自然科学版), 2022, 50(4): 40- 46.
	CAO Y, FEI H B, LI P, et al. Acoustic scene classification method based on multi-stream convolution and data augmentation. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2022, 50(4): 40- 46.
58	HASAN N W, SAUDI A S, KHALIL M I, et al. A genetic algorithm approach to automate architecture design for acoustic scene classification. IEEE Transactions on Evolutionary Computation, 2023, 27(2): 222- 236. URL
59	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1706.03762.
60	GONG Y, CHUNG Y A, GLASS J. AST: audio spectrogram Transformer[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2104.01778v3.
61	ZHANG Z C, XU S G, ZHANG S Q, et al. Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing, 2021, 453, 896- 903. URL
62	WANG C Y, SANTOSO A, WANG J C. Acoustic scene classification using self-determination convolutional neural network[C]//Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Washington D.C., USA: IEEE Press, 2017: 19-22.
63	WANG J, LI S. Self-attention mechanism based system for dcase2018 challenge task1 and task4[EB/OL]. [2023-11-07]. https://www.semanticscholar.org/paper/SELF-ATTENTION-MECHANISM-BASED-SYSTEM-FOR-DCASE-1-4-Wang-Li/22d7984334c314fc4c890df33dea6a5ac3959f61.
64	TRIPATHI A, PAUL K. Temporal self attention-based residual network for environmental sound classification[C]//Proceedings of INTERSPEECH'22. Washington D.C., USA: IEEE Press, 2022: 128.
65	ZHANG Z C, XU S G, ZHANG S Q, et al. Learning attentive representations for environmental sound classification. IEEE Access, 2019, 7, 130327- 130339. URL
66	WANG H L, ZOU Y X, CHONG D D, et al. Environmental sound classification with parallel temporal-spectral attention[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1912.06808v3.
67	WANG Y, FENG C Y, ANDERSON D V. A multi-channel temporal attention convolutional neural network model for environmental sound classification[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2021: 930-934.
68	LI Z T, HOU Y B, XIE X, et al. Multi-level attention model with deep scattering spectrum for acoustic scene classification[C]//Proceedings of the IEEE International Conference on Multimedia & Expo Workshops (ICMEW). Washington D.C., USA: IEEE Press, 2019: 396-401.
69	REN Z, KONG Q, QIAN K, et al. Attention-based convolutional neural networks for acoustic scene classification[EB/OL]. [2023-11-07]. https://www.researchgate.net/publication/329186395_Attention-based_Convolutional_Neural_Networks_for_Acoustic_Scene_Classification.
70	SINGH A, RAJAN P, BHAVSAR A. Deep multi-view features from raw audio for acoustic Scene Classification[C]//Proceedings of DCASE'19. New York, USA: ACM Press, 2019: 229-233.
71	KOUTINI K, SCHLVTER J, EGHBAL-ZADEH H, et al. Efficient training of audio transformers with patchout[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2110.05069v3.
72	HUANG J, LU H, MEYER P L, et al. Acoustic scene classification using deep learning-based ensemble averaging[C]//Proceedings of DCASE'19. New York, USA: ACM Press, 2019: 94-98.
73	KUMAR A, KHADKEVICH M, FVGEN C. Knowledge transfer from weakly labeled audio using convolutional neural network for sound events and scenes[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2018: 326-330.
74	RUDER S. An overview of multi-task learning in deep neural networks[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1706.05098v1.
75	NWE T L, DAT T H, MA B. Convolutional neural network with multi-task learning scheme for acoustic scene classification[C]//Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Washington D.C., USA: IEEE Press, 2017: 1347-1350.
76	SCHMID F, MOROCUTTI T, MASOUDIAN S, et al. CP-JKU submission to DCASE23: efficient acoustic scene classification with CP-MOBILE[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Schmid_28_t1.pdf.
77	CAI Y, LIN M, ZHU C, et al. DCASE2023 task1 submission: device simulation and time-frequency separable convolution for acoustic scene classification[EB/OL]. [2023-11-07]. https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0%2C5&q=Waveforms+and+speDCASE2023+task1+submission%3A+Device+simulation+and+time-frequency+separable+convolution+for+acoustic+scene+classificationctrograms%3A+enhancing+acoustic+scene+classification+using+multimodal+feature+fusion&btnG=.
78	TAN J, LI Y. Low-complexity acoustic scene classification using blueprint separable convolution and knowledge distillation[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Tan_62_t1.pdf.
79	GREIF J, PICHLER N, WILLDONER C, et al. MALACH23 submission to DCASE2023: acoustic scene classification with receptive-field regularized convolution neural networks and state space models[EB/OL]. [2023-11-07]. https://scholar.google.com.hk/scholar?hl=zh-CN&as_sdt=0%2C5&q=+MALACH23+submission+to+DCASE2023%3A+acoustic+scene+classification+with+receptive-field+regularized+convolution+neural+networks+and+state+space+models&btnG=.
80	KIM T S, RHO D, LEE G, et al. Dual-strategy enhancement of acoustic scene and event classification: integrating Res2Net, GhostNet, and MobileFormer architectures[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Park_35_t1.pdf.
81	CAI W, ZHANG M, ZHANG X. TENCENT submission to DCASE23 TASK1: low-complexity deep learning solution for acoustic scene classification[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Cai_53_t1.pdf.
82	SCHMIDT L P, KILIÇ B, PETERS N. Submission to DCASE2023 TAKS1: device invariant training with structured filter pruning for low complexity acoustic scene classification[EB/OL]. [2023-11-07]. https://www.researchgate.net/publication/374723456_Submission_to_DCASE_2023_Task_1_Device_Invariant_Training_with_Structured_Filter_Pruning_for_Low_Complexity_Acoustic_Scene_Classification.
83	FEI H, LI X, JIA J. Acoustic scene classification based on multi-teacher knowledge distillation and SERFR-CNN[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2023/technical_reports/DCASE2023_Fei_12_t1.pdf.
84	TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2302.13971v1.
85	LATIF S, SHOUKAT M, SHAMSHAD F, et al. Sparks of large audio models: a survey and outlook[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2308.12792v3.
86	FATHULLAH Y, WU C Y, LAKOMKIN E, et al. Prompting large language models with speech recognition abilities[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2307.11795v1.
87	OpenAI. GPT-4 technical report[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2303.08774.
88	ZHANG D, LI S M, ZHANG X, et al. SpeechGPT: empowering large language models with intrinsic cross-modal conversational abilities[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2305.11000v2.
89	DU Z H, WANG J M, CHEN Q, et al. LauraGPT: listen, attend, understand, and regenerate audio with GPT[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2310.04673v4.
90	ZHANG H, SI N W, CHEN Y Q, et al. Tuning large language model for end-to-end speech translation[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2310.02050v1.
91	CAI R, LU L, HANJALIC A, et al. A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(3): 1026- 1039. URL
92	GIANNOULIS D, STOWELL D, BENETOS E, et al. A database and challenge for acoustic scene classification and event detection[C]//Proceedings of the 21st European Signal Processing Conference. Washington D.C., USA: IEEE Press, 2013: 1-5.
93	CHAUDHURI S, RAJ B. Unsupervised hierarchical structure induction for deeper semantic analysis of audio[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D.C., USA: IEEE Press, 2013: 833-837.
94	GIANNOULIS D, BENETOS E, STOWELL D, et al. Detection and classification of acoustic scenes and events: an IEEE AASP challenge[C]//Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Washington D.C., USA: IEEE Press, 2013: 1-4.
95	MESAROS A, HEITTOLA T, VIRTANEN T. TUT database for acoustic scene classification and sound event detection[C]//Proceedings of the 24th European Signal Processing Conference. Washington D.C., USA: IEEE Press, 2016: 1128-1132.
96	MESAROS A, HEITTOLA T, BENETOS E, et al. Detection and classification of acoustic scenes and events: outcome of the DCASE 2016 challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(2): 379- 393. URL
97	MESAROS A, HEITTOLA T, DIMENT A, et al. DCASE 2017 challenge setup: tasks, datasets and baseline system[EB/OL]. [2023-11-07]. https://www.researchgate.net/publication/319842878_DCASE_2017_CHALLENGE_SETUP_TASKS_DATASETS_AND_BASELINE_SYSTEM.
98	MESAROS A, HEITTOLA T, VIRTANEN T. A multi-device dataset for urban acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1807.09840v2.
99	MESAROS A, HEITTOLA T, VIRTANEN T. Acoustic scene classification in DCASE 2019 challenge: closed and OpenSet classification and data mismatch setups[EB/OL]. [2023-11-07]. http://homepages.tuni.fi/annamaria.mesaros/pubs/mesaros_ASC_DCASE2019.
100	HEITTOLA T, MESAROS A, VIRTANEN T. Acoustic scene classification in DCASE 2020 challenge: generalization across devices and low complexity solutions[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2005.14623v2.
101	WANG S S, MESAROS A, HEITTOLA T, et al. A curated dataset of urban scenes for audio-visual scene analysis[C]//Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2021: 626-630.
102	WANG S S, HEITTOLA T, MESAROS A, et al. Audio-visual scene classification: analysis of DCASE2021 challenge submissions[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2105.13675v2.
103	MARTÍ N-MORATÓ I, PAISSAN F, ANCILOTTO A, et al. Low-complexity acoustic scene classification in DCASE2022 Challenge[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2206.03835v2.
104	PICZAK K J, PICZAK K J. ESC: dataset for environmental sound classification[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York, USA: ACM Press, 2015: 1015-1018.
105	GEMMEKE J F, ELLIS D P W, FREEDMAN D, et al. AudioSet: an ontology and human-labeled dataset for audio events[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2017: 776-780.
106	FONSECA E, FAVORY X, PONS J, et al. FSD50K: an open dataset of human-labeled sound events. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 30, 829- 852. URL
107	SAKI F, GUO Y Y, HUNG C Y, et al. Open-set evolving acoustic scene classification system[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events. New York, USA: ACM Press, 2019: 219-223.
108	WILKINGHOFF K, KURTH F. Open-set acoustic scene classification with deep convolutional autoencoders[C]//Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events. New York, USA: ACM Press, 2019: 258-262.
109	LEHNER B, KOUTINI K, SCHWARZLMVLLER C, et al. Acoustic scene classification with reject option based on resnets[EB/OL]. [2023-11-07]. https://www.researchgate.net/publication/337834114_ACOUSTIC_SCENE_CLASSIFICATION_WITH_REJECT_OPTION_BASED_ON_RESNETS.
110	KOUTINI K, HENKEL F, EGHBAL-ZADEH H, et al. CP-JKU submissions to DCASE'20: Low-complexity cross-device acoustic scene classification with rf-regularized CNNs[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2020/technical_reports/DCASE2020_Koutini_142.pdf.
111	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 4510-4520.
112	TAN M, LE Q. Efficientnet: rethinking model scaling for convolutional neural networks[EB/OL]. [2023-11-07]. https://arxiv.org/abs/1905.11946.
113	CHEN W L, WILSON J T, TYREE S, et al. Compressing neural networks with the hashing trick[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning. New York, USA: ACM Press, 2015: 2285-2294.
114	KOUTINI K, JAN S, WIDMER G. CPJKU submission to DCASE21: cross-device audio scene classification with wide sparse frequency-damped CNNs[EB/OL]. [2023-11-07]. https://dcase.community/documents/challenge2021/technical_reports/DCASE2021_Koutini_112_t1.pdf.
115	SINGH A, KING J A, LIU X B, et al. Low-complexity CNNs for acoustic scene classification[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2208.01555v1.
116	ZHAO N. Low-complexity acoustic scene classification using knowledge distillation and multiple classifiers[EB/OL]. [2023-11-07]. https://dcase.community/documents/workshop2023/proceedings/DCASE2023Workshop_Weng_75.pdf.
117	SCHMID F, MASOUDIAN S, KOUTINI K, et al. Knowledge distillation from Transformers for low-complexity acoustic scene classification[EB/OL]. [2023-11-07]. https://dcase.community/documents/workshop2022/proceedings/DCASE2022Workshop_Schmid_27.pdf.
118	MEZZA A I, HABETS E A P, MVLLER M, et al. Unsupervised domain adaptation for acoustic scene classification using band-wise statistics matching[C]//Proceedings of the 28th European Signal Processing Conference (EUSIPCO). Washington D.C., USA: IEEE Press, 2021: 11-15.
119	KOSMIDER M. Calibrating neural networks for secondary recording devices[EB/OL]. [2023-11-07]. https://www.semanticscholar.org/paper/CALIBRATING-NEURAL-NETWORKS-FOR-SECONDARY-RECORDING-Kosmider/9c02158873465752b43444c23265ecd38813acad.
120	KIM B, YANG S, KIM J, et al. QTI submission to DCASE2021: residual normalization for device-imbalanced acoustic scene classification with efficient design[EB/OL]. [2023-11-07]. https://arxiv.org/abs/2206.13909v2.
121	MEZZA A I, HABETS E A P, MVLLER M, et al. Unsupervised domain adaptation via principal subspace projection for acoustic scene classification. Journal of Signal Processing Systems, 2022, 94(2): 197- 213. doi: 10.1007/s11265-021-01720-9?utm_source=xmol&utm_content=meta
122	GHARIB S, DROSSOS K, ÇAKIR E, et al. Unsupervised adversarial domain adaptation for acoustic scene classification[C]//Proceedings of Workshop on Detection and Classification of Acoustic Scenes and Events. New York, USA: ACM Press, 2018: 138-142.
123	OLVERA M, VINCENT E, GASSO G. On the impact of normalization strategies in unsupervised adversarial domain adaptation for acoustic scene classification[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2022: 631-635.
124	DROSSOS K, MAGRON P, VIRTANEN T. Unsupervised adversarial domain adaptation based on the Wasserstein distance for acoustic scene classification[C]//Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Washington D.C., USA: IEEE Press, 2019: 259-263.

[1]	秦永旺, 张洋, 胡星, 刘胜, 李少青. 基于图注意力网络的门级网表功能识别[J]. 计算机工程, 2025, 51(6): 29-37.
[2]	廖丁丁, 刘俊峰, 曾君, 邱晓欢. 一种基于块平均正交权重修正的连续学习算法[J]. 计算机工程, 2025, 51(6): 57-64.
[3]	商雅名, 吴安彪, 袁野, 王一舒. 基于个性化PageRank高阶邻域聚合的图神经网络增强[J]. 计算机工程, 2025, 51(6): 38-48.
[4]	曹蓓, 赵奎. 基于双重情感和多特征融合的虚假新闻检测[J]. 计算机工程, 2025, 51(6): 193-203.
[5]	王培吉, 邹承明. 基于向量转换的卷积计算优化方法[J]. 计算机工程, 2025, 51(6): 74-82.
[6]	陈思帆, 杨家志, 黄琳, 吕志玮, 沈露. 融合可变形核和自注意力的点云分类分割边卷积网络[J]. 计算机工程, 2025, 51(6): 146-154.
[7]	郝志峰, 黎阳霖, 许柏炎, 蔡瑞初. 面向跨域自然语言生成SQL语句的超图神经网络[J]. 计算机工程, 2025, 51(5): 114-123.
[8]	魏铭康, 李嘉楠, 韩林, 高伟, 赵荣彩, 王洪生. 面向深度学习编译器的多粒度量化框架支持与优化[J]. 计算机工程, 2025, 51(5): 62-72.
[9]	赵瑶谦, 滕奇志, 何小海, 税爱, 陈洪刚. 基于自注意力特征蒸馏的轻量级图像超分辨率重建[J]. 计算机工程, 2025, 51(5): 257-265.
[10]	庄紫薇, 朱俊国. 面向多源文本的越南语文本检错方法[J]. 计算机工程, 2025, 51(5): 93-102.
[11]	李丹丹, 李智, 郑龙, 张丽. 面向弥散张量图像的鲁棒可逆水印算法[J]. 计算机工程, 2025, 51(5): 279-287.
[12]	蒋杰平, 王明文. 基于时空置换注意力机制的残差行为识别模型[J]. 计算机工程, 2025, 51(4): 119-128.
[13]	杜晨阳, 张雪英, 黄丽霞, 李娟. 基于改进高效通道注意力机制的多特征语音情感识别[J]. 计算机工程, 2025, 51(4): 97-106.
[14]	戴康佳, 徐慧英, 朱信忠, 李悉钰, 黄晓, 陈国强, 张志雄. YGL-SLAM: 动态场景下基于点和线的语义SLAM系统[J]. 计算机工程, 2025, 51(3): 95-104.
[15]	韩鹏, 黄韫栀, 任彩月, 程竞仪, 徐军. 基于双分支网络的乳腺PET新辅助化疗疗效评估[J]. 计算机工程, 2025, 51(3): 293-299.

选择文件类型/文献管理软件名称

选择包含的内容

声景识音：数字化时代声学场景分类的探索与前沿

Soundscape Recognition: Explorations and Frontiers of Acoustic Scene Classification in the Digital Era

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 124

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

声景识音：数字化时代声学场景分类的探索与前沿

Soundscape Recognition: Explorations and Frontiers of Acoustic Scene Classification in the Digital Era

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 124

相关文章 15

编辑推荐

Metrics

本文评价