CNN-Transformer-Based Lesion and Organ Segmentation Network for Electronic Laryngoscope

doi:10.19678/j.issn.1000-3428.0070192

Abstract

Abstract:

In electronic laryngoscopy, the variable morphology of lesions and organs, along with unclear boundaries between lesions, organs, and mucosal tissues, leads to unsatisfactory accuracy in image segmentation of lesions and major laryngeal organs. To address this problem, a CNN-Transformer two-stream hybrid network is proposed. The Convolutional Neural Network (CNN) branch extracts fine-grained features, whereas the Transformer branch extracts global semantic features. Specifically, the hybrid network first extracts fine-grained features at multiple scales in the image through the CNN branch and then fuses the extracted features with the global semantic features from the Transformer branch. This approach effectively captures both shallow, local fine-grained representations of features and deep, global information. A dark feature enhancement module is used to enhance the feature details in the darker regions of the image before performing multilevel feature fusion. To validate the effectiveness of the method, 2 425 laryngoscopic surgical images from various medical institutions are used for experiments. The results are compared and analyzed with nine recently proposed methods, demonstrating the superiority of the proposed approach.

Key words: electronic laryngoscope, image segmentation, hybrid two-stream network, multi-level feature fusion, dark feature enhancement

摘要：

在电子喉镜检查中, 随着镜头的移动, 病灶和器官的形态会发生多种变化, 同时病灶和器官与黏膜组织的边界不清晰, 导致了对病灶和主要喉部器官进行同步图像分割的准确率不理想。为解决这一问题, 提出一种CNN-Transformer双流混合网络。双流混合网络中的卷积神经网络(CNN)分支负责提取细粒度特征, 而Transformer分支则负责提取全局语义特征。具体来说, 混合网络通过CNN对图像中多种尺度下的细粒度特征进行挖掘, 然后将提取到的不同尺度下的CNN特征与Transformer分支提取到的相应尺度下的全局语义特征进行融合。这种双流混合结构既能有效实现捕获到特征的浅层次及局部细节信息表现, 同时又能对深层特征和全局信息保持敏感。此外, 在进行多层次特征融合前, 使用暗部特征强化模块来增强阴影区域图像的特征细节, 以保证分割的准确率。为验证方法的有效性, 使用了来自不同医疗机构的2 425张喉镜手术图像进行实验, 并与近期提出的9种方法进行了对比分析, 实验结果证明了所提出方法的先进性。

关键词: 电子喉镜, 图像分割, 双流混合网络, 多尺度特征融合, 暗部特征增强

LI Baiya. CNN-Transformer-Based Lesion and Organ Segmentation Network for Electronic Laryngoscope[J]. Computer Engineering, 2025, 51(6): 327-337.

李白芽. 基于CNN-Transformer的电子喉镜病灶及器官分割网络[J]. 计算机工程, 2025, 51(6): 327-337.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0070192

https://www.ecice06.com/EN/Y2025/V51/I6/327

Figures/Tables 15

Fig.1 Segmentation results of CNN and Transformer

Fig.2 CNN-Transformer hybrid two-stream network framework

Fig.3 Structure of multi-scale feature fusion module

Fig.4 Structure of dark feature enhancement module

Fig.5 Data set presentation

Fig.6 Comparisons of segmentation results for representative examples

Fig.7 Comparison of segmentation effect of two cases

Fig.8 Visualization results of ablation experiment

Fig.9 MeanDice and the number of parameters at different network depths

Fig.10 Accuracy of each class at different network depths

Fig.11 Gradient heat map analysis of different cases

References 30

1	HESAMIAN M H , JIA W J , HE X J , et al. Deep learning techniques for medical image segmentation: achievements and challenges. Journal of Digital Imaging, 2019, 32 (4): 582- 596. doi: 10.1007/s10278-019-00227-x
2	王美玲, 朱继庆, 李莹, 等. 基于卷积神经网络的喉镜图像解剖部位自动识别的研究. 临床耳鼻咽喉头颈外科杂志, 2023, 37 (1): 6- 12.
	WANG M L , ZHU J Q , LI Y , et al. Automatic anatomical site recognition of laryngoscopic images using convolutional neural network. Journal of Clinical Otorhinolaryngology Head and Neck Surgery, 2023, 37 (1): 6- 12.
3	潘晓英, 白伟栋, 代栋, 等. 用于咽喉器官分割的空洞残差金字塔算法. 计算机辅助设计与图形学学报, 2023, 35 (7): 1000- 1009.
	PAN X Y , BAI W D , DAI D , et al. Dilated residual pyramid algorithm for throatorgan segmentation. Journal of Computer-Aided Design & Computer Graphics, 2023, 35 (7): 1000- 1009.
4	吉彬, 任建君, 郑秀娟, 等. 改进U-Net在喉白斑病灶分割中的应用. 计算机工程, 2020, 46 (9): 248- 253. doi: 10.19678/j.issn.1000-3428.0056011
	JI B , REN J J , ZHENG X J , et al. Application of improved U-Net in segmentation of laryngeal leukoplakia lesion. Computer Engineering, 2020, 46 (9): 248- 253. doi: 10.19678/j.issn.1000-3428.0056011
5	JHA D, SMEDSRUD P H, RIEGLER M A, et al. ResUNet++: an advanced architecture for medical image segmentation[C]//Proceedings of IEEE International Symposium on Multimedia. Washington D. C., USA: IEEE Press, 2019: 225-2255.
6	PATEL K, BUR A M, WANG G H. Enhanced U-Net: a feature enhancement network for polyp segmentation[C]//Proceedings of the 18th Conference on Robots and Vision. Washington D. C., USA: IEEE Press, 2021: 181-188.
7	ZHANG Y F, PANG B, LU C W. Semantic segmentation by early region proxy[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE Press, 2022: 1258-1268.
8	HUANG X , DENG Z , LI D , et al. MISSFormer: an effective transformer for 2D medical image segmentation. IEEE Transactions on Medical Imaging, 2023, 42 (5): 1484- 1494. doi: 10.1109/TMI.2022.3230943
9	XU G P, ZHANG X, HE X W, et al. LeViT-UNet: make faster encoders with Transformer for medical image segmentation[C]//Proceedings of Conference on Pattern Recognition and Computer Vision. Berlin, Germany: Springer, 2024: 42-53.
10	JI Y F, ZHANG R M, WANG H J, et al. Multi-compound transformer for accurate biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention. Berlin, Germany: Springe, 2021: 326-336.
11	THAWAKAR O, NARAYAN S, CAO J L, et al. Video instance segmentation via multi-scale spatio-temporal split attention transformer[C]//Proceedings of International Conference on Computer Vision. Berlin, Germany: Springe, 2022: 666-681.
12	HUANG Z L, BEN Y C, LUO G Z, et al. Shuffle transformer: rethinking spatial shuffle for vision transformer[EB/OL]. [2024-07-01]. https://arxiv.org/abs/2106.03650v1.
13	WANG Z D, CUN X D, BAO J M, et al. Uformer: a general U-shaped transformer for image restoration[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE Press, 2022: 17683-17693.
14	XIE E Z, WANG W H, YU Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[C]//Proceedings of Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 12077-12090.
15	FAN D P, JI G P, ZHOU T, et al. PraNet: parallel reverse attention network for polyp segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention. Berlin, Germany: Springer, 2020: 263-273.
16	HUANG C H, WU H Y, LIN Y L. HarDNet-MSEG: a simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 FPS[EB/OL]. [2024-07-01]. https://arxiv.org/abs/2101.07172v2.
17	LIN A L , CHEN B Z , XU J Y , et al. DS-TransUNet: dual swin transformer U-Net for medical image segmentation. IEEE Transactions on Instrumentation and Measurement, 2022, 71, 1- 15.
18	YAO C, HU M H, LI Q L, et al. Transclaw U-Net: claw U-Net with transformers for medical image segmentation[C]//Proceedings of the 5th International Conference on Information Communication and Signal Processing. Washington D. C., USA: IEEE Press, 2022: 280-284.
19	CHEN J, LU Y, YU Q, et al. Transunet: transformers make strong encoders for medical image segmentation[EB/OL]. [2024-07-01]. https://arxiv.org/abs/2102.04306.
20	ZHANG Y D, LIU H Y, HU Q. TransFuse: fusing transformers and CNNs for medical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention. Berlin, Germany: Springe, 2021: 14-24.
21	LI Y J, CAI W T, GAO Y, et al. More than encoder: introducing transformer decoder to upsample[C]//Proceedings of IEEE International Conference on Bioinformatics and Biomedicine. Las Vegas, USA: IEEE Press, 2022: 1579-1602.
22	ZHANG Z , LIN Z , XU J , et al. Bilateral attention network for RGB-D salient object detection. IEEE Transactions on Image Processing, 2021, 30, 1949- 1961. doi: 10.1109/TIP.2021.3049959
23	ZHENG S X, LU J C, ZHAO H S, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 6881-6890.
24	ZHU X Z, CHENG D Z, ZHANG Z, et al. An empirical study of spatial attention mechanisms in deep networks[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 6688-6697.
25	WANG H Y, XIE S, LIN L F, et al. Mixed transformer U-Net for medical image segmentation[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2022: 2390-2394.
26	WANG H Y, XIE S, LIN L F, et al. Mixed transformer U-Net for medical image segmentation[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2022: 2390-2394.
27	VALANARASU J M J, OZA P, HACIHALILOGLU I, et al. Medical transformer: gated axial-attention for medical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention. Berlin, Germany: Springe, 2021: 36-46.
28	CAO H, WANG Y Y, CHEN J, et al. Swin-UNet: UNet-like pure transformer for medical image segmentation[C]//Proceedings of International Conference on Computer Vision. Berlin, Germany: Springe, 2023: 205-218.
29	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 3431-3440.
30	FAN D P , ZHOU T , JI G P , et al. Inf-Net: automatic COVID-19 lung infection segmentation from CT images. IEEE Transactions on Medical Imaging, 2021, 39 (8): 2626- 2637.

[1]	LIU Zhaowei, FANG Yanhong, ZHENG Mingyu, SUO Bin. Lung Disease Diagnosis Method Based on Attention Mechanism and Multi-tasking [J]. Computer Engineering, 2025, 51(1): 332-342.
[2]	GUO Min, ZHANG Xihan, LI Yang. Integrated Attentional Teacher Mutual Consistency Semi-Supervised Medical Image Segmentation [J]. Computer Engineering, 2024, 50(9): 313-323.
[3]	Shuang GAO, Yilun SHI, Qiaozhi XU, Lei YU. Research on Cardiac MRI Segmentation Based on Asymmetric Encoding and Decoding Structure of Contrastive Learning [J]. Computer Engineering, 2024, 50(8): 290-300.
[4]	HU Shuai, LI Hualing, HAO Dechen. Improved Multistage Edge-Enhanced Medical Image Segmentation Network of U-Net [J]. Computer Engineering, 2024, 50(4): 286-293.
[5]	Baihao JIANG, Jing LIU, Dawei QIU, Liang JIANG. Review of Deep Learning Applications in Spinal Image Segmentation [J]. Computer Engineering, 2024, 50(3): 1-15.
[6]	CAO Chuqing, LUO Hainan, MA Yujie. Relocalization Network with Element-wise Attention Mechanism and Corners Features [J]. Computer Engineering, 2024, 50(11): 130-141.
[7]	Yanggan FU, Lanwei ZHU, Hongrong WU, Fang CHEN. Coral Reef Benthic Material Information Extraction Method Based on Improved U-Net [J]. Computer Engineering, 2023, 49(12): 231-242.
[8]	Benchen YANG, Yuhang JIA, Haibo JIN. Volume Segmentation of Liver and Liver Tumor with Fusion of Multi-Branch Features [J]. Computer Engineering, 2023, 49(10): 194-201.
[9]	Zhangqingqing CHU, Zhiqiang ZHONG, Ziye YAN, Yinwei ZHAN. Brain Tumor Segmentation Algorithm Based on Feature Fusion and Attention Mechanism [J]. Computer Engineering, 2023, 49(10): 154-161.
[10]	QIAO Caicai, WU Chengmao, LI Changxing, WANG Jiaye. Robust Fuzzy Clustering Algorithm Integrating Membership Degree and Pixel Alternating Guided Filtering [J]. Computer Engineering, 2022, 48(8): 224-233.
[11]	LIN Zhijie, ZHENG Qiulan, LIANG Yong, XING Wei. Medical Image Segmentation Model Based on Involution U-Net [J]. Computer Engineering, 2022, 48(8): 180-186.
[12]	HUANG Sheng, RAN Haoshan. Refined Edge Detection Method Based on Semantic Information [J]. Computer Engineering, 2022, 48(3): 204-210.
[13]	WANG Wenxin, HE Yuhang, CHEN Gang. UCaps Network Based on EM-Routing Algorithm for Medical Image Segmentation [J]. Computer Engineering, 2022, 48(2): 268-274.
[14]	ZHANG Xiangfen, LIU Yan, YUAN Feiniu. 3D Medical Image Segmentation Based on Inverted Pyramid Deep Learning Network [J]. Computer Engineering, 2022, 48(12): 304-311.
[15]	SHANG Jiatong, LEI Tao, ZHANG Dong, DU Xiaogang, ZHAI Yujie. Lightweight Deformable Encoder-Decoder Network for Etched Image Segmentation [J]. Computer Engineering, 2022, 48(12): 203-211,217.

Please choose a citation manager

Content to export