Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (6): 327-337. doi: 10.19678/j.issn.1000-3428.0070192

• Development Research and Engineering Application • Previous Articles     Next Articles

CNN-Transformer-Based Lesion and Organ Segmentation Network for Electronic Laryngoscope

LI Baiya*()   

  1. Department of Otolaryngology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, Shaanxi, China
  • Received:2024-08-05 Online:2025-06-15 Published:2024-12-13
  • Contact: LI Baiya

基于CNN-Transformer的电子喉镜病灶及器官分割网络

李白芽*()   

  1. 西安交通大学第一附属医院耳鼻咽喉头颈外科, 陕西 西安 710061
  • 通讯作者: 李白芽
  • 基金资助:
    2022西安交通大学教改项目(22BJ07Z); 陕西省2023年度“教师发展研究计划专项项目”(2023JSY027)

Abstract:

In electronic laryngoscopy, the variable morphology of lesions and organs, along with unclear boundaries between lesions, organs, and mucosal tissues, leads to unsatisfactory accuracy in image segmentation of lesions and major laryngeal organs. To address this problem, a CNN-Transformer two-stream hybrid network is proposed. The Convolutional Neural Network (CNN) branch extracts fine-grained features, whereas the Transformer branch extracts global semantic features. Specifically, the hybrid network first extracts fine-grained features at multiple scales in the image through the CNN branch and then fuses the extracted features with the global semantic features from the Transformer branch. This approach effectively captures both shallow, local fine-grained representations of features and deep, global information. A dark feature enhancement module is used to enhance the feature details in the darker regions of the image before performing multilevel feature fusion. To validate the effectiveness of the method, 2 425 laryngoscopic surgical images from various medical institutions are used for experiments. The results are compared and analyzed with nine recently proposed methods, demonstrating the superiority of the proposed approach.

Key words: electronic laryngoscope, image segmentation, hybrid two-stream network, multi-level feature fusion, dark feature enhancement

摘要:

在电子喉镜检查中, 随着镜头的移动, 病灶和器官的形态会发生多种变化, 同时病灶和器官与黏膜组织的边界不清晰, 导致了对病灶和主要喉部器官进行同步图像分割的准确率不理想。为解决这一问题, 提出一种CNN-Transformer双流混合网络。双流混合网络中的卷积神经网络(CNN)分支负责提取细粒度特征, 而Transformer分支则负责提取全局语义特征。具体来说, 混合网络通过CNN对图像中多种尺度下的细粒度特征进行挖掘, 然后将提取到的不同尺度下的CNN特征与Transformer分支提取到的相应尺度下的全局语义特征进行融合。这种双流混合结构既能有效实现捕获到特征的浅层次及局部细节信息表现, 同时又能对深层特征和全局信息保持敏感。此外, 在进行多层次特征融合前, 使用暗部特征强化模块来增强阴影区域图像的特征细节, 以保证分割的准确率。为验证方法的有效性, 使用了来自不同医疗机构的2 425张喉镜手术图像进行实验, 并与近期提出的9种方法进行了对比分析, 实验结果证明了所提出方法的先进性。

关键词: 电子喉镜, 图像分割, 双流混合网络, 多尺度特征融合, 暗部特征增强