Lin Junkai, Yu Jinghu, Wang Qimeng, Zhu Fangyong, Xu Haifeng
Accepted: 2026-06-17
Oral diseases seriously affect public health, and timely and effective diagnosis and treatment are of great significance for reducing the risk of disease progression. Conventional diagnosis of oral diseases mainly relies on manual interpretation of imaging data by experienced clinicians, which is often time-consuming and may overlook lesions with blurred boundaries. Therefore, image segmentation techniques are needed to assist the clinical diagnosis of dental diseases. Dental panoramic radiographs can present the overall morphology of teeth and jawbone structures in a single image and are commonly used in clinical dental diagnosis. However, due to low gray-level contrast, blurred lesion boundaries, noise, and artifact interference commonly present in dental panoramic radiographs, multi-class dental disease segmentation, including dental caries, periapical periodontitis, furcation involvement, and impacted teeth, remains highly challenging. To address these issues, this paper proposes Teeth-Net, a network for multi-class dental disease segmentation in dental panoramic radiographs. Based on the TransUNet architecture, Teeth-Net introduces targeted improvements in three key stages: feature extraction, feature reconstruction, and skip connections. In the feature extraction stage, a Cross-Scale Pyramid Fusion Module (CPFM) is introduced to optimize the original encoder. Multi-scale features are extracted through parallel group convolutions with different receptive fields, and the correlations among features at different scales are modeled using a cross-scale attention mechanism, thereby enhancing the model’s ability to capture small lesions and alleviating the loss of detailed features. In the feature reconstruction stage, a Parallel Multi-Kernel Pooling Module (PMKP) is designed to extract local details and global contextual information in parallel through multi-scale max pooling and average pooling. Channel compression and feature fusion are then performed to provide richer semantic information for the decoder. At each skip connection, a Spatial-Channel Collaborative Attention module (SCCA) is embedded to adaptively filter shallow features transmitted from the encoder through spatial and channel attention mechanisms, suppress background noise interference, and improve the quality of cross-layer feature fusion between the encoder and decoder. Comparative and ablation experiments are conducted on a self-built dental panoramic radiograph dataset. The experimental results show that Teeth-Net achieves a mean Dice coefficient, Hausdorff Distance (HD), precision, and recall of 84.22%, 18.546 mm, 94.13%, and 95.96%, respectively. Compared with the baseline TransUNet model, the mean Dice coefficient, precision, and recall are improved by 3.34, 2.89, and 4.21 percentage points, respectively, while the HD value is reduced by 6.869 mm. These results indicate that the proposed method achieves significant improvements in overall segmentation accuracy, boundary consistency, and lesion detection capability. To further evaluate the generalization ability and cross-dataset adaptability of the model, external tests are conducted on two public-source datasets. On the re-annotated MICCAI 2023 STS external test set, Teeth-Net achieves a mean Dice coefficient, HD value, precision, and recall of 80.26%, 19.520 mm, 92.58%, and 93.41%, respectively. Compared with the baseline TransUNet model, the mean Dice coefficient, precision, and recall are improved by 3.32, 4.33, and 3.89 percentage points, respectively, while the HD value is reduced by 6.705 mm. On the public Multi-Center Dental Panoramic Radiography Image (MCDP) dataset, Teeth-Net achieves a mean Dice coefficient, HD value, precision, and recall of 88.99%, 12.126 mm, 90.61%, and 92.45%, respectively. Compared with the baseline TransUNet model, the mean Dice coefficient, precision, and recall are improved by 3.83, 4.03, and 3.33 percentage points, respectively, while the HD value is reduced by 7.222 mm. The results on the self-built dataset and the two external test datasets demonstrate that Teeth-Net achieves better segmentation accuracy, boundary delineation ability, and cross-domain adaptability than the baseline TransUNet model under different data sources and imaging conditions. The proposed method can provide effective technical support for the assisted diagnosis of multi-class dental diseases in dental panoramic radiographs.