基于SAM的海洋浮游动物实例分割方法

doi:10.19678/j.issn.1000-3428.0253370

摘要/Abstract

摘要： 分割一切模型(SAM)在各种下游任务中得到了广泛的应用。海洋浮游动物物种形态复杂、透明度高、物种尺度大小不一，导致现有的分割模型难以适应从而分割精度较低。此外，缺乏像素级别实例标注的海洋浮游动物图像数据集阻碍了SAM在该领域分割任务中的探索研究。为了解决这些问题，构建一个具有像素级别精细化标注的实例分割数据集MZIS，其中包含25个物种类别与1908张浮游动物图像。针对海洋浮游动物场景进一步提出一种基于SAM的实例分割方法MZIS-SAM。具体来说：首先，为了弥补缺乏的海洋浮游动物语义类别信息，设计了一种浮游动物显微图像自适应的ViT（ZMA-ViT）编码器，提取浮游动物的视觉特征提示并融入网络；接着，设计了一个多尺度膨胀注意力聚合模块（MDAAM），用于整合编码器中的多层特征来增强多尺度特征表达；最后，设计了一个特征提示生成模块（FPGM）来自动生成视觉特征提示，实现端到端的实例分割掩码预测。实验结果表明，相比于现有的方法，MZIS-SAM在MZIS数据集上的、和得分分别达到77.0%、97.7%与85.8%先进水平。

Abstract: The Segment Anything Model(SAM) has been widely applied in diverse downstream tasks.The complexityof species morphology, high transparency, and varying speciessizes of marine zooplankton pose significant challenges to the adaptability of existing segmentation models, often resulting in low segmentation accuracy.Moreover, the lack of datasets of marine zooplankton images has impeded the exploration of SAM for instance segmentation in this field. To address this issue, this paper constructs a Marine Zooplankton Instance Segmentation (MZIS) dataset with pixel-level fine-grained annotations, which contains 1908 zooplankton images of 25 species categories.Furthermore, this research proposes a Marine Zooplankton Instance Segmentation framework based on SAM, called MZIS-SAM for the Zooplankton images. Specifically, to compensate for the lack of semantic category information, MZIS-SAM first introducesaZooplankton Microimages Adaptive ViT(ZMA-ViT) encoder to extract visual feature prompts of zooplankton and incorporate them into the network.Subsequently, to enhance the multi-scale feature representation of zooplankton, a Multi-Scale Dilated Attention Aggregation Module(MDAAM) is designed that to progressively integrate multi-level features from SAM’s encoder.Finally, MZIS-SAM devises a Feature Prompt Generation Module(FPGM) to automatically generate visual feature prompts for end-to-end segmentation.The experimental results on the MZIS dataset show that compared to existing instance segmentation methods, MZIS-SAM achieves state-of-the-art performance with scores of 77.0%, 97.7%, and 85.8% on , , and , respectively.

李忠伟, 聂思远, 王雷全, 袁德坤, 齐衍萍. 基于SAM的海洋浮游动物实例分割方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253370.

ZhongweiLi, SiyuanNie, LeiquanWang, DekunYuan, YanPingQi. Marine Zooplankton Instance Segmentation Method Based on SAM[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253370.

参考文献

[1] FREDERIKSEN M, EDWARDS M, RICHARDSON A J, et al. From plankton to top predators: bottom‐up control of a marine food web across four trophic levels[J]. Journal of Animal Ecology, 2006, 75(6): 1259-1268.
[2] ZHANG WJ, BI H S, WANG D S, et al. Automated zooplankton size measurement using deep learning: overcoming the limitations of traditional methods[J]. Frontiers in Marine Science, 2024, 11: 1341191.
[3] BRANDÃO M C, BENEDETTI F, MARTINI S, et al. Macroscale patterns of oceanic zooplankton composition and size structure[J]. Scientific Reports, 2021, 11(1): 15714.
[4] SCHULZ J, BARZ K, AYON P, et al. Imaging of plankton specimens with the lightframe on-sight keyspecies investigation (LOKI) system[J]. Journal of the European optical society-rapid publications, 2010, 5: 10017s.
[5] COWEN R K, GUIGAND C M. In situ ichthyoplankton imaging system (ISIIS): system design and preliminary results[J]. Limnology and Oceanography: Methods, 2008, 6(2): 126-132.
[6] PICHERAL M, GUIDI L, STEMMANN L, et al. The Underwater Vision Profiler 5: an advanced instrument for high spatial resolution studies of particle size spectra and zooplankton[J]. Limnology and Oceanography: Methods, 2010, 8(9): 462-473.
[7] BOLYA D, ZHOU C, XIAO F Y, et al. Yolact: real-time instance segmentation[C]//Proceedings of the IEEE/CVF international conference on computer vision. Los Alamitos, California, USA: IEEE Computer Society,2019: 9157-9166.
[8] WANG X L, KONG T, SHEN C H, et al. Solo: Segmenting objects by locations[C]//European conference on computer vision. Cham, Switzerland: Springer International Publishing, 2020: 649-665.
[9] TIAN Z, SHEN CH, CHEN H. Conditional convolutions for instance segmentation[C]//European conference on computer vision. Cham, Switzerland: Springer International Publishing, 2020: 282-298.
[10] FANG Y X, YANG S S, WANG X G, et al. Instances as queries[C]//Proceedings of the IEEE/CVF international conference on computer vision. Los Alamitos, CA, USA: IEEE Computer Society, 2021: 6910-6919.
[11] CHENG T H, WANG X G, CHEN S Y, et al. Sparse instance activation for real-time instance segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.Los Alamitos, CA, USA: IEEE Computer Society, 2022: 4433-4442.
[12] CHENG B W, MISRA I, SCHWING A G, et al. Masked-attention mask transformer for universal image segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Los Alamitos, USA: IEEE Computer Society, 2022: 1290-1299.
[13] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision.Los Alamitos, USA: IEEE Computer Society, 2017: 2961-2969.
[14] CHEN K, PANG J M, WANG J Q, et al. Hybrid task cascade for instance segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Los Alamitos, USA: IEEE Computer Society,2019: 4974-4983.
[15] QIAO S Y, CHEN L C, YUILLE A. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.Los Alamitos, USA: IEEE Computer Society, 2021: 10213-10224.
[16] VU T, KANG H, YOO C D. Scnet: Training inference sample consistency for instance segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2021, 35(3): 2701-2709.
[17] TIAN Z, SHEN C H, WANG X L, et al. Boxinst: High-performance instance segmentation with box annotations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.Los Alamitos, USA: IEEE Computer Society, 2021: 5443-5452.
[18] ACHIAM J, ADLER S, AGARWAL S, et al. Gpt-4 technical report[EB/OL]. [2025-08-23]. https://arxiv.org/abs/2303.08774.
[19] OQUAB M, DARCET T, MOUTAKANNI T, et al. Dinov2: Learning robust visual features without supervision[EB/OL]. [2025-09-12].https://arxiv.org/abs/2304.07193
[20] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//International conference on machine learning. Cambridge, MA, USA: PmLR, 2021: 8748-8763.
[21] KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]//Proceedings of the IEEE/CVF international conference on computer vision.Los Alamitos, USA: IEEE Computer Society, 2023: 4015-4026.
[22] MA J, HE Y T, LI F F, et al. Segment anything in medical images[J]. Nature Communications, 2024, 15(1): 654.
[23] SHAHARABANY T, DAHAN A, GIRYES R, et al. Autosam: Adapting sam to medical images by overloading the prompt encoder[EB/OL]. [2025-09-12]. https://arxiv.org/abs/2306.06370.
[24] WU J D, WANG Z Y, HONG M X, et al. Medical sam adapter: Adapting segment anything model for medical image segmentation[J]. Medical image analysis, 2025, 102: 103547.
[25] OSCO L P, WU Q S, LEMOS E L D, et al. The segment anything model (sam) for remote sensing applications: from zero to one shot[J]. International Journal of Applied Earth Observation and Geoinformation, 2023, 124: 103540.
[26] RAFI I S, LI C Y, ZHU H, et al. GeoSAM: fine-tuning SAM with sparse and dense visual prompting for automated segmentation of mobility infrastructure[EB/OL]. [2025-09-12]. https://ui.adsabs.harvard.edu/abs/2023arXiv231111319I/abstract.
[27] CHEN K Y, LIU C Y, CHEN H, et al. RSPrompter: learning to prompt for remote sensing instance segmentation based on visual foundation model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 1-17.
[28] LIAN S J, ZHANG Z Y, LI H, et al. Diving into underwater: Segment anything model guided underwater salient instance segmentation and a large-scale dataset[EB/OL]. [2025-09-12]. https://arxiv.org/abs/2406.06039.
[29] CEN J Z, ZHOU Z W, FANG J M, et al. Segment anything in 3d with nerfs[J]. Advances in Neural Information Processing Systems, 2023, 36: 25971-25990.
[30] ZHANG R R, JIANG Z K, GUO Z Y, et al. Personalize segment anything model with one shot[EB/OL]. [2025-09-12]. https://arxiv.org/abs/2305.03048.
[31] LUO M Y, ZHANG T, WEI SQ, et al. SAM-RSIS: progressively adapting SAM with box prompting to remote sensing image instance segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024.
[32] XIE Z Z, GUAN B C, JIANG W H, et al. PA-SAM: Prompt adapter SAM for high-quality image segmentation[C]//2024 IEEE International Conference on Multimedia and Expo (ICME). Piscataway, NJ, USA: IEEE, 2024: 1-6.
[33] MINAEE S, BOYKOV Y, PORIKLI F, et al. Image segmentation using deep learning: a survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 44(7): 3523-3542.
[34] REN SQ, HE K M, GIRSHICK R, et al. Faster r-cnn: towards real-time object detection with region proposal IEEE/CVF conference on computer vision and pattern recognition.Los Alamitos, USA: IEEE Computer Society, 2021: 10213-10224.

选择文件类型/文献管理软件名称

选择包含的内容