| 1 | HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2017: 2980-2988. 10.1109/TPAMI.2018.2844175 | 
																													
																						| 2 | HUANG Z J, HUANG L C, GONG Y C, et al. Mask scoring R-CNN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2019: 6402-6411. 10.48550/arXiv.1903.00241 | 
																													
																						| 3 | CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2018: 6154-6162. 10.1109/CVPR.2018.00644 | 
																													
																						| 4 | BOLYA D, ZHOU C, XIAO F, et al. YOLACT: real-time instance segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2019: 9156-9165. 10.1109/ICCV.2019.00925 | 
																													
																						| 5 |  BOLYA D ,  ZHOU C ,  XIAO F Y , et al.  YOLACT++: better real-time instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (2): 1108- 1121.  doi: 10.1109/TPAMI.2020.3014297
 | 
																													
																						| 6 |  | 
																													
																						| 7 |  | 
																													
																						| 8 |  TIAN Z ,  ZHANG B W ,  CHEN H , et al.  Instance and panoptic segmentation using conditional convolutions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (1): 669- 680.  doi: 10.1109/TPAMI.2022.3145407
 | 
																													
																						| 9 | CHENG B W, MISRA L, SCHWING A G, et al. Masked-attention mask transformer for universal image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2022: 1280-1289. 10.1109/CVPR52688.2022.00135 | 
																													
																						| 10 | CHENG B, SCHWING A G, KIRILLOV A. Per-pixel classification is not all you need for semantic segmentation[EB/OL]. [2023-05-10]. https://arxiv.org/abs/2107.06278 . | 
																													
																						| 11 | TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2019: 9626-9635. 10.48550/arXiv.1904.01355 | 
																													
																						| 12 |  WU J Z ,  LIU B ,  ZHANG H , et al.  Fault detection based on Fully Convolutional Networks (FCN). Journal of Marine Science and Engineering, 2021, 9 (3): 259.  doi: 10.3390/jmse9030259
 | 
																													
																						| 13 | LIANG F, WU B C, DAI X L, et al. Open-vocabulary semantic segmentation with mask-adapted CLIP[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2023: 7061-7070. | 
																													
																						| 14 | ZAREIAN A, DELA ROSA K, HU D H, et al. Open-vocabulary object detection using captions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2021: 14388-14397. | 
																													
																						| 15 | ZHANG Z, ZHAO Z, LIN Z, et al. Counterfactual contrastive learning for weakly-supervised vision-language grounding[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2020: 18123-18134. | 
																													
																						| 16 | DAI X Y, CHEN Y P, YANG J W, et al. Dynamic DETR: end-to-end object detection with dynamic attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2021: 2968-2977. 10.1109/ICCV48922.2021.00298 | 
																													
																						| 17 | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//Proceedings of MICCAI 2015. Berlin, Germany: Springer, 2015: 234-241. 10.1007/978-3-319-24574-4_28 | 
																													
																						| 18 | LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D. C., USA: IEEE Press, 2021: 9992-10002. 10.1109/ICCV48922.2021.00986 | 
																													
																						| 19 |  | 
																													
																						| 20 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2023-05-10]. https://arxiv.org/abs/2010.11929 . | 
																													
																						| 21 |  FANG W ,  CHEN Y P ,  XUE Q Y .  Survey on research of RNN-based spatio-temporal sequence prediction algorithms. Journal on Big Data, 2021, 3 (3): 97- 110.  doi: 10.32604/jbd.2021.016993
 | 
																													
																						| 22 |  SMAGULOVA K ,  JAMES A P .  A survey on LSTM memristive neural network architectures and applications. The European Physical Journal Special Topics, 2019, 228 (10): 2313- 2324.  doi: 10.1140/epjst/e2019-900046-x
 | 
																													
																						| 23 |  HAN K ,  WANG Y H ,  CHEN H T , et al.  A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (1): 87- 110.  doi: 10.1109/TPAMI.2022.3152247
 | 
																													
																						| 24 |  REN S Q ,  HE K M ,  GIRSHICK R , et al.  Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149.  doi: 10.1109/TPAMI.2016.2577031
 | 
																													
																						| 25 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2016: 770-778. 10.1109/CVPR.2016.90 | 
																													
																						| 26 | TIAN Z, SHEN C H, CHEN H. Conditional convolutions for instance segmentation[C]//Proceedings of European Conference on Computer Vision (ECCV). Berlin, Germany: Springer, 2020: 282-298. 10.48550/arXiv.2003.05664 | 
																													
																						| 27 | KIRILLOV A, WU Y X, HE K M, et al. PointRend: image segmentation as rendering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2020: 9796-9805. | 
																													
																						| 28 | LI F, ZHANG H, LIU S L, et al. DN-DETR: accelerate DETR training by introducing query denoising[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2022: 1-10. 10.48550/arXiv.2203.01305 | 
																													
																						| 29 | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of European Conference on Computer Vision (ECCV). Berlin, Germany: Springer, 2014: 740-755. 10.48550/arXiv.1405.0312 | 
																													
																						| 30 | CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2016: 3213-3223. 10.48550/arXiv.1604.01685 | 
																													
																						| 31 |  | 
																													
																						| 32 | GHIASI G, CUI Y, SRINIVAS A, et al. Simple copy-paste is a strong data augmentation method for instance segmentation[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2021: 2917-2927. 10.48550/arXiv.2012.07177 | 
																													
																						| 33 |  | 
																													
																						| 34 |  |