Metro Platform Foreign Object Detection Based on Dual-channel Transformer

doi:10.19678/j.issn.1000-3428.0067217

Abstract

Abstract:

Recently, Transformers have achieved more competitive results than Convolutional Neural Network(CNN) in foreign object detection owing to their global self-attention advantages. However, they still face problems such as high computing costs, a fixed scale of input image patches, and less interaction between local and global information. To address the aforementioned challenges, a DualFormer model that incorporates a dual-channel Transformer backbone, pyramid lightweight Transformer blocks, and a channel cross-attention mechanism is proposed. The model aims to detect foreign objects in the gap between metro platform screen and train doors. A dual-channel strategy is proposed to address the fixed input image patch size issue by designing two feature extraction channels to extract features from input image patches of various scales, thus improving the ability of the network to extract both coarse-grained and fine-grained features and enhancing the recognition accuracy of multiscale targets. To address the issue of high computational cost, a pyramid lightweight Transformer block is proposed, which introduces cascaded convolution into the Multi-Head Self-Attention(MHSA) module and leverages the dimensionality compression capability of the convolution to decrease the computational cost of the model. Regarding the issue of limited interaction between local and global information, a channel cross-attention mechanism is proposed, which allows coarse-grained and fine-grained features to interact at the channel level and optimizes the weight allocation of local and global information in the network. The results demonstrate that DualFormer has a mean average precision of 89.7% on the standardized metro anomaly detection dataset with a detection speed of 24 frame/s and 1.98×10⁷ model parameters, which is superior to those of existing Transformer detection algorithms.

Key words: Vision Transformer(ViT), foreign object detection, dual-channel strategy, pyramid lightweight Transformer block, attention fusion

摘要：

Transformer因其全局注意力优势在异物检测上取得了比卷积神经网络(CNN)更具竞争力的结果, 但依然面临计算成本高、输入图像块尺寸固定、局部与全局信息交互匮乏等问题。提出一种基于双通道Transformer骨干网络、金字塔轻量化Transformer块和通道交叉注意力机制的DualFormer模型, 用以检测地铁站台屏蔽门与列车门间隙中存在的异物。针对输入图像块尺寸固定的问题, 提出双通道策略, 通过设计2种不同的特征提取通道对不同尺度的输入图像块进行特征提取, 增强网络对粗、细粒度特征的提取能力, 提高对多尺度目标的识别精度; 针对计算成本高的问题, 构建金字塔轻量化Transformer块, 将级联卷积引入到多头自注意力(MHSA)模块中, 并利用卷积的维度压缩能力来降低模型的计算成本; 针对局部与全局信息交互匮乏的问题, 提出通道交叉注意力机制, 利用提取到的粗细粒度特征在通道层面进行交互, 优化局部与全局信息在网络中的权重。在标准化地铁异物检测数据集上的实验结果表明, DualFormer模型参数量为1.98×10⁷, 实现了89.7%的精度和24帧/s的速度, 优于对比的Transformer检测算法。

关键词: 视觉Transformer, 异物检测, 双通道策略, 金字塔轻量化Transformer块, 注意力融合

Ruikang LIU, Weiming LIU, Mengfei DUAN, Wei XIE, Yuan DAI. Metro Platform Foreign Object Detection Based on Dual-channel Transformer[J]. Computer Engineering, 2024, 50(4): 197-207.

刘瑞康, 刘伟铭, 段梦飞, 谢玮, 戴愿. 基于双通道Transformer的地铁站台异物检测[J]. 计算机工程, 2024, 50(4): 197-207.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0067217

http://www.ecice06.com/EN/Y2024/V50/I4/197

Figures/Tables 12

Fig.1 The architecture of DualFormer network

Fig.2 Pyramid lightweight Transformer block

Fig.3 Channel cross-attention mechanism

Fig.4 Risk spatial structure diagram between metro platform train door and screen door

Fig.5 Sample images in MAD dataset

Fig.6 Partial test results

References 35

1	王晓燕. 无人驾驶系统中屏蔽门和列车车门故障应对探讨. 铁路通信信号工程技术, 2017, 14(1): 66- 68. URL
	WANG X Y. Discussion on troubleshooting of screen door and train door in driverless system. Railway Signalling & Communication Engineering, 2017, 14(1): 66- 68. URL
2	刘伟铭, 陈纲梅, 李海玉, 等. 地铁风险空间分析及异物检测系统技术要求. 铁道标准设计, 2019, 63(10): 168- 176. URL
	LIU W M, CHEN G M, LI H Y, et al. Risk space analysis and technical requirements for foreign object detection system. Railway Standard Design, 2019, 63(10): 168- 176. URL
3	王瑞峰, 杨子河, 孔维珍. 红外光幕在地铁屏蔽门障碍物探测中的研究. 传感器与微系统, 2013, 32(3): 25- 28. URL
	WANG R F, YANG Z H, KONG W Z. Research on infrared light screen in obstacle detection of subway platform screen doors. Transducer and Microsystem Technologies, 2013, 32(3): 25- 28. URL
4	凌人, 连奇幸, 何悦海, 等. 基于激光扫描的地铁站台门与列车间隙异物检测研究. 机车电传动, 2020,(3): 60- 62. URL
	LING R, LIAN Q X, HE Y H, et al. Study on foreign objects detection in the gap between platform edge door and train based on laser scanning. Electric Drive for Locomotives, 2020,(3): 60- 62. URL
5	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2012: 1097-1105.
6	孔德龙, 蒲凡. 基于深度残差神经网络的地铁站台门与列车门间异物自动检测方法研究. 城市轨道交通研究, 2021, 24(12): 66- 70. URL
	KONG D L, PU F. Research on automatic detection method of foreign objects between platform screen door and train door based on deep residual neural network. Urban Mass Transit, 2021, 24(12): 66- 70. URL
7	ZHENG Z X, LIU W M, LIU R K, et al. Anomaly detection of metro station tracks based on sequential updatable anomaly detection framework. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7677- 7691. doi: 10.1109/TCSVT.2022.3181452
8	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2023-02-01]. https://arxiv.org/abs/2010.11929v1.
9	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37.
10	GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2023-02-01]. https://www.bilibili.com/read/cv15759865/.
11	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318- 327. doi: 10.1109/TPAMI.2018.2858826
12	FENG C J, ZHONG Y J, GAO Y, et al. TOOD: task-aligned one-stage object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C. , USA: IEEE Press, 2021: 3490-3499.
13	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
14	HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C. , USA: IEEE Press, 2017: 2980-2988.
15	CAI Z W, VASCONCELOS N. Cascade R-CNN: high quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1483- 1498. doi: 10.1109/TPAMI.2019.2956516
16	SUN P Z, ZHANG R F, JIANG Y, et al. Sparse R-CNN: end-to-end object detection with learnable proposals[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2021: 14449-14458.
17	REDON J, FARHADI A, YOLOv3: an incremental improvement[EB/OL]. [2023-02-01]. https://pjreddie.com/media/files/papers/YOLOv3.pdf.
18	ZHANG H Y, WANG Y, DAYOUB F, et al. VarifocalNet: an IoU-aware dense object detector[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2021: 8514-8523.
19	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//Proceedings of ECCV 2020. Berlin, Germany: Springer, 2020: 213-229.
20	LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2021: 9992-10002.
21	TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[C]//Proceedings of International Conference on Machine Learning. New York, USA: ACM Press, 2021: 10347-10357.
22	WU H P, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision transformers[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C. , USA: IEEE Press, 2021: 22-31.
23	ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[C]//Proceedings of International Conference on Learning Representations. New York, USA: ACM Press, 2021: 1-16.
24	WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C. , USA: IEEE Press, 2021: 548-558.
25	WU Y H, LIU Y, ZHAN X, et al. P2T: pyramid pooling transformer for scene understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 12760- 12771. doi: 10.1109/TPAMI.2022.3202765ef.org/guestquery?queryType=xml&restype=unixref&xml=\|Ann N Y Acad Sci\|\|922\|\|1\|2000\|\|\|
26	刘伟铭, 温俊锐, 郑仲星, 等. 适用于地铁异物前景检测的神经网络——DifferentNet. 华南理工大学学报(自然科学版), 2021, 49(10): 11-21, 40. doi: 10.12141/j.issn.1000-565X.200671
	LIU W M, WEN J R, ZHENG Z X, et al. DifferentNet: neural network for foreign objects foreground detection in metro. Journal of South China University of Technology(Natural Science Edition), 2021, 49(10): 11-21, 40. doi: 10.12141/j.issn.1000-565X.200671
27	LIU R K, LIU W M, LI H Y, et al. Metro anomaly detection based on light strip inductive key frame extraction and MAGAN network. IEEE Transactions on Instrumentation Measurement, 2022, 71, 5000214.
28	DAI Y, LIU W M, WANG H, et al. YOLO-former: marrying YOLO and transformer for foreign object detection. IEEE Transactions on Instrumentation Measurement, 2022, 71, 5026114.
29	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2017: 2117-2125.
30	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2023-02-01]. https://arxiv.org/abs/1804.02767.
31	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2016: 770-778.
32	BA J L, KIROS J R, HINTON G E. Layer normalization[EB/OL]. [2023-02-01]. https://arxiv.org/abs/1607.06450.
33	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of ECCV 2014. Berlin, Germany: Springer, 2014: 740-755.
34	CHEN K, WANG J, PANG J, et al. MMDetection: open MMLab detection toolbox and benchmark[EB/OL]. [2023-02-01]. https://arxiv.org/abs/1906.07155.
35	LIU Z, MAO H Z, WU C Y, et al. A ConvNet for the 2020s[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C. , USA: IEEE Press, 2022: 11976-11986.

[1]	Kuan WANG, Shibin XUAN, Xuedong HE, Ziwei LI, Jiaxiang LI. Human Pose Estimation Method Based on Cross Attention Transformer [J]. Computer Engineering, 2023, 49(7): 223-231.
[2]	ZHAO Hong, CHEN Zhiwen, GUO Lan, AN Dong. Video Content Caption Generation Based on ViT and Semantic Guidance [J]. Computer Engineering, 2023, 49(5): 247-254.
[3]	SUN Yi, GAO Jian, GU Yijun. Malicious Encrypted Traffic Detection Integrating One-Dimensional Inception Structure and ViT [J]. Computer Engineering, 2023, 49(1): 154-162.
[4]	ZHAO Guochuan, WANG Heng, ZHANG Hua, PANG Jie, ZHOU Jian. Hydropower Complex Defect Recognition Method Based on Pure Self-Attention [J]. Computer Engineering, 2022, 48(9): 277-285.
[5]	REN Lili, BIAN Xuan, WANG Guanglei, WANG Hongrui. Polyp Segmentation Network GLIA-Net Based on Deep Learning [J]. Computer Engineering, 2022, 48(12): 248-254.

Please choose a citation manager

Content to export