Survey of Text-based Visual Question Answering

doi:10.19678/j.issn.1000-3428.0067514

Abstract

Abstract:

Traditional Visual Question Answering(VQA)only focuses on the visual object information in the image, ignoring the text information in the image. In addition to visual information, Text-based Visual Question Answering (TextVQA)also focuses on the text information in the image, which can answer questions more accurately and efficiently. In recent years, TextVQA has become a research focal point in the field of multimodality, and it has important application prospects in the field of scenes containing text information, such as automatic driving and scene understanding. This paper describes the concept of TextVQA and the existing problems and challenges, and makes a systematic analysis of TextVQA tasks from the aspects of methods, datasets, and future research directions. This study focuses on the analysis of the existing research methods of TextVQA, and summarizes them into three stages, namely, feature extraction, feature fusion, and answer prediction. According to the different methods used in the fusion stage, the TextVQA methods are described from three aspects: simple attention, Transformer-based, and pre-training methods. The advantages and disadvantages of different methods are summarized, and the performance of existing methods in public datasets is analyzed and compared. Four common public datasets are introduced, and their characteristics and evaluation metrics are analyzed. Finally, this paper discusses the problems and challenges facing the TextVQA task, and discusses the future research directions.

Key words: Text-based Visual Question Answering(TextVQA), text information, natural language processing, computer vision, multimodal fusion

摘要：

传统视觉问答（VQA）大多只关注图像中的视觉对象信息，忽略了对图像中文本信息的关注。文本视觉问答(TextVQA)除了视觉信息外还关注了图像中的文本信息，能够更加准确并高效地回答问题。近年来，TextVQA已经成为多模态领域的研究热点，在自动驾驶、场景理解等包含文本信息的场景中有重要的应用前景。阐述TextVQA的概念以及存在的问题与挑战，从方法、数据集、未来研究方向等方面对TextVQA任务进行系统性的分析。总结现有的TextVQA研究方法，并将其归纳为3个阶段，分别为特征提取阶段、特征融合阶段和答案预测阶段。根据融合阶段使用方法的不同，从简单注意力方法、基于Transformer方法和基于预训练方法这3个方面对TextVQA方法进行阐述，分析对比不同方法的特点以及在公开数据集中的表现。介绍TextVQA领域4种常用的公共数据集，并对它们的特点和评价指标进行分析。在此基础上，探讨当前TextVQA任务中存在的问题与挑战，并对该领域未来的研究方向进行展望。

关键词: 文本视觉问答, 文本信息, 自然语言处理, 计算机视觉, 多模态融合

Guide ZHU, Hai HUANG. Survey of Text-based Visual Question Answering[J]. Computer Engineering, 2024, 50(2): 1-14.

朱贵德, 黄海. 文本视觉问答综述[J]. 计算机工程, 2024, 50(2): 1-14.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0067514

http://www.ecice06.com/EN/Y2024/V50/I2/1

Figures/Tables 13

Fig.1 Text-based visual question answering model framework

Fig.2 Text-based visual question answering task-related models classification

Fig.3 Simple attention method framework

Fig.4 Vision-text fusion method framework

Fig.5 M4C model framework

Fig.6 Framework of joint graph reasoning model

Fig.7 Examples of constructing answer regions in images and generating text explanations

Fig.8 EKTVQA model framework

Fig.9 TAP model framework

Fig.10 LaTr model framework

References 88

1	ANTOL S, AGRAWAL A, LU J, et al. VQA: visual question answering. International Journal of Computer Vision, 2015, 123(1): 4- 31.
2	SINGH A, NATARAJAN V, SHAH M, et al. Towards VQA models that can read[EB/OL]. [2023-03-05]. http://arxiv.org/abs/1904.08920v1.
3	SHIH K J, SINGH S, HOIEM D. Where to look: focus regions for visual question answering[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 4613-4621.
4	YANG Z C, HE X D, GAO J F, et al. Stacked attention networks for image question answering[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 21-29.
5	LI L J, GAN Z, CHENG Y, et al. Relation-aware graph attention network for visual question answering[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 10312-10321.
6	KAFLE K, KANAN C. Visual question answering: datasets, algorithms, and future challenges. Computer Vision and Image Understanding, 2017, 163, 3- 20. doi: 10.1016/j.cviu.2017.06.005
7	WU Q, TENEY D, WANG P, et al. Visual question answering: a survey of methods and datasets. Computer Vision and Image Understanding, 2017, 163, 21- 40. doi: 10.1016/j.cviu.2017.05.001
8	包希港, 周春来, 肖克晶, 等. 视觉问答研究综述. 软件学报, 2021, 32(8): 2522- 2544.
	BAO X G, ZHOU C L, XIAO K J, et al. Survey on visual question answering. Journal of Software, 2021, 32(8): 2522- 2544.
9	MANMADHAN S, KOVOOR B C. Visual question answering: a state-of-the-art review. Artificial Intelligence Review, 2020, 53(8): 5705- 5745. doi: 10.1007/s10462-020-09832-7
10	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
11	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-03-05]. https://arxiv.org/pdf/1804.02767.pdf.
12	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1409.1556.
13	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
14	SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 12-24.
15	SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1512.00567.
16	SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 4510-4520.
17	CHEN Y D, WANG W, ZHOU Y, et al. Self-training for domain adaptive scene text detection[C]//Proceedings of the 25th International Conference on Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 850-857.
18	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 2980-2988.
19	BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1904.01941.
20	LI H, WANG P, SHEN C H, et al. Show, attend and read: a simple and strong baseline for irregular text recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 8610- 8617. doi: 10.1609/aaai.v33i01.33018610
21	BORISYUK F, GORDO A, SIVAKUMAR V. Rosetta: large scale system for text detection and recognition in images[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, USA: ACM Press, 2018: 71-79.
22	SHI B G, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298- 2304. doi: 10.1109/TPAMI.2016.2646371
23	BOJANOWSKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 2017, 5, 135- 146. doi: 10.1162/tacl_a_00051
24	ALMAZÁN J, GORDO A, FORNÉS A, et al. Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(12): 2552- 2566. doi: 10.1109/TPAMI.2014.2339814
25	HOCHREITER S, SCHMIDHUBER J. Long short-term memory. Neural Computation, 1997, 9(8): 1735- 1780. doi: 10.1162/neco.1997.9.8.1735
26	PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation[EB/OL]. [2023-03-05]. https://aclanthology.org/D14-1162.pdf.
27	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2023-03-05]. https://aclanthology.org/N19-1423.pdf.
28	ZHOU B, TIAN Y D, SUKHBAATAR S, et al. Simple baseline for visual question answering[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1512.02167.
29	KAFLE K, KANAN C. Answer-type prediction for visual question answering[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 4976-4984.
30	GAO D F, LI K, WANG R P, et al. Multi-modal graph neural network for joint reasoning on vision and scene text[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 12743-12753.
31	VASWANI A, SHAZEER N M, PARMAR N, et al. Attention is all you need[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1706.03762.
32	HU R H, SINGH A, DARRELL T, et al. Iterative answer prediction with pointer-augmented multimodal transformers for TextVQA[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 9989-9999.
33	KANT Y, BATRA D, ANDERSON P, et al. Spatially aware multimodal transformers for TextVQA[EB/OL]. [2023-03-05]. https://arxiv.org/abs/2007.12146.
34	ZHU Q, GAO C Y, WANG P, et al. Simple is not easy: a simple strong baseline for TextVQA and TextCaps. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(4): 3608- 3615. doi: 10.1609/aaai.v35i4.16476
35	YANG Z Y, LU Y J, WANG J F, et al. TAP: text-aware pre-training for text-VQA and text-caption[EB/OL]. [2023-03-05]. https://arxiv.org/abs/2012.04638.
36	JIN Z X, SHOU M Z, ZHOU F, et al. From token to word: OCR token evolution via contrastive learning and semantic matching for text-VQA[C]//Proceedings of the 30th ACM International Conference on Multimedia. New York, USA: ACM Press, 2022: 4564-4572.
37	BITEN A F, LITMAN R, XIE Y S, et al. LaTr: layout-aware Transformer for scene-text VQA[EB/OL]. [2023-03-05]. https://arxiv.org/abs/2112.12494.
38	ANDERSON P, HE X D, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1707.07998.
39	SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model. IEEE Transactions on Neural Networks, 2009, 20(1): 61- 80. doi: 10.1109/TNN.2008.2005605
40	GU J T, LU Z D, LI H, et al. Incorporating copying mechanism in sequence-to-sequence learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. [S. l.]: Association for Computational Linguistics, 2016: 1631-1640.
41	WU J J, DU J, WANG F R, et al. A multimodal attention fusion network with a dynamic vocabulary for TextVQA. Pattern Recognition, 2022, 122, 108214. doi: 10.1016/j.patcog.2021.108214
42	GÓMEZ L, BITEN A F, TITO R, et al. Multimodal grid features and cell pointers for scene text visual question answering. Pattern Recognition Letters, 2021, 150, 242- 249. doi: 10.1016/j.patrec.2021.06.026
43	SHARMA H, JALAL A S. Improving visual question answering by combining scene-text information. Multimedia Tools and Applications, 2022, 81(9): 12177- 12208. doi: 10.1007/s11042-022-12317-0
44	GÓMEZ L, MAFLA A, RUSIÑOL M, et al. Single shot scene text retrieval[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 728-744.
45	PERRONNIN F, DANCE C. Fisher kernels on visual vocabularies for image categorization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2007: 1-8.
46	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks. Advances in Neural Information Processing Systems, 2015,(1): 2692- 2700.
47	YAO T, PAN Y W, LI Y H, et al. Exploring visual relationship for image captioning[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 711-727.
48	RAO V N, ZHEN X, HOVSEPIAN K, et al. A first look: towards explainable TextVQA models via visual and textual explanations[EB/OL]. [2023-03-05]. https://arxiv.org/abs/2105.02626.
49	VELICKOVIC P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1710.10903.
50	GAO C Y, ZHU Q, WANG P, et al. Structured multimodal attentions for TextVQA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 9603- 9614. doi: 10.1109/TPAMI.2021.3132034
51	LIU Y L, ZHANG S, JIN L W, et al. Omnidirectional scene text detection with sequential-free box discretization[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. New York, USA: ACM Press, 2019: 3052-3058.
52	YANG L, WANG P, LI H, et al. A holistic representation guided attention network for scene text recognition. Neurocomputing, 2020, 414, 67- 75. doi: 10.1016/j.neucom.2020.07.010
53	LIU F, XU G H, WU Q, et al. Cascade reasoning network for text-based visual question answering[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM Press, 2020: 4060-4069.
54	YANG Z, XUAN J, LIU Q, et al. Modality-specific multimodal global enhanced network for text-based visual question answering[C]//Proceedings of IEEE International Conference on Multimedia and Expo. Washington D. C., USA: IEEE Press, 2022: 1-6.
55	GAO P, JIANG Z, YOU H, et al. Dynamic fusion with intra-and inter-modality attention flow for visual question answering[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 6639-6648.
56	YU Z, YU J, CUI Y H, et al. Deep modular co-attention networks for visual question answering[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 6281-6290.
57	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 618-626.
58	PARK D H, HENDRICKS L A, AKATA Z, et al. Multimodal explanations: justifying decisions and pointing to the evidence[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1802.08129.
59	WU J L, MOONEY R. Faithful multimodal explanation for visual question answering[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1809.02805.
60	HAN W, HUANG H T, HAN T. Finding the evidence: localization-aware answer prediction for text visual question answering[C]//Proceedings of the 28th International Conference on Computational Linguistics. Stroudsburg, USA: International Committee on Computational Linguistics, 2020: 3118-3131.
61	ZHANG X Y, YANG Q. Position-augmented transformers with entity-aligned mesh for TextVQA[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM Press, 2021: 2519-2528.
62	ZENG G Y, ZHANG Y, ZHOU Y, et al. Beyond OCR+VQA: involving OCR into the flow for robust and accurate TextVQA[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM Press, 2021: 376-385.
63	QIAO Z, ZHOU Y, YANG D B, et al. SEED: semantics enhanced encoder-decoder framework for scene text recognition[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 13528-13537.
64	GORDO A, ALMAZAN J, MURRAY N, et al. LEWIS: latent embeddings for word images and their semantics[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2015: 1242-1250.
65	XU C S, XU Z L, HE Y F, et al. Multi-modal learning with text merging for TEXTVQA[EB/OL]. [2023-03-05]. https://ieeexplore.ieee.org/document/9746969/.
66	FANG C Y, ZENG G Y, ZHOU Y, et al. Towards escaping from language bias and OCR error: semantics-centered text visual question answering[C]//Proceedings of IEEE International Conference on Multimedia and Expo. Washington D. C., USA: IEEE Press, 2022: 1-6.
67	SHAH S, MISHRA A, YADATI N, et al. KVQA: knowledge-aware visual question answering. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 8876- 8884. doi: 10.1609/aaai.v33i01.33018876
68	NARASIMHAN M, SCHWING A G. Straight to the facts: learning knowledge base retrieval for factual visual question answering[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1809.01124.
69	SINGH A K, MISHRA A, SHEKHAR S, et al. From strings to things: knowledge-enabled VQA model that can read and reason[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 4602-4612.
70	MARINO K, RASTEGARI M, FARHADI A, et al. OK-VQA: a visual question answering benchmark requiring external knowledge[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 3195-3204.
71	YE K R, ZHANG M D, KOVASHKA A. Breaking shortcuts by masking for robust visual reasoning[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision. Washington D. C., USA: IEEE Press, 2021: 3520-3530.
72	LI G H, WANG X, ZHU W W. Boosting visual question answering with context-aware knowledge aggregation[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM Press, 2020: 1227-1235.
73	DEY A U, VALVENY E, HARIT G. EKTVQA: generalized use of external knowledge to empower scene text in text-VQA. IEEE Access, 2022, 10, 72092- 72106. doi: 10.1109/ACCESS.2022.3186471
74	CHEN F L, ZHANG D Z, HAN M L, et al. VLP: a survey on vision-language pre-training. Machine Intelligence Research, 2023, 20(1): 38- 56. doi: 10.1007/s11633-022-1369-5
75	ZHOU L W, PALANGI H, ZHANG L, et al. Unified vision-language pre-training for image captioning and VQA. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 13041- 13049. doi: 10.1609/aaai.v34i07.7005
76	WANG J F, YANG Z Y, HU X W, et al. GIT: a generative image-to-text transformer for vision and language[EB/OL]. [2023-03-05]. https://arxiv.org/abs/2205.14100.
77	ZHANG P C, LI X J, HU X W, et al. VinVL: revisiting visual representations in vision-language models[EB/OL]. [2023-03-05]. https://arxiv.org/abs/2101.00529.
78	MA W T, CUI Y M, SI C L, et al. CharBERT: character-aware pre-trained language model[EB/OL]. [2023-03-05]. https://arxiv.org/abs/2011.01513.
79	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2023-03-05]. https://arxiv.org/abs/2010.11929v1.
80	KRASIN I, DUERIG T, ALLDRIN N, et al. OpenImages: a public dataset for large-scale multi-label and multi-class image classification[EB/OL]. [2023-03-05]. https://www.researchgate.net/publication/321746637_OpenImages_A_public_dataset_for_large-scale_multi-label_and_multi-class_image_classification.
81	BITEN A F, TITO R, MAFLA A, et al. Scene text visual question answering[EB/OL]. [2023-03-05]. https://arxiv.org/abs/1905.13648.
82	BIGHAM J P, JAYANT C, JI H J, et al. VizWiz: nearly real-time answers to visual questions[C]//Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology. New York, USA: ACM Press, 2010: 333-342.
83	MISHRA A, SHEKHAR S, SINGH A K, et al. OCR-VQA: visual question answering by reading text in images[C]//Proceedings of International Conference on Document Analysis and Recognition. New York, USA: ACM Press, 2019: 947-952.
84	WANG X Y, LIU Y L, SHEN C H, et al. On the general value of evidence, and bilingual scene-text visual question answering[EB/OL]. [2023-03-05]. https://arxiv.org/abs/2002.10215.
85	SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[EB/OL]. [2023-03-05]. http://www.arxiv.org/pdf/1312.6199.pdf.
86	AKHTAR N, MIAN A. Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access, 2018, 6, 14410- 14430. doi: 10.1109/ACCESS.2018.2807385
87	ZHANG W E, SHENG Q Z, ALHAZMI A, et al. Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Transactions on Intelligent Systems and Technology, 2020, 11(3): 1- 24.
88	XU X, CHEN J F, XIAO J H, et al. What machines see is not what they get: fooling scene text recognition models with adversarial text images[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 12301-12311.

[1]	Zhiqiang WU, Qing XIE, Lin LI, Yongjian LIU. Graph Neural Network Recommendation Algorithm Based on Multimodal Fusion [J]. Computer Engineering, 2024, 50(1): 91-100.
[2]	Mengmeng CUI, Jingping LIU, Tong RUAN, Yuqiu SONG, Wen DU. Target-Level Implicit Sentiment Classification Based on Dual Multiview Representation [J]. Computer Engineering, 2024, 50(1): 79-90.
[3]	Hongpeng LI, Bo MA, Yating YANG, Lei WANG, Zhen WANG, Xiao LI. Document-level Event Extraction Method Based on Slot Semantic Enhanced Prompt Learning [J]. Computer Engineering, 2023, 49(9): 23-31.
[4]	Yanxia GUO, Yong JIN, Hong TANG, Jinzhi PENG. Multi-modal Emotion Recognition Based on Dynamic Convolution and Residual Gating [J]. Computer Engineering, 2023, 49(7): 94-101.
[5]	Xingya YAN, Yaxi KUANG, Guangrui BAI, Yue LI. Student Classroom Behavior Recognition Method Based on Deep Learning [J]. Computer Engineering, 2023, 49(7): 251-258.
[6]	Yongsheng QI, Xiaoxu DU, Junfeng ZHU, Shengli GAO, Liqiang LIU. Efficient Livestock Detection in Grazing Areas Based on Enhanced Lightweight Deep Network [J]. Computer Engineering, 2023, 49(7): 278-287.
[7]	ZHOU Yiyun, WAN Xinjun, HU Fuyuan, CHEN Hao. Instance Segmentation Algorithm Based on Joint Attention and Feature Association [J]. Computer Engineering, 2023, 49(6): 217-226.
[8]	LI Jingwen, ZHAO Kui. Password Guessing Method Based on Improved PCFG Algorithm [J]. Computer Engineering, 2023, 49(5): 38-47.
[9]	YANG Wenzhong, DING Tiantian, KANG Peng, BU Wenxiu. Review of Chinese Keyword Extraction Based on Public Opinion News [J]. Computer Engineering, 2023, 49(3): 1-17.
[10]	WEN Jing, YANG Jie. Depth Estimation Based on Scene Object Attention and Depth Map Fusion [J]. Computer Engineering, 2023, 49(2): 222-230.
[11]	YU Ming, ZHONG Yuanxiang, WANG Yan. A Survey of Facial Micro-Expression Analysis Methods [J]. Computer Engineering, 2023, 49(2): 1-14.
[12]	CAI Ruichu, ZHANG Shengqiang, XU Boyan. Method for Generating Code Comments Based on Structure-aware Hybrid Encoding Model [J]. Computer Engineering, 2023, 49(2): 61-69.
[13]	WANG Chundong, SUN Jiaqi, YANG Wenjun. Method for Generating Chinese Text Adversarial Examples Based on Rectification Understanding [J]. Computer Engineering, 2023, 49(2): 37-45.
[14]	LI Kequan, CHEN Yan, LIU Jiachen, MU Xiangwei. Survey of Deep Learning-Based Object Detection Algorithms [J]. Computer Engineering, 2022, 48(7): 1-12.
[15]	SI Yichen, GUAN Youqing. Chinese Named Entity Recognition Model Based on Transformer Encoder [J]. Computer Engineering, 2022, 48(7): 66-72.

Please choose a citation manager

Content to export