[1] Bi Y, Jiang H , Hu Y ,et al.See and Learn More: Dense
Caption-Aware Representation for Visual Question
Answering[J].IEEE Transactions on Circuits and Systems
for Video Technology, 2024,2(34):1135-1146.
[2] 陈巧红,项深祥,方贤,等.跨模态自适应特征融合的视觉
问答方法 [J/OL]. 哈 尔 滨 工 业 大 学 学
报 ,1-13[2025-03-25].http://kns.cnki.net/kcms/detail/23.12
35.T.20250314.1036.004.html.
Chen Q H, Xiang S X, Fang X, et al. A visual question
answering method with cross-modal adaptive feature
fusion [J/OL]. Journal of Harbin Institute of
Technology,1-13[2025-03-25].http://kns.cnki.net/kcms/det
ail/23.1235.T.20250314.1036.004.html. [3] 葛依琳,孙海春,袁得嵛.融合多模态知识与有监督检索
的 视 觉 问 答 模 型 [J/OL]. 计 算 机 科 学 与 探
索,1-17[2025-0325].
Ge Y L, Sun H C, Yuan D Y. A visual question answering
model integrating multimodal knowledge and supervised
retrieval [J/OL]. Journal of Frontiers of Computer Science
and Technology, 1-17 [2025-03-25]. (in Chinese)
[4] 倪琴,刘双,余杨泽,等.基于多角度融合与联合记忆网络
的视频问答认知模型[J].上海师范大学学报(自然科学版
中英文 ),2024,53(05):596-603.DOI:10.20192/j.cnki.
JSHNU(NS).2024.05.003.
Wang Y, Liu M, Wu J, et al. Multi-granularity interaction
and integration network for video question answering[J].
IEEE Transactions on Circuits and Systems for Video
Technology, 2023, 33(12): 7684-7695.
[5] Antol S, Agrawal A, Lu J, et al. Vqa: Visual question
answering[C]//Proceedings of the IEEE international
conference on computer vision. 2015: 2425-2433.
[6] Zhang J, Liu X, Wang Z. Latent Attention Network With
Position Perception for Visual Question Answering[J].
IEEE Transactions on Neural Networks and Learning
Systems, 2024:1-11.
[7] Goyal Y, Khot T, Summers-Stay D, et al. Making the v in
vqa matter: Elevating the role of image understanding in
visual question answering[C]//Proceedings of the IEEE
conference on computer vision and pattern recognition.
2017: 6904-6913.
[8] Gao D, Wang R, Shan S, et al. Cric: A vqa dataset for
compositional reasoning on vision and commonsense[J].
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2022, 45(5): 5561-5578.
[9] Agrawal A, Batra D, Parikh D, et al. Don't just assume;
look and answer: Overcoming priors for visual question
answering[C]//Proceedings of the IEEE conference on
computer vision and pattern recognition. 2018: 4971-4980.
[10] Han X, Wang S, Su C, et al. Greedy gradient ensemble for
robust visual question answering[C]//Proceedings of the
IEEE/CVF international conference on computer vision.
2021: 1584-1593.
[11] Liu J, Fan C F, Zhou F, et al. Be flexible! learn to debias
by sampling and prompting for robust visual question
answering[J]. Information Processing & Management,
2023, 60(3): 103296.
[12] Cho J W, Kim D J, Ryu H, et al. Generative bias for robust
visual question answering[C]//Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2023: 11681-11690.
[13] Nam J, Cha H, Ahn S, et al. Learning from failure:
Training debiased classifier from biased classifier, 2020[J].
URL https://arxiv. org/abs, 2007.
[14] Anderson P, He X, Buehler C, et al. Bottom-up and
top-down attention for image captioning and visual
question answering[C]//Proceedings of the IEEE
conference on computer vision and pattern recognition.
2018: 6077-6086.
[15] Tan H, Bansal M. Lxmert: Learning cross-modality
encoder representations from transformers[J]. arXiv
preprint arXiv:1908.07490, 2019.
[16] Cadene R, Dancette C, et al. Rubi: Reducing unimodal
biases for visual question answering[J]. Advances in
neural information processing systems, 2019, 32.
[17] Chen L, Yan X, Xiao J, et al. Counterfactual samples
synthesizing for robust visual question
answering[C]//Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition. 2020:
10800-10809.
[18] Si Q, et al. Towards robust visual question answering:
Making the most of biased samples via contrastive
learning[J]. arXiv preprint arXiv:2210.04563, 2022.
[19] Liu Y, Guo Y, Yin J, et al. Answer questions with right
image regions: A visual attention regularization
approach[J]. ACM Transactions on Multimedia Computing,
Communications, and Applications (TOMM), 2022, 18(4):
1-18.
[20] Clark C, Yatskar M, Zettlemoyer L. Don't take the easy
way out: Ensemble based methods for avoiding known
dataset biases[J]. arXiv preprint arXiv:1909.03683, 2019.
[21] Han X, Wang S, Su C, et al. General greedy de-bias
learning[J]. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2023, 45(8): 9789-9805.
[22] Kolling C, More M, Gavenski N, et al. Efficient
counterfactual debiasing for visual question
answering[C]//Proceedings of the IEEE/CVF winter
conference on applications of computer vision. 20223001-3010.
[23] Chen L, Zheng Y, Xiao J. Rethinking data augmentation
for robust visual question answering[C]//European
conference on computer vision. Cham: Springer Nature
Switzerland, 2022: 95-112.
[24] Liang Z, Jiang W, Hu H, et al. Learning to contrast the
counterfactual samples for robust visual question
answering[C]//Proceedings of the 2020 conference on
empirical methods in natural language processing
(EMNLP). 2020: 3285-3292.
[25] Zhu X, Mao Z, Liu C, et al. Overcoming language priors
with self-supervised learning for visual question
answering[J]. arXiv preprint arXiv:2012.11528, 2020.
[26] Si Q, Lin Z, Zheng M, et al. Check it again: Progressive
visual question answering via visual entailment[J]. arXiv
preprint arXiv:2106.04605, 2021.
[27] Guo Y, Nie L, Cheng Z, et al. Loss re-scaling VQA:
Revisiting the language prior problem from a
class-imbalance view[J]. IEEE Transactions on Image
Processing, 2021, 31: 227-238.
[28] Basu A, Addepalli S, Babu R V. Rmlvqa: A margin loss
approach for visual question answering with language
biases[C]//Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 2023:
11671-11680.
[29] Han S, Pool J, Tran J, et al. Learning both weights and
connections for efficient neural network[J]. Advances in
neural information processing systems, 2015, 28.
[30] Molchanov P, Tyree S, Karras T, et al. Pruning
convolutional neural networks for resource efficient
inference[J]. arXiv preprint arXiv:1611.06440, 2016.
[31] Liu Z, Li J, Shen Z, et al. Learning efficient convolutional
networks through network slimming[C]//Proceedings of
the IEEE international conference on computer vision.
2017: 2736-2744.
[32] Zhu M, Gupta S. To prune, or not to prune: exploring the
efficacy of pruning for model compression[J]. arXiv
preprint arXiv:1710.01878, 2017.
[33] Frankle J, Carbin M. The lottery ticket hypothesis: Finding
sparse, trainable neural networks[J]. arXiv preprint
arXiv:1803.03635, 2018.
[34] Sanh V, Wolf T, Belinkov Y, et al. Learning from others'
mistakes: Avoiding dataset biases without modeling
them[J]. arXiv preprint arXiv:2012.01300, 2020.
[35] Jang E, Gu S, Poole B. Categorical reparameterization
with gumbel-softmax[J]. arXiv preprint arXiv:1611.01144,
2016.
[36] Girshick R. Fast r-cnn[J]. arXiv preprint arXiv:1504.08083,
2015.
[37] Pennington J, Socher R, Manning C D. Glove: Global
vectors for word representation[C]//Proceedings of the
2014 conference on empirical methods in natural language
processing (EMNLP). 2014: 1532-1543.
[38] Cho K. Learning phrase representations using RNN
encoder-decoder for statistical machine translation[J].
arXiv preprint arXiv:1406.1078, 2014.
[39] Yang Z, He X, Gao J, et al. Stacked attention networks for
image question answering[C]//Proceedings of the IEEE
conference on computer vision and pattern recognition.
2016: 21-29.
[40] Kim J H, Jun J, Zhang B T. Bilinear attention networks[J].
Advances in neural information processing systems, 2018,
31.
[41] Bi Y, Jiang H, Hu Y, et al. See and learn more: Dense
caption-aware representation for visual question
answering[J]. IEEE Transactions on Circuits and Systems
for Video Technology, 2023, 34(2): 1135-1146.
[42] Agrawal A, Batra D, Parikh D, et al. Don't just assume;
look and answer: Overcoming priors for visual question
answering[C]//Proceedings of the IEEE conference on
computer vision and pattern recognition. 2018: 4971-4980.
[43] Bi Y, Jiang H, Hu Y, et al. Fair Attention Network for
Robust Visual Question Answering[J]. IEEE Transactions
on Circuits and Systems for Video Technology, 2024:
1-12.
[44] Pan Y, Liu J, Jin L, et al. Unbiased Visual Question
Answering by Leveraging Instrumental Variable[J]. IEEE
Transactions on Multimedia, 2024.
|