| 1 |
LIN Z H , ZHANG D H , TAO Q Y , et al. Medical visual question answering: a survey. Artificial Intelligence in Medicine, 2023, 143, 102611.
doi: 10.1016/j.artmed.2023.102611
|
| 2 |
ISHMAM M F , SHOVON M S H , MRIDHA M F , et al. From image to language: a critical analysis of Visual Question Answering (VQA) approaches, challenges, and opportunities. Information Fusion, 2024, 106, 102270.
|
| 3 |
|
| 4 |
NGUYEN B D, DO T T, NGUYEN B X, et al. Overcoming data limitation in medical visual question answering[C]//Proceedings of the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention. Berlin, Germany: Springer, 2019: 522-530.
|
| 5 |
FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the International Conference on Machine Learning. [S. l. ]: PMLR, 2017: 1126-1135.
|
| 6 |
MASCI J, MEIER U, CIRESAN D, et al. Stacked convolutional auto-encoders for hierarchical feature extraction[C]//Proceedings of the 21st International Conference on Artificial Neural Networks and Machine Learning. Berlin, Germany: Springer, 2011: 52-59.
|
| 7 |
CONG F Z , XU S B , GUO L , et al. Anomaly matters: an anomaly-oriented model for medical visual question answering. IEEE Transactions on Medical Imaging, 2022, 41 (11): 3385- 3397.
doi: 10.1109/TMI.2022.3185113
|
| 8 |
GONG H F, CHEN G Q, LIU S S, et al. Cross-modal self-attention with multi-task pre-training for medical visual question answering[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval. New York, USA: ACM Press, 2021: 456-460.
|
| 9 |
KHARE Y, BAGAL V, MATHEW M, et al. MMBERT: multimodal BERT pretraining for improved medical VQA[C]//Proceedings of the 18th International Symposium on Biomedical Imaging (ISBI). Washington D.C., USA: IEEE Press, 2021: 1033-1036.
|
| 10 |
|
| 11 |
MOON J H , LEE H , SHIN W , et al. Multi-modal understanding and generation for medical images and text via vision-language pre-training. IEEE Journal of Biomedical and Health Informatics, 2022, 26 (12): 6070- 6080.
doi: 10.1109/JBHI.2022.3207502
|
| 12 |
LI P F, LIU G, TAN L, et al. Self-supervised vision-language pretraining for medial visual question answering[C]//Proceedings of the IEEE 20th International Symposium on Biomedical Imaging (ISBI). Washington D.C., USA: IEEE Press, 2023: 1-5.
|
| 13 |
RÜCKERT J, ABACHA A B, DE HERRERA A G S, et al. Overview of ImageCLEFmedical 2022—caption prediction and concept detection[C]//Proceedings of CEUR Workshop. Berlin, Germany: Springer, 2022: 1294-1307.
|
| 14 |
CHEN Z H , DU Y H , HU J P , et al. Mapping medical image-text to a joint space via masked modeling. Medical Image Analysis, 2024, 91, 103018.
doi: 10.1016/j.media.2023.103018
|
| 15 |
SUBRAMANIAN S, WANG L L, MEHTA S, et al. MedICaT: a dataset of medical images, captions, and textual references[EB/OL]. [2024-05-17]. https://arxiv.org/abs/2010.06000.
|
| 16 |
吴志强, 解庆, 李琳, 等. 基于多模态融合的图神经网络推荐算法. 计算机工程, 2024, 50 (1): 91- 100.
doi: 10.19678/j.issn.1000-3428.0066929
|
|
WU Z Q , XIE Q , LI L , et al. Graph neural network recommendation algorithm based on multimodal fusion. Computer Engineering, 2024, 50 (1): 91- 100.
doi: 10.19678/j.issn.1000-3428.0066929
|
| 17 |
ZHAN L M, LIU B, FAN L, et al. Medical visual question answering via conditional reasoning[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM Press, 2020: 2345-2354.
|
| 18 |
PAN H W , HE S N , ZHANG K J , et al. AMAM: an attention-based multimodal alignment model for medical visual question answering. Knowledge-Based Systems, 2022, 255, 109763.
|
| 19 |
刘凯, 任洪逸, 李蓥, 等. 基于交叉模态注意力特征增强的医学视觉问答. 计算机工程, 2025, 51 (6): 49- 56.
doi: 10.19678/j.issn.1000-3428.0068910
|
|
LIU K , REN H Y , LI Y , et al. Medical visual question answering based on cross-modal attention feature enhancement. Computer Engineering, 2025, 51 (6): 49- 56.
doi: 10.19678/j.issn.1000-3428.0068910
|
| 20 |
HUANG X F , GONG H F . A dual-attention learning network with word and sentence embedding for medical visual question answering. IEEE Transactions on Medical Imaging, 2024, 43 (2): 832- 845.
doi: 10.1109/TMI.2023.3322868
|
| 21 |
LI Y , YANG Q H , WANG F , et al. Asymmetric cross-modal attention network with multimodal augmented mixup for medical visual question answering. Artificial Intelligence in Medicine, 2023, 144, 102667.
doi: 10.1016/j.artmed.2023.102667
|
| 22 |
吴梓恒. 基于细粒度特征提取和认知推理的医学视觉问答研究[D]. 南京: 南京信息工程大学, 2024.
|
|
WU Z H. Research on medical visual question answering based on fine-grained feature extraction and cognitive reasoning[D]. Nanjing: Nanjing University of Information Science and Technology, 2024. (in Chinese)
|
| 23 |
REN F J , ZHOU Y Y . CGMVQA: a new classification and generative model for medical visual question answering. IEEE Access, 2020, 8, 50626- 50636.
doi: 10.1109/ACCESS.2020.2980024
|
| 24 |
|
| 25 |
LAU J J , GAYEN S , BEN ABACHA A , et al. A dataset of clinically generated visual questions and answers about radiology images. Scientific Data, 2018, 5, 180251.
doi: 10.1038/sdata.2018.251
|
| 26 |
LIU B, ZHAN L M, XU L, et al. Slake: a semantically-labeled knowledge-enhanced dataset for medical visual question answering[C]//Proceedings of the IEEE 18th International Symposium on Biomedical Imaging (ISBI). Washington D.C., USA: IEEE Press, 2021: 1650-1654.
|
| 27 |
SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2017: 618-626.
|
| 28 |
|
| 29 |
YANG Z C, HE X D, GAO J F, et al. Stacked attention networks for image question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 21-29.
|
| 30 |
|
| 31 |
LIU B, ZHAN L M, WU X M. Contrastive pre-training and representation distillation for medical visual question answering based on radiology images[C]//Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Berlin, Germany: Springer, 2021: 210-220.
|
| 32 |
ESLAMI S, MEINEL C, DE MELO G. PubMedCLIP: how much does CLIP benefit visual question answering in the medical domain?[C]//Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023. Stroudsburg, USA: ACL Press, 2023: 1181-1193.
|