[1] SURYAWANSHI S, CHAKRAVARTHI B R, ARCAN M, et al. Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text[C]//Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying. ELRA, 2020: 32-41.
[2] HE P C, GAO J F, CHEN W Z. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing[C] //Proceedings of the 11th International Conference on Learning Representations. ICLR, 2023.
[3] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[C]//Proceedings of the 9th International Conference on Learning Representations. ICLR, 2021:1-20.
[4] PRAMANICK S, SHARMA S, DIMITROV D, et al. MOMENTA: A multimodal framework for detecting harmful memes and their targets[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. ACL, 2021: 4439–4455.
[5] LIN H Z, LUO Z Y, MA J, et al. Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models[C]// Proce edings of the 2023 Conference on Empirical Methods in Natural Language Processing. ACL, 2023: 9114-9128.
[6] CAO R, HEE M S, KUEK A, et al. Pro-cap: Leveraging a frozen vision-language model for hateful meme detection[C]//Proceedings of the 31st ACM International Conference on Multimedia. ACM, 2023: 5244-5252.
[7] TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL, 2019: 6558–6569.
[8] 余本功,陈明玥.基于细粒度图像-方面的情感增强方面级情感分析[J].计算机应用研究,2025,42(04):1073-1079.
YU B B,CHEN M Y. Aspect-oriented affective knowledge enhanced for aspect-based sentiment analysis[J]. Application Research of Computers, 2025,42(04):1073-1079.
[9] 刘洲,马立平,张海燕.基于深度图文细粒度对齐的弱监督多模态情感分析[J].计算机应用研究, 2025,42(02):419-424.
LIU Z,MA L P,ZHANG H Y. Weakly supervised multimodal sentiment analysis based on deep fine-grained alignment of image and text[J]. Application Research of Computers, 2025,42(02):419-424.
[10] VUORIO R, SUN S H, HU H X, et al. Multimodal model-agnostic meta-learning via task-aware modulation[C]. //Advances in neural information processing systems. NeurIPS, 2019.
[11] Zhu D, Chen J, Shen X, et al. Minigpt-4: Enhancing vision-language understanding with advanced large language models[J]. arXiv preprint arXiv:2304.10592, 2023.
[12] LIU H T, LI C Y, WU Q Y, et al. Visual instruction tuning[C]. //Advances in neural information processing systems. NeurIPS, 2019: 34892-34916.
[13] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 2016: 770-778.
[14] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of Machine Learning Research. PMLR, 2021: 8748-8763.
[15] DEEPSEEK-AI, LIU A X, MEI A X, et al. DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models[J]. arXiv preprint arXiv:2512.02556,
2025.
[16] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]//Advances in Neural Information Processing Systems. NeurIPS, 2016: 3630-3638.
[17] SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[C].//Advances in neural information processing systems. NeurIPS, 2017: 4077-4087.
[18] KOUTLIS C, SCHINAS M, PAPADOPOULOS S. Memefier: Dual-stage modality fusion for image meme classification[C]//Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. ACM, 2023: 586-591.
[19] ZHANG L H, JIN L, SUN X, et al. TOT: topology-aware optimal transport for multimodal hate detection[C]//Proceedings of the AAAI conference on artificial intelligence. AAAI, 2023: 4884-4892.
[20] HUERTAS-TATO J, KOURLIS C, PAPADOPOULOS S, et al. A CLIP-based siamese approach for meme classification[C]//Proceedings of the 2024 International Joint Conference on Neural Networks. IEEE,2024: 1-8.
[21] HOSSAIN E, SHARIF O, HOQUE M M, et al. Align before attend: Aligning visual and textual features for multimodal hateful content detection[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop. ACL, 2024: 162-174.
[22] TIAN Y H, XIA F, SONG Y. Learning multimodal contrast with cross-modal memory and reinforced contrast recognition[C]//Proceedings of the 2024 Conference of the Association for Computational Linguistics. ACL, 2024: 6561-6573.
[23] SHAH S B, SHIWAKOTI S, BHUIYAN T, et al. Entity-Aware Optimal Transport and Residual Attention for Multimodal Content Moderation[C]// Companion Proceedings of the ACM on Web Conference 2025. ACM, 2025: 2306-2313.
[24] 刘晋文,王磊,马博,等.基于弱监督模态语义增强的多模态有害信息检测方法[J].计算机应用, 2025,45(10):3146-3153.
LIU J W,WANG L,MA B,et al. Multimodal harmful content detection method based on weakly supervised modality semantic enhancement[J]. Journal of Computer Applications, 2025,45(10):3146-3153.
[25] 徐超,浦悦,陈勇,等. 基于因果掩码控制与反事实干预机制融合的多模态仇恨模因检测[J]. 计算机应用研究, 2025, 42(12): 3559-3565.
XU C,PU Y,CHEN Y, et al. Multimodal hate meme detection based on fusion of causal mask control and counterfactual intervention mechanism[J]. Application Research of Computers, 2025, 42(12): 3559-3565.
[26] LIN H Z, LUO Z Y, WANG B, et al. GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse[J]. ACM Transactions on Intelligent Systems and Technology, 2024.
[27] PAN F J, WU X B, QUAN T, et al. Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning[J]. arXiv preprint arXiv:2506.08477, 2025.
[28] WANG P, BAI S, TAN S, et al. Qwen2-vl: Enhancing vision-language model's perception of the world at any resolution[J]. arXiv preprint arXiv:2409.12191, 2024.
[29] KIELA D, FIROOZ H, MOHAN A, et al. The hateful memes challenge: Detecting hate speech in multimodal memes[C].//Advances in neural information processing systems. NeurIPS, 2020: 2611-2624.
|