作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (6): 227-233,241. doi: 10.19678/j.issn.1000-3428.0064591

• 图形图像处理 • 上一篇    下一篇

基于残差注意力多尺度关系网络的逻辑推理

熊中敏, 曾旗, 卢鹏, 王振华, 郑宗生   

  1. 上海海洋大学 信息学院, 上海 201306
  • 收稿日期:2022-04-29 修回日期:2022-07-18 发布日期:2022-09-20
  • 作者简介:熊中敏(1971-),男,副教授、博士,主研方向为计算机视觉、数据库理论及应用;曾旗(通信作者),硕士研究生;卢鹏、王振华、郑宗生,副教授、博士。
  • 基金资助:
    上海市科委课题(20dz1203800);上海市科委地方能力建设项目(19050502100)。

Logical Reasoning Based on Residual Attention Multi-scale Relation Network

XIONG Zhongmin, ZENG Qi, LU Peng, WANG Zhenhua, ZHENG Zongsheng   

  1. College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
  • Received:2022-04-29 Revised:2022-07-18 Published:2022-09-20

摘要: 逻辑推理是感知视觉元素间规律和联系的能力,使计算机拥有类人的推理能力是一项重要的研究内容。在大量数据和深度模型的驱动下,现今人工智能已在图像处理等领域中取得超越人类水平的表现,但通过图像进行逻辑推理的能力还较落后。为解决面向逻辑推理的多尺度关系网络(MRNet)特征提取能力不足及泛化性较差的问题,提出一种改进的残差注意力多尺度关系网络(ResAMRNet)。在主干网络中,利用残差结构并结合跳跃连接与长跳跃连接,将浅层特征融入深层网络训练过程中,减少特征信息丢失,并提高模型特征提取能力。在推理模块中,将通道注意力机制与残差模块相融合检测每行图片间的关系特征,差异化各特征通道的重要程度,自适应学习注意力权重,提取关键特征。设计双池化高效通道注意力机制,结合全局最大池化进一步获取对象的特征信息,提高模型泛化性。在RAVEN和I-RAVEN数据集上的实验结果表明,ResAMRNet的分类准确率相比于MRNet分别提升了8.3和18.1个百分点,具有较强的逻辑推理能力。

关键词: 逻辑推理, 残差结构, 注意力机制, I-RAVEN数据集, 多尺度关系网络

Abstract: Logical reasoning is the ability to perceive patterns and connections between visual elements. Endowing computers with human-like reasoning ability is a critical area of research;state-of-the-art deep neural networks have achieved superhuman performance in image processing and other fields.However,the concept of logical reasoning through images requires further research.To address the problems of insufficient feature extraction and generalization of Multi-scale Relation Network(MRNet),an improved logical reasoning method,called Residual Attention Multi-scale Relation Network(ResAMRNet),is proposed. In the backbone network,shallow features are integrated into the deep network training process by utilizing residual structures and combining jump and long jump. This reduces the loss of feature information and improves the feature extraction capability of the model. In the reasoning module,the channel attention mechanism and residuals are combined to detect the relationship features between each image line.It can differentiate the significance of each feature channel,learn the attention weight adaptively,and extract key features.In this study,a Double-pooled Efficient Channel Attention(DECA) mechanism is proposed to combine global maximum pooling to further obtain feature information regarding objects and to improve generalization.Experimental results on representative logical reasoning datasets,Relational and Analogical Visual rEasoNing(RAVEN) and Improved RAVEN(I-RAVEN),show that the accuracy of the proposed method using these datasets is higher by 8.3 and 18.1 percentage points,respectively,than that of MRNet. Therefore,it demonstrates strong logical reasoning capabilities.

Key words: logical reasoning, residual structure, attention mechanism, I-RAVEN dataset, multi-scale relation network

中图分类号: