作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (4): 92-99. doi: 10.19678/j.issn.1000-3428.0057290

• 人工智能与模式识别 • 上一篇    下一篇

一种用于代码注释自动生成的语法辅助复制机制

许柏炎, 蔡瑞初, 梁智豪   

  1. 广东工业大学 计算机学院, 广州 510006
  • 收稿日期:2020-01-27 修回日期:2020-04-11 发布日期:2020-04-16
  • 作者简介:许柏炎(1991-),男,博士研究生,主研方向为自然语言处理;蔡瑞初(通信作者),教授、博士生导师;梁智豪,硕士研究生。
  • 基金资助:
    国家自然科学基金(61876043);广东省自然科学基金(2014A030306004,2014A030308008);广东特支计划(2015TQ01X140);NSFC-广东联合基金(U1501254);广州市珠江科技新星专项(201610010101);广州市科技计划项目(201902010058)。

A Grammar-Aided Copy Mechanism for Automatic Code Comment Generation

XU Boyan, CAI Ruichu, LIANG Zhihao   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2020-01-27 Revised:2020-04-11 Published:2020-04-16

摘要: 现有代码注释生成方法的复制机制未考虑源代码复杂多变的语法结构,导致存在准确率和鲁棒性不高等问题。通过改进指针网络使其支持结构化数据输入,提出一种语法辅助复制机制,以用于代码注释自动生成。该机制包含节点筛选策略和去冗余生成策略2个部分。节点筛选策略基于语法信息引入掩盖变量以过滤无效节点,从而降低指针网络对复杂语法的学习成本。去冗余生成策略基于时间窗口对节点概率进行动态调整,可解决代码自动注释中关键信息缺失的问题。实验结果表明,在WikiSQL数据集上,相比基准方法,该机制的BLEU、ROUGE-2和ROUGE-L指标值分别提升14.5%、10.3%和5.5%,在ATIS数据集上,上述指标值分别提升2.8%、6.6%和2.5%,验证了该机制的有效性以及引入语法信息的必要性。

关键词: 代码注释生成, 指针网络, 自然语言生成, 结构信息, 复制机制

Abstract: The copy mechanisms of the existing code comment generation methods do not consider the complex and varying grammar structures of source code,resulting in low copy accuracy and low robustness.This paper reconstructs the pointer network to make it support structured data input,and proposes a new grammar-aided copy mechanism for automatic comment generation.The mechanism consists of two parts:node filtering strategy and de-redundant generation strategy.Node filtering strategy that introduces masking variables to filter invalid type nodes based on grammatical information,which reduces the learning cost of complex grammar in pointer networks.De-redundant generation strategy that dynamically adjusts the node probability based on the time window,which solves the problem of missing key information in the automatically generated comment.Experimental results show that compared with baseline methods,the proposed method improves BLEU by 14.5%,ROUGE-2 by 10.3% and ROUGE-L by 5.5% on the WikiSQL dataset,and improves BLEU by 2.8%,ROUGE-2 by 6.6% and ROUGE-L by 2.5% on the ATIS dataset.The results verify the effectiveness of the mechanism and the necessity of introducing grammatical information.

Key words: code comment generation, pointer network, natural language generation, structured information, copy mechanism

中图分类号: