作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (2): 304-308,314. doi: 10.19678/j.issn.1000-3428.0053873

• 开发研究与工程应用 • 上一篇    下一篇

基于结构感知双编码器的代码注释自动生成

徐少峰, 潘文韬, 熊赟, 朱扬勇   

  1. 复旦大学 计算机科学技术学院 上海市数据科学重点实验室, 上海 201203
  • 收稿日期:2019-01-31 修回日期:2019-03-11 发布日期:2019-03-15
  • 作者简介:徐少峰(1994-),男,硕士研究生,主研方向为数据挖掘;潘文韬,硕士研究生;熊赟、朱扬勇,教授、博士。
  • 基金资助:
    国家自然科学基金(U1636207,91546105,20873999);上海市科学技术委员会科研计划项目(16JC1400801,17511105502)。

Code Annotation Automatic Generation Based on Structure Aware Dual Encoder

XU Shaofeng, PAN Wentao, XIONG Yun, ZHU Yangyong   

  1. Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai 201203, China
  • Received:2019-01-31 Revised:2019-03-11 Published:2019-03-15

摘要: 在软件开发过程中,性能良好的代码注释工具能够提高开发效率并降低维护成本。部分研究者将代码注释自动生成看作将源代码翻译成自然语言注释的翻译任务,但仅考虑源代码的序列信息而忽略了代码内部的结构特性。为此,在常见端到端翻译模型的基础上,利用代码抽象语法树将源代码的结构信息嵌入到编码器解码器翻译模型中,提出一种基于结构感知的双编码器解码器模型,该模型综合考虑源代码的序列信息与代码内部的结构特性。在真实数据集上的实验结果表明,相比PBMT、Seq2seq模型,该模型的BLEU得分较高,且生成的注释更准确和易读。

关键词: 代码注释生成, 抽象语法树, 双编码器解码器模型, 卷积神经网络, 循环神经网络

Abstract: In the process of software development,code annotation tools with good performance can improve development efficiency and reduce maintenance costs.Some researchers regard the automatic generation of code annotation as a task that translates source code into natural language annotation.They only take the sequence information of source code into consideration,while ignoring the internal structure characteristics of the code.Therefore,on the basis of the common end to end translation model,by using the code abstract syntax tree,the structure information of the source code is embedded into the encoder and decoder translation model,and a dual encoder and decoder model based on structure awareness is proposed,which comprehensively considers the sequence information of the source code and the structure features within the code.Experimental results on real datasets show that compared with the PBMT and Seq2seq models,the BLEU score of the proposed method is higher and the generated annotations are more accurate and readable.

Key words: code annotation generation, abstract syntax tree, dual encoder and decoder model, Convolutional Neural Network(CNN), Recurrent Neural Network(RNN)

中图分类号: