作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (2): 61-69. doi: 10.19678/j.issn.1000-3428.0063592

• 人工智能与模式识别 • 上一篇    下一篇

基于结构感知混合编码模型的代码注释生成方法

蔡瑞初, 张盛强, 许柏炎   

  1. 广东工业大学 计算机学院, 广州 510006
  • 收稿日期:2021-12-21 修回日期:2022-02-18 发布日期:2022-03-22
  • 作者简介:蔡瑞初(1983-),男,教授、博士、博士生导师,主研方向为因果关系、机器学习;张盛强,硕士研究生;许柏炎(通信作者),博士研究生。
  • 基金资助:
    国家自然科学基金(61876043);国家优秀青年科学基金(62122022);广州市科技计划项目(201902010058)。

Method for Generating Code Comments Based on Structure-aware Hybrid Encoding Model

CAI Ruichu, ZHANG Shengqiang, XU Boyan   

  1. School of Computers, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2021-12-21 Revised:2022-02-18 Published:2022-03-22

摘要: 代码注释能够提高程序代码的可读性,从而提升软件开发效率并降低成本。现有的代码注释生成方法将程序代码的序列表示或者抽象语法树表示输入到不同结构的编码器网络,无法融合程序代码不同抽象形式的结构特性,导致生成的注释可读性较差。构建一种结构感知的混合编码模型,同时考虑程序代码的序列表示和结构表示,通过序列编码层和图编码层分别捕获程序代码的序列信息和语法结构信息,并利用聚合编码过程将两类信息融合至解码器。设计一种结构感知的图注意力网络,通过将程序代码的语法结构的层次和类型信息嵌入图注意力网络的学习参数,有效提升了混合编码模型对程序代码的复杂语法结构的学习能力。实验结果表明,与SiT基准模型相比,混合编码模型在Python和Java数据集上的BLEU、ROUGE-L、METEOR得分分别提高了2.68%、1.47%、3.82%和2.51%、2.24%、3.55%,能生成更准确的代码注释。

关键词: 代码注释生成, 混合编码模型, 图注意力网络, 深度自注意力网络, 自然语言处理

Abstract: Code comments improve the readability of program codes, enhancing software development efficiency and reducing costs.Existing methods for code comment generation feed the sequence form or Abstract Syntax Tree(AST) form of a program code into encoder networks with different structures, which cannot fuse the structural characteristics of different abstract forms of program codes.This results in poor readability of the generated comments.This study proposes a Structure-aware Hybrid Encoding(SHE) model.The SHE model considers both the sequence form and structure form of the program code.This includes capturing the context information and the grammar structure information of the program code by the sequence encoding layer and the graph encoding layer, respectively, and effectively fusing the above two aspect information into the decoder through aggregation encoding.This study further proposes a Structure-aware Graph Attention(SGAT) network to effectively improve the learning ability of the SHE model for the complex grammar structure of a program code by integrating the hierarchical and type information of the grammar structure of the program code into the learning parameters of the graph attention network.The experimental results show that compared with the Structure-induced Transformer(SiT) baseline models, the SHE model improves the Bi-Lingual Evaluation Understudy(BLEU), Recall-Oriented Understudy for Gisting Evaluation-Longest common subsequence(ROUGE-L), and Metric for Evaluation of Translation with Explicit Ordering(METEOR) scores by 2.68%, 1.47%, and 3.82%, respectively, on the Python dataset.Moreover, the SHE model improves BLEU, ROUGE-L and METEOR scores by 2.51%, 2.24%, and 3.55%, respectively, on the Java dataset.The experimental results demonstrate that the SHE model can generate more accurate code comments than the baseline models.

Key words: code comment generation, hybrid encoding model, graph attention network, deep self-attention network, natural language processing

中图分类号: