作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (4): 240-248. doi: 10.19678/j.issn.1000-3428.0064240

• 开发研究与工程应用 • 上一篇    下一篇

基于多维度异质图结构的代码注释自动生成

戎珂瑶, 熊贇   

  1. 复旦大学 计算机科学技术学院 上海市数据科学重点实验室, 上海 200433
  • 收稿日期:2022-03-21 修回日期:2022-05-30 发布日期:2022-08-09
  • 作者简介:戎珂瑶(1996-),女,硕士研究生,主研方向为深度学习;熊贇,教授、博士。

Automatic Code Annotation Generation Based on Multi-dimensional Heterogeneous Graph Structure

RONG Keyao, XIONG Yun   

  1. Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai 200433, China
  • Received:2022-03-21 Revised:2022-05-30 Published:2022-08-09

摘要: 代码注释能够增强源代码的可读性、辅助软件开发过程,因此代码注释自动生成任务成为研究热点。然而现有工作大多只利用了源代码的序列信息或抽象语法树信息,未能充分捕捉代码语言特有的多种特征。为进一步利用源代码的多维度特征,提升注释生成的效果,构建基于多维度异质图结构的代码注释自动生成模型。利用异质图结构和图神经网络,将源代码的抽象语法树、控制流图、数据流图等进行融合并构建为具有多种节点和连边的异质表示图,以此表现代码的语义特征、序列特征、语法特征、结构特征等多维度特征。在真实数据集上的实验结果表明,该模型相较于Hybrid-DRL、NeuralCodeSum、SeqGNN等模型具有更好的效果,在BLEU-4、METEOR、ROUGE-L指标上分别最高提升1.6%、3.2%、3.1%,可获得更流畅、可读性更好的代码注释。

关键词: 代码注释生成, 异质图, 图注意力网络, 神经机器翻译, 多维度特征

Abstract: The task of automatic code annotation generation has become a research hotspot considering code annotations can enhance the readability of source code and assist the software development process.While some researchers have exploited the sequence information or abstract syntax tree information of source code, the multiple features specific to the code language have not been studied.Therefore, to further exploit the multi-dimensional features of source code and improve the annotation generation effect, this study uses a heterogeneous graph structure and graph neural network to fuse and construct the abstract syntax tree, control flow graph, and data flow graph of the source code into a heterogeneous representation graph with multiple nodes and edges to represent the multi-dimensional features such as semantic features, sequence features, syntax features, and structural features of the code.Furthermore, this study proposes an automatic code annotation generation model based on a multi-dimensional heterogeneous graph structure.The experimental results show that the proposed model can perform better on the real datasets compared to other current models such as Hybrid-DRL, NeuralCodeSum, SeqGNN, et al.The highest improvement in BLEU-4, METEOR, and ROUGE-L metrics are 1.6%, 3.2%, and 3.1%, respectively, which obtained more fluent and readable code annotations.

Key words: code annotation generation, heterogeneous graph, graph attention network, neural machine translation, multi-dimensional feature

中图分类号: