基于多维度异质图结构的代码注释自动生成

doi:10.19678/j.issn.1000-3428.0064240

摘要/Abstract

摘要： 代码注释能够增强源代码的可读性、辅助软件开发过程，因此代码注释自动生成任务成为研究热点。然而现有工作大多只利用了源代码的序列信息或抽象语法树信息，未能充分捕捉代码语言特有的多种特征。为进一步利用源代码的多维度特征，提升注释生成的效果，构建基于多维度异质图结构的代码注释自动生成模型。利用异质图结构和图神经网络，将源代码的抽象语法树、控制流图、数据流图等进行融合并构建为具有多种节点和连边的异质表示图，以此表现代码的语义特征、序列特征、语法特征、结构特征等多维度特征。在真实数据集上的实验结果表明，该模型相较于Hybrid-DRL、NeuralCodeSum、SeqGNN等模型具有更好的效果，在BLEU-4、METEOR、ROUGE-L指标上分别最高提升1.6%、3.2%、3.1%，可获得更流畅、可读性更好的代码注释。

关键词: 代码注释生成, 异质图, 图注意力网络, 神经机器翻译, 多维度特征

Abstract: The task of automatic code annotation generation has become a research hotspot considering code annotations can enhance the readability of source code and assist the software development process.While some researchers have exploited the sequence information or abstract syntax tree information of source code, the multiple features specific to the code language have not been studied.Therefore, to further exploit the multi-dimensional features of source code and improve the annotation generation effect, this study uses a heterogeneous graph structure and graph neural network to fuse and construct the abstract syntax tree, control flow graph, and data flow graph of the source code into a heterogeneous representation graph with multiple nodes and edges to represent the multi-dimensional features such as semantic features, sequence features, syntax features, and structural features of the code.Furthermore, this study proposes an automatic code annotation generation model based on a multi-dimensional heterogeneous graph structure.The experimental results show that the proposed model can perform better on the real datasets compared to other current models such as Hybrid-DRL, NeuralCodeSum, SeqGNN, et al.The highest improvement in BLEU-4, METEOR, and ROUGE-L metrics are 1.6%, 3.2%, and 3.1%, respectively, which obtained more fluent and readable code annotations.

Key words: code annotation generation, heterogeneous graph, graph attention network, neural machine translation, multi-dimensional feature

中图分类号:

TP18

戎珂瑶, 熊贇. 基于多维度异质图结构的代码注释自动生成[J]. 计算机工程, 2023, 49(4): 240-248.

RONG Keyao, XIONG Yun. Automatic Code Annotation Generation Based on Multi-dimensional Heterogeneous Graph Structure[J]. Computer Engineering, 2023, 49(4): 240-248.

https://www.ecice06.com/CN/Y2023/V49/I4/240

图/表 5

20230417190152

20230417190155

20230417190158

20230417190201

20230417190204

参考文献

[1] SRIDHARA G, HILL E, MUPPANENI D, et al.Towards automatically generating summary comments for Java methods[C]//Proceedings of IEEE/ACM International Conference on Automated Software Engineering.Washington D.C., USA:IEEE Press, 2010:43-52.
[2] HILL E, POLLOCK L, VIJAY-SHANKER K.Automatically capturing source code context of NL-queries for software maintenance and reuse[C]//Proceedings of IEEE International Conference on Software Engineering.Washington D.C., USA:IEEE Press, 2009:232-242.
[3] SRIDHARA G, POLLOCK L, VIJAY-SHANKER K.Generating parameter comments and integrating with method summaries[C]//Proceedings of IEEE International Conference on Program Comprehension.Washington D.C., USA:IEEE Press, 2011:71-80.
[4] HAIDUC S, APONTE J, MARCUS A.Supporting program comprehension with source code summarization[C]//Proceedings of IEEE/ACM International Conference on Software Engineering.Washington D.C., USA:IEEE Press, 2011:223-226.
[5] WONG E, YANG J Q, TAN L.AutoComment:mining question and answer sites for automatic comment generation[C]//Proceedings of IEEE/ACM International Conference on Automated Software Engineering.Washington D.C., USA:IEEE Press, 2014:562-567.
[6] WONG E, LIU T Y, LIN T.CloCom:mining existing source code for automatic comment generation[C]//Proceedings of IEEE International Conference on Software Analysis, Evolution, and Reengineering.Washington D.C., USA:IEEE Press, 2015:380-389.
[7] 白杨, 张丽萍.挖掘软件源代码的代码注释自动生成方法[J].计算机工程与应用, 2020, 56(10):246-253. BAI Y, ZHANG L P.Mining source code for automatic comment generation method[J].Computer Engineering and Applications, 2020, 56(10):246-253.(in Chinese)
[8] IYER S, KONSTAS I, CHEUNG A, et al.Summarizing source code using a neural attention model[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Stroudsburg, USA:Association for Computational Linguistics, 2016:2073-2083.
[9] ZHENG W H, ZHOU H Y, LI M, et al.Code attention:translating code to comments by exploiting domain features[EB/OL].[2022-02-01].https://arxiv.org/abs/1709. 07642.
[10] HU X, LI G, XIA X, et al.Deep code comment generation[C]//Proceedings of IEEE/ACM International Conference on Program Comprehension.Washington D.C., USA:IEEE Press, 2020:200-210.
[11] HU X, LI G, XIA X, et al.Deep code comment generation with hybrid lexical and syntactical information[J].Empirical Software Engineering, 2020, 25(3):2179-2217.
[12] 牛长安, 葛季栋, 唐泽, 等.基于指针生成网络的代码注释自动生成模型[J].软件学报, 2021, 32(7):2142-2165. NIU C A, GE J D, TANG Z, et al.Automatic generation of source code comments model based on pointer-generator network[J].Journal of Software, 2021, 32(7):2142-2165.(in Chinese)
[13] 许柏炎, 蔡瑞初, 梁智豪.一种用于代码注释自动生成的语法辅助复制机制[J].计算机工程, 2021, 47(4):92-99. XU B Y, CAI R C, LIANG Z H.A grammar-aided copy mechanism for automatic code comment generation[J].Computer Engineering, 2021, 47(4):92-99.(in Chinese)
[14] XIONG Y, XU S F, RONG K Y, et al.Code2Text:dual attention syntax annotation networks for structure-aware code translation[C]//Proceedings of International Conference on Database Systems for Advanced Applications.Berlin, Germany:Springer, 2020:87-103.
[15] 徐少峰, 潘文韬, 熊赟, 等.基于结构感知双编码器的代码注释自动生成[J].计算机工程, 2020, 46(2):304-308, 314. XU S F, PAN W T, XIONG Y, et al.Code annotation automatic generation based on structure aware dual encoder[J].Computer Engineering, 2020, 46(2):304-308, 314.(in Chinese)
[16] ALLAMANIS M, PENG H, SUTTON C.A convolutional attention network for extreme summarization of source code[C]//Proceedings of Conference on Machine Learning.[S.l.]:PMLR, 2016:2091-2100.
[17] MOU L L, LI G, ZHANG L, et al.Convolutional neural networks over tree structures for programming language processing[C]//Proceedings of AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2016:1287-1293.
[18] 董传珂, 赵逢禹, 刘亚.基于注意力机制的双编码器代码注释生成[J].小型微型计算机系统, 2022, 43(2):438-442. DONG C K, ZHAO F Y, LIU Y.Dual encoder code comment generation based on attention mechanism[J].Journal of Chinese Computer Systems, 2022, 43(2):438-442.(in Chinese)
[19] WAN Y, ZHAO Z, YANG M, et al.Improving automatic source code summarization via deep reinforcement learning[C]//Proceedings of IEEE/ACM International Conference on Automated Software Engineering.Washington D.C., USA:IEEE Press, 2018:397-407.
[20] HU X, LI G, XIA X, et al.Deep code comment generation[C]//Proceedings of the 26th Conference on Program Comprehension.New York, USA:ACM Press, 2018:200-210.
[21] HU X, LI G, XIA X, et al.Deep code comment generation with hybrid lexical and syntactical information[J].Empirical Software Engineering, 2020, 25(3):2179-2217.
[22] LECLAIR A, JIANG S Y, MCMILLAN C.A neural model for generating natural language summaries of program subroutines[C]//Proceedings of IEEE/ACM International Conference on Software Engineering.Washington D.C., USA:IEEE Press, 2019:795-806.
[23] HAQUE S, LECLAIR A, WU L F, et al.Improved automatic summarization of subroutines via attention to file context[C]//Proceedings of the 17th International Conference on Mining Software Repositories.New York, USA:ACM Press, 2020:300-310.
[24] LIU S Q, CHEN Y, XIE X F, et al.Retrieval-augmented generation for code summarization via hybrid GNN[EB/OL].[2022-02-01].https://arxiv.org/abs/2006. 05405.
[25] ZENG C, YU Y, LI S S, et al.deGraphCS:embedding variable-based flow graph for neural code search[EB/OL].[2022-02-01].https://arxiv.org/abs/2103.13020.
[26] ZHENG W H, ZHOU H Y, LI M, et al.Code attention:translating code to comments by exploiting domain features[EB/OL].[2022-02-01].https://arxiv.org/abs/1709. 07642.
[27] LUONG M T, PHAM H, MANNING C D.Effective approaches to attention-based neural machine translation[EB/OL].[2022-02-01].https://arxiv.org/abs/1508. 04025.
[28] BARONE A V M, SENNRICH R.A parallel corpus of Python functions and documentation strings for automated code documentation and code generation[EB/OL].[2022-02-01].https://arxiv.org/abs/1707.02275.
[29] AHMAD W U, CHAKRABORTY S, RAY B, et al.A Transformer-based approach for source code summarization[EB/OL].[2022-02-01].https://arxiv.org/abs/2005. 00653.
[30] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2017:6000-6010.
[31] WEI B L, LI G, XIA X, et al.Code generation as a dual task of code summarization[EB/OL].[2022-02-01].https://arxiv.org/abs/1910.05923.
[32] KIPF T N, WELLING M.Semi-supervised classification with graph convolutional networks[EB/OL].[2022-02-01].https://arxiv.org/abs/1609.02907.
[33] VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al.Graph attention networks[EB/OL].[2022-02-01].https://arxiv.org/abs/1710.10903.
[34] FERNANDES P, ALLAMANIS M, BROCKSCHMIDT M.Structured neural summarization[EB/OL].[2022-02-01].https://arxiv.org/abs/1811.01824.
[35] LECLAIR A, HAQUE S, WU L F, et al.Improved code summarization via a graph neural network[C]//Proceedings of the 28th International Conference on Program Comprehension.New York, USA:ACM Press, 2020:184-195.

选择文件类型/文献管理软件名称

选择包含的内容