基于结构感知混合编码模型的代码注释生成方法

doi:10.19678/j.issn.1000-3428.0063592

摘要/Abstract

摘要： 代码注释能够提高程序代码的可读性，从而提升软件开发效率并降低成本。现有的代码注释生成方法将程序代码的序列表示或者抽象语法树表示输入到不同结构的编码器网络，无法融合程序代码不同抽象形式的结构特性，导致生成的注释可读性较差。构建一种结构感知的混合编码模型，同时考虑程序代码的序列表示和结构表示，通过序列编码层和图编码层分别捕获程序代码的序列信息和语法结构信息，并利用聚合编码过程将两类信息融合至解码器。设计一种结构感知的图注意力网络，通过将程序代码的语法结构的层次和类型信息嵌入图注意力网络的学习参数，有效提升了混合编码模型对程序代码的复杂语法结构的学习能力。实验结果表明，与SiT基准模型相比，混合编码模型在Python和Java数据集上的BLEU、ROUGE-L、METEOR得分分别提高了2.68%、1.47%、3.82%和2.51%、2.24%、3.55%，能生成更准确的代码注释。

关键词: 代码注释生成, 混合编码模型, 图注意力网络, 深度自注意力网络, 自然语言处理

Abstract: Code comments improve the readability of program codes, enhancing software development efficiency and reducing costs.Existing methods for code comment generation feed the sequence form or Abstract Syntax Tree(AST) form of a program code into encoder networks with different structures, which cannot fuse the structural characteristics of different abstract forms of program codes.This results in poor readability of the generated comments.This study proposes a Structure-aware Hybrid Encoding(SHE) model.The SHE model considers both the sequence form and structure form of the program code.This includes capturing the context information and the grammar structure information of the program code by the sequence encoding layer and the graph encoding layer, respectively, and effectively fusing the above two aspect information into the decoder through aggregation encoding.This study further proposes a Structure-aware Graph Attention(SGAT) network to effectively improve the learning ability of the SHE model for the complex grammar structure of a program code by integrating the hierarchical and type information of the grammar structure of the program code into the learning parameters of the graph attention network.The experimental results show that compared with the Structure-induced Transformer(SiT) baseline models, the SHE model improves the Bi-Lingual Evaluation Understudy(BLEU), Recall-Oriented Understudy for Gisting Evaluation-Longest common subsequence(ROUGE-L), and Metric for Evaluation of Translation with Explicit Ordering(METEOR) scores by 2.68%, 1.47%, and 3.82%, respectively, on the Python dataset.Moreover, the SHE model improves BLEU, ROUGE-L and METEOR scores by 2.51%, 2.24%, and 3.55%, respectively, on the Java dataset.The experimental results demonstrate that the SHE model can generate more accurate code comments than the baseline models.

Key words: code comment generation, hybrid encoding model, graph attention network, deep self-attention network, natural language processing

中图分类号:

TP312

蔡瑞初, 张盛强, 许柏炎. 基于结构感知混合编码模型的代码注释生成方法[J]. 计算机工程, 2023, 49(2): 61-69.

CAI Ruichu, ZHANG Shengqiang, XU Boyan. Method for Generating Code Comments Based on Structure-aware Hybrid Encoding Model[J]. Computer Engineering, 2023, 49(2): 61-69.

https://www.ecice06.com/CN/Y2023/V49/I2/61

图/表 11

20230216180155

20230216180159

20230216180202

20230216180206

20230216180210

20230216180213

20230216180216

20230216180220

20230216180223

20230216180226

20230216180230

参考文献

[1] WOODFIELD S N, DUNSMORE H E, SHEN V Y.The effect of modularization and comments on program comprehension[C]//Proceedings of the 5th International Conference on Software Engineering.Washington D.C., USA:IEEE Press, 1981:215-223.
[2] TENNY T.Program readability:procedures versus comments[J].IEEE Transactions on Software Engineering, 1988, 14(9):1271-1279.
[3] 殷明明, 史小静, 俞鸿飞, 等.基于对比注意力机制的跨语言句子摘要系统[J].计算机工程, 2020, 46(5):86-93. YIN M M, SHI X J, YU H F, et al.Cross-lingual sentence summarization system based on contrastive attention mechanism[J].Computer Engineering, 2020, 46(5):86-93.(in Chinese)
[4] 张楚婷, 常亮, 王文凯, 等.基于BiLSTM-CRF的细粒度知识图谱问答[J].计算机工程, 2020, 46(2):41-47. ZHANG C T, CHANG L, WANG W K, et al.Fine-grained question answering over knowledge graph based on BiLSTM-CRF[J].Computer Engineering, 2020, 46(2):41-47.(in Chinese)
[5] 冯读娟, 杨璐, 严建峰.基于双编码器结构的文本自动摘要研究[J].计算机工程, 2020, 46(6):60-64. FENG D J, YANG L, YAN J F.Research on automatic text summarization based on dual-encoder structure[J].Computer Engineering, 2020, 46(6):60-64.(in Chinese)
[6] 陈翔, 杨光, 崔展齐, 等.代码注释自动生成方法综述[J].软件学报, 2021, 32(7):2118-2141. CHEN X, YANG G, CUI Z Q, et al.Survey of state-of-the-art automatic code comment generation[J].Journal of Software, 2021, 32(7):2118-2141.(in Chinese)
[7] 霍丽春, 张丽萍.代码注释演化及分类研究综述[J].内蒙古师范大学学报(自然科学汉文版), 2020, 49(5):423-432. HUO L C, ZHANG L P.An overview on evolution and classification of code annotation[J].Journal of Inner Mongolia Normal University(Natural Sciences Edition), 2020, 49(5):423-432.(in Chinese)
[8] IYER S, KONSTAS I, CHEUNG A, et al.Summarizing source code using a neural attention model[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2016:2073-2083.
[9] ALON U, BRODY S, LEVY O, et al.Code2Seq:generating sequences from structured representations of code[EB/OL].[2021-11-05].https://arxiv.org/abs/1808.01400.
[10] HU X, LI G, XIA X, et al.Deep code comment generation[C]//Proceedings of the 26th Conference on Program Comprehension.New York, USA:ACM Press, 2018:200-210.
[11] WU H Q, ZHAO H, ZHANG M.Code summarization with structure-induced Transformer[C]//Proceedings of ACL-IJCNLP'21.Stroudsburg, USA:Association for Computational Linguistics, 2021:1078-1090.
[12] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2017:5998-6008.
[13] LUONG M T, PHAM H, MANNING C D.Effective approaches to attention-based neural machine translation[EB/OL].[2021-11-05].https://arxiv.org/abs/1508.04025.
[14] HU X, LI G, XIA X, et al.Summarizing source code with transferred API knowledge[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence.Stockholm, Sweden:International Joint Conferences on Artificial Intelligence Organization, 2018:2269-2275.
[15] LECLAIR A, JIANG S Y, MCMILLAN C.A neural model for generating natural language summaries of program subroutines[C]//Proceedings of the 41st International Conference on Software Engineering.Washington D.C., USA:IEEE Press, 2019:795-806.
[16] CHUNG J, GULCEHRE C, CHO K, et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].[2021-11-05].https://arxiv.org/pdf/1412.3555.pdf.
[17] CAI R C, LIANG Z H, XU B Y, et al.TAG:type auxiliary guiding for code comment generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2020:291-301.
[18] FERNANDES P, ALLAMANIS M, BROCKSCHMIDT M.Structured neural summarization[EB/OL].[2021-11-05].https://arxiv.org/abs/1811.01824.
[19] LI Y J, TARLOW D, BROCKSCHMIDT M, et al.Gated graph sequence neural networks[EB/OL].[2021-11-05].https://arxiv.org/abs/1511.05493.
[20] LECLAIR A, HAQUE S, WU L F, et al.Improved code summarization via a graph neural network[C]//Proceedings of the 28th International Conference on Program Comprehension.New York, USA:ACM Press, 2020:184-195.
[21] SCARSELLI F, GORI M, TSOI A C, et al.The graph neural network model[J].IEEE Transactions on Neural Networks, 2009, 20(1):61-80.
[22] LIU S Q, CHEN Y, XIE X F, et al.Retrieval-augmented generation for code summarization via hybrid GNN[EB/OL].[2021-11-05].https://arxiv.org/pdf/2006.05405.pdf.
[23] HELLENDOORN V J, SUTTON C, SINGH R, et al.Global relational models of source code[C]//Proceedings of the 8th International Conference on Learning Representations.New York, USA:ACM Press, 2020:1-12.
[24] YASUNAGA M, LIANG P.Graph-based, self-supervised program repair from diagnostic feedback[EB/OL].[2021-11-05].https://arxiv.org/abs/2005.10636.
[25] SEE A, LIU P J, MANNING C D.Get to the point:summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2017:1073-1083.
[26] WAN Y, ZHAO Z, YANG M, et al.Improving automatic source code summarization via deep reinforcement learning[C]//Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering.Washington D.C., USA:IEEE Press, 2018:397-407.
[27] PAPINENI K, ROUKOS S, WARD T, et al.BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2002:311-318.
[28] LIN C Y.Rouge:a package for automatic evaluation of summaries[C]//Proceedings of Workshop on Text Summarization Branches Out.New York, USA:ACM Press, 2004:74-81.
[29] BANERJEE S, LAVIE A.METEOR:an automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.New York, USA:ACM Press, 2005:65-72.
[30] KINGMA D P, BA J.Adam:a method for stochastic optimization[EB/OL].[2021-11-05].https://arxiv.org/pdf/1412.6980.pdf.
[31] GLOROT X, BENGIO Y.Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics.New York, USA:ACM Press, 2010:249-256.
[32] HOCHREITER S, SCHMIDHUBER J.Long short-term memory[J].Neural Computation, 1997, 9(8):1735-1780.
[33] ERIGUCHI A, HASHIMOTO K, TSURUOKA Y.Tree-to-sequence attentional neural machine translation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2016:823-833.
[34] WEI B L, LI G, XIA X, et al.Code generation as a dual task of code summarization[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2019:6563-6573.
[35] AHMAD W, CHAKRABORTY S, RAY B, et al.A transformer-based approach for source code summarization[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2020:4998-5007.

选择文件类型/文献管理软件名称

选择包含的内容