一种用于代码注释自动生成的语法辅助复制机制

doi:10.19678/j.issn.1000-3428.0057290

摘要/Abstract

摘要： 现有代码注释生成方法的复制机制未考虑源代码复杂多变的语法结构，导致存在准确率和鲁棒性不高等问题。通过改进指针网络使其支持结构化数据输入，提出一种语法辅助复制机制，以用于代码注释自动生成。该机制包含节点筛选策略和去冗余生成策略2个部分。节点筛选策略基于语法信息引入掩盖变量以过滤无效节点，从而降低指针网络对复杂语法的学习成本。去冗余生成策略基于时间窗口对节点概率进行动态调整，可解决代码自动注释中关键信息缺失的问题。实验结果表明，在WikiSQL数据集上，相比基准方法，该机制的BLEU、ROUGE-2和ROUGE-L指标值分别提升14.5%、10.3%和5.5%，在ATIS数据集上，上述指标值分别提升2.8%、6.6%和2.5%，验证了该机制的有效性以及引入语法信息的必要性。

关键词: 代码注释生成, 指针网络, 自然语言生成, 结构信息, 复制机制

Abstract: The copy mechanisms of the existing code comment generation methods do not consider the complex and varying grammar structures of source code,resulting in low copy accuracy and low robustness.This paper reconstructs the pointer network to make it support structured data input,and proposes a new grammar-aided copy mechanism for automatic comment generation.The mechanism consists of two parts:node filtering strategy and de-redundant generation strategy.Node filtering strategy that introduces masking variables to filter invalid type nodes based on grammatical information,which reduces the learning cost of complex grammar in pointer networks.De-redundant generation strategy that dynamically adjusts the node probability based on the time window,which solves the problem of missing key information in the automatically generated comment.Experimental results show that compared with baseline methods,the proposed method improves BLEU by 14.5%,ROUGE-2 by 10.3% and ROUGE-L by 5.5% on the WikiSQL dataset,and improves BLEU by 2.8%,ROUGE-2 by 6.6% and ROUGE-L by 2.5% on the ATIS dataset.The results verify the effectiveness of the mechanism and the necessity of introducing grammatical information.

Key words: code comment generation, pointer network, natural language generation, structured information, copy mechanism

中图分类号:

TP312

许柏炎, 蔡瑞初, 梁智豪. 一种用于代码注释自动生成的语法辅助复制机制[J]. 计算机工程, 2021, 47(4): 92-99.

XU Boyan, CAI Ruichu, LIANG Zhihao. A Grammar-Aided Copy Mechanism for Automatic Code Comment Generation[J]. Computer Engineering, 2021, 47(4): 92-99.

https://www.ecice06.com/CN/Y2021/V47/I4/92

图/表 9

20210425165344

20210425165346

20210425165411

20210425165414

20210425165418

20210425165421

20210425165424

20210425165427

20210425165436

参考文献

[1] BEN-NUN T,JAKOBOVITS A S,HOEFLER T.Neural code comprehension:a learnable representation of code semantics[C]//Proceedings of the 32nd International Con-ference on Neural Information Processing Systems.New York,USA:ACM Press,2018:3585-3597.
[2] ALLAMANIS M,TARLOW D,GORDON A,et al.Bimodal modelling of source code and natural language[C]//Proceedings of the 32nd International Conference on Machine Learning.New York,USA:ACM Press,2015:2123-2132.
[3] LIU Yang.Recent advances in neural machine translation[J].Journal of Computer Research and Development,2017,54(6):1144-1149.(in Chinese)刘洋.神经机器翻译前沿进展[J].计算机研究与发展,2017,54(6):1144-1149.
[4] SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2014:3104-3112.
[5] WU Renshou,WANG Hongling,WANG Zhongqing,et al.Short text summary generation with global self-matching mechanism[J].Journal of Software,2019,30(9):2705-2717.(in Chinese)吴仁守,王红玲,王中卿,等.全局自匹配机制的短文本摘要生成方法[J].软件学报,2019,30(9):2705-2717.
[6] MING Tuosiyu,CHEN Hongchang,HUANG Ruiyang,et al.Semantic subgraph predictive summary algorithm based on weighted AMR graph[J].Computer Engineering, 2018,44(10):292-297,302.(in Chinese)明拓思宇,陈鸿昶,黄瑞阳,等.基于加权AMR图的语义子图预测摘要算法[J].计算机工程,2018,44(10):292-297,302.
[7] GAMBHIR M,GUPTA V.Recent automatic text summariza-tion techniques:a survey[J].Artificial Intelligence Review,2017,47(1):1-66.
[8] VINYALS O,FORTUNATO M,JAITLY N.Pointer networks[EB/OL].[2019-12-20].https://arxiv.org/pdf/1506.03134.pdf.
[9] TAI K S,SOCHER R,MANNING C D.Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.New York,USA:ACM Press,2015:1556-1566.
[10] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[11] MOVSHOVITZ-ATTIAS D,COHEN W.Natural language models for predicting programming comments[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2013:35-40.
[12] IYER S,KONSTAS I,CHEUNG A,et al.Summarizing source code using a neural attention model[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2016:2073-2083.
[13] ALLAMANIS M,PENG H,SUTTON C.A convolu-tional attention network for extreme summarization of source code[C]//Proceedings of International Conference on Machine Learning.New York,USA:ACM Press,2016:2091-2100.
[14] WAN Yao,ZHAO Zhou,YANG Min,et al.Improving auto-matic source code summarization via deep reinforcement learning[C]//Proceedings of the 33rd ACM/IEEE Inter-national Conference on Automated Software Engineering.New York,USA:ACM Press,2018:397-407.
[15] HU Xing,LI Ge,XIA Xin,et al.Deep code comment generation[C]//Proceedings of the 26th Conference on Program Comprehension.New York,USA:ACM Press,2018:200-210.
[16] XU Kun,WU Lingfei,WANG Zhiguo,et al.Graph2Seq:graph to sequence learning with attention-based neural networks[EB/OL].[2019-12-20].https://arxiv.org/pdf/1804.00823.pdf.
[17] SEE A,LIU P J,MANNING C D.Get to the point:summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2017:1073-1083.
[18] GU Jiatao,LU Zhengdong,LI Hang,et al.Incorporating copying mechanism in sequence-to-sequence learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2016:1631-1640.
[19] WANG L,BLUNSOM P,GREFENSTETTE E,et al.Latent predictor networks for code generation[C]//Proceedings of the 54th Annual Meeting of the Association for Computa-tional Linguistics.New York,USA:ACM Press,2016:599-609.
[20] LIANG Y D,ZHU K Q.Automatic generation of text descriptive comments for code blocks[EB/OL].[2019-12-20].http://www.cs.sjtu.edu.cn/~kzhu/papers/kzhu-aaai18-code.pdf.
[21] LUONG M T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[C]//Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing.Washington D.C.,USA:IEEE Press,2015:1412-1421.
[22] ZHONG V,XIONG C M,SOCHER R.Seq2SQL:generating structured queries from natural language using reinforcement learning[EB/OL].[2019-12-20].https://arxiv.org/pdf/1709.00103.pdf.
[23] DONG L,LAPATA M.Language to logical form with neural attention[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2016:33-47.
[24] YIN P C,NEUBIG G.A syntactic neural model for general-purpose code generation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2017:440-458.
[25] RABINOVICH M,STERN M,KLEIN D.Abstract syntax networks for code generation and semantic parsing[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2017:1139-1153.
[26] ERIGUCHI A,HASHIMOTO K,TSURUOKA Y.Tree-to-sequence attentional neural machine translation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2016:823-846.
[27] PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.New York,USA:ACM Press,2002:311-318.
[28] LIN C Y.Rouge:a package for automatic evaluation of summaries[EB/OL].[2019-12-20].https://www.aclweb.org/anthology/W04-1013.pdf.
[29] KINGMA D P,BA J.Adam:a method for stochastic optimization[EB/OL].[2019-12-20].https://arxiv.org/pdf/1412.6980.pdf.
[30] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL].[2019-12-20].https://arxiv.org/pdf/1301.3781.pdf.
[31] PENNINGTON J,SOCHER R,MANNING C.Glove:global vectors for word representation[EB/OL].[2019-12-20].https://nlp.stanford.edu/pubs/glove.pdf.
[32] GLOROT X,BENGIO Y.Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics.New York,USA:ACM Press,2010:249-256.

选择文件类型/文献管理软件名称

选择包含的内容