Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

API usage constraint knowledge construction based on large language model

  

  • Published:2025-04-11

基于大语言模型的API使用约束知识构建

Abstract: API usage constraints are conditions or restrictions that developers must follow when invoking APIs to ensure correct usage and prevent misuse. API documentation serves as an important source for extracting these constraints. Existing NLP-based methods for extracting API usage constraints often rely on syntactic patterns but have limited ability to handle complex coordinated sentences and impose strict requirements on syntactic structures. To address these issues, this paper proposes an API usage constraint knowledge extraction method based on large language models, referred to as AUCK. AUCK first preprocesses Java API documentation and extracts sentences containing API usage constraints. Then, it summarizes syntactic patterns of coordinated sentences and designs corresponding cases to guide the large language model in decomposing coordinated sentences into simple sentences. Finally, it summarizes syntactic patterns of triplets and designs cases to guide the large language model in extracting API usage constraint triplets. Experimental results on Java API documentation show that AUCK achieves an accuracy of 92.23% and a recall of 93.14%, significantly outperforming existing methods, including DRONE (accuracy: 80.61%, recall: 86.81%), the mainstream triplet extraction tool OpenIE (accuracy: 76.92%, recall: 52.63%), and the large language model ChatGPT-3.5 (accuracy: 82.23%, recall: 67.71%). In addition, applying AUCK to Android and Python API documentation verifies its good transferability.

摘要: API使用约束是开发者在调用API时必须遵守的条件或限制,以确保正确使用并避免API误用。API文档是提取这些约束的重要来源。现有的基于自然语言处理的API使用约束提取方法通常依赖于句法模式,但对复杂并列句的处理能力有限,且对语法模式要求严格。为此,本文提出了一种基于大语言模型的API使用约束知识提取方法,记为AUCK。AUCK首先对Java API文档进行预处理,提取包含API使用约束的句子;其次,总结并列句的句法模式并设计相应案例,指导大语言模型将并列句拆分为简单句;最后,针对简单句总结出三元组句法模式,并设计案例指导大语言模型提取API使用约束三元组。实验结果表明,在Java API文档上,AUCK的准确率和召回率分别达92.23%和93.14%,显著优于现有方法DRONE(准确率80.61%、召回率86.81%)、主流三元组提取工具OpenIE(准确率76.92%、召回率52.63%)以及大语言模型ChatGPT-3.5(准确率82.23%、召回率67.71%)。此外,将AUCK应用于Android和Python API文档的实验验证了其良好的迁移能力。