Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (8): 74-85. doi: 10.19678/j.issn.1000-3428.0070623

• Research Hotspots and Reviews • Previous Articles     Next Articles

API Usage Constraint Knowledge Construction Based on Large Language Model

LIU Genhao1, ZHANG Neng2,*(), ZHENG Zibin1   

  1. 1. School of Software Engineering, Sun Yat-sen University, Zhuhai 519082, Guangdong, China
    2. School of Computer Science, Central China Normal University, Wuhan 430079, Hubei, China
  • Received:2024-11-18 Revised:2025-02-24 Online:2025-08-15 Published:2025-04-11
  • Contact: ZHANG Neng

基于大语言模型的API使用约束知识构建

刘根壕1, 张能2,*(), 郑子彬1   

  1. 1. 中山大学软件工程学院, 广东 珠海 519082
    2. 华中师范大学计算机学院, 湖北 武汉 430079
  • 通讯作者: 张能
  • 基金资助:
    国家自然科学基金(62302536); 国家自然科学基金(62032025); 广东省基础与应用基础研究基金(2023A1515012292)

Abstract:

Application Programming Interface (API) usage constraints are the conditions or restrictions that developers must follow when invoking APIs to ensure correct usage and prevent misuse. API documentation is an important tool for extracting these constraints. Existing Natural Language Processing (NLP)-based methods for extracting API usage constraints often rely on syntactic patterns, but their ability to handle complex coordinated sentences and impose strict requirements on syntactic structures is limited. To address these issues, this paper proposes an API usage constraint knowledge extraction method based on Large Language Model (LLM), referred to as AUCK. AUCK first preprocesses Java API documentation and extracts sentences containing API usage constraints. It then summarizes the syntactic patterns of coordinated sentences and designs corresponding cases to guide a LLM to decompose coordinated sentences into simple sentences. Finally, it summarizes the syntactic patterns of triplets and design cases to guide the LLM in extracting API usage constraint triplets. Experimental results on Java API documentation show that AUCK achieves an accuracy of 92.23% and recall of 93.14%, significantly outperforming existing methods, including DRONE (accuracy: 80.61%, recall: 86.81%), the mainstream triplet extraction tool OpenIE (accuracy: 76.92%, recall: 52.63%), and the large language model ChatGPT-3.5 (accuracy: 82.23%, recall: 67.71%). In addition, the application of AUCK to Android and Python API documentation verifies its good transferability.

Key words: Java API documentation, API usage constraint, Large Language Model (LLM), parallel sentence decomposition, triplet extraction, knowledge extraction

摘要:

API(Application Programming Interface)使用约束是开发者在调用API时必须遵守的条件或限制,以确保正确使用并避免API误用。API文档是提取这些约束的重要来源。现有的基于自然语言处理(NLP)的API使用约束提取方法通常依赖于句法模式,但对复杂并列句的处理能力有限,且对语法模式要求严格。为此,提出一种基于大语言模型(LLM)的API使用约束知识提取方法,记为AUCK。AUCK首先对Java API文档进行预处理,提取包含API使用约束的句子;其次,总结并列句的句法模式并设计相应案例,指导LLM将并列句拆分为简单句;最后,针对简单句总结出三元组句法模式,并设计案例指导LLM提取API使用约束三元组。实验结果表明,在Java API文档上,AUCK的准确率和召回率分别达到92.23%和93.14%,显著优于现有方法DRONE(准确率为80.61%,召回率为86.81%)、主流三元组提取工具OpenIE(准确率为76.92%,召回率为52.63%)以及大语言模型ChatGPT-3.5(准确率为82.23%,召回率为67.71%)。此外,将AUCK应用于Android和Python API文档的实验结果验证了其良好的迁移能力。

关键词: Java API文档, API使用约束, 大语言模型, 并列句拆解, 三元组提取, 知识提取