| 1 |
|
| 2 |
ZHANG B, HADDOW B, BIRCH A. Prompting large language model for machine translation: a case study[C]//Proceedings of the 40th International Conference on Machine Learning. Washingotn D.C., USA: IEEE Press, 2023: 41092-41110.
|
| 3 |
MALLEN A, ASAI A, ZHONG V, et al. When not to trust language models: investigating effectiveness of parametric and non-parametric memories[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2023: 9802-9822.
|
| 4 |
ZHANG Y, LI Y F, CUI L Y, et al. Siren's song in the AI ocean: a survey on hallucination in large language models[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2309.01219.
|
| 5 |
HUANG L , YU W J , MA W T , et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 2025, 43 (2): 1- 55.
|
| 6 |
|
| 7 |
IZACARD G , LEWIS P , LOMELI M , et al. Atlas: few-shot learning with retrieval augmented language models. Journal of Machine Learning Research, 2023, 24 (1): 11912- 11954.
|
| 8 |
KOMEILI M, SHUSTER K, WESTON J. Internet-augmented dialogue generation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2022: 8460-8478.
|
| 9 |
BOIKO D A , MACKNIGHT R , KLINE B , et al. Autonomous chemical research with large language models. Nature, 2023, 624 (7992): 570- 578.
doi: 10.1038/s41586-023-06792-0
|
| 10 |
LI Y S, LIAO N Y, ZHANG G H, et al. Unmanned vehicle formation control based on Large language model[C]//Proceedings of the 12th China Conference on Command and Control. Singapore: Springer, 2024: 329-339.
|
| 11 |
DU Y, WEI F Y, ZHANG H Y. AnyTool: self-reflective, hierarchical agents for large-scale API calls[C]//Proceedings of the 41st International Conference on Machine Learning. Washingotn D.C., USA: IEEE Press, 2024: 11812-11829.
|
| 12 |
SCHICK T, DWIVEDI-YU J, DESSÌ R, et al. Toolformer: language models can teach themselves to use tools[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans, USA: [s. n.], 2023: 68539-68551.
|
| 13 |
WANG Z R, CHENG Z J, ZHU H, et al. What are tools anyway? A survey from the language model perspective[C]//Proceedings of the 1st Conference on Language Modeling. Philadelphia, USA: Academic Press, 2024: 1-15.
|
| 14 |
|
| 15 |
QIN Y J , HU S D , LIN Y K , et al. Tool learning with foundation models. ACM Computing Surveys, 2025, 57 (4): 1- 40.
|
| 16 |
QU C L , DAI S H , WEI X C , et al. Tool learning with large language models: a survey. Frontiers of Computer Science, 2025, 19 (8): 198343.
doi: 10.1007/s11704-024-40678-2
|
| 17 |
|
| 18 |
|
| 19 |
SHEN Y L, SONG K T, TAN X, et al. HuggingGPT: solving AI tasks with chatGPT and its friends in hugging face[C]// Proceedings of the 37th International Conference on Neural Information Processing Systems. Washington D.C., USA: IEEE Press, 2023: 38154-38180.
|
| 20 |
SINGHAL K , AZIZI S , TU T , et al. Large language models encode clinical knowledge. Nature, 2023, 620 (7972): 172- 180.
doi: 10.1038/s41586-023-06291-2
|
| 21 |
|
| 22 |
YUAN S Y, SONG K T, CHEN J J, et al. EASYTOOL: enhancing LLM-based agents with concise tool instruction[C]// Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2025: 951-972.
|
| 23 |
|
| 24 |
KONG Y L, RUAN J Q, CHEN Y H, et al. TPTU-v2: boosting task planning and tool usage of large language model-based agents in real-world industry systems[C]//Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track. Stroudsburg, USA: ACL Press, 2024: 371-385.
|
| 25 |
GAO T Y, YAO X C, CHEN D Q. SimCSE: simple contrastive learning of sentence embeddings[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL Press, 2021: 6894-6910.
|
| 26 |
XU Y, FENG Y L, MU H L, et al. Concise and precise context compression for tool-using language models[C]//Proceedings of the Findings of the Association for Computational Linguistics ACL 2024. Stroudsburg, USA: ACL Press, 2024: 16430-16441.
|
| 27 |
WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2022: 24824-24837.
|
| 28 |
GLAESE A, MCALEESE N, TREBACZ M, et al. Improving alignment of dialogue agents via targeted human judgements[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2209.14375.
|
| 29 |
ZHANG Y E, CAI H, SONG X R, et al. Reverse chain: a generic-rule for LLMs to master multi-API planning[C]//Proceedings of the Findings of the Association for Computational Linguistics. Stroudsburg, USA: ACL Press, 2024: 302-325.
|
| 30 |
SHIN T, RAZEGHI Y, LOGAN R L, et al. AutoPrompt: eliciting knowledge from language models with automatically generated prompts[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, USA: ACL Press, 2020: 4222-4235.
|
| 31 |
REYNOLDS L, MCDONELL K. Prompt programming for large language models: beyond the few-shot paradigm[C]//Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. New York, USA: ACM Press, 2021: 1-7.
|
| 32 |
|
| 33 |
|
| 34 |
LEWIS P , PEREZ E , PIKTUS A , et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 2020, 33, 9459- 9474.
|
| 35 |
SUN H, LIU X, GONG Y Y, et al. Allies: prompting large language model with beam search[C]//Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, USA: ACL Press, 2023: 3794-3805.
|
| 36 |
TRIVEDI H, BALASUBRAMANIAN N, KHOT T, et al. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2023: 10014-10037.
|
| 37 |
FAN W Q, DING Y J, NING L B, et al. A survey on RAG meeting LLMs: towards retrieval-augmented large language models[C]//Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2024: 6491-6501.
|
| 38 |
VU T, IYYER M, WANG X Z, et al. FreshLLMs: refreshing large language models with search engine augmentation[C]//Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024. Stroudsburg, USA: ACL Press, 2024: 13697-13720.
|
| 39 |
LIU X, LAI H Y, YU H, et al. WebGLM: towards an efficient Web-enhanced question answering system with human preferences[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2023: 4549-4560.
|
| 40 |
PRESS O, ZHANG M R, MIN S, et al. Measuring and narrowing the compositionality gap in language models[C]//Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, USA: ACL Press, 2023: 5687-5711.
|
| 41 |
BORGEAUD S, MENSCH A, HOFFMANN J, et al. Improving language models by retrieving from trillions of tokens[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2112.04426.
|
| 42 |
ZHOU X H , SUN Z Y , LI G L . DB-GPT: large language model meets database. Data Science and Engineering, 2024, 9 (1): 102- 111.
doi: 10.1007/s41019-023-00235-6
|
| 43 |
GU Y, SHU Y H, YU H, et al. Middleware for LLMs: tools are instrumental for language agents in complex environments[C]//Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL Press, 2024: 7646-7663.
|
| 44 |
|
|
GAO J W, ZHAO S T, HUANG N B. Research and application of group intelligence emergency decision making method based on large language model[J/OL]. Computer Engineering: 1-16[2025-09-07]. https://doi.org/10.19678/j.issn.1000-3428.0252386. (in Chinese)
|
| 45 |
AHN J, VERMA R, LOU R Z, et al. Large language models for mathematical reasoning: progresses and challenges[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop. Stroudsburg, USA: ACL Press, 2024: 225-237.
|
| 46 |
CHEN W, WANG Q S, LONG Z F, et al. DISC-FinLLM: a Chinese financial large language model based on multiple experts fine-tuning[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2310.15205.
|
| 47 |
XUE S Q, ZHOU F, XU Y, et al. WeaverBird: empowering financial decision-making with large language model, knowledge base, and search engine[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2308.05361.
|
| 48 |
|
|
FENG G P, CHEN Z J, LIN Z Y, et al. Term recognition in the electric power domain based on dynamic domain graphs and collaborative large and small models[J/OL]. Computer Engineering: 1-18[2025-09-07]. https://doi.org/10.19678/j.issn.1000-3428.0252291. (in Chinese)
|
| 49 |
HE-YUEYA J, POESIA G, WANG R, et al. Solving math word problems by combining language models with symbolic solvers[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2304.09102.
|
| 50 |
LÜ Q, HAVALDAR S, STEIN A, et al. Faithful chain-of-thought reasoning[C]//Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2023: 305-329.
|
| 51 |
WU Y R, JIA F R, ZHANG S K, et al. MathChat: converse to tackle challenging math problems with LLM agents[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2306.01337.
|
| 52 |
|
| 53 |
QIN Y J, CAI Z H, JIN D, et al. WebCPM: interactive Web search for Chinese long-form question answering[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2023: 8968-8988.
|
| 54 |
|
| 55 |
|
| 56 |
HASSIJA V , CHAMOLA V , MAHAPATRA A , et al. Interpreting black-box models: a review on explainable artificial intelligence. Cognitive Computation, 2024, 16 (1): 45- 74.
doi: 10.1007/s12559-023-10179-8
|
| 57 |
YANG D K , WEI J J , XIAO D L , et al. PediatricsGPT: large language models as Chinese medical assistants for pediatric applications. Advances in Neural Information Processing Systems, 2024, 37, 138632- 138662.
|
| 58 |
|
| 59 |
AHMADI A, SHARIF S, BANAD Y M. Towards transparent artificial intelligence: exploring modern approaches and future directions[C]//Proceedings of the Conference on AI, Science, Engineering, and Technology (AIxSET). Washington D.C., USA: IEEE Press, 2024: 248-251.
|
| 60 |
SURÍS D, MENON S, VONDRICK C. ViperGPT: visual inference via Python execution for reasoning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 11854-11864.
|
| 61 |
GAO J T, CHEN B, ZHAO X Y, et al. LLM4Rerank: LLM-based auto-reranking framework for recommendations[C]//Proceedings of the ACM on Web Conference 2025. New York, USA: ACM Press, 2025: 228-239.
|
| 62 |
FAN S D, CONG X, FU Y P, et al. WorkflowLLM: enhancing workflow orchestration capability of large language models[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2411.05451.
|
| 63 |
|
| 64 |
WU Q Y, BANSAL G, ZHANG J Y, et al. AutoGen: enabling next-gen LLM applications via multi-agent conversation[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2308.08155.
|
| 65 |
|
| 66 |
HSIEH C Y, CHEN S A, LI C L, et al. Tool documentation enables zero-shot tool-usage with large language models[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2308.00675.
|
| 67 |
JIN Q , YANG Y F , CHEN Q Y , et al. GeneGPT: augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics, 2024, 40 (2): 075.
|
| 68 |
LIANG Y B , WU C F , SONG T , et al. TaskMatrix.AI: completing tasks by connecting foundation models with millions of APIs. Intelligent Computing, 2024, 3, 63.
doi: 10.34133/icomputing.0063
|
| 69 |
|
| 70 |
|
| 71 |
GAO L Y, MADAAN A, ZHOU S Y, et al. PAL: program-aided language models[C]//Proceedings of the 40th International Conference on Machine Learning. Washington D.C., USA: IEEE Press, 2023: 10764-10799.
|
| 72 |
LIN Q Q, WEN M N, PENG Q Y, et al. Robust function-calling for on-device language model via function masking[C]//Proceedings of the 13th International Conference on Learning Representations. Washington D.C., USA: IEEE Press, 2025: 1-9.
|
| 73 |
CHEN Z P, ZHOU K, ZHANG B C, et al. ChatCoT: tool-augmented chain-of-thought reasoning on chat-based large language models[C]//Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, USA: ACL Press, 2023: 14777-14790.
|
| 74 |
CHEN W H, MA X G, WANG X Y, et al. Program of thoughts prompting: disentangling computation from reasoning for numerical reasoning tasks[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2211.12588.
|
| 75 |
INABA T, KIYOMARU H, CHENG F, et al. MultiTool-CoT: GPT-3 can use multiple external tools with chain of thought prompting[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, USA: ACL Press, 2023: 1522-1532.
|
| 76 |
LU J R, HOLLEIS T, ZHANG Y Z, et al. ToolSandbox: a stateful, conversational, interactive evaluation benchmark for LLM tool use capabilities[C]//Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025. Stroudsburg, USA: ACL Press, 2025: 1160-1183.
|
| 77 |
YE J J, WU Y L, LI S X, et al. TL-Training: a task-feature-based framework for training large language models in tool use[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2412.15495.
|
| 78 |
HAO B G, XU Z Z, WANG M L, et al. FunReason: enhancing large language models' function calling via self-refinement multiscale loss and automated data refinement[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2505.20192.
|
| 79 |
LI M H, ZHAO Y X, YU B W, et al. API-Bank: a comprehensive benchmark for tool-augmented LLMs[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL Press, 2023: 3102-3116.
|
| 80 |
WANG Z Z, ZENG X S, LIU W W, et al. ToolFlow: boosting LLM tool-calling through natural and coherent dialogue synthesis[C]//Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2025: 4246-4263.
|
| 81 |
TANG Q Y, DENG Z L, LIN H Y, et al. ToolAlpaca: generalized tool learning for language models with 3000 simulated cases[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2306.05301.
|
| 82 |
SHINN N , CASSANO F , GOPINATH A , et al. Reflexion: language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 2023, 36, 8634- 8652.
|
| 83 |
SONG Y F, XIONG W M, ZHU D W, et al. RestGPT: connecting large language models with real-world restful APIs[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2306.06624.
|
| 84 |
GAO D F, JI L, ZHOU L W, et al. AssistGPT: a general multi-modal assistant that can plan, execute, inspect, and learn[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2306.08640.
|
| 85 |
GAO S, SHI Z L, ZHU M H, et al. Confucius: iterative tool learning from introspection feedback by easy-to-difficult curriculum[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 18030-18038.
|
| 86 |
GOU Z B, SHAO Z H, GONG Y Y, et al. CRITIC: large language models can self-correct with tool-interactive critiquing[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2305.11738.
|
| 87 |
SONG X S, WU Y N, WANG W X, et al. ProgCo: program helps self-correction of large language models[C]//Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, USA: ACL Press, 2025: 944-959.
|
| 88 |
QIAO S F, GUI H H, LÜ C F, et al. Making language models better tool learners with execution feedback[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2024: 3550-3568.
|
| 89 |
MA R T, WANG P S, LIU C, et al. S2R: teaching LLMs to self-verify and self-correct via reinforcement learning[C]//Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2025: 22632-22654.
|
| 90 |
YIFEILU Y, YE F H, LI J, et al. CodeTool: enhancing programmatic tool invocation of LLMs via process supervision[C]//Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2025: 18287-18304.
|
| 91 |
HAO S B, LIU T Y, WANG Z, et al. ToolkenGPT: augmenting frozen language models with massive tools via tool embeddings[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. Stroudsburg, USA: ACL Press, 2023: 45870-45894.
|
| 92 |
|
| 93 |
DANG H, LIU T Y, WU Z F, et al. Improving large language models function calling and interpretability via guided-structured templates[C]//Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL Press, 2025: 24437-24453.
|
| 94 |
HOU X, ZHAO Y, WANG S, et al. Model Context Protocol (MCP): landscape, security threats, and future research directions[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2503.23278.
|
| 95 |
SHEN Z Y, WANG D Y, MISHRA S S, et al. SLOT: structuring the output of large language models[C]//Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track. Stroudsburg, USA: ACL Press, 2025: 472-491.
|
| 96 |
HUANG T H, JUNG D, KUMAR V, et al. Planning and editing what you retrieve for enhanced tool learning[C]//Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024. Stroudsburg, USA: ACL Press, 2024: 975-988.
|
| 97 |
LIU Z Y, LAI Z Q, GAO Z W, et al. ControlLLM: augment language models withtools by searching on graphs[C]//Proceedings of the 18th European Conference on Computer Vision. Berlin, Germany: Springer, 2025: 89-105.
|
| 98 |
CHEN H Q, ZHANG K X, LI L, et al. ToolDec: syntax error-free and generalizable tool use for LLMs via finite-state decoding[C]//Proceedings of the 3rd Workshop on Mathematical Reasoning and AI at NeurIPS'23. Berlin, Germany: Springer, 2023: 1-10.
|
| 99 |
LU Y N, YU H P, KHASHABI D. GEAR: augmenting language models with generalizable and efficient tool resolution[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2024: 112-138.
|
| 100 |
QIAN C, HAN C, FUNG Y, et al. CREATOR: tool creation for disentangling abstract and concrete reasoning of large language models[C]//Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, USA: ACL Press, 2023: 6922-6939.
|
| 101 |
YUAN L F, CHEN Y Y, WANG X Y, et al. CRAFT: customizing LLMs by creating and retrieving from specialized toolsets[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2309.17428.
|
| 102 |
|
| 103 |
WANG Z Z, NEUBIG G, FRIED D. TROVE: inducing verifiable and efficient toolboxes for solving programmatic tasks[C]//Proceedings of the 41st International Conference on Machine Learning. Washington D.C., USA: IEEE Press, 2024: 51177-51191.
|
| 104 |
QIN Y J, LIANG S H, YE Y N, et al. ToolLLM: facilitating large language models to master 16000+ Real-world APIs[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2307.16789.
|
| 105 |
HUANG Y, SHI J W, LI Y, et al. MetaTool benchmark for large language models: deciding whether to use tools and which to use[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2310.03128.
|
| 106 |
YE J J, LI G Y, GAO S Y, et al. ToolEyes: fine-grained evaluation for tool learning capabilities of large language models in real-world scenarios[C]//Proceedings of the 31st International Conference on Computational Linguistics. Washington D.C., USA: IEEE Press, 2025: 156-187.
|
| 107 |
YE J J, DU Z Y, YAO X S, et al. ToolHop: a query-driven benchmark for evaluating large language models in multi-hop tool use[C]//Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2025: 2995-3021.
|
| 108 |
GONZALEZ J, PATIL S, WANG X, et al. Gorilla: large language model connected with massive APIs[C]// Proceedings of the 38th International Conference on Neural Information Processing Systems. Vancouver, Canada: Neural Information Processing Systems Foundation, Inc., 2024: 126544-126565.
|
| 109 |
WANG P, WU Y N, WANG Z K, et al. MTU-Bench: a multi-granularity tool-use benchmark for large language models[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2410.11710.
|
| 110 |
GAO X Q, XIE S Y, ZHAI J, et al. MCP-RADAR: a multi-dimensional benchmark for evaluating tool use capabilities in large language models[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2505.16700.
|
| 111 |
ZHUANG Y C, YU Y, WANG K. ToolQA: a dataset for LLM question answering with external tools[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. Stroudsburg, USA: ACL Press, 2023: 50117-50143.
|
| 112 |
CHEN Z H, DU W H, ZHANG W W, et al. T-Eval: evaluating the tool utilization capability of large language models step by step[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: ACL Press, 2024: 9510-9529.
|
| 113 |
ZHOU P , PUJARA J , REN X , et al. Self-Discover: large language models self-compose reasoning structures. Advances in Neural Information Processing Systems, 2024, 37, 126032- 126058.
|
| 114 |
|
| 115 |
SONG C H, SADLER B M, WU J M, et al. LLM-Planner: few-shot grounded planning for embodied agents with large language models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2024: 2986-2997.
|
| 116 |
|
| 117 |
GUPTA T, KEMBHAVI A. Visual programming: compositional visual reasoning without training[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 14953-14962.
|
| 118 |
QIN Y M, WEI B M, GE J X, et al. Chain-of-Visual-Thought: teaching VLMs to see and think better with continuous visual tokens[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2511.19418.
|
| 119 |
BHARTI S , FEIZI S , KATTAKINDA P , et al. LLM-Check: investigating detection of hallucinations in large language models. Advances in Neural Information Processing Systems, 2024, 37, 34188- 34216.
|
| 120 |
LIU Z C, ZHAO C S, IANDOLA F, et al. MobileLLM: optimizing sub-billion parameter language models for on-device use cases[EB/OL]. [2025-09-07]. https://arxiv.org/abs/2402.14905.
|
| 121 |
|