[1]NAVEED H, KHAN A U, QIU S, et al. A comprehensive overview of large language models [J]. arXiv preprint arXiv:2307.06435, 2023.
[2]车万翔, 窦志成, 冯岩松, et al. 大模型时代的自然语言处理: 挑战、机遇与发展[J]. 中国科学: 信息科学, 2023, 53: 1645–1687.
Che W X, Dou Z C, Feng Y S, et al. Towards a comprehensive understanding of the impact of large language modelson natural language processing: challenges, opportunities and future directions (in Chinese). Sci Sin Inform, 2023, 53:1645–1687.
[3]JIANG J Y, WANG F, SHEN J S, et al. A survey on large language models for code generation [J]. arXiv preprint arXiv:2406.00515, 2024.
[4]韩雪雯 , 车尚锟 , 杨梦晴 , et al. 多模态数据驱动的 AI 智能体模式设计 [J]. 图书情报工作 , 2024, 68(24): 27-37.
Han X W, Che S K, Yang M Q, et al. Design of AI Agent Models Driven by Multimodal Data[J]. Libraryand Information Service, 2024, 68(24): 27-37.
[5]YUE M R. A survey of large language model agents for question answering [J]. arXiv preprint arXiv:2503.19213, 2025.
[6]SMIT A P, DUCKWORTH P, GRINSZTAJN N, et al. Are we going MAD? Benchmarking multi-agent debate between language models for medical Q&A [J]. arXiv preprint arXiv:2311.17371, 2023.
[7]王文晟, 谭宁, 黄凯, et al. 基于大模型的具身智能系统综述[J]. 自动化学报, 2025, 51(1): 1−19.
Wang W S, Tan N, Huang K, et al. Embodied Intelligence Systems Based on Large Models: A Survey [J]. Acta Autom Sinica, 2025, 51(1): 1−19.
[8]DRIESS D, XIA F, SAJJADI M S M, et al. PaLM-E: An embodied multimodal language model [C]//Proceedings of the 40th International Conference on Machine Learning. 2023: 8469-8488.
[9]JIN H L, HUANG L H, CAI H P, et al. From LLMs to LLM-based agents for software engineering: A survey of current, challenges and future [J]. arXiv preprint arXiv:2408.02479, 2024.
[10]LEMIEUX C, INALA J P, LAHIRI S K, et al. CodaMosa: Escaping coverage plateaus in test generation with pre-trained large language models [C]//Proceedings of the 45th International Conference on Software Engineering. 2023: 919-931.
[11]LI Z L, XU S Y, MEI K, et al. AutoFlow: Automated workflow generation for large language model agents [J]. arXiv preprint arXiv:2407.12821, 2024.
[12]ZENG Z, WATSON W, CHO N, et al. FlowMind: Automatic workflow generation with LLMs [J]. arXiv preprint arXiv:2404.13050, 2024.
[13]FAN S D, CONG X, FU Y P, et al. WorkflowLLM: Enhancing workflow orchestration capability of large language models [C]//Proceedings of the 13th International Conference on Learning Representations. 2025.
[14]YE Y N, CONG X, TIAN S Z, et al. ProAgent: From robotic process automation to agentic process automation [J]. arXiv preprint arXiv:2311.10751, 2023.
[15]HONG S R, ZHUGE M C, CHEN J, et al. MetaGPT: Meta programming for a multi-agent collaborative framework [C]//Proceedings of the 12th International Conference on Learning Representations. 2024.
[16]ZHANG J Y, XIANG J Y, YU Z Y, et al. AFlow: Automating agentic workflow generation [C]//Proceedings of the 13th International Conference on Learning Representations. 2025.
[17]MADAAN A, TANDON N, GUPTA P, et al. Self-refine: Iterative refinement with self-feedback [C]//Proceedings of the 37th Conference on Neural Information Processing Systems. 2023.
[18]ZHANG D, HUANG X S, ZHOU D Z, et al. Accessing GPT-4 level mathematical olympiad solutions via Monte Carlo tree self-refine with LLaMa-3 8B [J]. arXiv preprint arXiv:2406.07394, 2024.
[19]COBBE K, KOSARAJU V, BAVARIAN M, et al. Training verifiers to solve math word problems [J]. arXiv preprint arXiv:2110.14168, 2021.
[20]HENDRYCKS D, BURNS C, KADAVATH S, et al. Measuring mathematical problem solving with the MATH dataset [C]//Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2021.
[21]YANG Z L, QI P, ZHANG S Z, et al. HotpotQA: A dataset for diverse, explainable multi-hop question answering [C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 2369-2380.
[22]DUA D, WANG Y Z, DASIGI P, et al. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs [C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 2368-2378.
[23]CHEN M, TWOREK J, JUN H W, et al. Evaluating large language models trained on code [J]. arXiv preprint arXiv:2107.03374, 2021.
[24]AUSTIN J, ODENA A, NYE M I, et al. Program synthesis with large language models [J]. arXiv preprint arXiv:2108.07732, 2021.
[25]WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models [C]//Proceedings of the 36th Conference on Neural Information Processing Systems. 2022: 24824-24837.
[26]WANG X Z, WEI J, SCHUURMANS D, et al. Self-consistency improves chain of thought reasoning in language models [C]//Proceedings of the 11th International Conference on Learning Representations. 2023.
[27]XU X H, TAO C Y, SHEN T, et al. Re-reading improves reasoning in large language models [C]//Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024: 15549-15575.
[28]NORI H, LEE Y T, ZHANG S, et al. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine [J]. arXiv preprint arXiv:2311.16452, 2023.
|