Integrating Large Language Models with Hybrid Strategies for Geometry Problem Understanding

doi:10.19678/j.issn.1000-3428.0253426

Abstract

Abstract: Problem understanding is a critical prerequisite for achieving automated geometric theorem proving. However, existing approaches commonly suffer from excessive reliance on feature engineering and limited generalization capabilities, making them inadequate for effectively supporting automated problem solving. To address this challenge, this paper proposes a large language model-based method for geometric problem understanding by fine-tuning the Qwen2.5 base model and integrating chain-of-thought reasoning with k-nearest neighbor (KNN) retrieval-augmented generation. Furthermore, to enhance the accuracy of semantic translation, we introduce an agent-based hallucination detection and correction mechanism, which significantly mitigates hallucination issues during problem understanding. Experimental results demonstrate that the proposed method achieves an accuracy of 88.85% and a recall of 89.12% on the intent understanding task of the self-constructed dataset, significantly outperforming the baseline model. On the Geometry3K dataset, it attains an accuracy of 94.86% and a recall of 94.18%, exhibiting superior performance compared to the Inter-GPS method. Additionally, comprehensive ablation studies and comparative analyses under various parameter configurations further validate the superior performance and adaptability of our multi-strategy hybrid approach.

摘要： 题意理解是实现几何自动证明的关键前提。然而，现有方法普遍存在对特征工程依赖过重、泛化能力有限等问题，难以有效支撑自动解题的需求。针对这一挑战，本文在微调Qwen2.5基座模型的基础上，结合思维链推理与K近邻检索增强技术，提出了一种基于大语言模型的几何题意理解方法。为进一步提升语义翻译的准确性，本文还引入了一种基于智能体的幻觉检测与纠错机制，以缓解题意理解过程中的幻觉问题。实验结果表明，该方法在自建数据集上的准确率与召回率分别达到88.85%和89.12%，性能显著优于多种基线模型；在公开基准Geometry3K上的准确率与召回率分别为94.86%与94.18%，同样优于Inter-GPS等现有方法。此外，通过系统的消融实验与多参数配置对比分析，进一步验证了所提出的多策略融合方法在性能与适应性方面的优越性。

WANG Shengming, YANG Weiwei , MA Yan, CHEN Mao. Integrating Large Language Models with Hybrid Strategies for Geometry Problem Understanding[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253426.

王胜明, 杨威威, 马燕, 陈矛. 基于大语言模型和混合策略的几何题意理解[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253426.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0253426

References

[1] Facciaroni L, Gambini A, Mazza L. The difficulties in geometry: A quantitative analysis based on results of mathematics competitions in Italy[J]. European Journal of Scienceand Mathematics Education, 2023, 11(2): 259-270.
[2] Zhang D, Wang L, Zhang L, et al. The gap of semantic parsing: A survey on automatic math word problem solvers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(9): 2287-2305.
[3] Gan W, Yu X, Wang M. Automatic understanding and formalization of plane geometry proving problems in natural language: A supervised approach[J]. International Journal on Artificial Intelligence Tools, 2019, 28(4): 1940003.
[4] Chesani F, Mello P. Milano M. Solving mathematical puzzles: A challenging competition for AI[J]. AI Magazine, 2017, 38(3): 83-96.
[5] Danesi M, Neuman Y. AI and mathematical understanding: ChatGPT goes to school[C]//Mathematics and Educationin an AI Era: Cognitive Science, Technological, and Semiotic Perspectives 2025, 2025: 37-50.
[6] Lu P, Gong R, Jiang S, et al. Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 6774-6786.
[7] Wong W K, Hsu S C, Wu S H, et al. LIM-G: Learner-initiating instruction model based on cognitive knowledge for geometry word problem comprehension[J]. Computers &Education, 2007, 48(4): 582-601.
[8] Zhang X, Zhang C, Sun J, et al. Eduplanner: Llm-based multi-agent systems for customized and intelligent instructional design[J]. IEEE Transactions on Learning Technologies, 2025, 18: 416-427.
[9] Zhong Q, Wang K, Xu Z, et al. Achieving> 97% on gsm8k: Deeply understanding the problems makes llms better solvers for math word problems[J]. Frontiers of ComputerScience, 2026, 20(1):1-3.
[10] Yao B, Chen G, Zou R, et al. More samples or moreprompts? exploring effective few-shot in-context learning for LLMs with in-context sampling[C]//Findings of the Association for Computational Linguistics: NAACL 2024, 2024: 1772-1790.
[11] 吴春志, 赵玉龙, 刘鑫, 等. 大语言模型微调方法研究综述[J]. 中文信息学报, 2025, 39(2): 1-26. (Wu Chunzhi, Zhao Yulong, Liu Xin, et al. Fine tuning methods for large language models: A survey[J]. Journal of Chinese Information Processing, 2025, 39(2): 1-26.)
[12] Zhang S, Li X, Zong M, et al. Learning k for knn classification[J]. ACM Transactions on Intelligent Systems and Technology, 2017, 8(3):1-9.
[13] Stechly K, Valmeekam K, Kambhampati S. Chain of thoughtlessness? an analysis of cot in planning[J]. Advancesin Neural Information Processing Systems, 2024, 37: 29106-29141.
[14] Arslan M, Ghanem H, Munawar S, et al. A survey on RAG with LLMs[J]. Procedia computer science, 2024, 246: 3781-3790.
[15] 黄昌勤, 钟益华, 王希哲,等. 从单智能体到多智能体：大模型智能体支持下的激励型学习活动设计与实证研究[J].华东师范大学学报(教育科学版), 2025, 43(5): 44-56. (HuangChangqin, Zhong Yihua, Wang Xizhe, et al. From single-agent to multi-agent: motivational learning activities design and empirical study supported by LLM-based agents[J]. Journal of East China Normal University Educational Sciences,2025, 43(5):44-56.)
[16] Lv K, Yang Y, Liu T, et al. Full parameter fine-tuning for large language models with limited resources[C]//Proceedings of the 62nd Annual Meeting of the Association forComputational Linguistics, 2024, 1: 8187–8198.
[17] Gao D, Ma Y, Liu S, et al. FashionGPT: LLM instruction fine-tuning with multiple LoRA-adapter fusion[J]. Knowledge-Based Systems, 2024, 299:112043.
[18] Zhang B, Chang K, Li C. Simple techniques for enhancing sentence embeddings in generative language models[C]//International Conference on Intelligent Computing. Singapore: Springer Nature Singapore, 2024: 52-64.
[19] Sriramanan G, Bharti S, Sadasivan VS, et al. Llm-check: Investigating detection of hallucinations in large language models[J]. Advances in Neural Information Processing Systems, 2024, 37: 34188-34216.
[20] Guo H Y, Liu Q T, Chen M, et al. Research for facing the natural language of the geometry drawing[J]. Computer Science, 2012, 39(6A): 503-506.
[21] Gan W B, Yu X G, Sun C, et al. Understanding plane geometry problems by integrating relations extracted fromtext and diagram[C]//Pacific-Rim Symposium on Image and Video Technology, 2017: 366-381.
[22] Ganglmayr I, Kovács Z. Using Java Geometry Expertas guide in the preparations for math contests[J]. arXiv preprint arXiv:2401.13704, 2024. [23] Uwurukundo M S, Maniraho J F, Tusiime M, et al. GeoGebra software in teaching and learning geometry of 3-dimension to improve students’ performance and attitude ofsecondary school teachers and students[J]. Education and Information Technologies, 2024, 29(8):10201-10223.
[24] Gardazi N M, Daud A, Malik M K, et al. BERT applications in natural language processing: a review[J]. Artificial Intelligence Review, 2025, 58(6):1-49.
[25] Bai S, Chen K, Liu X, et al. Qwen2.5-vl technical report[J]. arXiv preprint arXiv:2502.13923, 2025.
[26] Zheng Y, Zhang R, Zhang J, et al. Llamafactory: Unified efficient fine-tuning of 100+ language models[J]. arXivpreprint arXiv:2403.13372, 2024.
[27] Luo Y, Yang Z, Meng F, et al. An empirical study ofcatastrophic forgetting in large language models during continual fine-tuning[J]. IEEE Transactions on Audio, Speech and Language Processing, 2025, 33: 3776-3786.
[28] Liu J, Shen D, Zhang Y, et al. What makes good in-context examples for GPT-3?[C]//Proceedings of deep learning inside out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, 2022: 100-114.
[29] 刘泽垣, 王鹏江, 宋晓斌, 等. 大语言模型的幻觉问题研究综述[J]. 软件学报, 2025, 36(3): 1152-1185. (Liu Zeyuan, Wang Pengjiang, Song Xiaobin, et al. Surveyon Hallucinations in Large Language Models [J]. Journal of Software,2025, 36(3): 1152-1185.)
[30] Lu P, Gong R, Jiang S, et al. Theorem-aware geometry problem solving with symbolic reasoning and theorem prediction[C]// The 35th Conference on Neural Information Processing Systems Workshop on Math AI for Education (MATHAI4ED), 2021.
[31] Liu A, Mei A, Lin B, et al. Deepseek-v3. 2: Pushing the frontier of open large language models [J]. arXiv preprint arXiv:2512.02556, 2025.
[32] Bai S, Cai Y, Chen R, et al. Qwen3-vl technical report [J]. arXiv preprint arXiv:2511.21631, 2025.
[33] Yang A, Zhang B, Hui B, et al. Qwen2.5-math technical report: Toward mathematical expert model via self-improvement [J]. arXiv preprint arXiv:2409.12122, 2024.

Please choose a citation manager

Content to export