1 |
|
2 |
|
3 |
MIAO S Y, LIANG C C, SU K Y. A diverse corpus for evaluating and developing English math word problem solvers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2020: 975-984.
|
4 |
|
5 |
HOSSEINI M J, HAJISHIRZI H, ETZIONI O, et al. Learning to solve arithmetic word problems with verb categorization[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, USA: Association for Computational Linguistics, 2014: 523-533.
|
6 |
WEI T W, LUAN J, LIU W, et al. CMATH: can your language model pass Chinese elementary school math test? [EB/OL]. [2024-02-11]. http://arxiv.org/abs/2306.16636.
|
7 |
ZHONG W J, CUI R X, GUO Y D, et al. AGIEval: a human-centric benchmark for evaluating foundation models[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2304.06364.
|
8 |
RAIYAN S R, FAIYAZ M N, KABIR S M J, et al. Math word problem solving by generating linguistic variants of problem statements[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop). Stroudsburg, USA: Association for Computational Linguistics, 2023: 362-378.
|
9 |
KONCEL-KEDZIORSKI R, ROY S, AMINI A, et al. MAWPS: a math word problem repository[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, USA: Association for Computational Linguistics, 2016: 1152-1157.
|
10 |
AMINI A, GABRIEL S, LIN S, ET AL. MathQA: towards interpretable math word problem solving with operation-based formalisms[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, USA: Association for Computational Linguistics, 2019: 2357-2367.
|
11 |
QIN J H, LIANG X D, HONG Y N, et al. Neural-symbolic solver for math word problems with auxiliary tasks[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2021: 5870-5881.
|
12 |
|
13 |
|
14 |
HENDRYCKS D, BURNS C, KADAVATH S, et al. Measuring mathematical problem solving with the MATH dataset[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2103.03874.
|
15 |
ZHANG B C, ZHOU K, WEI X L, et al. Evaluating and improving tool-augmented computation-intensive math reasoning[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2306.02408.
|
16 |
|
17 |
|
18 |
|
19 |
CHEN W H, YIN M, KU M, et al. TheoremQA: a theorem-driven question answering dataset[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Singapore. Stroudsburg, USA: Association for Computational Linguistics, 2023: 7889-7901.
|
20 |
MISHRA S, FINLAYSON M, LU P, et al. LILA: a unified benchmark for mathematical reasoning[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2022: 5807-5832.
|
21 |
LU P, QIU L, CHANG K W, et al. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2209.14610.
|
22 |
ROBINSON J A. A machine-oriented logic based on the resolution principle. Journal of the ACM, 1965, 12(1): 23- 41.
doi: 10.1145/321250.321253
|
23 |
KNUTH D E, BENDIX P B. Simple word problems in universal algebras[C]//Proceedings of a Conference Held at Oxford Under the Auspices of the Science Research Council Atlas Computer Laboratory. [S. l.]: Elsevier, 2014: 263.
|
24 |
MEGILL N, WHEELER D A. MetaMath: a computer language for mathematical proofs. Anaheim, USA: Lulu Press, 2019.
|
25 |
DE MOURA L, KONG S, AVIGAD J, et al. The Lean theorem prover (system description). Berlin, Germany: Springer International Publishing, 2015.
|
26 |
PAULSON L C. Isabelle: a generic theorem prover. Berlin, Germany: Springer, 1994.
|
27 |
|
28 |
|
29 |
|
30 |
BANSAL K, LOOS S M, RABE M N, et al. HOList: an environment for machine learning of higher-order theorem proving[EB/OL]. [2024-02-11]. http://arxiv.org/abs/1904.03241.
|
31 |
|
32 |
GELERNTER H, HANSEN J R, LOVELAND D W. Empirical explorations of the geometry theorem machine[C]//Proceedings of IRE-AIEE-ACM'60. New York, USA: ACM Press, 1960: 143-149.
|
33 |
TRINH T H, WU Y H, LE Q V, et al. Solving Olympiad geometry without human demonstrations. Nature, 2024, 625(7995): 476- 482.
doi: 10.1038/s41586-023-06747-5
|
34 |
SEO M, HAJISHIRZI H, FARHADI A, et al. Solving geometry problems: combining text and diagram interpretation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2015: 1466-1476.
|
35 |
ALVIN C, GULWANI S, MAJUMDAR R, et al. Synthesis of problems for shaded area geometry reasoning. Berlin, Germany: Springer International Publishing, 2017.
|
36 |
SACHAN M, DUBEY K, XING E. From textbooks to knowledge: a case study in harvesting axiomatic knowledge from textbooks to solve geometry problems[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2017: 773-784.
|
37 |
SACHAN M, XING E. Learning to solve geometry problems from natural language demonstrations in textbooks[C]//Proceedings of the 6th Joint Conference on Lexical and Computational Semantics. Stroudsburg, USA: Association for Computational Linguistics, 2017: 251-261.
|
38 |
LU P, GONG R, JIANG S B, et al. Inter-GPS: interpretable geometry problem solving with formal language and symbolic reasoning[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2021: 6774-6786.
|
39 |
CHEN J Q, TANG J H, QIN J H, et al. GeoQA: a geometric question answering benchmark towards multimodal numerical reasoning[C]//Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Stroudsburg, USA: Association for Computational Linguistics, 2021: 513-523.
|
40 |
CHEN J Q, LI T, QIN J H, et al. UniGeo: unifying geometry logical reasoning via reformulating mathematical expression[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2022: 3313-3323.
|
41 |
LU P, BANSAL H, XIA T, et al. MathVista: evaluating mathematical reasoning of foundation models in visual contexts[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2310.02255.
|
42 |
MASRY A, LONG D, TAN J Q, et al. ChartQA: a benchmark for question answering about charts with visual and logical reasoning[C]//Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022. Stroudsburg, USA: Association for Computational Linguistics, 2022: 2263-2279.
|
43 |
KAFLE K, PRICE B, COHEN S, et al. DVQA: understanding data visualizations via question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 5648-5656.
|
44 |
CHAUDHRY R, SHEKHAR S, GUPTA U, et al. LEAF-QA: locate, encode & attend for figure question answering[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). Washington D. C., USA: IEEE Press, 2020: 3501-3510.
|
45 |
|
46 |
WANG C X, LIANG S L, ZHANG Y, et al. Does it make sense? And why? A pilot study for sense making and explanation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2019: 4020-4026.
|
47 |
TALMOR A, HERZIG J, LOURIE N, et al. CommonsenseQA: a question answering challenge targeting commonsense knowledge[EB/OL]. [2024-02-11]. http://arxiv.org/abs/1811.00937.
|
48 |
ZELLERS R, HOLTZMAN A, BISK Y, et al. HellaSwag: can a machine really finish your sentence? [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2019: 4791-4800.
|
49 |
ZHOU B, KHASHABI D, NING Q, et al. "Going on a vacation" takes longer than "Going for a walk": a study of temporal commonsense understanding[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, USA: Association for Computational Linguistics, 2019: 3363-3369.
|
50 |
GEVA M, KHASHABI D, SEGAL E, et al. Did Aristotle use a Laptop? A question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 2021, 9, 346- 361.
doi: 10.1162/tacl_a_00370
|
51 |
|
52 |
WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2201.11903.
|
53 |
SRIVASTAVA A, RASTOGI A, RAO A, et al. Beyond the imitation game: quantifying and extrapolating the capabilities of language models[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2206.04615.
|
54 |
SUZGUN M, SCALES N, SCHÄRLI N, et al. Challenging BIG-bench tasks and whether chain-of-thought can solve them[C]//Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, USA: Association for Computational Linguistics, 2023: 13003-13051.
|
55 |
LIU J, CUI L Y, LIU H M, et al. LogiQA: a challenge dataset for machine reading comprehension with logical reasoning[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence. [S. l.]: International Joint Conferences on Artificial Intelligence Organization, 2020: 3622-3628.
|
56 |
YU W H, JIANG Z H, DONG Y F, et al. ReClor: a reading comprehension dataset requiring logical reasoning[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2002.04326.
|
57 |
ZHONG W J, WANG S Y, TANG D Y, et al. Analytical reasoning of text[C]//Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022. Stroudsburg, USA: Association for Computational Linguistics, 2022: 2306-2319.
|
58 |
KASNECI E, SESSLER K, KÜCHEMANN S, et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 2023, 103, 102274.
doi: 10.1016/j.lindif.2023.102274
|
59 |
|
60 |
|
61 |
|
62 |
|
63 |
|
64 |
|
65 |
CHUNG H W, HOU L, LONGPRE S, et al. Scaling instruction-finetuned language models. Journal of Machine Learning Research, 2024, 25(70): 1- 53.
|
66 |
PENEDO G, MALARTIC Q, HESSLOW D, et al. The Refined Web dataset for Falcon LLM: outperforming curated corpora with Web data, and Web data only[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2306.01116.
|
67 |
DU Z X, QIAN Y J, LIU X, et al. GLM: general language model pretraining with autoregressive blank infilling[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2022: 320-335.
|
68 |
|
69 |
|
70 |
|
71 |
|
72 |
SUN Y, WANG S H, FENG S K, et al. ERNIE 3.0: large-scale knowledge enhanced pre-training for language understanding and generation[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2107.02137.
|
73 |
ZENG W, REN X, SU T, et al. PanGu-α: large-scale autoregressive pretrained Chinese language models with auto-parallel computation[EB/OL]. [2024-02-11]. https://arxiv.org/abs/2104.12369.
|
74 |
RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[C]//Proceedings of OSDI'04: Washington D. C., USA: IEEE Press, 2004: 137-150.
|
75 |
|
76 |
ZONG M, KRISHNAMACHARI B. Solving math word problems concerning systems of equations with GPT-3[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2023: 15972-15979.
|
77 |
WU T Y, HE S Z, LIU J P, et al. A brief overview of ChatGPT: the history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica, 2023, 10(5): 1122- 1136.
doi: 10.1109/JAS.2023.123618
|
78 |
SHAKARIAN P, KOYYALAMUDI A, NGU N, et al. An independent evaluation of ChatGPT on Mathematical Word Problems (MWP)[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2302.13814.
|
79 |
CHENG V, ZHANG Y. Analyzing ChatGPT's mathematical deficiencies: insights and contributions[C]//Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023). New York, USA: ACM Press, 2023: 188-193.
|
80 |
|
81 |
LEWKOWYCZ A, ANDREASSEN A J, DOHAN D, et al. Solving quantitative reasoning problems with language models[EB/OL]. [2024-02-11]. https://arxiv.org/abs/2206.14858.
|
82 |
|
83 |
|
84 |
ZHANG M X, WANG Z C, YANG Z C, et al. Interpretable math word problem solution generation via step-by-step planning[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2023: 6858-6877.
|
85 |
YUE X, QU X W, ZHANG G, et al. MAmmoTH: building math generalist models through hybrid instruction tuning[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2309.05653.
|
86 |
WANG Y, LIU X J, SHI S M. Deep neural solver for math word problems[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2017: 845-854.
|
87 |
LING W, YOGATAMA D, DYER C, et al. Program induction by rationale generation: learning to solve and explain algebraic word problems[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2017: 158-167.
|
88 |
HE-YUEYA J, POESIA G, WANG R E, et al. Solving math word problems by combining language models with symbolic solvers[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2304.09102.
|
89 |
CHEN W H, MA X G, WANG X Y, et al. Program of thoughts prompting: disentangling computation from reasoning for numerical reasoning tasks[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2211.12588.
|
90 |
|
91 |
|
92 |
ZHU X Y, WANG J J, ZHANG L, et al. Solving math word problems via cooperative reasoning induced language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2023: 4471-4485.
|
93 |
WU Y R, JIA F R, ZHANG S K, et al. MathChat: converse to tackle challenging math problems with LLM agents[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2306.01337.
|
94 |
PENG R, YANG C Z, HUANG L T, et al. A numeracy-enhanced decoding for solving math word problem[C]//Proceedings of CCF International Conference on Natural Language Processing and Chinese Computing. Berlin, Germany: Springer, 2023: 111-122.
|
95 |
YAO J, ZHOU Z H, WANG Q F. Solving math word problem with problem type classification[C]// Proceedings of CCF International Conference on Natural Language Processing and Chinese Computing. Berlin, Germany: Springer, 2023: 123-134.
|
96 |
|
97 |
HUANG W Q, XIAO J. A new encoder using character and word feature fusion for Chinese math word problem solving[C]//Proceedings of CCF International Conference on Natural Language Processing and Chinese Computing. Berlin, Germany: Springer, 2023: 313-324.
|
98 |
IMANI S, DU L, SHRIVASTAVA H. MathPrompter: mathematical reasoning using large language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track). Stroudsburg, USA: Association for Computational Linguistics, 2023: 37-42.
|
99 |
|
100 |
ROY S, ROTH D. Solving general arithmetic word problems[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2015: 1743-1752.
|
101 |
|
102 |
|
103 |
WAN X C, SUN R X, DAI H J, et al. Better zero-shot reasoning with self-adaptive prompting[C]//Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, USA: Association for Computational Linguistics, 2023: 3493-3514.
|
104 |
WANG L, XU W Y, LAN Y H, et al. Plan-and-solve prompting: improving zero-shot chain-of-thought reasoning by large language models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, USA: Association for Computational Linguistics, 2023: 2609-2634.
|
105 |
LIU P F, YUAN W Z, FU J L, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023, 55(9): 1- 35.
|
106 |
SHAO Z, GONG Y, SHEN Y, et al. Synthetic prompting: generating chain-of-thought demonstrations for large language models[C]//Proceedings of the 40th International Conference on Machine Learning. New York, USA: ACM Press, 2023: 30706-30775.
|
107 |
SHUM K, DIAO S Z, ZHANG T. Automatic prompt augmentation and selection with chain-of-thought from labeled data[C]//Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, USA: Association for Computational Linguistics, 2023: 12113-12139.
|
108 |
WANG X Z, WEI J, SCHUURMANS D, et al. Self-consistency improves chain of thought reasoning in language models[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2203.11171.
|
109 |
HU H X, LU H Y, ZHANG H J, et al. Chain-of-symbol prompting elicits planning in large language models[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2305.10276.
|
110 |
YAO S Y, YU D, ZHAO J, et al. Tree of thoughts: deliberate problem solving with large language models[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2305.10601.
|
111 |
MO S T, XIN M. Tree of uncertain thoughts reasoning for large language models[C]//Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D. C., USA: IEEE Press, 2024: 12742-12746.
|
112 |
NING X F, LIN Z N, ZHOU Z X, et al. Skeleton-of-thought: large language models can do parallel decoding[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2307.15337.
|
113 |
BESTA M, BLACH N, KUBICEK A, et al. Graph of thoughts: solving elaborate problems with large language models[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 17682-17690.
|
114 |
LEI B, LIN P H, LIAO C H, et al. Boosting logical reasoning in large language models through a new framework: the graph of thought[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2308.08614.
|
115 |
刘明, 吴忠明, 廖剑, 等. 大语言模型的教育应用: 原理, 现状与挑战——从轻量级BERT到对话式ChatGPT. 现代教育技术, 2023, 33 (8): 19- 28.
doi: 10.3969/j.issn.1009-8097.2023.08.003
|
|
LIU M, WU Z M, LIAO J, et al. Educational Applications of large language models: principles, status and challenges—from light-weighted BERT to conversational ChatGPT. Modern Educational Technology, 2023, 33(8): 19- 28.
doi: 10.3969/j.issn.1009-8097.2023.08.003
|
116 |
BAİDOO-ANU D, OWUSU ANSAH L. Education in the era of generative Artificial Intelligence (AI): understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 2023, 7(1): 52- 62.
doi: 10.61969/jai.1337500
|
117 |
刘宝存, 苟鸣瀚. ChatGPT等新一代人工智能工具对教育科研的影响及对策. 苏州大学学报(教育科学版), 2023, 11 (3): 54- 62.
URL
|
|
LIU B C, GOU M H. The impact and countermeasures of new generation artificial intelligence tools such as ChatGPT on educational research. Journal of Soochow University Educational Science Edition, 2023, 11(3): 54- 62.
URL
|
118 |
WANG C, LIU S X, AWADALLAH A H. Cost-effective hyperparameter optimization for large language model generation inference[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2303.04673.
|
119 |
CAI T L, LI Y H, GENG Z Y, et al. Medusa: simple LLM inference acceleration framework with multiple decoding heads[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2401.10774.
|
120 |
|
121 |
ZHAO H Y, CHEN H J, YANG F, et al. Explainability for large language models: a survey. ACM Transactions on Intelligent Systems and Technology, 2024, 15(2): 1- 38.
|
122 |
STAAB R, VERO M, BALUNOVIĆ M, et al. Beyond memorization: violating privacy via inference with large language models[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2310.07298.
|