[1] GARNTER. Generative AI, Machine Customers and AR/VR are Expected to Transform Sales in the Next Five Years[EB/OL].(2022-10-10)[2022 11-05]. https://www.gartner.com/en/newsroom/press-releases/2022-10-10-gartner-identifies-seven-technology-disruptions-that-will impact-sales-through-2027.
[2] SHUMAILOV I, SHUMAYLOV Z, ZHAO Y, et al. The curse of recursion: Training on generated data makes models forget[J]. arXiv preprint arXiv:2305.17493, 2023.
[3] 李旭光,胡奕,王曼,等.人工智能生成内容研究综述:应用、风险与治理[J].图书情报工作,2024,68(17):136-149.
LI X, HU Y, WANG M, et al. A Review of AI-generated Content Research: Applications, Risks, and Govern-ance[J]. Library and Information Service, 2024, 68(17): 136-149.
[4] DOHMATOB E, FENG Y, YANG P, et al. A tale of tails: Model collapse as a change of scaling laws[J]. arxiv preprint arxiv:2402.07043, 2024.
[5] GERSTGRASSER M, SCHAEFFER R, DEY A, et al. Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data[J]. arxiv preprint arxiv:2404.01413, 2024.
[6] LEE D H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural net-works[C]//Workshop on challenges in representation learning, ICML. 2013, 3(2): 896.
[7] XIE Q, DAI Z, HOVY E, et al. Unsupervised data aug-mentation for consistency training[J]. Advances in Neural Information Processing Systems, 2020, 33: 6256-6268.
[8] ZHAO W X, ZHOU K, LI J, et al. A survey of large lan-guage models[J]. arXiv preprint arXiv:2303.18223, 2023, 1(2).
[9] LIU R, WEI J, LIU F, et al. Best practices and lessons learned on synthetic data[J]. arxiv preprint arxiv:2404.07503, 2024.
[10] 叶英杰,李川.人工智能模型训练中合成数据的应用风险及其治理路径[J/OL].情报理论与实践,1-11[2025-03-24].http://kns.cnki.net/kcms/detail/11.1762.G3.20250208.1355.004.html.
YE Y, Li C. Application Risks of Synthetic Data in AI Model Training and Its Governance Pathways[J/OL]. Information studies: Theory & Application. [2025-03-24]. http://kns.cnki.net/kcms/detail/11.1762.G3.20250208.1355.004.html.
[11] FU S, ZHANG S, WANG Y, et al. Towards theoretical understandings of self-consuming generative models[J]. arXiv preprint arXiv:2402.11778, 2024.
[12] SEDDIK M E A, CHEN S W, HAYOU S, et al. How bad is training on synthetic data? a statistical analysis of language model collapse[J]. arXiv preprint arXiv:2404.05090, 2024.
[13] BERTRAND Q, BOSE A J, DUPLESSIS A, et al. On the stability of iterative retraining of generative models on their own data[J]. arXiv preprint arXiv:2310.00429, 2023.
[14] WANG L, SHI X, LI G, et al. Why language models col-lapse when tr ained on recursively generated text[J]. arXiv preprint arXiv:2412.14872, 2024.
[15] HATAYA R, BAO H, ARAI H. Will large-scale generative models corrupt future datasets?[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 20555-20565.
[16] JAIN A, MONTANARI A, SASOGLU E. Scaling laws for learning with real and surrogate data[J]. arxiv preprint arxiv:2402.04376, 2024.
[17] BENDER E M, KOLLER A. Climbing towards NLU: On meaning, form, and understanding in the age of da-ta[C]//Proceedings of the 58th annual meeting of the association for computational linguistics. 2020: 5185-5198.
[18] GUO Y, SHANG G, VAZIRGIANNIS M, et al. The curi-ous decline of linguistic diversity: Training language models on synthetic text[J]. arxiv preprint arxiv:2311.09807, 2023.
[19] FERBACH D, BERTRAND Q, BOSE A J, et al. Self-consuming generative models with curated data provably optimize human preferences[J]. arXiv preprint arXiv:2407.09499, 2024.
[20] JELINEK F. Statistical methods for speech recognition[M]. Cambridge, MA: MIT Press, 1998.
[21] SHANNON C E. A mathematical theory of communica-tion[J]. The Bell System Technical Journal, 1948, 27(3): 379 423.
[22] SU Z, WU X, ZHOU W, et al. Hc3 plus: A seman-tic-invariant human chatgpt comparison corpus[J]. arXiv preprint arXiv:2309.02731, 2023.
[23] WU J, YANG S, ZHAN R, et al. A survey on LLM-generated text detection: Necessity, methods, and future directions[J]. Computational Linguistics, 2025: 1-66.
[24] LIU Y, ZHOU J, SANG G, et al. The journey of language models in understanding natural language[C]//JIN C, YANG S, SHANG X, WANG H, ZHANG Y. Web information systems and applications (International Conference on Web Information Systems and Applications), Singapore, 2024. Singapore: Springer Nature Singapore, 2024: 331-363.
|