1 |
甘陈敏, 唐宏, 杨浩澜, 等. 融合卷积收缩门控的生成式文本摘要方法. 计算机工程, 2024, 50 (2): 98- 104.
doi: 10.19678/j.issn.1000-3428.0066847
|
|
GAN C M , TANG H , YANG H L , et al. Abstractive text summarization method incorporating convolutional shrinkage gating. Computer Engineering, 2024, 50 (2): 98- 104.
doi: 10.19678/j.issn.1000-3428.0066847
|
2 |
HUANG D D, CUI L Y, YANG S, et al. What have we achieved on text summarization?[C]//Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2020: 446-469.
|
3 |
SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of NIPS'14. Cambridge, USA: MIT Press, 2014: 3104-3112.
|
4 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Washington D. C., UAS: IEEE Press, 2017: 6000-6010.
|
5 |
LEWIS M, LIU Y H, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2020: 7871-7880.
|
6 |
DAI Z, YANG Z, YANG Y, et al. Transformer-XL: attentive language models beyond a fixed-length context[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1901.02860.
|
7 |
RAE J W, POTAPENKO A, JAYAKUMAR S M, et al. Compressive transformers for long-range sequence modelling[C]//Proceedings of International Conference on Learning Representations. Washington D. C., UAS: IEEE Press, 2019: 256-268.
|
8 |
|
9 |
HOCHREITER S , SCHMIDHUBER J . Long short-term memory. Neural Computation, 1997, 9 (8): 1735- 1780.
doi: 10.1162/neco.1997.9.8.1735
|
10 |
CHUNG J, GÜLÇEHRE Ç, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1412.3555.
|
11 |
黄东瑞, 毛克彪, 郭中华, 等. 几种神经网络经典模型综述. 高技术通讯, 2023, 33 (8): 860- 871.
doi: 10.3772/j.issn.1002-0470.2023.08.008
|
|
HUANG D R , MAO K B , GUO Z H , et al. A review of classical models of neural networks. Chinese High Technology Letters, 2023, 33 (8): 860- 871.
doi: 10.3772/j.issn.1002-0470.2023.08.008
|
12 |
BENGIO Y , SIMARD P , FRASCONI P . Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 1994, 5 (2): 157- 166.
doi: 10.1109/72.279181
|
13 |
|
14 |
|
15 |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1810.04805v2.
|
16 |
HUANG Y P, CHENG Y L, BAPNA A, et al. GPipe: efficient training of giant neural networks using pipeline parallelism[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1811.06965v5.
|
17 |
|
18 |
KITAEV N, KAISER L, LEVSKAYA A. Reformer: the efficient transformer[C]//Proceedings of International Conference on Learning Representations. Washington D. C., UAS: IEEE Press, 2019: 4235-4246.
|
19 |
|
20 |
ZAHEER M, GURUGANESH G, DUBEY A, et al. Big bird: transformers for longer sequences[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Washington D. C., UAS: IEEE Press, 2020: 17283-17297.
|
21 |
QIU J Z, MA H, LEVY O, et al. Blockwise self-attention for long document understanding[C]// Proceedings of Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2020: 2555-2565.
|
22 |
ZHOU H Y , ZHANG S H , PENG J Q , et al. Informer: beyond efficient transformer for long sequence time-series forecasting. Artificial Intelligence, 2021, 35 (12): 11106- 11115.
|
23 |
|
24 |
ZHOU Y J, DOU Z C, YUAN H Y, et al. Socialformer: social network inspired long document modeling for document ranking[C]//Proceedings of ACM Web Conference. New York, USA: ACM Press, 2022: 339-347.
|
25 |
LIU T T, WANG C Y, CHEN C, et al. Understanding long programming languages with structure-aware sparse attention[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM Press, 2022: 2093-2098.
|
26 |
|
27 |
|
28 |
MIKOLOV T, ZWEIG G. Context dependent recurrent neural network language model[C]//Proceedings of IEEE Spoken Language Technology Workshop. Miami, USA: IEEE Press, 2012: 234-239.
|
29 |
BAI S J, KOLTER J Z, KOLTUN V, et al. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1803.01271v2.
|
30 |
MANAKUL P, GALES M. Long-span summarization via local attention and content selection[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2021: 6026-6041.
|