基于神经网络的复句判定及其关系识别研究

doi:10.19678/j.issn.1000-3428.0059269

计算机工程 ›› 2021, Vol. 47 ›› Issue (11): 54-61. doi: 10.19678/j.issn.1000-3428.0059269

基于神经网络的复句判定及其关系识别研究

贾旭楠¹, 魏庭新^2,3, 曲维光^1,3, 顾彦慧¹, 周俊生¹

1. 南京师范大学计算机科学与技术学院, 南京 210023;
2. 南京师范大学国际文化教育学院, 南京 210097;
3. 南京师范大学文学院, 南京 210097

收稿日期:2020-08-17 修回日期:2020-10-12 发布日期:2021-11-09
作者简介:贾旭楠(1994-),女,硕士研究生,主研方向为自然语言处理;魏庭新,讲师、博士研究生;曲维光(通信作者),教授、博士生导师;顾彦慧,副教授;周俊生,教授。
基金资助:
国家自然科学基金“汉语抽象意义表示关键技术研究”（61772278）；江苏省高校哲学社会科学基金“面向机器学习的汉语复句语料库建设研究”（2019JSA0220）。

Study on Complex Sentence Identification and Its Relation Recognition Based on Neural Network

JIA Xunan¹, WEI Tingxin^2,3, QU Weiguang^1,3, GU Yanhui¹, ZHOU Junsheng¹

1. School of Computer Science and Technology, Nanjing Normal University, Nanjing 210023, China;
2. International College for Chinese Studies, Nanjing Normal University, Nanjing 210097, China;
3. School of Chinese Language and Literature, Nanjing Normal University, Nanjing 210097, China

Received:2020-08-17 Revised:2020-10-12 Published:2021-11-09

摘要/Abstract

摘要： 复句是自然语言的基本单位之一，复句的判定及其语义关系的识别，对于句法解析、篇章理解等都有着非常重要的作用。基于神经网络模型识别自然语料中的复句，判断其复句关系，构造复句判定和复句关系识别联合模型，以最大程度地减少误差传递。在复句判定任务中通过Bi-LSTM获得上下文语义信息，采用注意力机制捕获句内跨距离搭配信息，利用CNN捕获句子局部信息。在复句关系识别任务中，使用Bert增强句子的语义表示，运用Tree-LSTM对句法结构和成分标记进行建模。在CAMR中文语料上的实验结果表明，基于注意力机制的复句判定模型F1值达到91.7%，基于Tree-LSTM的复句关系识别模型F1值达到69.15%。在联合模型中，2项任务的F1值分别达到92.15%和66.25%，说明联合学习能够使不同任务获得更多特征，从而提高模型性能。

关键词: 复句判定, 神经网络, 复句关系识别, 联合模型, 语义建模

Abstract: Complex sentence is one of the basic units in natural languages.The identification of complex sentences and the recognition of their semantic relations are crucial to syntactic parsing and text understanding.In this study,a neural network model is used to recognize the complex sentences in texts and determine the relationships between them.A model is constructed for the joint recognition of complex sentences and their semantic relations to minimize the propagation of errors.For recognition of complex sentences,a Bi-LSTM model is used to obtain sentence-level contextual semantic information,an attention mechanism to capture the cross-distance collocation information within a sentence,and a Convolutional Neural Network(CNN) to capture the local information of the sentences.For recognition of complex sentence relationships,Bert is used to enhance the semantic representation of sentences,and Tree-LSTM is used to model syntactic structure and component tags.The experimental results on the Chinese corpus dataset,CAMR,show that the F1 value of the attention mechanism-based model reaches 91.7% in complex sentence recognition,and that of the Tree-LSTM-based model reaches 69.15% in recognition of complex sentence relationships.The F1 value of the joint model reaches 92.15% and 66.25% in the two tasks respectively,which proves that joint learning increases the number of obtained features and thus improves the model performance.

Key words: complex sentence identification, neural network, complex sentence relation recognition, joint model, semantic modeling

中图分类号:

TP18

贾旭楠, 魏庭新, 曲维光, 顾彦慧, 周俊生. 基于神经网络的复句判定及其关系识别研究[J]. 计算机工程, 2021, 47(11): 54-61.

JIA Xunan, WEI Tingxin, QU Weiguang, GU Yanhui, ZHOU Junsheng. Study on Complex Sentence Identification and Its Relation Recognition Based on Neural Network[J]. Computer Engineering, 2021, 47(11): 54-61.

https://www.ecice06.com/CN/Y2021/V47/I11/54

图/表 16

20211113143235

20211113143237

20211113143241

20211113143307

20211113143310

20211113143313

20211113143316

20211113143319

20211113143322

20211113143327

20211113143331

20211113143334

20211113143343

20211113143348

20211113143353

20211113143357

参考文献

[1] 胡金柱,舒江波,胡泉,等.复句关系词自动识别中规则的表示方法研究[J].计算机工程与应用,2016,52(1):127-132. HU J Z,SHU B,HU Q,et al.Research on expression method of rules in auto-identifying relational word of Chinese compound sentences[J].Computer Engineering and Applications,2016,52(1):127-132.(in Chinese)
[2] 徐昇,王体爽,李培峰,等.运用多层注意力神经网络识别中文隐式篇章关系[J].中文信息学报,2019,33(8):12-19,35. XU S,WANG T S,LI P F,et al.Using multilayer attention neural network to identify implicit textual relations in Chinese[J].Journal of Chinese Information Processing,2019,33(8):12-19,35.(in Chinese)
[3] 魏庭新,曲维光,宋丽,等.面向中文抽象语义表示的复句研究综述[J].厦门大学学报(自然科学版),2018,57(6):849-858. WEI T X,QU W G,SONG L,et al.A review of complex sentences for abstract semantic representation in Chinese[J].Journal of Xiamen University(Natural Science),2018,57(6):849-858.(in Chinese)
[4] 黎锦熙.新著国语文法[M].长沙:湖南教育出版社,2007. LI J X.The new mandarin grammar[M].Changsha:Hunan Education Press,2007.(in Chinese)
[5] 黄伯荣,廖序东.现代汉语(增订版)[M].北京:高等教育出版社,2002. HUANG B R,LIAO X D.Modern Chinese(updated)[M].Beijing:Higher Education Press,2002.(in Chinese)
[6] 邢福义.复句与关系词语[M].哈尔滨:黑龙江人民出版社,1985. XING F Y.Compound sentences and relation words[M].Harbin:Heilongjiang People's Publishing House,1985.(in Chinese)
[7] 胡金柱,吴锋文,李琼,等.汉语复句关系词库的建设及其利用[J].语言科学,2010,9(2):133-142. HU J Z,WU F W,LI Q,et al.Construction and utilization of Chinese compound sentence relation lexis[J].Language Science,2010,9(2):133-142.(in Chinese)
[8] 杨进才,郭凯凯,沈显君,等.基于贝叶斯模型的复句关系词自动识别与规则挖掘[J].计算机科学,2015,42(7):291-294,319. YANG J C,GUO K K,SHEN X J,et al.Automatic identification and rule mining for relation words of Chinese compound sentences based on Bayesian model[J].Computer Science,2015,42(7):291-294,319.(in Chinese)
[9] 李艳翠,孙静,周国栋,等.基于清华汉语树库的复句关系词识别与分类研究[J].北京大学学报(自然科学版),2014,50(1):118-124. LI Y C,SUN J,ZHOU G D,et al.Research on the recognition and classification of complex sentence relation words based on Tsinghua Chinese tree library[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2014,50(1):118-124.(in Chinese)
[10] 张牧宇,宋原,秦兵,等.中文篇章级句间语义关系识别[J].中文信息学报,2013,27(6):51-57. ZHANG M Y,SONG Y,QIN B,et al.Chinese discourse relation recognition[J].Journal of Chinese Information Processing,2013,27(6):51-57.(in Chinese)
[11] 孙凯丽,邓沌华,李源,等.基于句内注意力机制多路CNN的汉语复句关系识别方法[J].中文信息学报,2020,34(6):9-17,26. SUN K L,DENG D H,LI Y,et al.A method for recognition of complex sentence relations in Chinese based on multi-channel CNN of intra-sentence attention mechanism[J].Journal of Chinese Information Processing,2020,34(6):9-17,26.(in Chinese)
[12] 杨进才,汪燕燕,曹元,等.关系词非充盈态复句的特征融合CNN关系识别方法[J].计算机系统应用,2020,29(6):224-229. YANG J C,WANG Y Y,CAO Y,et al.Relation classification of non-saturated Chinese compound sentence via feature fusion CNN[J].Computer System Application,2020,29(6):224-229.(in Chinese)
[13] 姚双云,胡金柱,舒江波,等.篇章连贯语义关系的自动标注方法[J].计算机工程,2012,38(7):131-133. YAO S Y,HU J Z,SHU J B,et al.Automatic annotation method of textual coherence semantic relation[J].Computer Engineering,2012,38(7):131-133.(in Chinese)
[14] 孙静,李艳翠,周国栋,等.汉语隐式篇章关系识别[J].北京大学学报(自然科学版),2014,50(1):111-117. SUN J,LI Y C,ZHOU G D,et al.Research of Chinese implicit discourse relation recognition[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2014,50(1):111-117.(in Chinese)
[15] PENNINGTON J,SOCHER R,MANNING C D.GloVe:global vectors for word representation[C]//Proceedings of 2014 IEEE Conference on Empirical Methods in Natural Language Processing.Washington D.C.,USA:IEEE Press,2014:1532-1543.
[16] VASWANI A,SHAZEER N,PARNAR N,et al.Attention is all you need[C]//Proceedings of the 31st IEEE International Conference on Neural Information Processing Systems.Washington D.C.,USA:IEEEPress,2017:6000-6010.
[17] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2020-07-10].https://arXiv preprint arXiv:1810.04805.
[18] TAI K S,SOCHER R,MANNING C D.Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of Annual Meeting of the Association for Computational Linguistics.Washington D.C.,USA:IEEEPress,2015:1556-1566.
[19] 戴茹冰.汉语抽象语义表示体系资源构建及其应用研究[D].南京:南京师范大学,2020. DAI R B.Resource construction and application of Chinese abstract semantic representation system[D].Nanjing:Nanjing Normal University,2020.(in Chinese)
[20] 李艳翠.汉语篇章结构表示体系及资源构建研究[D].苏州:苏州大学,2015. LI Y C.Research on Chinese discourse structure representation system and resource construction[D].Suzhou:Soochow University,2015.(in Chinese)
[21] KINGMA D,BA J.Adam:a method for stochastic optimization[EB/OL].[2020-07-10].https://arXiv preprint arXiv:1412.6980v8.
[22] SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.

选择文件类型/文献管理软件名称

选择包含的内容

基于神经网络的复句判定及其关系识别研究

Study on Complex Sentence Identification and Its Relation Recognition Based on Neural Network

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王志浩, 钱沄涛. 基于Swin Transformer的双流遥感图像时空融合超分辨率重建[J]. 计算机工程, 2024, 50(9): 33-45.
[2]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[3]	李泽霖, 吕兆峰, 陈富强, 李克. 基于多跳信息融合的实体对齐模型[J]. 计算机工程, 2024, 50(9): 142-152.
[4]	王汝英, 马嘉骏, 董建强, 刘万龙, 张海涛, 尹凯, 赵博超. 基于MTS-BiGRU-DMHSA的工业负荷预测方法[J]. 计算机工程, 2024, 50(9): 169-178.
[5]	张鲁, 田春伟, 宋焕生, 刘侍刚. 用于低剂量CT图像去噪的多级双树复小波网络[J]. 计算机工程, 2024, 50(9): 266-275.
[6]	高煜宝, 文志诚. 基于注意力机制的双路解码器图像去噪方法[J]. 计算机工程, 2024, 50(9): 324-332.
[7]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[8]	何杏宇, 周易歆, 罗东旭, 杨桂松. 基于图神经网络和多主体评价的教学资源推荐[J]. 计算机工程, 2024, 50(7): 13-22.
[9]	耿丽丽, 牛保宁. 基于通道相似度熵的卷积神经网络裁剪[J]. 计算机工程, 2024, 50(7): 133-143.
[10]	张洋, 刘畅, 李少青. 基于可控制性度量的图神经网络门级硬件木马检测方法[J]. 计算机工程, 2024, 50(7): 164-173.
[11]	牛瑞婷, 严天峰, 高锐, 王映植. 低信噪比下基于深度学习TCNN-MobileNet的调制识别[J]. 计算机工程, 2024, 50(7): 204-215.
[12]	张溢文, 蔡满春, 陈咏豪, 朱懿, 姚利峰. 融合空间特征的多尺度深度伪造检测方法[J]. 计算机工程, 2024, 50(7): 240-250.
[13]	逯焕宇, 张永宏, 马光义, 谢东林, 田伟. 基于半监督对抗学习的遥感图像水体提取[J]. 计算机工程, 2024, 50(7): 251-263.
[14]	李云航, 潘晴, 田妮莉. 结构相似度优化的混合多尺度医学图像融合[J]. 计算机工程, 2024, 50(7): 264-270.
[15]	张正康, 杨丹, 聂铁铮, 寇月. 基于图结构聚类的自监督学习疾病诊断方法[J]. 计算机工程, 2024, 50(7): 360-371.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于神经网络的复句判定及其关系识别研究

Study on Complex Sentence Identification and Its Relation Recognition Based on Neural Network

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献

相关文章 15

编辑推荐

Metrics

本文评价