作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2019, Vol. 45 ›› Issue (5): 116-121. doi: 10.19678/j.issn.1000-3428.0050574

• 人工智能及识别技术 • 上一篇    下一篇

基于语义结构的迁移学习文本特征对齐算法

卢晨阳,康雁,杨成荣,蒲斌   

  1. 云南大学 软件学院,昆明 650500
  • 收稿日期:2018-03-02 出版日期:2019-05-15 发布日期:2019-05-15
  • 作者简介:卢晨阳(1994—),男,硕士研究生,主研方向为自然语言处理;康雁,副教授;杨成荣、蒲斌,硕士研究生。
  • 基金资助:

    国家自然科学基金(61762092);云南省软件工程重点实验室开放基金(2017SE204)。

Text Feature Alignment Algorithm for Transfer Learning Based on Semantic Structure

LU Chenyang,KANG Yan,YANG Chengrong,PU Bin   

  1. School of Software,Yunnan University,Kunming 650500,China
  • Received:2018-03-02 Online:2019-05-15 Published:2019-05-15

摘要:

特征对齐在源域和目标域空间不一致时会导致负迁移现象。为此,提出一种基于GloVe和WordNet模型的迁移学习文本特征对齐算法。根据数据样本词性和类别对分类任务进行特征筛选,选择源域和目标域的领域共有词作为枢纽词,使用GloVe模型对齐源域和目标域中最相似的非枢纽特征。在此基础上,根据源域和目标域的非共有特征,通过WordNet模型对领域独立特征完成强语义对齐,同时利用含有枢纽特征的对齐三元组表示对齐特征。实验结果表明,该算法可有效降低特征维度,扩充特征空间,提高跨领域文本分类精度。

关键词: 迁移学习, 特征对齐, 词向量, 词网, 文本挖掘

Abstract:

Feature alignment causes a negative transfer when the source domain space and target domain space are inconsistent.Therefore,a text feature alignment algorithm for transfer learning based on the GloVe and WordNet model is proposed.According to the part of speech and category of the sample data,feature filtering is performed to classification tasks.The shared terms of the source domains and target domain are selected as pivot words,and the GloVe model is used to align the most similar non-pivot features in the source domain and target domain.On this basis,according to the unique features of the source domain and target domain,strong semantic alignment is achieved through the WordNet model for the domain independent features.At the same time,alignment features are represented by aligning triples with pivot features.Experimental results show that the algorithm can effectively reduce the feature dimension,expand the feature space,and improve the accuracy of cross-domain text classification.

Key words: transfer learning, feature alignment, word vector, WordNet, text mining

中图分类号: