作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

面向迁移学习的文本特征对齐算法

魏晓聪1,2,林鸿飞1   

  1. (1.大连理工大学 计算机科学与技术学院,辽宁 大连 116024; 2.大连外国语大学 软件学院,辽宁 大连 116044)
  • 收稿日期:2016-01-15 出版日期:2017-02-15 发布日期:2017-02-15
  • 作者简介:魏晓聪(1982—),女,讲师、博士研究生,主研方向为自然语言处理、情感计算、机器学习;林鸿飞,教授、博士生导师。
  • 基金资助:
    国家自然科学基金(61572102,61562080);大连外国语大学科研基金(2014XJQN14)。

Transfer Learning Oriented Text Feature Alignment Algorithm

WEI Xiaocong 1,2,LIN Hongfei 1   

  1. (1.School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024,China; 2.School of Software,Dalian University of Foreign Languages,Dalian,Liaoning 116044,China)
  • Received:2016-01-15 Online:2017-02-15 Published:2017-02-15

摘要: 源领域和目标领域特征空间的不一致导致迁移学习准确率下降。为此,提出一种基于Word2Vec的不同领域特征对齐算法。只选取形容词、副词、名词、动词作为特征,针对每种词性,选择源领域和目标领域的枢纽特征,分别在源领域和目标领域为该枢纽特征计算出与之语义相似度最大的非枢纽特征,将其作为相似枢纽特征,从而为每个枢纽特征构成一个相似枢纽特征对。将出现在这些领域中的每一个相似枢纽特征按照枢纽特征对进行特征替换,从而将不同领域语义相似的特征进行对齐,并在特征替换后的源领域和目标领域数据上进行机器学习。实验结果表明,该算法的平均分类精度达到88.2%,高于Baseline算法。

关键词: 迁移学习, 特征对齐, 情感分析, 源领域, 目标领域

Abstract: The inconsistency between source domain and target domain feature spaces results in accuracy decline of transfer learning.To resolve this problem,this paper proposes a different domain feature alignment method based on Word2Vec.Adjectives,adverbs,nouns and verbs are selected as features.Pivot feature is selected from source domain and target domain for every part of speech.The most similar non-pivot feature is calculated for each pivot feature respectively from source domain and target domain as similar pivot feature.Then similar pivot feature pairs are constructed accordingly.Every similar pivot feature appearing in both domains is transformed according to similar pivot feature pairs.Consequently,the features which represent similar semantic information are aligned.Machine learning is performed on source domain and target domain data after feature transformation.Experimental result shows that the average accuracy of the proposed algorithm is 88.2%,higher than Baseline algorithm.

Key words: transfer learning, feature alignment, emotion analysis, source domain, target domain

中图分类号: