Feature Extension Algorithm Fusing Statistical Information and Semantic Similarity

doi:10.3969/j.issn.1000-3428.2017.06.028

Abstract

Abstract: By analyzing high dimension characteristic and sparsity of short text,this paper proposes a feature extension algorithm fusing statistical information feature words between concepts and semantic similarity for short text.Firstly,it selects reasonable feature set through the contribution degree of word and constructs initial feature extension set.Then it calculates statistical correlation between feature words and constructs a binary word correlation pair set.Finally,by using the semantic relations of external knowledge base,HowNet,it obtains synsets of relevant words,calculates the semantic similarity,extends the synsets which meet the conditions to the feature words of the short text and obtains the extend feature set.Experimental results show that,after using the proposed algorithm to extended features,the classification results of classifiers can be greatly improved.

Key words: short text, statistical correlation, semantic similarity, HowNet, feature extension

摘要：

通过分析短文本的高维性和稀疏性,提出一种融合特征词间统计信息与语义相似度的短文本特征扩展算法。根据词的贡献度对候选特征集进行筛选,得到扩展集合初始值。计算特征词之间的统计相关度,构建二元相关词对集合。利用外部知识库知网中的语义关系获取相关词对的义项集合并计算语义相似度,将满足条件的义项扩展为短文本的特征词,得到扩展后的特征集。实验结果表明,使用该算法对短文本进行特征扩展后,可显著提升分类器的分类效果。

关键词: 短文本, 统计相关度, 语义相似度, 知网, 特征扩展

CLC Number:

TP18

LI Xiaohong,CAO Lin,SU Yun,MA Huifang. Feature Extension Algorithm Fusing Statistical Information and Semantic Similarity[J]. Computer Engineering.

李晓红,曹林,宿云,马慧芳. 融合统计信息与语义相似度的特征扩展算法[J]. 计算机工程.

/ Recommend / Download Citations

URL:

https://www.ecice06.com/EN/Y2017/V43/I6/177

References

参考文献［1］Sun Aixin.Short Text Classification Using Very Few Words［C］//Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York,USA:ACM Press,2012:1145-1146. ［2］Zelikovitz S,Marquez F.Transductive Learning for Short-text Classification Problems Using Latent Semantic Indexing［J］.International Journal of Pattern Recognition and Artificial Intelligence,2005,19(2):143-163. ［3］杨婉霞,孙理,黄永峰.结合语义与统计的特征降维短文本聚类［J］.计算机工程,2012,38(22):171-175. ［4］Yan Tao,Wang Xiwei.Feature Extension for Short Text［C］//Proceedings of the 3rd International Symposium on Computer Science and Computational Technology.Jiaozuo,China:［s.n.］,2010:338-341. ［5］Liu Mingxuan,Fan Xinghua.A Method for Chinese Short Text Classification Considering Effective Feature Expansion［J］.International Journal of Advanced Research in Artificial Intelligence,2012,1(1). ［6］Wang Peng,Zhang Heng,Xu Bo.Short Text Feature Enrichment Using Link Analysis on Topic-keyword Graph［C］//Proceedings of NLPCC’14.Berlin,Germany:Springer,2014:79-90. ［7］Man Yuan.Feature Extension for Short Text Categoriza-tion Using Frequent Term Sets［J］.Procedia Computer Science,2014,31:663-670. ［8］陈羽中,方明月,郭文忠.面向微博热点话题发现的多标签传播聚类方法研究［J］.模式识别与人工智能,2015,28(1):1-10. ［9］Cataldi M,di Caro L,Schifanella C.Emerging Topic Detection on Twitter Based on Temporal and Social Terms Evaluation［C］//Proceedings of the 10th International Workshop on Multimedia Data Mining.Washington D.C.,USA:［s.n.］,2010:1-10. ［10］Chen Mengen,Jin Xiaoming,Shen Dou.Short Text Classification Improved by Learning Multi-granularity Topics［C］//Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Barcelona,Spain:［s.n.］,2011:1776-1781. ［11］刘群,李素建.基于《知网》的词汇语义相似度的计算［C］//第三届汉语词汇语义学研讨会.台北,中国:［出版者不详］,2002:59-76. ［12］Pan Liqiang,Zhang Pu,Xiong Anping.Semantic Similarity Calculation of Chinese Word［J］.International Journal of Advanced Computer Science and Applications,2014,5(8):205-214. ［13］Liu Wenyin,Quan Xiaojun,Feng Min,et al.A Short Text Modeling Method Combining Semantic and Statistical Information［J］.Information Sciences,2010,180(20):4031-4041. ［14］Zhang Huaping,Yu Hongkui,Yi De.HHMM-based Chinese Lexical Analyzer ICT-CLAS［C］//Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing.Sapporo,Japan:［s.n.］,2003:184-187. ［15］Peat H J,Willet P.The Limitations of Term Co-occurrence Data for Query Expansion in Document Retrieval Systems［J］.Journal of American Society for Information Science,1991,42(5):378-383. 编辑金胡考

[1]	Jiayi LIN, Hongbin XIA, Yuan LIU. Math Word Problems Solving Model Based on Analogical Learning [J]. Computer Engineering, 2024, 50(7): 63-70.
[2]	LI Xue, WANG Yawen, ZHANG Qianjin. Automatic Naming of Source Code Based on Information Retrieval [J]. Computer Engineering, 2024, 50(6): 304-310.
[3]	YOU Ben, LI Xiaohong, YAO Jin, FENG Shaojie. Semi-supervised Classification for Short Text Based on Multi-grained Graphs and Attention Mechanism [J]. Computer Engineering, 2024, 50(5): 83-90.
[4]	Xiaodan CUI, Dawei LIU, Yifan LIU, Zhibin ZHAO, Yougui REN, Yongming YAN. Research and Implementation of Key Frame Summarization Model for News Short Video [J]. Computer Engineering, 2023, 49(8): 182-189.
[5]	YANG Zhenyu, WANG Lei, MA Bo, YANG Yating, DONG Rui, Azmat Anwar, WANG Zhen. A Cross-Lingual Distant Supervision Method for Uyghur and Chinese [J]. Computer Engineering, 2023, 49(2): 271-278.
[6]	Zijian LIU, Yong WANG, Yuanni LIU, Yousheng ZHOU. Efficient Clustering Algorithm of Short Text Streams Based on Episodic Memory [J]. Computer Engineering, 2023, 49(10): 145-153.
[7]	LIANG Dengyu, LIU Daming. Short Text Matching Model Combined with Multi-Granularity Information and External Knowledge [J]. Computer Engineering, 2022, 48(8): 129-135,143.
[8]	ZHAN Fei, ZHU Yanhui, LIANG Wentong, ZHANG Xu, OUYANG Kang, KONG Lingwei, HUANG Yalin. Short Text Entity Linking Method Based on Multi-Task Learning [J]. Computer Engineering, 2022, 48(3): 315-320.
[9]	SHI Caixia, LI Shuqin, LIU Bin. Method for Calculating Short Text Similarity Using Multi-Check Weighted Fusion [J]. Computer Engineering, 2021, 47(2): 95-102.
[10]	YUAN Ziyong, GAO Shu, CAO Jiao, CHEN Liangchen. Method for Few-Shot Short Text Classification Based on Heterogeneous Graph Convolutional Network [J]. Computer Engineering, 2021, 47(12): 87-94.
[11]	ZHANG Shengqi, WANG Yuanlong, LI Ru, WANG Xiaoyue, WANG Xiaohui, YAN Zhichao. Entity Linking Based on Local Attention Mechanism for Chinese Short Text [J]. Computer Engineering, 2021, 47(11): 77-83,92.
[12]	DING Chenhui, XIA Hongbin, LIU Yuan. Short Text Classification Model Combining Knowledge Graph and Attention Mechanism [J]. Computer Engineering, 2021, 47(1): 94-100.
[13]	DUAN Dandan, TANG Jiashan, WEN Yong, YUAN Kehai. Chinese Short Text Classification Algorithm Based on BERT Model [J]. Computer Engineering, 2021, 47(1): 79-86.
[14]	LI Shibao, LI He, ZHAO Qingshuai, YIN Lele, LIU Jianhang, HUANG Tingpei. Chinese Textual Entailment Recognition Fused with External Semantic Knowledge [J]. Computer Engineering, 2021, 47(1): 44-49.
[15]	YIN Yabo,YANG Wenzhong,YANG Huiting,XU Chaoying. Research on Short Text Classification Algorithm Based on Convolutional Neural Network and KNN [J]. Computer Engineering, 2018, 44(7): 193-198.

Please choose a citation manager

Content to export