Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering

Previous Articles     Next Articles

Analysis of Sino-Vietnamese Bilingual News Topics Mixing Elements and Themes

XIA Qing,YAN Xin,YU Zhengtao,WANG Jiancheng,GAO Shengxiang,HONG Xudong   

  1. (School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
  • Received:2015-07-13 Online:2016-09-15 Published:2016-09-15

融合要素及主题的汉越双语新闻话题分析

夏青,严馨,余正涛,汪建成,高盛祥,洪旭东   

  1. (昆明理工大学 信息工程与自动化学院,昆明 650500)
  • 作者简介:夏青(1990-),男,硕士研究生,主研方向为自然语言处理;严馨,副教授、硕士;余正涛,教授、博士;汪建成,硕士研究生;高盛祥、洪旭东,博士研究生。
  • 基金资助:
    国家自然科学基金资助项目(61462055,61472168,61262041);云南省自然科学基金资助重点项目(2013FA130)。

Abstract: It is a hot research point of analyzing and discovering bilingual topics.However,there is no further research on specific contexts.So this paper puts forward a similarity calculation method for Sino-Vietnamese context based on bilingual subject distribution words in Sino-Vietnamese bilingual news texts.It is mixed with element features of news such as titles,key words and entities,integrates the news feature information into the context similarity calculation to construct bilingual text similarity matrix,and uses adaptive K-means algorithm to cluster Sino-Vietnamese bilingual news texts in order to analyze Sino-Vietnamese bilingual news topics.Experimental results prove that the accuracy rate,recall rate and F-measure of the proposed method are higher than that of the calculation method using only news text similarity and K-means clustering method.

Key words: analysis of bilingual news topic, Sino-Vietnamese bilingual, text similarity, topic, adaptive clustering

摘要: 双语话题分析与发现是当前国内外的研究热点,但针对特定文本研究较少。为此,在汉越双语新闻文本中,基于双语主题分布词的汉越文本相似度计算方法,提出融合标题、关键词以及实体等并针对新闻文本的新闻要素特征。将这些新闻特征信息融合到文本相似度计算中构建双语文本相似度矩阵,对汉越双语新闻文本采用自适应K均值算法进行聚类,分析汉越双语新闻话题。实验结果表明,与仅考虑新闻文本相似度的计算方法和K均值聚类方法相比,该方法的准确率、召回率和F值更高。

关键词: 双语新闻话题分析, 汉越双语, 文本相似度, 主题, 自适应聚类

CLC Number: