作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

维吾尔文网络查询扩展词的构建研究

年 梅1,2,张兰芳2   

  1. (1. 新疆大学计算机科学与技术博士后流动站,乌鲁木齐830054;2. 新疆师范大学网络信息安全与舆情分析实验室,乌鲁木齐830054)
  • 收稿日期:2014-04-21 出版日期:2015-04-15 发布日期:2015-04-15
  • 作者简介:年 梅(1970 - ),女,教授,主研方向:自然语言处理,网络安全;张兰芳,硕士研究生。
  • 基金资助:
    国家自然科学基金资助项目(61163064);新疆师范大学网络信息安全与舆情分析实验室公开招标课题基金资助项目(WLYQ2012110);新疆师范大学研究生科技创新基金资助项目(20131204)。

Research on Construction of Uyghur Network Query Expansion Words

NIAN Mei 1,2,ZHANG Lanfang 2   

  1. (1. Postdoctoral Flow Station of Computer Science and Technology,Xinjiang University,Urumqi 830054,China; 2. Network Information Security and Public Opinion Analysis Laboratory,Xinjiang Normal University,Urumqi 830054,China)
  • Received:2014-04-21 Online:2015-04-15 Published:2015-04-15

摘要: 为提高维吾尔文网络内容查询的扩展性能,提出一种将维语同义词和互联网资源相结合的扩展词构建算 法。利用维吾尔语同义词词典、近义词词典和反义词词典等建立基本候选词库,将互联网作为超大规模语料库,以 搜索引擎为工具,使用改进的点互信息对基本扩展词进行相似度评价,选取前N 个词形成候选扩展词库1,对包含 关键词的互联网语料,基于局部共现和点互信息分析,构建候选扩展词库2,对上述2 种候选扩展词库加权求和,按 顺序选择部分词为扩展词。通过搜索引擎实现扩展查询验证,结果表明,与常规查询和同义词查询扩展算法相比,该算法能明显提高查询的准确率。

关键词: 查询扩展, 局部共现分析, 点互信息算法, 扩展词, 大规模语料库

Abstract: In order to improve the performance of Uighur network content query expansion,this paper presents a kind of expansion words construction algorithm that is based on the combination of the Uygur synonym resources and Internet resources. An initial candidate words set is created by the Uyghur synonym,near-synonym and antonyms dictionary. The Internet is acted as a very large scale corpus,the similarity between the keywords and every word in the initial candidate words set is computed by the improved point mutual information algorithm. The words are sorted by the similarity evaluation and the top N words are selected to the candidate expansion words set-1. Meanwhile based on the partial collinear and point mutual information algorithm,it analyzes the Internet corpus which contained keywords and create the candidate expansion words set-2. The final expansion words are selected according to the results of weighted summation for the candidate expansion words set-1 and set-2. Compared with the normal keywords query and synonym expansion query,the query results based on the expansion words in this paper show that the accuracy of this algorithm is much better than the others.

Key words: query expansion, local co-occurrence analysis, point mutual information algorithm, expansion word, large scale corpus

中图分类号: