作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (3): 58-60. doi: 10.3969/j.issn.1000-3428.2011.03.021

• 软件技术与数据库 • 上一篇    下一篇

向量空间模型中结合句法的文本表示研究

杨玉珍1,2,刘培玉1,2,姜沛佩1,2   

  1. (1. 山东师范大学信息科学与工程学院,济南 250014;2. 山东省分布式计算机软件新技术重点实验室,济南 250014)
  • 出版日期:2011-02-05 发布日期:2011-01-28
  • 作者简介:杨玉珍(1978-),女,讲师、硕士研究生,主研方向:网络信息安全,信息过滤;刘培玉,教授、博士生导师;姜沛佩, 硕士研究生
  • 基金资助:
    国家自然科学基金资助项目(60873247);山东省高新自主创新专项工程基金资助项目(2008ZZ28)

Research on Text Representation with Combination of Syntactic in Vector Space Model

YANG Yu-zhen 1,2, LIU Pei-yu 1,2, JIANG Pei-pei 1,2   

  1. (1. School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China; 2. Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan 250014, China)
  • Online:2011-02-05 Published:2011-01-28

摘要: 为增强向量空间模型(VSM)中项的语义描述性,克服VSM中各语义单元相互独立的缺陷,提出一种基于短语的特征粒度描述方法。该方法从文本的表示及特征项之间的组织方式入手,通过句法规则识别基本短语,构建特征与中心动词的关系树,利用基本短语代替BOW中的词。实验结果表明,采用基本短语的文本表示可提高分类的性能,增加项之间的联系,克服特征项相互独立的缺陷,在特征数量较少的情况下仍能保持良好的分类效果。

关键词: 特征项, 短语, 句法规则, 关系树, 文本表示

Abstract: In order to improve the semantic description of items, and minify impact by mutual independence of terms in Vector Space Model (VSM), this paper proposes a phrase-based text representation. This model analyzes the relationship of the feature items, recognizes basic phrases by development of syntactic rules, and forms the related tree which contains feature items and head verb. It uses phrase-based to describe text instead of words in BOW, thereby the shortcoming of mutual independence is overcome. Experimental result indicates that the new approach improves the performance of the classifier, increases links between terms, and keeps classifying texts correctly, even if the number of feature items is small.

Key words: feature item, phrase, rules of syntactic, relationship tree, text representation

中图分类号: