作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (15): 83-84,8. doi: 10.3969/j.issn.1000-3428.2008.15.029

• 软件技术与数据库 • 上一篇    下一篇

基于LSA的Web信息采集和统计服务

李晓婷1,张 磊2,沈建京2   

  1. (1. 西安通信学院通信装备管理系,西安 710106;2. 解放军信息工程大学电子信息工程系,郑州 450001)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-08-05 发布日期:2008-08-05

Web Information Collection and Statistics Services Based on LSA

LI Xiao-ting1, ZHANG Lei2, SHEN Jian-jing2   

  1. (1. Department of Communication Equipment Management, Xi’an Communication College, Xi’an 710106; 2. Department of Electrical Information Engineering, PLA Information Engineering University, Zhengzhou 450001)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-08-05 Published:2008-08-05

摘要: 在网络信息时代,传统的统计预测方法已经不完全适用,而对特定领域的信息采集和统计的需求日趋明显,使有效定向采集和统计特定领域信息并得到其相应的预测结果成为一个日益重要的研究方向。该文通过运用汉语分词、潜在语义分析和语义匹配等技术,构造了用户兴趣模型,并同时使用了面向服务的体系结构来设计该Web信息采集统计服务,通过具体的实验验证了对Web信息结构分析和未知信息相关性预测来控制信息采集统计的效果。

关键词: 信息采集, 潜在语义分析, 面向服务的架构, Web服务

Abstract: In network information age, the traditional statistics and prediction methods have not been applicable to Web information collection and statistics anymore and owing to the requirements of information collection and statistics in special area are clearer than before, it makes the effectively directional information collection and statistics in special area and getting the corresponding predictive results become a more important research direction. This paper applies the technologies of Chinese word segmenting, Latent Semantic Analysis(LSA), semantic matching, and constructs a user interest model. In the mean time, it uses Service-Oriented Architecture(SOA) to design the Web information collection and statistics service, and validates the effect of the analysis of Web page gathering structure and unknown information, forecast for the relevance of Web page to control the information collection and statistics by concrete experiments.

Key words: information collection, Latent Semantic Analysis(LSA), Service-Oriented Architecture(SOA), Web service

中图分类号: