摘要: 在网络信息时代,传统的统计预测方法已经不完全适用,而对特定领域的信息采集和统计的需求日趋明显,使有效定向采集和统计特定领域信息并得到其相应的预测结果成为一个日益重要的研究方向。该文通过运用汉语分词、潜在语义分析和语义匹配等技术,构造了用户兴趣模型,并同时使用了面向服务的体系结构来设计该Web信息采集统计服务,通过具体的实验验证了对Web信息结构分析和未知信息相关性预测来控制信息采集统计的效果。
关键词:
信息采集,
潜在语义分析,
面向服务的架构,
Web服务
Abstract: In network information age, the traditional statistics and prediction methods have not been applicable to Web information collection and statistics anymore and owing to the requirements of information collection and statistics in special area are clearer than before, it makes the effectively directional information collection and statistics in special area and getting the corresponding predictive results become a more important research direction. This paper applies the technologies of Chinese word segmenting, Latent Semantic Analysis(LSA), semantic matching, and constructs a user interest model. In the mean time, it uses Service-Oriented Architecture(SOA) to design the Web information collection and statistics service, and validates the effect of the analysis of Web page gathering structure and unknown information, forecast for the relevance of Web page to control the information collection and statistics by concrete experiments.
Key words:
information collection,
Latent Semantic Analysis(LSA),
Service-Oriented Architecture(SOA),
Web service
中图分类号:
李晓婷;张 磊;沈建京. 基于LSA的Web信息采集和统计服务[J]. 计算机工程, 2008, 34(15): 83-84,8.
LI Xiao-ting; ZHANG Lei; SHEN Jian-jing. Web Information Collection and Statistics Services Based on LSA[J]. Computer Engineering, 2008, 34(15): 83-84,8.