作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (18): 265-267. doi: 10.3969/j.issn.1000-3428.2007.18.093

• 开发研究与设计技术 • 上一篇    下一篇

基于Web的智能信息采集及处理系统设计与实现

张 帆,李琳娜,杨炳儒   

  1. (北京科技大学信息工程学院,北京100083)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-09-20 发布日期:2007-09-20

Design and Implementation of Intelligent Information Collection and Processing System Based on Web

ZHANG Fan, LI Lin-na, YANG Bing-ru   

  1. (School of Information and Engineering, University of Science and Technology Beijing, Beijing 100083)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-09-20 Published:2007-09-20

摘要: 互联网信息日益扩展的同时,如何采集和利用Web信息越来越备受关注。该文设计和实现的基于Web的智能信息采集及处理系统,采用高效的URL去重和基于模版的下载机制,提高了采集Web资源的性能;应用自然语言处理技术,对采集信息做智能分类和摘要,在发布上突出个性化的信息服务。与同类系统相比,智能性、实用性都显示出了明显的优势。

关键词: Web采集, URL去重, 智能信息处理, 个性化发布

Abstract: With the rapid development of Internet, collecting and exploiting Web information is extensively addressed. This paper designs and realizes one intelligent system on Web information collection and processing. On one hand, thanks to unrepreated URLs and template-based downloading, the collection performance is improved greatly. On the other hand, mature and advanced natural language processing techniques are used for classifying and abstracting the collected information. Thus, the personalization is highlighted. Experimental results show that the proposed system outperforms related work greatly.

Key words: Web collection, unrepreated URL, intelligent information processing, personal issue

中图分类号: