Deep Web Data Source Classification Based on Query Interface Context

doi:10.3969/j.issn.1000-3428.2010.12.023

Computer Engineering ›› 2010, Vol. 36 ›› Issue (12): 66-68. doi: 10.3969/j.issn.1000-3428.2010.12.023

• Networks and Communications • Previous Articles Next Articles

Deep Web Data Source Classification Based on Query Interface Context

HUA Hui, FU Yu-chen, ZHOU Xiao-ke

(School of Computer Science & Technology, Soochow University, Suzhou 215006)

Online:2010-06-20 Published:2010-06-20

基于查询接口文本的Deep Web数据源分类

华慧，伏玉琛，周小科

(苏州大学计算机科学与技术学院，苏州 215006)

作者简介:华慧(1984－)，男，硕士研究生，主研方向：数据挖掘，模式识别；伏玉琛，副教授；周小科，讲师
基金资助:
国家自然科学基金资助项目(60673092)；2007质检公益项目科研专项基金资助项目(10-60)；江苏省高校自然科学基金资助项目(07KJD520187)；江苏省现代企业信息化应用支撑软件工程技术研究开发中心开放基金资助项目(SX200902)

Abstract

Abstract: As the volume of information in the Deep Web grows, a Deep Web data source classification algorithm based on query interface context is presented. Two methods are combined to get the search interfaces similarity. One is based on the vector space. The classical TF-IDF statistics are used to gain the similarity between search interfaces. The other is to compute the two pages semantic similarity by the use of HowNet. Based on the K-NN algorithm, a WDB classifaction algorithm is presented. Experimental results show this algorithm generates high-quality clusters, measuring with both in terms of entropy and F-measure. It has the practical value of application.

Key words: Deep Web, data source classification, HowNet, K-NN algorithm, semantic classification

摘要： 根据Deep Web数量的爆炸性增长特点，提出一种基于查询接口文本的Deep Web数据源分类算法，对于分类的查询接口，采用 2种方法：基于向量空间的TF-IDF方法和基于知网的语义相似度方法。综合2种方法获得接口之间的相似度。借鉴K-NN算法，提出WDB分类算法，从而实现Deep Web数据源的分类。实验结果表明，该算法在熵和F-measure 2种评价标准上均能获得较高质量，具有一定实用价值。

关键词: 深层网, 数据源分类, 知网, K-NN算法, 语义分类

CLC Number:

TP311.52

HUA Hui, FU Yu-Chen, ZHOU Xiao-Ke. Deep Web Data Source Classification Based on Query Interface Context[J]. Computer Engineering, 2010, 36(12): 66-68.

华慧, 伏玉琛, 周小科. 基于查询接口文本的Deep Web数据源分类[J]. 计算机工程, 2010, 36(12): 66-68.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2010.12.023

http://www.ecice06.com/EN/Y2010/V36/I12/66

[1]	LI Shibao, LI He, ZHAO Qingshuai, YIN Lele, LIU Jianhang, HUANG Tingpei. Chinese Textual Entailment Recognition Fused with External Semantic Knowledge [J]. Computer Engineering, 2021, 47(1): 44-49.
[2]	DENG Han,ZHU Xinhua,LI Qi,PENG Qi. Sentence Similarity Calculation Based on Syntactic Structure and Modifier [J]. Computer Engineering, 2017, 43(9): 240-244,249.
[3]	LI Xiaohong,CAO Lin,SU Yun,MA Huifang. Feature Extension Algorithm Fusing Statistical Information and Semantic Similarity [J]. Computer Engineering, 2017, 43(6): 177-181.
[4]	XIAN Xuefeng,CUI Zhiming,FANG Ligang,GU Caidong,SUN Xun. Data Source Two-layer Selection Model for Deep Web Localized Data Integration [J]. Computer Engineering, 2017, 43(3): 32-39.
[5]	YE Shiren,SUN Ning. Chinese Text Classification by Domain Ontology Graph Based on Concept Clustering [J]. Computer Engineering, 2016, 42(12): 181-187.
[6]	DENG Song. Deep Web Data Source Selection for Entity Information Integrated Retrieval [J]. Computer Engineering, 2016, 42(10): 75-79.
[7]	WEI Wei,XIANG Yang. Method of Word Similarity Computation Based on HowNet 2008 [J]. Computer Engineering, 2015, 41(9): 215-219.
[8]	KONG Yan-Yan, SHI Hua-Ji. Deep Web Data Region Identification Based on Similar URL [J]. Computer Engineering, 2012, 38(2): 48-50.
[9]	WANG Zhen-Yu, TUN Ze-Heng, HU Fang-Chao. Words Sentiment Polarity Calculation Based on HowNet and PMI [J]. Computer Engineering, 2012, 38(15): 187-189,193.
[10]	CHEN Li-Jun, LIN Fu-Zhong. Pattern Matching Method for Deep Web Interface Integration [J]. Computer Engineering, 2012, 38(12): 42-44.
[11]	LI Bei, ZHANG Lei. Recognition of Chinese Personal Name Based on Error-driven Learning and HowNet [J]. Computer Engineering, 2012, 38(12): 179-181.
[12]	LIU Hui, HUANG Kuan-Na, TU Jian-Qiao. Crawling Strategy of Deep Web Crawler [J]. Computer Engineering, 2012, 38(11): 284-286.
[13]	LIU Jin-Ling, LIU Dan, ZHOU Hong. Extraction Method of Chinese Short Message Text Lexical Chain Based on HowNet [J]. Computer Engineering, 2012, 38(10): 67-69.
[14]	LI Dao-Shen, LIU Yong. Deep Web Data Sources Discovery Method Based on Ontology [J]. Computer Engineering, 2012, 38(04): 52-54.
[15]	DIAO Zhen, TUN Ning, SONG Fen-Fen. Sentence Semantic Similarity Calculation Based on Multi-feature Fusion [J]. Computer Engineering, 2012, 38(01): 171-173.

Please choose a citation manager

Content to export

Deep Web Data Source Classification Based on Query Interface Context

基于查询接口文本的Deep Web数据源分类

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Deep Web Data Source Classification Based on Query Interface Context

基于查询接口文本的Deep Web数据源分类

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments