Duplication Deletion Method for Structural Information

doi:10.3969/j.issn.1000-3428.2009.03.009

Computer Engineering ›› 2009, Vol. 35 ›› Issue (3): 23-25,2. doi: 10.3969/j.issn.1000-3428.2009.03.009

• Software Technology and Database • Previous Articles Next Articles

Duplication Deletion Method for Structural Information

LI Lin, LIU Gui-feng, ZHAO Peng-peng, CUI Zhi-ming

(Institute of Intelligent Information Processing and Application, Soochow University, Suzhou 215006)

Received:1900-01-01 Revised:1900-01-01 Online:2009-02-05 Published:2009-02-05

结构化信息的去重方法

李林，刘桂峰，赵朋朋，崔志明

(苏州大学智能信息处理及应用研究所，苏州 215006)

Abstract

Abstract: This paper proposes a learning-based duplication deletion method for structural information on Web. It prepares a training set for producing classifier, classifies different attribute fields of structured information in pages, and computes the distances according to the classifier. The distance between the whole information object and classified sample information is computed, and whether the record is duplicate by comparing with threshold is judged.

摘要： 针对载有结构化信息的网页，提出一种基于学习的去重方法。通过先期准备的样本定义分类器，根据分类器对页面中结构化信息的不同属性字段进行分类和距离计算，计算出整个信息对象和已分类样本信息的距离，以这些距离与阈值的大小关系判断该信息对象是否重复。

关键词: 相似性测度, 去重, 聚类

CLC Number:

TP391

LI Lin; LIU Gui-feng; ZHAO Peng-peng; CUI Zhi-ming. Duplication Deletion Method for Structural Information[J]. Computer Engineering, 2009, 35(3): 23-25,2.

李林;刘桂峰;赵朋朋;崔志明. 结构化信息的去重方法[J]. 计算机工程, 2009, 35(3): 23-25,2.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2009.03.009

http://www.ecice06.com/EN/Y2009/V35/I3/23

[1]	Yuyan JIANG, Chengfeng TAO, Ping LI. Deep Subspace Clustering Algorithm with Data Augmentation and Adaptive Self-Paced Learning [J]. Computer Engineering, 2023, 49(8): 96-103, 110.
[2]	Meiguang ZHENG, Yong YANG. Personalized Federated Learning Algorithm Based on Mutual Information and Soft Clustering [J]. Computer Engineering, 2023, 49(8): 20-28.
[3]	Zeshui LI, Junzhong JI, Cuicui YANG. Functional Module Detection Based on Deep Network Embedding of Edge Weighing Information in PPIN [J]. Computer Engineering, 2023, 49(8): 69-76.
[4]	Tianchen QIU, Xiaoying ZHENG, Yongxin ZHU, Songlin FENG. Federated Learning Architecture for Non-IID Data [J]. Computer Engineering, 2023, 49(7): 110-117.
[5]	Weifen TANG, Cuifang GAO. Dynamic Time Warping Algorithm with Adaptive Weighting of Extreme Points [J]. Computer Engineering, 2023, 49(7): 150-160.
[6]	GAO Xiaofang, YUAN Yuliang, WEN Jing, BAI Xuefei. Label Propagation Algorithm for Intersecting Multi-manifolds Clustering [J]. Computer Engineering, 2023, 49(6): 90-98.
[7]	WEI Ya, ZHANG Zhengjun, HE Kailin, TANG Li. Density Peak Clustering Algorithm Based on Relative Density [J]. Computer Engineering, 2023, 49(6): 53-61.
[8]	DAI Haolei, HUANG Yonghui, ZHOU Guoxu. Clustering Analysis Based on Hyper-graph Regularized Non-Negative Tensor Train Decomposition [J]. Computer Engineering, 2023, 49(6): 81-89.
[9]	LI Xiaoteng, ZHANG Panpan, GOU Zhinan, GAO Kai. Multi-Modal Named Entity Recognition Method Based on Multi-Task Learning [J]. Computer Engineering, 2023, 49(4): 114-119.
[10]	ZHANG Sheng, TANG Fan, ZHANG Tianqi, FAN Sen. FCM-SSGP Method for Ultra-Wideband Indoor Localization [J]. Computer Engineering, 2023, 49(3): 211-220.
[11]	CHENG Xiaohui, LI Yu, KANG Yanping. Double Standard Pruning of Convolution Network Based on Feature Extraction of Intermediate Graph [J]. Computer Engineering, 2023, 49(3): 105-112.
[12]	BI Xiang, HUANG Huang, ZHANG Benhong, WEI Xing. V2V Composite Routing Algorithm for Internet of Vehicles Based on Clustering and Improved Q-Learning [J]. Computer Engineering, 2023, 49(3): 221-230,247.
[13]	YUAN Lining, HU Hao, LIU Zhao. Graph Representation Learning Based on Multi-Channel Graph Convolutional Autoencoders [J]. Computer Engineering, 2023, 49(2): 150-160,174.
[14]	CAI Ruichu, WU Yunjin, CHEN Wei, HAO Zhifeng. Collective Causal Relations Discovery Algorithm for Multivariate Time-Series [J]. Computer Engineering, 2023, 49(2): 127-135.
[15]	HU Huiqi, ZHANG Weiqiang, XU Chen. Discriminant Enhanced Sparse Subspace Clustering [J]. Computer Engineering, 2023, 49(2): 98-104.

Please choose a citation manager

Content to export

Duplication Deletion Method for Structural Information

结构化信息的去重方法

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Duplication Deletion Method for Structural Information

结构化信息的去重方法

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments