Efficient Cleaning Approach for XML Data

doi:10.3969/j.issn.1000-3428.2008.15.017

Computer Engineering ›› 2008, Vol. 34 ›› Issue (15): 47-50. doi: 10.3969/j.issn.1000-3428.2008.15.017

• Software Technology and Database • Previous Articles Next Articles

Efficient Cleaning Approach for XML Data

HAN Jing-yu1,2, CHENG Yu2, DONG Yi-sheng2

（1. School of Computer, Nanjing University of Posts & Telecomunications, Nanjing 210003; 2. Department of Computer Science and Engineering, Southeast University, Nanjing 210096）

Received:1900-01-01 Revised:1900-01-01 Online:2008-08-05 Published:2008-08-05

一种有效的XML数据清洗方法

韩京宇1,2，成瑜2，董逸生2

（1. 南京邮电大学计算机学院，南京 210003；2. 东南大学计算机科学与工程系，南京 210096）

Abstract

Abstract: By studying characteristics of duplicate XML data, this paper proposes an active machine learning method for a specific application, which is applied to glean transformation rules and matching rules, and accurately identify duplicate XML elements. Transfomation rules are used to eliminate the structural diversities among elements and matching rules are used to identify the relationships between parent and child nodes. In turn, during the detection phase an efficient hash filter algorithm is proposed to reduce computational complexity. Theory and experiment shows that the method can solve this problem efficiently and effectively.

Key words: active learning, matching rules, hash

摘要： 研究XML格式的重复数据元素的特点，提出对于特定应用领域，在具体的上下文环境中主动学习XML重复元素的识别规则。通过结构转换，将结构不尽相同的XML数据映射成结构一致的数据，并通过学习不同层次数据元素间的依赖关系权重来获得匹配规则。根据学习得到的转换和匹配规则，采用哈希过滤的方法来提高检测重复XML元素的效率。该方法能够有效地解决XML重复检测面临的结构多样性的问题，理论分析和实验表明，该方法有较高的精度和效率。

关键词: 主动学习, 匹配规则, 哈希

CLC Number:

TP311

HAN Jing-yu; CHENG Yu; DONG Yi-sheng. Efficient Cleaning Approach for XML Data[J]. Computer Engineering, 2008, 34(15): 47-50.

韩京宇;成瑜;董逸生. 一种有效的XML数据清洗方法[J]. 计算机工程, 2008, 34(15): 47-50.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2008.15.017

http://www.ecice06.com/EN/Y2008/V34/I15/47

[1]	LIANG Tianyou, MENG Min, WU Jigang. Unsupervised Cross-Modal Hashing Based on Feature Fusion [J]. Computer Engineering, 2023, 49(2): 90-97.
[2]	HE Yue, CHEN Guangsheng, JING Weipeng, XU Zekun. Remote Sensing Image Retrieval Based on Deep Multi-Similarity Hashing Method [J]. Computer Engineering, 2023, 49(2): 206-212.
[3]	ZENG Chang, JIANG Wenbao, GUO Yangnan. File Data Synchronization Method Based on Ordered Hash Chain [J]. Computer Engineering, 2023, 49(1): 181-190,200.
[4]	WEI Chengjing, LI Guodong. Encryption Algorithm of Video Images Combining Hyper-Chaotic System and Logistic Mapping [J]. Computer Engineering, 2022, 48(5): 263-271.
[5]	PENG Hongyan, LI Jie, SHI Zhenkui, LI Xianxian. A Blockchain-based Verifiable Encrypted Image Retrieval Scheme [J]. Computer Engineering, 2022, 48(2): 25-33,39.
[6]	GU Yan, ZHAO Chongyu, HUANG Ping. Deep Hash Learning Model Based on High-Order Statistical Information [J]. Computer Engineering, 2020, 46(7): 260-267,276.
[7]	REN Dezhi, CHEN Juguang, WANG Yong, DUAN Xiaoran, HAO Yujie, WU Xiaohua. Spatial Query Authentication Method Based on MIR Tree [J]. Computer Engineering, 2020, 46(3): 114-119,128.
[8]	LI Jie, ZHU Hongliang, CHEN Yuling, XIN Yang. Improved Parallel Apriori Algorithm Based on Hash Storage and Transaction Weighting [J]. Computer Engineering, 2020, 46(11): 109-116.
[9]	GE Binghui, ZHAO Zongqu, HE Zheng, QIN Panke. Ring Signature Scheme of Programmable Hash Function on Lattices [J]. Computer Engineering, 2020, 46(10): 131-136.
[10]	SHI Zhicai, WANG Yihan, ZHANG Xiaomei, CHEN Jiwei, CHEN Shanshan. An RFID Grouping-proof Protocol with Privacy Protection and Forward Security [J]. Computer Engineering, 2020, 46(1): 108-113.
[11]	YE Qing, WANG Mingming, TANG Yongli, QIN Panke, WANG Yongjun. HIBE Scheme Based on Programmable Hash Function on Lattices [J]. Computer Engineering, 2020, 46(1): 129-135,143.
[12]	DAI Yalan,HE Lang,HUANG Zhangcan. Unsupervised image hashing algorithm based on sparse-autoencoder [J]. Computer Engineering, 2019, 45(5): 222-225,236.
[13]	SHEN Xindi,ZHAI Dongjun,ZHANG Detian,LIU An. Privacy Preserving POI Recommendation Algorithm Based on LSH [J]. Computer Engineering, 2019, 45(1): 96-102.
[14]	HUANG Baohua,L Qi,MO Jiawei. Ciphertext Fuzzy Search Scheme Based on Pinyin Similarity in Cloud Storage [J]. Computer Engineering, 2019, 45(1): 103-108.
[15]	LI Mengdong,SHAO Yufang,SUN Yuqing,LI Jie. Efficiency Analysis of SWIFFT Algorithm [J]. Computer Engineering, 2019, 45(1): 109-114.

Please choose a citation manager

Content to export

Efficient Cleaning Approach for XML Data

一种有效的XML数据清洗方法

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Efficient Cleaning Approach for XML Data

一种有效的XML数据清洗方法

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments