作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (21): 54-56.

• 软件技术与数据库 • 上一篇    下一篇

基于模式的XML文档相似度算法

孙 霞,程宏斌   

  1. (常熟理工学院计算机科学与工程学院,江苏 常熟 215500)
  • 出版日期:2010-11-05 发布日期:2010-11-03
  • 作者简介:孙 霞(1978-),女,讲师、硕士,主研方向:数据库技术,XML技术;程宏斌,讲师、硕士

Similarity Algorithm Based on Schema of XML Document

SUN Xia, CHENG Hong-bin   

  1. (School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China)
  • Online:2010-11-05 Published:2010-11-03

摘要: 提出一种基于XML模式的文档相似度算法,其中,XML模式间的相似度是XML文档聚类的重要依据,元素是XML模式的主体,模式的相似度由元素相似度组成,该算法综合考虑XML模式中元素的结构和语义信息,进一步提高计算相似度的精度。另外,该算法通过计算XML模式间的相似度,可以降低算法的复杂度,提高聚类的准确性,易于提取聚簇的通用XML模式。

关键词: 模式, 相似度, 结构, 语义, 可扩展标记语言

Abstract: A similarity algorithm based on XML schema is brought forward. The similarity of XML Schema is an important foundation for XML clustering. Elements in XML are the main body and the similarity among elements is the major components of schemas similarity. The algorithm takes full account of the structure and semantics of elements. It can make more accurate calculation of similarity. In the mean while, it reduces the complexity and improves the accuracy of clustering. In addition, it is easy to extract the common XML schema of clustering by calculating the similarity among the XML schemas.

Key words: schema, similarity, structure, semantic, XML

中图分类号: