作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (6): 88-97. doi: 10.19678/j.issn.1000-3428.0059693

• 人工智能与模式识别 • 上一篇    下一篇

结合区间二型FRCM与混合度量的两阶段信息粒化

邵丽洁, 马福民   

  1. 南京财经大学 信息工程学院, 南京 210023
  • 收稿日期:2020-10-12 修回日期:2020-12-08 发布日期:2020-12-15
  • 作者简介:邵丽洁(1995-),女,硕士研究生,主研方向为数据挖掘;马福民(通信作者),教授、博士。

Two-Phase Information Granulation Combined with Interval Type-2 FRCM and Mixed Metrics

SHAO Lijie, MA Fumin   

  1. College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China
  • Received:2020-10-12 Revised:2020-12-08 Published:2020-12-15
  • Contact: 国家自然科学基金(61973151);江苏省自然科学基金(BK20191406);江苏省高校自然科学研究重大项目(17KJA120001)。 E-mail:fmmatj@26.com

摘要: 针对类簇交叉且分布不均衡的复杂数据,依据可信粒度准则,提出一种结合区间二型模糊粗糙C均值(IT2FRCM)聚类与混合度量的两阶段信息粒化算法。在第一阶段,利用IT2FRCM算法对原始数据进行聚类分析,得到初始的信息粒。在第二阶段,综合考虑数据空间分布、样本规模及粒子性质等因素,采用混合度量方法设计均衡证据合理性和语义独特性的粒化函数,并基于可信粒度准则优化由覆盖度和独特性组成的复合函数,求解最佳粒子边界。在人工数据集和UCI数据集上的实验结果表明,该算法能够有效提高不平衡数据的信息粒化质量和粒子代表性,在归类正确数、粒子特性等指标上均取得了理想表现。

关键词: 信息粒化, 可信粒度准则, 聚类, 密度, 混合度量

Abstract: To address the unevenly distributed complex data with crossed clusters, this paper proposes a two-phase information granulation algorithm based on the trusted granularity criterion, which combines Interval Type-2 Fuzzy C-Means(IT2FCM) clustering and hybrid metrics.In the first phase, the IT2FCM algorithm is used to cluster the raw data to get the initial information granule.In the second phase, considering the spatial distribution of data, sample size and granule properties, a granulation function is designed to balance the rationality of evidence and semantic uniqueness by using the mixed metric method, and the composite function composed of coverage and uniqueness is optimized based on the credible granularity criterion to solve the optimal granule boundary.The experimental results on artificial data sets and UCI data sets show that the proposed algorithm can effectively improve the information granulation quality and granule representativeness of unbalanced data, and achieve ideal performance in the correct number of classification, granule characteristics and other indicators.

Key words: information granularity, credible granularity criterion, clustering, density, mixed metrics

中图分类号: