Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (3): 346-354. doi: 10.19678/j.issn.1000-3428.0070081

• High-Performance Computing and Big Data • Previous Articles     Next Articles

Distributed Algorithm for Tolerance Relation Rough Sets Based on Graph Optimization

WU Zhengjiang*(), WU Xingchen, LIAN Tao, WANG Mengsong   

  1. School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454003, Henan, China
  • Received:2024-07-05 Revised:2024-09-01 Online:2026-03-15 Published:2026-03-10
  • Contact: WU Zhengjiang

基于图优化的容差关系粗糙集分布式算法

吴正江*(), 武星晨, 连涛, 王梦松   

  1. 河南理工大学计算机科学与技术学院, 河南 焦作 454003
  • 通讯作者: 吴正江
  • 作者简介:

    吴正江(CCF会员), 男, 教授、博士, 主研方向为粗糙集、数据挖掘、并行计算

    武星晨,硕士研究生

    连涛, 硕士研究生

    王梦松, 硕士研究生

  • 基金资助:
    国家自然科学基金(61972134); 国家自然科学基金(62372156)

Abstract:

Researchers have introduced the theory of tolerance relation rough sets to address the data filtering problem in distributed Incomplete Information Systems (IIS). With the continuous growth in data volume, it is necessary to achieve scalable parallel computing through distributed computing. Consequently, distributed tolerance relation rough sets have emerged, among which the Block Set is the core method for computing approximate sets. However, the Block Set only uses set operations during computation, with no structure between the data, and the process involves a large number of repetitive calculations, resulting in low computational efficiency. This study proposes a Tolerance Relation rough sets Distributed (TRDG) algorithm, which is based on graph optimization, to address this issue. Using the existing concepts of reliable and disputed elements, a hierarchical directed acyclic graph is constructed with data in the IIS as nodes and asymmetric tolerance relationships as edges. The graph structure is used to organize the data. To improve the computational efficiency of the Block Set in distributed environments, the study proposes a strategy that uses the nearest tolerance relationship instead of the general asymmetric tolerance relationship to remove redundant edges, simplify the graph structure, and obtain a Block Set based on the path from reliable elements to zero-degree disputed elements. Distributed graph optimization and path search algorithms are then implemented on the Spark platform, which ultimately completes the design of the TRDG algorithm. Experimental results show that the TRDG algorithm exhibits good parallel acceleration performance. Compared to traditional general tolerance rough approximation set solving algorithms, TRDG can save computing resources, increase the average computing speed by approximately 40 times, and increase the amount of data that can be processed by more than 50 times.

Key words: directed acyclic graph, Block, tolerance relation rough sets, nearest tolerance relation, distributed computing for approximation sets

摘要:

为了处理分布式的不完备信息系统(IIS)中的数据筛选问题, 研究人员引入了容差关系粗糙集理论。随着数据量的不断增长, 需要通过分布式计算来实现可扩展的并行化计算, 因此分布式容差关系粗糙集被提出, 其中Block Set是计算近似集的核心方法。然而, Block Set在计算时仅使用集合运算, 数据之间没有结构, 过程涉及大量重复计算, 导致计算效率不高。针对这一问题, 提出一种基于图优化的容差关系粗糙集分布式(TRDG)算法。引用已有的可靠元和争议元的概念, 以IIS中的数据为结点, 以非对称容差关系为边, 构建具有层次关系的有向无环图, 使用图结构来组织数据。为了提高Block Set在分布式环境中的计算效率, 提出使用最近容差关系代替一般非对称容差关系的策略, 用于删除冗余边, 简化图结构, 并基于可靠元到零出度争议元的路径来得到Block Set。然后, 在Spark平台上实现分布式的图优化算法和路径搜索算法, 最终完成TRDG算法的设计。实验结果表明, TRDG算法具有良好的并行加速性能, 和传统的容差关系粗糙近似集求解算法相比, TRDG能够节省计算资源, 计算速度平均提高了40倍, 可处理的数据量也增加了50倍以上。

关键词: 有向无环图, Block, 容差关系粗糙集, 最近容差关系, 近似集分布式计算