作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

大规模二分图中bi-triangle的分图枚举优化研究

  • 发布日期:2025-05-19

Optimization Research on Graph Partitioning Enumeration of Bi-Triangles in Large Bipartite networks

  • Published:2025-05-19

摘要: 二分图中bi-triangle(6-环)的枚举是局部聚类系数计算等图分析任务的核心操作。随着实际二分图数据规模不断扩大,其数据量已超出单机处理能力,亟需依托分布式算法实现高效枚举。然而,现有分布式分图枚举算法(GP算法)存在子图组合数据量庞大,消息过载及重复枚举等问题。对此,基于bi-triangle拓扑特性定制分图策略,提出两种优化算法:方法1将bi-triangle视为由三个wedge结构组成,以wedge组为基本单位生成子图,并提出基于A型和V型wedge组拼接的子图组合构造机制,显著减少子图组合的数量和数据规模,最终以wedge三元组枚举bi-triangle。此外,为解决消息过载和重复枚举,方法1提出基于分布式存储系统的子图读取机制和顶点有序性的去重机制。方法2将bi-triangle视为由两个zedge结构组成,先以wedge组为基本单位执行第一次分图,再通过“压缩zedge组”的构造与还原机制完成第二次分图,最终以zedge二元组枚举bi-triangle,实现比方法1更低阶的计算复杂度。实验表明,与GP算法相比,方法1在子图组合数据量上平均减少205倍,枚举时间至少降低45倍;方法2则分别平均减少30倍,至少降低101倍。

Abstract: Bi-triangle (6-cycle) enumeration in bipartite graphs is essential for graph analysis tasks like local clustering coefficient computation. As real-world bipartite graph data scales beyond single-machine capacity, efficient distributed algorithms are needed. However, the existing distributed graph partitioning (GP) enumeration algorithm struggles with large subgraph combinations, message overload, and redundant enumeration. In this regard, two optimized algorithms are proposed based on the topological characteristics of bi-triangles: Method 1 views the bi-triangle as three wedge structures, generating subgraphs using wedge groups as the basic unit. A subgraph combination mechanism via A-type and V-type wedge group concatenation is introduced, greatly reducing the number and scale of subgraph combinations, ultimately enumerating bi-triangles through wedge triplet. To prevent message overload and redundancy, a subgraph reading mechanism via a distributed storage system and a deduplication mechanism based on vertex ordering are proposed. Method 2 decomposes the bi-triangle into two zedge structures. It first partitions the graph using wedge groups and then applies a “compressed zedge” construction and restoration mechanism for a second partition, ultimately enumerating bi-triangles through zedge pairs with lower computational complexity than Method 1. Experiments show that, compared to GP, Method 1 reduces subgraph data by 205x on average and enumeration time by at least 45x, while Method 2 achieves average reductions of 30x and at least 101x, respectively.