大规模二分图中bi-triangle的分图枚举优化研究

doi:10.19678/j.issn.1000-3428.0252080

摘要/Abstract

摘要： 二分图中bi-triangle（6-环）的枚举是局部聚类系数计算等图分析任务的核心操作。随着实际二分图数据规模不断扩大，其数据量已超出单机处理能力，亟需依托分布式算法实现高效枚举。然而，现有分布式分图枚举算法（GP算法）存在子图组合数据量庞大，消息过载及重复枚举等问题。对此，基于bi-triangle拓扑特性定制分图策略，提出两种优化算法：方法1将bi-triangle视为由三个wedge结构组成，以wedge组为基本单位生成子图，并提出基于A型和V型wedge组拼接的子图组合构造机制，显著减少子图组合的数量和数据规模，最终以wedge三元组枚举bi-triangle。此外，为解决消息过载和重复枚举，方法1提出基于分布式存储系统的子图读取机制和顶点有序性的去重机制。方法2将bi-triangle视为由两个zedge结构组成，先以wedge组为基本单位执行第一次分图，再通过“压缩zedge组”的构造与还原机制完成第二次分图，最终以zedge二元组枚举bi-triangle，实现比方法1更低阶的计算复杂度。实验表明，与GP算法相比，方法1在子图组合数据量上平均减少205倍，枚举时间至少降低45倍；方法2则分别平均减少30倍，至少降低101倍。

Abstract: Bi-triangle (6-cycle) enumeration in bipartite graphs is essential for graph analysis tasks like local clustering coefficient computation. As real-world bipartite graph data scales beyond single-machine capacity, efficient distributed algorithms are needed. However, the existing distributed graph partitioning (GP) enumeration algorithm struggles with large subgraph combinations, message overload, and redundant enumeration. In this regard, two optimized algorithms are proposed based on the topological characteristics of bi-triangles: Method 1 views the bi-triangle as three wedge structures, generating subgraphs using wedge groups as the basic unit. A subgraph combination mechanism via A-type and V-type wedge group concatenation is introduced, greatly reducing the number and scale of subgraph combinations, ultimately enumerating bi-triangles through wedge triplet. To prevent message overload and redundancy, a subgraph reading mechanism via a distributed storage system and a deduplication mechanism based on vertex ordering are proposed. Method 2 decomposes the bi-triangle into two zedge structures. It first partitions the graph using wedge groups and then applies a “compressed zedge” construction and restoration mechanism for a second partition, ultimately enumerating bi-triangles through zedge pairs with lower computational complexity than Method 1. Experiments show that, compared to GP, Method 1 reduces subgraph data by 205x on average and enumeration time by at least 45x, while Method 2 achieves average reductions of 30x and at least 101x, respectively.

朱星坡, 王晓阳. 大规模二分图中bi-triangle的分图枚举优化研究[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252080.

ZHU XingPo, WANG Xiaoyang. Optimization Research on Graph Partitioning Enumeration of Bi-Triangles in Large Bipartite networks[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252080.

参考文献

[1] Huang J, Shen H, Cao Q, et al. Signed bipartite graph neural networks[C]//Proceedings of the 30th ACM international conference on information & knowledge management. 2021: 740-749.
[2] Wu Z, Song C, Chen Y, et al. A review of recommendation system research based on bipartite graph[C]//MATEC Web of Conferences. EDP Sciences, 2021, 336: 05010.
[3] Borgatti S P, Everett M G. Network analysis of 2-mode data[J]. Social networks, 1997, 19(3): 243-269.
[4] Latapy M, Magnien C, Del Vecchio N. Basic notions for the analysis of large two-mode networks[J]. Social networks, 2008, 30(1): 31-48.
[5] Xie G, Li J, Gu G, et al. Bgmsdda: a bipartite graph diffusion algorithm with multiple similarity integration for drug–disease association prediction[J]. Molecular Omics, 2021, 17(6): 997-1011.
[6] Chi C, Ye Y, Chen B, et al. Bipartite graph-based approach for clustering of cell lines by gene expression–drug response associations[J]. Bioinformatics, 2021, 37(17): 2617-2626.
[7] Chen W, Wang H, Long Z, et al. Fast flexible bipartite graph model for co-clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 35(7): 6930-6940.
[8] Zhang H, Nie F, Li X. Large-scale clustering with structured optimal bipartite graph[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 9950-9963.
[9] Maier C, Simovici D. Bipartite graphs and recommendation systems[J]. Journal of Advances in Information Technology-in print, 2022.
[10] Gurukar S, Pancha N, Zhai A, et al. Multibisage: A web-scale recommendation system using multiple bipartite graphs at pinterest[J]. arXiv preprint arXiv:2205.10666, 2022.
[11] Yang J, Peng Y, Zhang W. (p, q)-biclique counting and enumeration for large sparse bipartite graphs[J]. Proceedings of the VLDB Endowment, 2021, 15(2): 141-153.
[12] Chen L, Liu C, Zhou R, et al. Efficient maximal biclique enumeration for large sparse bipartite graphs[J]. Proceedings of the VLDB Endowment, 2022, 15(8): 1559-1571.
[13] Sun R, Wu Y, Chen C, et al. Maximal balanced signed biclique enumeration in signed bipartite graphs[C]//2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022: 1887-1899.
[14] Milo R, Shen-Orr S, Itzkovitz S, et al. Network motifs: simple building blocks of complex networks[J]. Science, 2002, 298(5594): 824-827.
[15] Opsahl T. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients[J]. Social networks, 2013, 35(2): 159-167.
[16] Yang Y, Fang Y, Orlowska M E, et al. Efficient bi-triangle counting for large bipartite networks[J]. Proceedings of the VLDB Endowment, 2021, 14(6): 984-996.
[17] Zhang F, Chen D, Wang S, et al. Scalable approximate butterfly and bi-triangle counting for large bipartite networks[J]. Proceedings of the ACM on Management of Data, 2023, 1(4): 1-26.
[18] Yang Y, Fang Y, Lin X, et al. Effective and efficient truss computation over large heterogeneous information networks[C]//2020 IEEE 36th international conference on data engineering (ICDE). IEEE, 2020: 901-912.
[19] Arifuzzaman S, Khan M, Marathe M. A space-efficient parallel algorithm for counting exact triangles in massive networks[C]//2015 IEEE 17th International Conference onHigh Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. IEEE, 2015: 527-534.
[20] Ahmed A, Enns K, Thomo A. Triangle enumeration for billion-scale graphs in rdbms[C]//International Conference on Advanced Information Networking and Applications. Cham: Springer International Publishing, 2021: 160-173.
[21] Liu X, Santoso Y, Srinivasan V, et al. Practical Survey on MapReduce Subgraph Enumeration Algorithms[C]//International Conference on Emerging Internetworking, Data & Web Technologies. Cham: Springer International Publishing, 2022: 430-444.
[22] Sarıyüce A E, Pinar A. Peeling bipartite networks for dense subgraph discovery[C]//Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 2018: 504-512.
[23] Wang Y, Xu R, Jian X, et al. Towards distributed bitruss decomposition on bipartite graphs[J]. Proceedings of the VLDB Endowment, 2022, 15(9): 1889-1901.
[24] Liu Q, Liao X, Huang X, et al. Distributed (α, β)-core decomposition over bipartite graphs[C]//2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2023: 909-921.
[25] 周旭,翁同峰,杨志邦,等.面向大规模二部图的分布式 Tip 分解算法[J].软件学报, 2022(003):033. Zhou X, Weng TF, Yang ZB et al. Distributed Algorithm for Tip Decomposition on Large Bipartite Graphs[J]. Ruan Jian Xue Bao/Journal of Software,2022(003):033.
[26] Niu J, Zola J, Sarıyüce A E. Counting induced 6-cycles in bipartite graphs[C]//Proceedings of the 51st International Conference on Parallel Processing. 2022: 1-10.
[27] Wang J, Fu A W C, Cheng J. Rectangle counting in large bipartite graphs[C]//2014 IEEE International Congress on Big Data. IEEE, 2014: 17-24.
[28] Sanei-Mehri S V, Sariyuce A E, Tirthapura S. Butterfly counting in bipartite networks[C]//Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018: 2150-2159.
[29] Wang K, Lin X, Qin L, et al. Vertex Priority Based Butterfly Counting for Large-scale Bipartite Networks[J]. PVLDB, 2019.
[30] Shi J, Shun J. Parallel algorithms for butterfly computations[M]//Massive Graph Analytics. Chapman and Hall/CRC, 2022: 287-330.
[31] Papadias S, Kaoudi Z, Pandey V, et al. Counting butterflies in fully dynamic bipartite graph streams[C]//2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, 2024: 2917-2930.
[32] Tang Y, Bendre M, Das M. Monarch: Distributed Butterfly Counting for Large-scale Bipartite Graph[C]//2024 IEEE International Conference on Big Data (BigData). IEEE, 2024: 799-804.
[33] Zhou A, Wang Y, Chen L. Butterfly counting on uncertain bipartite graphs[J]. Proceedings of the VLDB Endowment, 2021, 15(2): 211-223.
[34] Cai X, Ke X, Wang K, et al. Efficient temporal butterfly counting and enumeration on temporal bipartite graphs[J]. arXiv preprint arXiv:2306.00893, 2023.
[35] Liu B, Yuan L, Lin X, et al. Efficient (α, β)-core computation: An index-based approach[C]//The World Wide Web Conference. 2019: 1130-1141. [36] Wang K, Lin X, Qin L, et al. Towards efficient solutions of bitruss decomposition for large-scale bipartite graphs[J]. The VLDB Journal, 2022, 31(2): 203-226.
[37] 代强强, 于瀚文, 李荣华, 李振军, 王国仁. 面向二部图的极大缺陷二团高效枚举算法 . 软件学报 . http://www.jos.org.cn/1000-9825/7270.htm Dai QQ, Yu HW, Li RH, Li ZJ, Wang GR. Efficient Algorithms for Maximal Defective Biclique Enumeration on Bipartite Graphs. Ruan Jian Xue Bao/Journal of Software (in Chinese). http://www.jos.org.cn/1000-9825/7270.htm
[38] 张毅豪,华征宇,袁龙,等.基于距离泛化的二分图(α, β )-core 高效分解算法 [J]. 计算机科学,2024,51(11):95-102. Zhang YH, Hua ZY, Yuan L et al. Distance—generalized Based (α,β)-core Decomposition on Bipartite Graphs[J]. Computer Science,2024,51(11):95-102.
[39] 赵兴旺,薛晋芳.基于二部图表示的属性网络社区发现算法[J].计算机科学,2023,50(11):107-113. Zhao XW, Xue JF. Community Discovery Algorithm for Attributed Networks Based on Bipartite Graph Representation[J].Computer Science,2023,50(11):107-113.

选择文件类型/文献管理软件名称

选择包含的内容