基于LT码的分布式矩阵计算研究

doi:10.19678/j.issn.1000-3428.0067865

摘要/Abstract

摘要：

在如今大数据和机器学习应用范围不断扩大的背景下, 分布式计算系统成为处理庞大数据的必要工具。对于具有一定规模的计算集群, 其性能会不可避免地受到系统噪声的影响, 应考虑在分布式计算系统中借助编码技术来增强系统的鲁棒性。现有应用于分布式矩阵计算的编码方案多为固定速率编码, 无法适应节点数量动态变化的实际情况。同时, 由于部分任务有截止期限制, 应在保证任务顺利完成的前提下尽可能地减少平均开销从而降低时延。针对上述问题, 提出将LT码应用于雾计算场景下的分布式矩阵计算, 设计Remo2算法。依托LT码的无速率特性自适应信道状态变化, 通过合适的度分布函数设计以及双向切割、因子化度数的方法达到降低时延、增强分布式计算系统鲁棒性的预期效果。令k₁为A矩阵被切分后的子矩阵行值, k₂为B矩阵被切分后的子矩阵列值, 实验结果表明, 在$ {k}_{1} $值固定的前置条件下, 与FLT码及BDC-LT算法相比, Remo2算法的平均开销相对于前者稳定降低了33.3%, 相对于后者减少了7.7%的冗余。此外, 当$ {k}_{1}{k}_{2} $大小的码长固定时, $ {k}_{1} $、$ {k}_{2} $的离散化程度越低, 即$ \lim\left|{k}_{1}-{k}_{2}\right|\to 0 $, 会带来更小的平均开销。

关键词: LT码, 分布式矩阵计算, 双向切割, 因式化, 平均开销

Abstract:

Distributed computing systems have emerged as necessary tools for processing large amounts of data in the context of the continuous expansion of big data and machine learning applications. For computing clusters with a certain scale, their performance are inevitably affected by system noise. Therefore, it is necessary to consider encoding technology in distributed computing systems to enhance their robustness. Most of the existing encoding schemes used in distributed matrix computing are fixed rate codes, which cannot adapt to the actual situation of dynamic changes in the number of nodes. Meanwhile, owing to deadlines for some tasks, the average cost should be minimized completely to reduce latency while ensuring smooth task completion. To address the above issues, this paper proposes the application of the Luby Transform(LT) code to distributed matrix computing in fog computing scenarios, and designs Remo2 algorithm. Based on the rate-free characteristics of LT codes, adaptive channel state changes can be achieved through an appropriate degree distribution function design, bidirectional cutting, and degree factorization methods to reduce latency and enhance the robustness of distributed computing systems. This paper lets k₁ be the row value of the submatrix after the partition of the A matrix and k₂ be the column value of the submatrix after the partition of the B matrix. The experimental results indicate that under a fixed $ {k}_{1} $ value precondition, compared with the Factored LT(FLT) code and Block-diagonal Coding-LT(BDC-LT) algorithm, the average cost of the Remo2 algorithm can be stably reduced by 33.3% compared to that of the former, and the redundancy can be reduced by 7.7% compared to that of the latter. In addition, when the code length of $ {k}_{1}{k}_{2} $ is fixed, a lower degree of discretization of $ {k}_{1} $, $ {k}_{2} $, and $ \lim\left|{k}_{1}-{k}_{2}\right|\to 0 $ results in a smaller average overhead.

Key words: Luby Transform(LT) code, distributed matrix computing, bidirectional cutting, factorization, average cost

刘怡, 张磊. 基于LT码的分布式矩阵计算研究[J]. 计算机工程, 2024, 50(8): 328-335.

Yi LIU, Lei ZHANG. Research on Distributed Matrix Computing Based on LT Code[J]. Computer Engineering, 2024, 50(8): 328-335.

https://www.ecice06.com/CN/Y2024/V50/I8/328

图/表 8

图1 分布式矩阵乘法计算模型

Fig.1 Computing model of distributed matrix multiplication

图2 矩阵A的拆分

Fig.2 Splitting of matrix A

图3 矩阵B的拆分

Fig.3 Splitting of matrix B

图4 不同编译码算法对

$ \varepsilon $

的影响

Fig.4 The impact of different encoding and decoding algorithms on

$ \varepsilon $

图5

$ {k}_{1}={k}_{2}=20 $

时不同的

$ c、\delta $

取值对

$ \varepsilon $

的影响

Fig.5 The impact of different

$ c $

and

$ \delta $

values on

$ \varepsilon $

when

$ {k}_{1}={k}_{2}=20 $

图6

$ {k}_{1}=10,{k}_{2}=40 $

时不同的

$ c、\delta $

取值对

$ \varepsilon $

的影响

Fig.6 The impact of different

$ c $

and

$ \delta $

values on

$ \varepsilon $

when

$ {k}_{1}=10 $

and

$ {k}_{2}=40 $

图7

$ {k}_{1}=8,{k}_{2}=50 $

时不同的

$ c、\delta $

取值对

$ \varepsilon $

的影响

Fig.7 The impact of different

$ c $

and

$ \delta $

values on

$ \varepsilon $

when

$ {k}_{1}=8 $

and

$ {k}_{2}=50 $

参考文献 26

1	GADASIN D V, SHVEDOV A V, KOLTSOVA A V. Cluster model for edge computing[C]//Proceedings of 2020 International Conference on Engineering Management of Communication and Technology. Washington D. C., USA: IEEE Press, 2020: 1-4.
2	LEE K, LAM M, RAMCHANDRAN K. Speeding up distributed machine learning using codes. IEEE Transactions on Information Theory, 2018, 26(3): 1514- 1529.
3	BHATHAL G S, DHIMAN A S. Big data solution: improvised distributions framework of Hadoop[C]//Proceedings of the 2nd International Conference on Intelligent Computing and Control Systems. Washington D. C., USA: IEEE Press, 2018: 35-38.
4	KRISHNA T R, RAGUNATH T, BATTULA S K. Performance evaluation of read and write operations in Hadoop distributed file system[C]//Proceedings of the 6th International Symposium on Parallel Architectures, Algorithms and Programming. Washington D. C., USA: 2014: 110-113.
5	MERLA P, LIANG Y. Data analysis using Hadoop MapReduce environment[C]//Proceedings of 2017 IEEE International Conference on Big Data. Washington D. C., USA: IEEE Press, 2017: 4783-4785.
6	ZHAO Y, WU J, LIU C. A data aware caching for big-data applications using the MapReduce framework. Tsinghua Science and Technology, 2014, 19(1): 39- 50. doi: 10.1109/TST.2014.6733207
7	YANG G. The application of MapReduce in the cloud computing[C]//Proceedings of the 2nd International Symposium on Intelligence Information Processing and Trusted Computing. Washington D. C., USA: IEEE Press, 2011: 154-156.
8	CHEN L, ZHANG X, SUN L. Image parallel processing by using MapReduce[C]//Proceedings of 2021 International Conference on Information Science, Parallel and Distributed Systems. Washington D. C., USA: IEEE Press, 2021: 246-250.
9	JEFFREY D, BARROSO L A. The tail at scale. Communications of the ACM, 2013, 56(2): 74- 80. doi: 10.1145/2408776.2408794
10	杨逍. 基于编码的分布式计算理论与技术[D]. 南京: 东南大学, 2020.
	YANG X. Coding based distributed theory of computation and technology[D]. Nanjing: Southeast University, 2020. (in Chinese)
11	ICHIMURA S, NAGAI T. Threaded accurate matrix-matrix multiplications with sparse matrix-vector multiplications[C]//Proceedings of 2018 IEEE International Parallel and Distributed Processing Symposium. Washington D. C., USA: IEEE Press, 2018: 1093-1102.
12	CHANG W T, TANDON R. Random sampling for distributed coded matrix multiplication[C]//Proceedings of International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2019: 8187-8191.
13	LUONG T T, CUONG N N, DUNG L T. The preservation of the coefficient of fixed points of an MDS matrix under direct exponent transformation[C]//Proceedings of 2015 International Conference on Advanced Technologies for Communications. Washington D. C., USA: IEEE Press, 2015: 111-116.
14	季忠铭. 边缘分布式场景中多点协同计算的任务时延优化[D]. 合肥: 中国科学技术大学, 2022.
	JI Z M. Task delay optimization of multi-point collaborative computing in edge distributed scenarios[D]. Hefei: University of Science and Technology of China, 2022. (in Chinese)
15	王艳, 李念爽. 编码技术改进大规模分布式机器学习性能综述. 计算机研究与发展, 2020, 57(3): 542- 561. URL
	WANG Y, LI N S. Overview of coding technology improving the performance of large scale distributed machine learning. Computer Research and Development, 2020, 57(3): 542- 561. URL
16	LEE K, RAMCHANDRAN C S K. High-dimensional coded matrix multiplication. IEEE International Symposium on Information Theory, 2017, 48(1): 2418- 2422.
17	苑燕飞. 基于编码技术的分布式计算方法研究[D]. 西安: 西安电子科技大学, 2021.
	YUAN Y F. Research on distributed computing methods based on encoding technology[D]. Xi'an: Xidian University, 2021. (in Chinese)
18	YU Q, MADDAH A M, AVESTIMEHR A S. Polynomial codes: an optimal design for high-dimensional coded matrix multiplication. IEEE Transactions on Information Theory, 2017, 6(8): 82- 99.
19	JEONG H, DEVULAPALLI A, CALMON F P. ε-approximate coded matrix multiplication is nearly twice as efficient as exact multiplication. IEEE Journal on Selected Areas in Information Theory, 2021, 2(3): 845- 854. doi: 10.1109/JSAIT.2021.3099811
20	KUNG H T. Fast evaluation and interpolation. Fast Evaluation & Interpolation, 1973, 2(6): 32- 40.
21	DUTTA S, FAHIM M, HADDADPOUR F, et al. On the optimal recovery threshold of coded matrix multiplication. IEEE Transactions on Information Theory, 2020, 66(1): 278- 301. doi: 10.1109/TIT.2019.2929328
22	LIU N, LI K, TAO M. Code design and latency analysis of distributed matrix multiplication with straggling servers in fading channels. China Communications, 2021, 18(10): 15- 24. doi: 10.23919/JCC.2021.10.002
23	LUBY M. LT codes[C]//Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science. Washington D. C., USA: IEEE Press, 2002: 271-282.
24	孟云霄, 牛芳琳. 关于信源原始码长的LT码度分布设计. 信息通信, 2018, 40(7): 4- 9. URL
	MENG Y X, NIU F L. Design of LT code degree distribution on source code length. Information Communication, 2018, 40(7): 4- 9. URL
25	PRADHAN A K, HEIDARZADEH A, NARAYANAN K R. Factored LT and factored raptor codes for large-scale distributed matrix multiplication. IEEE Journal on Selected Areas in Information Theory, 2021, 2(3): 893- 906.
26	SEVERINSON A, GRAELL I A. Block-diagonal and LT codes for distributed computing with straggling servers. IEEE Transactions on Communications, 2019, 67(3): 1739- 1753.

[1]	魏德宾,李金明,潘成胜. 一种基于LT码的度分布优化算法[J]. 计算机工程, 2018, 44(9): 83-88.
[2]	赵太飞,刘雪,刘一杰. 直升机助降紫外光通信中分步式UEP-LT码研究[J]. 计算机工程, 2016, 42(9): 83-88,93.
[3]	袁博，赵旦峰，钱晋希. WSN中降低喷泉码存储冗余量的方法研究[J]. 计算机工程, 2014, 40(5): 68-72.
[4]	高宏峰，邵鸿翔，胡俊红. 基于软比特域的LT码修正震荡迭代算法[J]. 计算机工程, 2014, 40(4): 309-312.
[5]	李力，鄢田云. 一种基于LT码的数据云存储方案[J]. 计算机工程, 2014, 40(4): 7-13.
[6]	冯欣, 张艳, 贾志成. 喷泉码中LT码的二次译码算法[J]. 计算机工程, 2012, 38(06): 291-292.
[7]	高雪, 张兴会, 陈增强. 基于一维Markov映射的LT编解码研究[J]. 计算机工程, 2011, 37(23): 264-266.
[8]	何秀慧, 蒋敏兰. 具有UEP特性的LT编码方案[J]. 计算机工程, 2011, 37(22): 74-76.
[9]	赵旦峰, 钱晋希, 李炜. 无线传感器网络中不定帧长LT码系统研究[J]. 计算机工程, 2011, 37(10): 73-75.
[10]	徐公华, 张申, 刘鹏. 基于改进喷泉码的高可用性数据冗余方案[J]. 计算机工程, 2010, 36(16): 11-12.
[11]	李亮, 赵加祥, 袁鑫. 基于NSD度分布函数的LT码构造[J]. 计算机工程, 2010, 36(15): 240-241,244.
[12]	张冀, 高宏峰, 师春灵. LT码编译的改进方法[J]. 计算机工程, 2010, 36(11): 271-273,276.

选择文件类型/文献管理软件名称

选择包含的内容