IntervalSketch: A Multi-Dimensional Sketch for Heavy Flow Detection Integrating Packet Intervals

doi:10.19678/j.issn.1000-3428.0252087

Abstract

Abstract: In the fields of network communication and traffic management, the ability to quickly and accurately identify heavy flows is of great significance for tasks such as congestion control and malicious traffic monitoring. However, the extremely high transmission rates of data flows in real-world network environments make heavy flow detection highly complex and challenging. Most existing heavy flow detection methods rely primarily on single-dimensional statistical data, typically using only flow size estimation to perform traffic statistics and analysis. The limitation of these approaches lies in their neglect of other critical dimensions of information, such as the distribution characteristics of packet intervals, which may play a key role in accurately identifying heavy flows. To address these issues, this paper proposes a novel heavy flow detection algorithm called IntervalSketch. The algorithm introduces two key traffic features: flow size estimation and packet interval distribution characteristics. By leveraging these two dimensions, IntervalSketch optimizes the protection of heavy flows and the replacement of small flows. Specifically, by incorporating the packet interval distribution, IntervalSketch effectively distinguishes between heavy flows and small flows, thereby significantly improving detection performance under low-memory conditions. To evaluate the accuracy and effectiveness of IntervalSketch, two real-world network traffic datasets, CAIDA and MAWI, were used for experimental analysis. The results demonstrate that IntervalSketch exhibits significant advantages across various memory configurations and traffic distributions. Compared to existing methods, IntervalSketch not only maintains high detection accuracy in memory-constrained environments but also achieves substantial improvements in F1 score, with gains of up to 2.4 times over current state-of-the-art designs.

摘要： 在网络通信和网络流量管理等领域中，快速、准确地识别大流对流量拥塞控制、恶意流量监测等任务具有重要意义。然而，现实网络环境中的数据流传输速率极高，使得大流检测变得异常复杂和具有挑战性。目前，大多数现有的大流检测方法主要依赖单一维度的统计数据，通常仅基于流中数据包的估计值进行流量统计与分析。这种方法的局限性在于忽略了其他潜在维度的关键信息，例如数据包间隔的分布特性，这些信息在准确识别大流时可能起到关键作用。针对上述问题提出了一种新颖的大流检测算法——间隔值草图。该算法通过引入两个维度的流量特征，即流的估计值大小和数据包间隔分布特性，优化了大流的保护与小流的替换策略。具体而言，间隔值草图通过结合数据包间隔特性，可以有效区分大流与小流，从而在低内存条件下显著提升检测性能。为验证间隔值草图的准确性和有效性，采用了两个真实网络流量数据集——CAIDA和MAWI进行实验分析。结果表明，间隔值草图在多种内存设置和流量分布情况下均表现出显著的优势。与现有方法相比，间隔值草图不仅能够在内存资源受限的情况下保持较高的检测精度，还在F1分数上实现了显著提升，最高可达到现有设计方案的2.4倍。

LIN Shu, HUANG Jiawei, SHAO Jing, LI Sitan, LIANG Qi, WANG Qile, ZHAO Yilin. IntervalSketch: A Multi-Dimensional Sketch for Heavy Flow Detection Integrating Packet Intervals[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252087.

林澍, 黄家玮, 邵婧, 李思覃, 梁琦, 王启乐, 赵艺琳. 间隔值草图：融合数据包间隔的多维度大流检测草图算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252087.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252087

References

[1] 周伟林, 杨芫, 徐明伟. 网络功能虚拟化技术研究综述[J]. 计算机研究与发展, 2018, 55(04): 675-688. ZHOU W L, YANG Y, XU M W. A Survey of Network Function Virtualization Technologies[J]. Journal of Computer Research and Development, 2018, 55(4): 675–688. (in Chinese)
[2] LU J, CHEN H, ZHANG Z. LUSketch: A fast and precise sketch for top-k finding in data streams[C]//2022 International Conference on Computer Communications and Networks (ICCCN). IEEE, 2022: 1-10.
[3] 戴冕, 程光, 周余阳. 软件定义网络的测量方法研究[J]. 软件学报, 2019, 30(06): 1853-1874. DAI M, CHENG G, ZHOU Y Y. Research on Measurement Methods in Software-Defined Networking[J]. Journal of Software, 2019, 30(6): 1853–1874. (in Chinese)
[4] CURTIS A, MOGUL J, TOURRILHES J, et al. DevoFlow: Scaling flow management for high-performance networks[C]//Proceedings of the ACM SIGCOMM 2011 Conference, 2011: 254-265.
[5] KABBANI A, ALIZADEH M, YASUDA M, et al. AF-QCN: Approximate fairness with quantized congestion notification for multi-tenanted data centers[C]//2010 18th ieee symposium on high performance interconnects. IEEE, 2010: 58-65.
[6] 苏凡军，牛咏梅，邵清. 数据中心网络快速反馈传控制协议[J]. 计算机工程, 2015, 41(4): 107-111. SU F J, NIU Y M, SHAO Q. Rapid Feedback Transmission Control Protocol for Data Center Networks[J]. Computer Engineering, 2015, 41(4): 107–111. (in Chinese)
[7] DEMAINE E, LOPEZ A, MUNRO J. Frequency estimation of internet packet streams with limited space[C]//European Symposium on Algorithms. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002: 348-360.
[8] 林耘森箫, 毕军, 周禹. 基于 P4 的可编程数据平面研究及其应用 [J]. 计算机学报 , 2019, 42(11) 2539-2560. LIN Y S X, BI J, ZHOU Y. Research and Application of P4-Based Programmable Data Plane[J]. Chinese Journal of Computers, 2019, 42(11): 2539–2560. (in Chinese)
[9] METWALLY A, AGRAWAL D, EL A A. Efficient computation of frequent and top-k elements in data streams[C]//International conference on database theory. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005: 398-412.
[10] MANKU G S, MOTWANI R. Approximate frequency counts over data streams[C]//VLDB'02: Proceedings of the 28th International Conference on Very Large Databases. Morgan Kaufmann, 2002: 346-357.
[11] BASAT R B, EINZIGER G, FRIEDMAN R, et al. Randomized admission policy for efficient top-k and frequency estimation[C]//IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE, 2017: 1-9.
[12] YANG T, JIANG J, et al. Elastic sketch: Adaptive and fast network-wide measurements[C]//Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 2018: 561-575.
[13] CORMODE G, MUTHUKRISHNAN S. An improved data stream summary: the count-min sketch and its applications[J]. Journal of Algorithms, 2005, 55(1): 58-75.
[14] TANG L, HUANG Q, LEE P P C. Mv-sketch: A fast and compact invertible sketch for heavy flow detection in network data streams[C]//IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2019: 2026-2034.
[15] ZHOU Y, JIN H, LIU P, et al. Accurate per-flow measurement with bloom sketch[C]//IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2018: 1-2.
[16] LI J, LI Z, XU Y, et al. Wavingsketch: An unbiased and generic sketch for finding top-k items in data streams[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020: 1574-1584.
[17] ZHANG Y, LIU Z, WANG R, et al. CocoSketch: High-performance sketch-based measurement over arbitrary partial key query[C]//Proceedings of the 2021 ACM SIGCOMM 2021 Conference, 2021: 207-222.
[18] ZHOU Y, YANG T, JIANG J, et al. Cold filter: A meta-framework for faster and more accurate stream processing[C]//Proceedings of the 2018 International Conference on Management of Data, 2018: 741-756.
[19] LI Y, WANG F, YU X, et al. Ladderfilter: Filtering infrequent items with small memory and time overhead[J]. Proceedings of the ACM on Management of Data, 2023, 1(1): 1-21.
[20] HUANG J, ZHANG W, LI Y, et al. ChainSketch: An efficient and accurate sketch for heavy flow detection[J]. IEEE/ACM Transactions on Networking, 2022, 31(2): 738-753.
[21] SIVARAMAN A, SUBRAMANIAN S, ALIZADEH M, et al. Programmable packet scheduling at line rate[C]//Proceedings of the 2016 ACM SIGCOMM Conference, 2016: 44-57.
[22] LI S, HUANG J, ZHANG W, et al. PA-Sketch: A Fast and Accurate Sketch for Differentiated Flow Estimation[C]//2023 IEEE 31st International Conference on Network Protocols (ICNP). IEEE, 2023: 1-11.
[23] LAKHINA A, CROVELLA M, DIOT C. Characterization of network-wide anomalies in traffic flows[C]//Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, 2004: 201-206.
[24] HU Z, REN H, SHI P. Synchronization of complex dynamical networks subject to noisy sampling interval and packet loss[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(8): 3216-3226.
[25] POWERS D M W. Applications and explanations of Zipf’s law[C]//New methods in language processing and computational natural language learning, 1998.
[26] http://www.caida.org/data/overview/.
[27] http://mawi.wide.ad.jp/mawi/.
[28] A. Appleby. https://github.com/aappleby/smhasher.
[29] Ye J, Li L, Zhang W, et al. Ua-sketch: An accurate approach to detect heavy flow based on uninterrupted arrival[C]//Proceedings of the 51st International Conference on Parallel Processing. 2022: 1-11.
[30] CHARIKAR M, CHEN K, FARACH-COLTON M. Finding frequent items in data streams[C]//International Colloquium on Automata, Languages, and Programming. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002: 693-703

Please choose a citation manager

Content to export