作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (7): 259-268. doi: 10.19678/j.issn.1000-3428.0064882

• 开发研究与工程应用 • 上一篇    下一篇

基于滑动时间窗的物联网设备流量分类算法

余长宏, 陆雅, 王海鑫, 高明   

  1. 浙江工商大学 信息与电子工程学院, 杭州 310000
  • 收稿日期:2022-06-01 出版日期:2023-07-15 发布日期:2022-09-29
  • 作者简介:

    余长宏(1978—),男,副教授、博士,主研方向为工业互联网、人工智能、信号处理

    陆雅,硕士研究生

    王海鑫,硕士研究生

    高明,副教授、博士

  • 基金资助:
    国家自然科学基金(61871468); 浙江省重点研发计划(2017C01G2050953)

Traffic Classification Algorithm for IoT Device Based on Sliding Time Window

Changhong YU, Ya LU, Haixin WANG, Ming GAO   

  1. School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou 310000, China
  • Received:2022-06-01 Online:2023-07-15 Published:2022-09-29

摘要:

现有的物联网设备流量分类方案多依赖完整的流或流的前几个数据包。依赖完整的流会使流量数据增多,从而增加计算复杂度与存储资源的消耗,但物联网设备的存储空间与CPU性能都十分有限;而依赖流的前几个数据包,若其部分数据包丢失就会导致分类效果变差。针对上述问题,提出一种基于滑动时间窗口的随机森林物联网设备流量分类算法,利用物联网流量信息来表征各种设备的属性。首先,基于物联网设备流量时间依赖性的特点,利用滑动时间窗口将流划分为多个时间周期为T的子流;然后,基于物联网设备流量的加密特性,从子流中提取流信息与流头部的数据包信息建立特征向量;最后,基于随机森林随机抽样和随机选特征的特性构建分类模型,以增强模型的泛化能力,进一步提高分类性能。在公开数据集UNSW上的实验结果表明,该算法的分类准确率为96.23%、精确率为94.8%、召回率为91.47%、F1值为93%,具有较好的分类效果。

关键词: 物联网, 流量分类, 网络安全, 随机森林, 设备管理, 服务质量

Abstract:

Existing traffic classification schemes for Internet of Things(IoT) devices rely mostly on the complete flow or the first few packets of the flow.If the scheme relies on the complete flow, this will lead to more data, thus increasing the computing complexity and storage resource consumption, however, the storage space and CPU performance of IoT devices are very limited; if the scheme relies on the first few packets of the flow, if some of the first several packets that depend on the flow are lost, the classification effect is poor. To solve these problems, this paper proposes a random forest traffic classification algorithm for IoT devices based on sliding time window.This algorithm uses IoT traffic information to characterize the attributes of various devices.First, based on the time-dependent characteristics of the flow of IoT devices, the flow is divided into several sub-flows with a period of T using a sliding time window. Second, based on the encryption characteristics of the IoT device traffic, the flow and packet information of the flow head are extracted from the sub-flow to establish the feature vector. Finally, a classification model is constructed based on the characteristics of random sampling and randomly selected features of the random forest to enhance the generalization ability of the model and further improve the classification performance. The experimental results on the public dataset UNSW show that the classification accuracy, precision, recall rate, and F1 value are 96.23%, 94.8%, 91.47%, and 93%, respectively, indicating good classification accuracy.

Key words: Internet of Things(IoT), traffic classification, network security, random forest, device management, Quality of Service(QoS)