计算机工程 ›› 2019, Vol. 45 ›› Issue (7): 126-133.doi: 10.19678/j.issn.1000-3428.0050589

• 移动互联与通信技术 • 上一篇    下一篇

基于状态相关字段的二进制协议状态机推断

闫小勇, 李青, 莫有权   

  1. 信息工程大学 信息系统工程学院, 郑州 450000
  • 收稿日期:2018-03-05 修回日期:2018-05-21 出版日期:2019-07-15 发布日期:2019-07-23
  • 作者简介:闫小勇(1993-),男,硕士研究生,主研方向为协议逆向技术、数据挖掘;李青、莫有权,副教授。
  • 基金项目:
    国家自然科学基金“多天线无线携能通信系统中的物理层安全传输技术研究”(61601516)。

State Machine Inference for Binary Protocol Based on State-related Field

YAN Xiaoyong, LI Qing, MO Youquan   

  1. School of Information System and Engineering, Information Engineering University, Zhengzhou 450000, China
  • Received:2018-03-05 Revised:2018-05-21 Online:2019-07-15 Published:2019-07-23

摘要: 在通信协议规范中,报文的格式类型与状态类型不存在一一映射关系,通过聚类较难将格式类型相同、状态类型不同的报文分离。为此,提出一种基于状态相关字段的二进制私有协议状态机推断方法。根据最长公共子序列距离进行状态相关字段识别,以获取协议会话的行为逻辑相似性。构建基于邻接表的初始状态机,对其进行异常会话去除与相似状态合并,从而降低协议状态机的规模。在TCP协议和SMB协议数据集上的测试结果表明,该方法能够有效推断二进制私有协议状态机,其准确率与召回率均较高。

关键词: 协议状态机, 二进制协议, 最长公共子序列距离, 邻接表, 异常会话去除, 相似状态合并

Abstract: As the one-to-one mapping relationship does not exist between the message format type and the message status type in the communication protocol specification,it is difficult to separate messages with the same format type and different status type by clustering.Therefore,a state machine inference method for binary private protocol based on state-related field is proposed.State-related field are identified according to the Longest Common Subsequence Distance(LCSD) to obtain the logical similarity of protocol sessions.An initial state machine based on adjacency table is constructed,and its abnormal session is removed and similar state is merged to reduce the size of protocol state machine.Test results on TCP and SMB protocol datasets show that the proposed method can effectively infer the state machine of binary private protocol,and both its accuracy and recall rate are high.

Key words: protocol state machine, binary protocol, the Longest Common Subsequence Distance(LCSD), adjacency table, abnormal session removal, similar state merging

中图分类号: