作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 开发研究与工程应用 • 上一篇    下一篇

基于协议首部的字节频率统计特征发现方法

何 升,罗军勇,刘 琰   

  1. (数学工程与先进计算国家重点实验室,郑州450002)
  • 收稿日期:2014-03-13 出版日期:2015-02-15 发布日期:2015-02-13
  • 作者简介:何 升(1989 - ),男,硕士研究生,主研方向:信息安全;罗军勇,教授;刘 琰,副教授、博士。
  • 基金资助:
    国家自然科学基金资助项目(61309007);国家“863”计划基金资助项目(2012AA012902)。

Feature Discovering Method of Byte Frequency Statistics Based on Protocol Header

HE Sheng,LUO Junyong,LIU Yan   

  1. (State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450002,China)
  • Received:2014-03-13 Online:2015-02-15 Published:2015-02-13

摘要: 应用协议识别在网络安全领域具有极其广泛的应用,而如何发现协议特征是协议识别的核心问题。为此,提出一种高效准确的协议特征自动发现方法。利用协议自身的格式特点,将消息进行token 化,并根据token 序列对消息进行分类。由分类数的变化曲线大致判别协议的首部长度,从而确定字频统计的范围。对数据流中每个数据包的消息首部进行字节频率统计,并将字节频率进行归一化处理,得到字节频率特征向量。通过计算待测协议与样本协议的余弦相似度对协议进行分类和识别。实验结果表明,用该方法所提取的特征进行识别,准确率超过93. 5% 。

关键词: 协议识别, token 化, 字节频率, 特征向量, 余弦相似度

Abstract: Application protocol identification is widely applied in network security and the key problem of the protocol is how to discover the protocol feature. This paper proposes an efficient and precise method to automatically discover the protocol feature. The method takes advantage of the feature of protocol format to token the message,classify the messages according to the token sequence,and generally discriminate the protocol header length by change curve of classification number. Thus determine the scope of the word frequency statistics. The byte frequency of each data packet message header in data stream is counted and dealt under normalization. It gets the byte frequency vector of the protocol header,and utilizes the cosine similarity by calculating measured protocol and sample protocol to classify and identify the protocol. Experimental result shows that it has a high accuracy over 93. 5% using the signature extracted by this method.

Key words: protocol identification, tokenization, byte frequency, feature vector, cosine similarity

中图分类号: