作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

所属专题: 机器学习

• 人工智能及识别技术 • 上一篇    下一篇

基于机器学习的域名数据监控方法

刘明星,金 键,李晓东   

  1. (中国科学院计算机网络信息中心,北京100190)
  • 收稿日期:2013-09-16 出版日期:2014-09-15 发布日期:2014-09-12
  • 作者简介:刘明星(1985 - ),男,硕士,主研方向:网络安全,下一代互联网技术;金 键,高级工程师、硕士;李晓东,研究员、博士、博士生 导师。
  • 基金资助:
    国家自然科学基金资助项目(61005029);互联网基础技术开放实验室研究课题基金资助项目。

Monitoring Method of Domain Name Data Based on Machine Learning

LIU Ming-xing,JIN Jian,LI Xiao-dong   

  1. (Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China)
  • Received:2013-09-16 Online:2014-09-15 Published:2014-09-12

摘要: 域名资源记录被篡改的问题严重危害域名应用。由于该问题具有较强的隐蔽性,亟需一种快速且有效的发现域名危险变化的方法。为此,提出一种基于机器学习算法的域名数据监控方法。在一定数量的域名中选取出资源记录发生变化的域名,通过分析其相关信息生成一个由域名字面特征、正反匹配度等属性组成的元组。以变化是否危险为依据进行类标签人工标记,每个元组和其类标签组成训练集中的一个实例。由分析训练集决策树算 法和支持向量机算法建立检测域名系统数据危险变化的分类器。通过十折交叉法验证2 个分类器,发现其在域名危险变化判断上具有较强的能力,正确率的加权均值分别达到73. 8% 和82. 4% 。

关键词: 域名系统, 安全, 机器学习, 域名系统监控, 决策树, 支持向量机

Abstract: A threat that Domain Name System(DNS) data is tampered by hackers endangers DNS applications. Due to the hidden characteristic of this threat,a quick and effective method to find dangerous changes in DNS data is needed urgently. Regarding to the problem,this paper proposes a method to monitor the DNS data based on machine learning,by which dangerous change in DNS data can be found quickly. Some domain names whose data are changed are chosen from a number of domain names,and their relevant information is individually analyzed in order to produce a tuple that is represented by a multi-dimensional attribute vector,which contains literal characteristics,forward-inverse match and so on. After that a class is labeled depending on whether the changes are bad or not so that an instance containing the tuple and their class label is built and consequently a training set is built. By analyzing the training set the two classification algorithms,decision tree and Support Vector Machine(SVM),build classifiers,which are used to detect whether changes in DNS data are dangerous or not. The 10-fold cross-validation is used to validate the two classifiers. It is found that the classifiers do well in finding dangerous changes in DNS data,in which the present results show that the classifier can reach a good precision,and their weighted average accuracies are 73. 8% and 82. 4% .

Key words: Domain Name System(DNS), security, machine learning, DNS monitoring, decision tree, Support Vector Machine(SVM)

中图分类号: