作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

所属专题: 云计算专题

• 云计算专题 • 上一篇    下一篇

基于事件驱动机制的虚拟化故障检测恢复系统

崔竞松1a,1b ,路昊宇1a,郭 迟2,何 松1a   

  1. (1. 武汉大学a. 计算机学院;b. 空天信息安全与可信计算教育部重点实验室,武汉430072;2. 武汉大学卫星导航定位技术研究中心,武汉430079)
  • 收稿日期:2014-03-21 出版日期:2015-02-15 发布日期:2015-02-13
  • 作者简介:崔竞松(1975 - ),男,副教授、博士,主研方向:信息安全,云安全;路昊宇,硕士研究生;郭 迟(通讯作者),讲师、博士; 何 松,硕士研究生。
  • 基金资助:
    国家“863”计划基金资助项目(2013AA12A206);国家自然科学基金资助项目(41104010,91120002,61170026);中央高校基本 科研业务费专项基金资助项目(2042014kf0237)。

Virtualization Fault Detection Recovery System Based on Event-driven Mechanism

CUI Jingsong 1a,1b ,LU Haoyu 1a ,GUO Chi 2,HE Song 1a   

  1. (1a. School of Computer Science; b. Key Laboratory of Aerospace Information and Trusted Computing, Wuhan University,Wuhan 430072,China; 2. Global Navigation Satellite System Research Center, Wuhan University,Wuhan 430079,China)
  • Received:2014-03-21 Online:2015-02-15 Published:2015-02-13

摘要: 为解决虚拟化条件下云平台故障排除不及时的问题,在开源云平台OpenStack 上设计并实现一种虚拟化 故障检测恢复系统。该系统由GUI 层、调度层、逻辑层和功能层组成,以事件驱动机制为核心,将系统中传递的信息作为事件按时序进行处理。以感知模块、策略模块、执行模块为主体,调用OpenStack API 和Libvirt API 实现与 虚拟机管理层的交互。建立以信息获取、分析处理、故障恢复为主要内容的故障检测恢复体系,通过对云平台运行 环境的实时检测,获取状态参数,根据策略对参数进行分析判断并制定应对措施,实现对故障的自动恢复。实验结 果证明,该系统可以在无代理情况下对云平台进行实时检测和故障自动恢复,增强云环境的安全性,提升云平台的高可用性。

关键词: OpenStack 云平台, 负载均衡, 事件驱动机制, 高可用性, 虚拟化, 云计算

Abstract: In order to solve the problem that the fault troubleshooting of cloud platforms is not timely,and guarantee the continuity of cloud services,this paper designs and implements a virtualization fault detection and recovery system based on event-driven mechanism,which is on the open-source cloud platform———OpenStack. The system is composed of GUI layer,scheduling layer,logic layer and functional layer,and processes the information transmitted in the system by timing as an event on the basis of event-driven mechanism. It mainly uses perception module,policy module and execution module,which call OpenStack API and Libvirt API to interact with the management of virtual machines. The established fault detection recovery system mainly includes information acquisition,analysis and processing,fault recovery,and by real-time detection of the cloud platform’s runtime environment,it can obtain state parameters,analyze the parameters and develop countermeasures according to established policy,and achieve automatic fault recovery. Experimental results show that the system can detect and recover cloud platforms’ fault with agentless method,enhance the security of cloud environments,and improve the high availability of cloud platforms.

Key words: OpenStack cloud platform, load balancing, event-driven mechanism, high availability, virtualization, cloud computing

中图分类号: