作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (1): 229-231.

• 工程应用技术与实现 • 上一篇    下一篇

机群系统中对硬件监控方法的研究与实现

金正操 1,2,侯紫峰1,2,杜晓黎1,2   

  1. 1. 中国科学院计算技术研究所,北京 100080;2.中国科学院研究生院,北京 100039
  • 出版日期:2006-01-05 发布日期:2006-01-05

Research and Implementation of Hardware Monitoring in Cluster System

JIN Zhengcao1,2, HOU Zifeng1,2, DU Xiaoli1,2   

  1. 1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080;2. Graduate School of the Chinese Academy of Sciences, Beijing 100039
  • Online:2006-01-05 Published:2006-01-05

摘要: 提出了一种针对机群系统中硬件的监控方法。在机群系统中,对硬件监控信息的采集采用了通过硬件直接读取被监控硬件信息的方法。在传输方面,考虑到机群本身的特点,提出了采用一种有层次的串口网络系统,同时也设计了一种软硬件相结合的通信协议,具有和机群同样的可扩展性同时,也符合大规模机群的信息采集的性能要求。

关键词: 高性能计算;机群系统;机群监控;串口通信;层次串口网络

Abstract: This paper presents a method and implementation of hardware monitoring in cluster system. It collects the information of the hardware’s health status by a microprocessor. This microprocessor reads the information and waits for the request for these information. But the characteristics impose some restrictions on the behavior pattern of the information transfer. So it proposes and designs a layered serial communication system, which has native extensibility which make it fit into the cluster system and acquires pretty good performance when the scale of the cluster system grows. At the end, the paper also gives the implementation of this cluster’s hardware monitoring system that is adapted by the Lenovo DeepComp Cluster.

Key words: High performance computing; Cluster system; Cluster monitoring; Serial communication; Layered serial network