摘要: 提出了一种针对机群系统中硬件的监控方法。在机群系统中,对硬件监控信息的采集采用了通过硬件直接读取被监控硬件信息的方法。在传输方面,考虑到机群本身的特点,提出了采用一种有层次的串口网络系统,同时也设计了一种软硬件相结合的通信协议,具有和机群同样的可扩展性同时,也符合大规模机群的信息采集的性能要求。
关键词:
高性能计算;机群系统;机群监控;串口通信;层次串口网络
Abstract: This paper presents a method and implementation of hardware monitoring in cluster system. It collects the information of the hardware’s health status by a microprocessor. This microprocessor reads the information and waits for the request for these information. But the characteristics impose some restrictions on the behavior pattern of the information transfer. So it proposes and designs a layered serial communication system, which has native extensibility which make it fit into the cluster system and acquires pretty good performance when the scale of the cluster system grows. At the end, the paper also gives the implementation of this cluster’s hardware monitoring system that is adapted by the Lenovo DeepComp Cluster.
Key words:
High performance computing; Cluster system; Cluster monitoring; Serial communication; Layered serial network
金正操,侯紫峰,杜晓黎. 机群系统中对硬件监控方法的研究与实现[J]. 计算机工程, 2006, 32(1): 229-231.
JIN Zhengcao, HOU Zifeng, DU Xiaoli1,. Research and Implementation of Hardware Monitoring in Cluster System[J]. Computer Engineering, 2006, 32(1): 229-231.