LIU Xu, ZHANG Xihuang, LIU Zhao, LÜ Xiaojing, ZHU Guanghui
Cosmological simulations are essential for scientists to study the formation of non-linear structures and hypotheses of dark matter,dark energy,etc.High-precision cosmological simulations include hundreds of billions or even trillions of particles,thus demanding massive computational power.So supercomputers can provide an ideal platform for cosmological simulation.To implement cosmological N-body simulation on Sunway TaihuLight,a supercomputer developed in China,this paper analyzes the Particle Mesh(PM) and Fast Multipole Method(FMM) in PHoToNs.The analysis results are combined with the multi-core processor structure,and on this basis this paper proposes multiple performance optimization techniques,including a multi-level decomposition and load balancing scheme,a pipeline strategy using execution tree traversal and gravity calculation,and a vectorized gravity calculation algorithm.By using the above techniques,a N-body simulation software,SwPHoToNs,is implemented,which can give full play to the structural advantages of Sunway TaihuLight.Experimental results show that when conducting cosmological simulations which contain up to 640 billion particles on 5 200 000 cores of Sunway TaihuLight,SwPHoToNs obtains a sustained calculation speed of 29.44 PFLOPS with a parallel efficiency of 84.6% and computational efficiency of 48.3%.