Skip to Main Content
Determinant Quantum Monte Carlo (DQMC) simulation is one of few numerical methods that can explore the micro properties of fermions, which has many technically important applications in chemistry and material science. Conventionally, its parallelization relies on parallel Monte Carlo method, whose speedup is limited by the thermalization process and the underlying matrix computation. To achieve better performance, fine-grained parallelization on its numerical kernel is essential to utilize the massive parallel processing units, which are multicores and/or GPUs interconnected by high performance network. In this paper, we address the parallelization on one of the matrix kernel in the DQMC simulations: the multiplication of matrix exponentials. The matrix is derived from the kinetic Hamiltonian, which is highly sparse. We approximate its exponential by the checkerboard method, which decomposes the matrix exponential into a product of a sequence of block sparse matrices. We analyze the block sparse matrices of two common used lattice geometry: 2D torus and 3D cubic, and parallelize the computational kernel of multiplying them to a general matrix. The parallel algorithm is designed for multicore CPU and GPU. The results of experiments showed on a quad core processor, 3 times speedup can be observed in average, and on GPU, 145 times speedup is achievable.