We present an efficient implementation of parallel finite element operator application for hexahedral elements. The implementation is tailored to data structures for adaptively refined meshes and exploits parallelism on modern computer systems. The evaluation of local shape functions and gradients is performed with sum-factorization that makes use of the tensor-product form. For shared memory parallelization, we propose a novel two-level partitioning/coloring approach that avoids race conditions when writing into the result vector. We give evidence for the good performance of our implementation. We employ the optimized operator implementation on a problem in quantum dynamics described by the time-dependent Schroedinger equation. We obtain a speedup of more than a factor four over conventional solvers based on sparse matrices for a moderate polynomial order of four in three dimensions.
Published in:
E-Science (e-Science), 2011 IEEE 7th International Conference on
Date of Conference: 5-8 Dec. 2011