Analysis and Optimization of Block LU Decomposition for Execution on Tightly Coupled Processor Arrays | IEEE Conference Publication | IEEE Xplore