This article presents the design, implementation and performance evaluation of a hardware accelerator for matrix multiplication. The accelerator is loosely coupled with the host computer via common system bus. The accelerator is composed of linear processor array (LPA), distributed memory and dedicated address generator unit. Mathematical procedure for LPA synthesis is given. The speedup of the proposed accelerator for matrix multiplication is O(n/2), where n is a number of PEs in the array, and the efficiency is 1/2. By involving hardware AGU we achieved a speedup in data transfer of approximately 2.5, compared to the software implementation of address calculation, with a hardware overhead less than 1 %.