As the number of transistors that are integrated onto a silicon die continues to increase, the compute power is becoming a commodity. This has enabled a whole host of new applications that rely on high-throughput computations. Recently, the need for faster and cost-effective applications in form-factor constrained environments has driven an interest in on-chip acceleration of algorithms based on Monte Carlo simulations. Though Field Programmable Gate Arrays (FPGAs), with hundreds of on-chip arithmetic units, show significant promise for accelerating these embarrassingly parallel simulations, a challenge exists in sharing access to simulation data among many concurrent experiments. This paper presents a compute architecture for accelerating Monte Carlo simulations based on the Network-on-Chip (NOC) paradigm for on-chip communication. We demonstrate through the complete implementation of a Monte Carlo-based image reconstruction algorithm for Single-Photon Emission Computed Tomography (SPECT) imaging that this complex problem can be accelerated by two orders of magnitude on even a modestly sized FPGA over a 2 GHz Intel Core 2 Duo Processor. The architecture and the methodology that we present in this paper is modular and hence it is scalable to problem instances of different sizes, with application to other domains that rely on Monte Carlo simulations.