I. Introduction
Recently, there has been a growing interest to perform fast and energy-efficient linear transformations between vectors or matrices, as they comprise a powerful tool for a wide range of applications in classical photonics, microwave communications, neuromorphic and quantum computing. In this context, researchers have steered their efforts into the development of highly parallelized hardware capable to undertake and accelerate these operations, with several graphic processing units (GPUs) [1], field programmable gate arrays [2] and application specific integrated circuits (ASICs) [3] having been demonstrated and fabricated within the last decade. Until now, the vast majority of the developed prototypes rely on electronic CMOS transistors that, due to their fundamental bandwidth and energy-efficiency limitations dictated by Moore's law [4], are approaching a computational plateau. On top of that, according to Amdahls’ law [5], the speed-up of parallel computation is saturated. To this end, linear optics have been brought again to the foreground aiming to catalyze and become the preferred computing hardware solution, since they offer THz bandwidth and ultra-low power consumption [6].