I. Introduction
Data are ubiquitous in today's technological era. This is both a blessing and a curse, as we are swimming in sensors but drowning in data. In order to cope with these data, many systems employ data/information fusion. For example, you are right now combining multiple sources of data, e.g., taste, smell, touch, vision, hearing, memories, etc. In remote sensing, it is common practice to combine lidar, hyperspectral, visible, radar, and/or other variable spectral–spatial–temporal resolution sensors to detect objects, perform earth observations, etc. This is the same story for computer vision, smart cars, Big Data, and numerous other thrusts. While the last decade has seen great strides in topics like deep learning, the reality is that our understanding of fusion in the context of neural networks (NNs) (and therefore deep learning) has not witnessed similar growth. Most approaches to fusion in NNs are ad hoc (specialized for a particular application), and/or they are neither well understood nor explainable (i.e., how are the data being combined and why should we trust system outputs).