I. Introduction
Convolutional Neural Networks (CNN) are a widely used and powerful tool for processing and interpreting image-like data in various domains like automotive, robotics or medicine. Compared to layer-wise fully connected networks, CNNs benefit from weight-sharing: Instead of learning a large set of weights from all input pixels to an element of the next layer, they learn few weights of a small set of convolutional kernels that are applied all over the image. The goal is to achieve shift equivariance: A trained pattern (e.g. to detect a car in a camera image) should provide strong response at the particular location of the car in the image, independent of whether this location is, e.g., in the left or right part of the image. However, this goal is missed at locations close to the image borders, where the receptive field of the convolution exceeds the input. Typically, these out-of-image regions are filled with zero-padding. During the repeated convolutions in a CNN, this zero-padding occurs at each layer and the effect of distorted filter responses grows from the image borders towards the interior. While this seems inevitable for imagery from pinhole-model cameras, this is not the case for panoramic data. In panoramic data, there is at least one dimension with wrap-around structure and without an inherent image border. However, feeding panoramic data as 2D images to a standard CNN artificially introduces such borders. Fig. 1 shows the example of object recognition using a car mounted Velodyne - its 360° degree depth and reflectance images are practically important examples for panoramic data. Ignoring the wrap-around connections by using standard CNNs creates blind spots near the image borders where we are unable to interpret the environment.