1. Introduction
Most existing deep learning architectures capable of reasoning about 3D point-cloud data are trained on offline/frozen datasets of dense and clean point-clouds [1]–[10]. A recent study [11] proved that State-Of-The-Art (SOTA) point-cloud classifiers actually perform significantly worse on data including noise, missing parts and sparse points. On the contrary, many industrial applications require online/active acquisition of point-clouds, and often result in sparse and noisy data. Typical applications include environments where cameras are unoperable (e.g., dusty environments, poor luminosity conditions), requiring to use more robust data acquisition pipelines, such as iterative 3D points acquisitions using a laser sensor or a tactile sensor mounted on a polyarticulated robot. To maintain a decent pace in industrial context, only a limited number of points can be sampled for each object to process. To allow accurate 3D object classification from a limited number of actively sampled points, some recent works [12], [13] proposed to simultaneously learn the active sampling strategy, namely the exploration strategy, and the classifier. The insight behind such approach is that the exploration strategy training should be guided by the classification performances, so that each point acquisition provides the most information for the 3D recognition task. Both models leveraged an online RL algorithm rewarded by the classification performances to learn the exploration strategy, coupled with a classification loss to train the classifier. Such strategy basically compensates for the sparsity of the data by maximizing the efficiency of the exploration. However, such strategy doesn't explicitly penalize (1) missing parts on the explored object, e.g., due to a too narrow exploration, nor (2) noisy points, due to exploration trials that missed the object. In other words, such approach doesn't dissociate miss-classifications deriving from poor exploration strategy, from the ones caused by classifier mistakes. This leads to a potential unstable and sample-inefficient training.
Our RL based approach.