MPCT: Multiscale Point Cloud Transformer With a Residual Network | IEEE Journals & Magazine | IEEE Xplore

MPCT: Multiscale Point Cloud Transformer With a Residual Network


Abstract:

The self-attention (SA) network revisits the essence of data and has achieved remarkable results in text processing and image analysis. SA is conceptualized as a set oper...Show More

Abstract:

The self-attention (SA) network revisits the essence of data and has achieved remarkable results in text processing and image analysis. SA is conceptualized as a set operator that is insensitive to the order and number of data, making it suitable for point sets embedded in 3D space. However, working with point clouds still poses challenges. To tackle the issue of exponential growth in complexity and singularity induced by the original SA network without position encoding, we modify the attention mechanism by incorporating position encoding to make it linear, thus reducing its computational cost and memory usage and making it more feasible for point clouds. This article presents a new framework called multiscale point cloud transformer (MPCT), which improves upon prior methods in cross-domain applications. The utilization of multiple embeddings enables the complete capture of the remote and local contextual connections within point clouds, as determined by our proposed attention mechanism. Additionally, we use a residual network to facilitate the fusion of multiscale features, allowing MPCT to better comprehend the representations of point clouds at each stage of attention. Experiments conducted on several datasets demonstrate that MPCT outperforms the existing methods, such as achieving accuracies of 94.2% and 84.9% in classification tasks implemented on ModelNet40 and ScanObjectNN, respectively.
Published in: IEEE Transactions on Multimedia ( Volume: 26)
Page(s): 3505 - 3516
Date of Publication: 12 September 2023

ISSN Information:

Funding Agency:


I. Introduction

With the rapid development of 3D sensing technology, 3D point cloud data are appearing in many application areas such as autonomous driving, virtual and augmented reality, and robotics. Driven by deep neural networks, recent 3D works [1], [2], [3], [4], [5], [6], [7], [8] have focused on processing point clouds with learning-based methods. However, unlike images arranged on a regular pixel or grid, point clouds are sets of points embedded in three-dimensional space. This makes 3D point clouds structurally different from images and representationally different from complex 3D data (e.g., grid and voxel data), which have the simplest format but cannot be directly applied to design deep networks for standard tasks in computer vision.

Contact IEEE to Subscribe

References

References is not available for this document.