2025, 39(2):49-59.
Abstract:3D point clouds can fully describe the geometric information of target objects and have a wide range of applications in fields such as autonomous driving, medical imaging and robotics. However, existing methods lack differentiation when dealing with features between different channels, and at the same time adopt a unified coding strategy for low-level spatial coordinates and high-level semantic features, which in turn leads to incomplete point cloud feature extraction. Therefore, this paper proposes a channel-by-channel point cloud analysis network based on Transformer. First, in order to overcome the challenge of traditional graph convolution that is difficult to distinguish effective information in mixed channels, a depth-separable edge convolution is designed, which can significantly improve the inter-channel differentiation ability while preserving local geometric information during channel-by-channel feature extraction. Secondly, to address the problem that Transformer adopts a uniform coding approach in low-level spatial coordinates and high-level semantic features, which leads to insufficient information extraction, two feature coding strategies are proposed adaptive positional coding and spatial context coding, which are used to explore implicit geometric structures in low-level space and complex contextual relationships in high-level space, respectively. Finally, an effective fusion strategy is proposed, which can result in a more discriminative feature representation. In order to fully demonstrate the effectiveness of the proposed model, point cloud classification experiments are conducted on the public datasets ModelNet40 and ScanObjectNN, where the overall classification accuracies reach 93.7% and 83.2%, respectively, and the average intersection and merger ratio of overall part segmentation reaches 86.0% on the public dataset ShapeNet Part. Thus, the method in this paper has advanced performance in both classification and segmentation tasks.