电子测量与仪器学报

冯凯浩,陶志勇,李衡,李铭朗,林森.基于Transformer的逐通道点云分析网络[J].电子测量与仪器学报,2025,39(2):49-59

基于Transformer的逐通道点云分析网络

Transformer-based channel-by-channel point cloud analysis network

DOI：

中文关键词: 点云分类分割深度可分离卷积 Transfomer 融合算法 ModelNet40

英文关键词:point cloud classification segmentation deep separable convolution Transfomer fusion algorithm ModelNet40

基金项目:辽宁省科技厅应用基础研究项目（2022JH2/101300274）、辽宁省高等学校基本科研项目（LJKMZ20220679）资助

作者	单位
冯凯浩	辽宁工程技术大学电子与信息工程学院葫芦岛125105
陶志勇	辽宁工程技术大学电子与信息工程学院葫芦岛125105
李衡	辽宁工程技术大学电子与信息工程学院葫芦岛125105
李铭朗	辽宁工程技术大学电子与信息工程学院葫芦岛125105
林森	沈阳理工大学自动化与电气工程学院沈阳110159

Author	Institution
Feng Kaihao	School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105,China
Tao Zhiyong	School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105,China
Li Heng	School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105,China
Li Minglang	School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105,China
Lin Sen	School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang 110159，China

摘要点击次数: 22

全文下载次数: 64

中文摘要:

三维点云能够充分描述目标对象的几何信息,在自动驾驶、医学影像和机器人等领域有着广泛的应用前景。然而,现有方法在处理不同通道间的特征时缺乏差异化,同时对低级空间坐标和高级语义特征采用统一的编码策略,进而导致点云特征提取不全面。因此,提出了基于Transformer的逐通道点云分析网络。首先,为了克服传统图卷积在混合通道中难以区分有效信息的挑战,设计了一种深度可分离边缘卷积,可以在逐通道特征提取时保留局部几何信息的同时,显著提升通道间的区分能力。其次,针对Transformer在低级空间坐标和高级语义特征中采用统一编码方式,导致信息提取不足的问题,提出了两种特征编码策略，自适应位置编码和空间上下文编码,分别用于探索低级空间中的隐式几何结构和高级空间中的复杂上下文关系。最后,提出了一种有效的融合策略,可以形成更具区分性的特征表示。为了充分证明所提出模型的有效性,在公开数据集ModelNet40和ScanObjectNN上进行点云分类实验,总体分类精度分别达到93.7%和83.2%,在公开数据集ShapeNet Part上,整体部件分割的平均交并比达到86.0%。因而,研究方法在分类和分割任务中均具有先进的性能。

英文摘要:

3D point clouds can fully describe the geometric information of target objects and have a wide range of applications in fields such as autonomous driving, medical imaging and robotics. However, existing methods lack differentiation when dealing with features between different channels, and at the same time adopt a unified coding strategy for low-level spatial coordinates and high-level semantic features, which in turn leads to incomplete point cloud feature extraction. Therefore, this paper proposes a channel-by-channel point cloud analysis network based on Transformer. First, in order to overcome the challenge of traditional graph convolution that is difficult to distinguish effective information in mixed channels, a depth-separable edge convolution is designed, which can significantly improve the inter-channel differentiation ability while preserving local geometric information during channel-by-channel feature extraction. Secondly, to address the problem that Transformer adopts a uniform coding approach in low-level spatial coordinates and high-level semantic features, which leads to insufficient information extraction, two feature coding strategies are proposed adaptive positional coding and spatial context coding, which are used to explore implicit geometric structures in low-level space and complex contextual relationships in high-level space, respectively. Finally, an effective fusion strategy is proposed, which can result in a more discriminative feature representation. In order to fully demonstrate the effectiveness of the proposed model, point cloud classification experiments are conducted on the public datasets ModelNet40 and ScanObjectNN, where the overall classification accuracies reach 93.7% and 83.2%, respectively, and the average intersection and merger ratio of overall part segmentation reaches 86.0% on the public dataset ShapeNet Part. Thus, the method in this paper has advanced performance in both classification and segmentation tasks.

查看全文查看/发表评论下载PDF阅读器