Abstract:Light field imaging technology can capture both the spatial and angular information of light in a scene simultaneously. It is commonly used in various computer vision tasks. A two-stage 6D pose estimation method leveraging light field decoupled feature fusion is proposed. The aim of this method is to overcome the limitations of RGB image pose estimation methods when predicting pose in complex scenes with severe occlusion and truncation, illumination changes, and similarity between objects and backgrounds. Various feature extractors are utilised to decouple the light field macro-pixel image and map it to the feature space. An attention mechanism is then introduced to fuse the spatial, angular and EPI information to provide effective and reliable features for the downstream pose estimation network. Additionally, the back-projection is applied to the keypoints prediction network to minimise information loss during feature transfer. Experiments on the LF-6Dpose light field pose estimation dataset demonstrate that this method achieves 91.37% and 70.12% for the average closest point 3D distance for symmetric objects (ADD-S) and 2D Project metrics, respectively. This represents a 12.5% improvement compared to existing state-of-the-art methods in 3D distance metrics and more effectively solves the problem of estimating the 6D pose of objects in complex scenes.