HyperPointFormer: Multimodal Fusion in 3-D Space With Dual-Branch Cross-Attention Transformers

Rizaldy, Aldino; Gloaguen, Richard; Fassnacht, Fabian Ewald; Ghamisi, Pedram

doi:10.1109/JSTARS.2025.3595648

HyperPointFormer: Multimodal Fusion in 3-D Space With Dual-Branch Cross-Attention Transformers

Haupttitel:

HyperPointFormer: Multimodal Fusion in 3-D Space With Dual-Branch Cross-Attention Transformers

Autor*in:

Rizaldy, Aldino; Gloaguen, Richard; Fassnacht, Fabian Ewald; Ghamisi, Pedram

Erscheinungsjahr:

2025

Datum der Freigabe:

2025-10-14T08:51:04Z

Abstract:

Multimodal remote sensing data, including spectral and LiDAR or photogrammetry, is crucial for achieving satisfactory land-use/land-cover classification results in urban scenes. So far, most studies have been conducted in a 2-D context. When 3-D information is available in the dataset, it is typically integrated with the 2-D data by rasterizing the 3-D data into 2-D formats. Although this method yields satisfactory classification results, it falls short in fully exploiting the potential of 3-D data by restricting the model’s ability to learn 3-D spatial features directly from raw point clouds. In addition, it limits the generation of 3-D predictions, as the dimensionality of the input data has been reduced. In this study, we propose a fully 3D-based method that fuses all modalities within the 3-D point cloud and employs a dedicated dual-branch Transformer model to simultaneously learn geometric and spectral features. To enhance the fusion process, we introduce a cross-attention-based mechanism that fully operates on 3-D points, effectively integrating features from various modalities across multiple scales. The purpose of cross-attention is to allow one modality to assess the importance of another by weighing the relevant features. We evaluated our method by comparing it against both 3-D and 2-D methods using the 2018 IEEE GRSS Data Fusion Contest (DFC2018) dataset. Our findings indicate that 3-D fusion delivers competitive results compared to 2-D methods and offers more flexibility by providing 3-D predictions. These predictions can be projected onto 2-D maps, a capability that is not feasible in reverse. In addition, we evaluated our method on different datasets, specifically the ISPRS Vaihingen 3-D and the IEEE DFC2019.

Identifier:

https://refubium.fu-berlin.de/handle/fub188/49807
http://dx.doi.org/10.17169/refubium-49532

Teil des Identifiers:

e-ISSN (online): 2151-1535

Sprache:

Englisch

Freie Schlagwörter:

Classification
deep learning
fusion
hyperspectral
LiDAR
point cloud
segmentation

DDC-Klassifikation:

520 Astronomie und zugeordnete Wissenschaften

Publikationstyp:

Wissenschaftlicher Artikel