HyperPointFormer: Multimodal Fusion in 3-D Space With Dual-Branch Cross-Attention Transformers

Rizaldy, Aldino; Gloaguen, Richard; Fassnacht, Fabian Ewald; Ghamisi, Pedram

doi:10.1109/JSTARS.2025.3595648

HyperPointFormer: Multimodal Fusion in 3-D Space With Dual-Branch Cross-Attention Transformers

Metadata

dc.contributor.author

Rizaldy, Aldino

dc.contributor.author

Gloaguen, Richard

dc.contributor.author

Fassnacht, Fabian Ewald

dc.contributor.author

Ghamisi, Pedram

dc.date.accessioned

2025-10-14T08:51:04Z

dc.date.available

2025-10-14T08:51:04Z

dc.date.issued

2025

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/49807

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-49532

dc.description.abstract

Multimodal remote sensing data, including spectral and LiDAR or photogrammetry, is crucial for achieving satisfactory land-use/land-cover classification results in urban scenes. So far, most studies have been conducted in a 2-D context. When 3-D information is available in the dataset, it is typically integrated with the 2-D data by rasterizing the 3-D data into 2-D formats. Although this method yields satisfactory classification results, it falls short in fully exploiting the potential of 3-D data by restricting the model’s ability to learn 3-D spatial features directly from raw point clouds. In addition, it limits the generation of 3-D predictions, as the dimensionality of the input data has been reduced. In this study, we propose a fully 3D-based method that fuses all modalities within the 3-D point cloud and employs a dedicated dual-branch Transformer model to simultaneously learn geometric and spectral features. To enhance the fusion process, we introduce a cross-attention-based mechanism that fully operates on 3-D points, effectively integrating features from various modalities across multiple scales. The purpose of cross-attention is to allow one modality to assess the importance of another by weighing the relevant features. We evaluated our method by comparing it against both 3-D and 2-D methods using the 2018 IEEE GRSS Data Fusion Contest (DFC2018) dataset. Our findings indicate that 3-D fusion delivers competitive results compared to 2-D methods and offers more flexibility by providing 3-D predictions. These predictions can be projected onto 2-D maps, a capability that is not feasible in reverse. In addition, we evaluated our method on different datasets, specifically the ISPRS Vaihingen 3-D and the IEEE DFC2019.

dc.format.extent

21 Seiten

dc.language

eng

dc.rights.uri

https://creativecommons.org/licenses/by/4.0/

dc.subject

Classification

dc.subject

deep learning

dc.subject

fusion

dc.subject

hyperspectral

dc.subject

LiDAR

dc.subject

point cloud

dc.subject

segmentation

dc.subject.ddc

500 Naturwissenschaften und Mathematik::520 Astronomie::520 Astronomie und zugeordnete Wissenschaften

dc.title

HyperPointFormer: Multimodal Fusion in 3-D Space With Dual-Branch Cross-Attention Transformers

dc.type

Wissenschaftlicher Artikel

dcterms.bibliographicCitation.doi

10.1109/JSTARS.2025.3595648

dcterms.bibliographicCitation.journaltitle

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

dcterms.bibliographicCitation.pagestart

21254

dcterms.bibliographicCitation.pageend

21274

dcterms.bibliographicCitation.volume

dcterms.bibliographicCitation.url

https://doi.org/10.1109/JSTARS.2025.3595648

refubium.affiliation

Geowissenschaften

refubium.affiliation.other

Institut für Geologische Wissenschaften / Fachrichtung Fernerkundung und Geoinformatik

refubium.resourceType.isindependentpub

dcterms.accessRights.openaire

open access

dcterms.isPartOf.eissn

2151-1535

refubium.resourceType.provider

WoS-Alert

Show Simple Item Record

This Item appears in the following Collection(s)

Dokumente FU

Files in This Item

HyperPointFormer_Multimod ... nch_Cross-Attention_Transformers.pdf

Size: 16.49MB

Format: PDF

Checksum (MD5): 703b704dce88d95022c4c2a637a4db22

View/Open

HyperPointFormer: Multimodal Fusion in 3-D Space With Dual-Branch Cross-Attention Transformers

Refubium - Freie Universität Berlin Repository

HyperPointFormer: Multimodal Fusion in 3-D Space With Dual-Branch Cross-Attention Transformers

Metadata

This Item appears in the following Collection(s)

Files in This Item

Export metadata