dc.contributor.author
Rizaldy, Aldino
dc.contributor.author
Gloaguen, Richard
dc.contributor.author
Fassnacht, Fabian Ewald
dc.contributor.author
Ghamisi, Pedram
dc.date.accessioned
2025-10-14T08:51:04Z
dc.date.available
2025-10-14T08:51:04Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/49807
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-49532
dc.description.abstract
Multimodal remote sensing data, including spectral and LiDAR or photogrammetry, is crucial for achieving satisfactory land-use/land-cover classification results in urban scenes. So far, most studies have been conducted in a 2-D context. When 3-D information is available in the dataset, it is typically integrated with the 2-D data by rasterizing the 3-D data into 2-D formats. Although this method yields satisfactory classification results, it falls short in fully exploiting the potential of 3-D data by restricting the model’s ability to learn 3-D spatial features directly from raw point clouds. In addition, it limits the generation of 3-D predictions, as the dimensionality of the input data has been reduced. In this study, we propose a fully 3D-based method that fuses all modalities within the 3-D point cloud and employs a dedicated dual-branch Transformer model to simultaneously learn geometric and spectral features. To enhance the fusion process, we introduce a cross-attention-based mechanism that fully operates on 3-D points, effectively integrating features from various modalities across multiple scales. The purpose of cross-attention is to allow one modality to assess the importance of another by weighing the relevant features. We evaluated our method by comparing it against both 3-D and 2-D methods using the 2018 IEEE GRSS Data Fusion Contest (DFC2018) dataset. Our findings indicate that 3-D fusion delivers competitive results compared to 2-D methods and offers more flexibility by providing 3-D predictions. These predictions can be projected onto 2-D maps, a capability that is not feasible in reverse. In addition, we evaluated our method on different datasets, specifically the ISPRS Vaihingen 3-D and the IEEE DFC2019.
en
dc.format.extent
21 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Classification
en
dc.subject
deep learning
en
dc.subject
hyperspectral
en
dc.subject
segmentation
en
dc.subject.ddc
500 Naturwissenschaften und Mathematik::520 Astronomie::520 Astronomie und zugeordnete Wissenschaften
dc.title
HyperPointFormer: Multimodal Fusion in 3-D Space With Dual-Branch Cross-Attention Transformers
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.doi
10.1109/JSTARS.2025.3595648
dcterms.bibliographicCitation.journaltitle
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
dcterms.bibliographicCitation.pagestart
21254
dcterms.bibliographicCitation.pageend
21274
dcterms.bibliographicCitation.volume
18
dcterms.bibliographicCitation.url
https://doi.org/10.1109/JSTARS.2025.3595648
refubium.affiliation
Geowissenschaften
refubium.affiliation.other
Institut für Geologische Wissenschaften / Fachrichtung Fernerkundung und Geoinformatik
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2151-1535
refubium.resourceType.provider
WoS-Alert