dc.contributor.author
Wijaya, Andre Jatmiko
dc.contributor.author
Anžel, Aleksandar
dc.contributor.author
Richard, Hugues
dc.contributor.author
Hattab, Georges
dc.date.accessioned
2026-01-22T09:00:46Z
dc.date.available
2026-01-22T09:00:46Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/51239
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-50966
dc.description.abstract
Horizontal gene transfer (HGT) accelerates the spread of antimicrobial resistance (AMR) via mobile genetic elements allowing pathogens to acquire resistance genes across species. This process drives the evolution of multidrug-resistant “superbugs” in clinical settings. Detection of HGT is critical to mitigating AMR, but traditional methods based on sequence assembly or comparative genomics lack resolution for complex transfer events. While machine learning (ML) promises improved detection, several studies in other domains have demonstrated that data representations will strongly influence its performance. There is, however, no clear recommendation on the best data representation for HGT detection. Here, we evaluated 44 genomic data representations using five ML models across four data sets. We demonstrate that ML performance is highly dependent on the genomic data representation. The RCKmer-based representation (k = 7) paired with a support vector machine is found to be optimal (F1: 0.959; MCC: 0.908), outperforming other approaches. Moreover, models trained on multi-species data sets are shown to generalize better. Our findings suggest that genomic surveillance benefits from task-specific genome data representations. This work provides state-of-the-art, fine-tuned models for identifying and annotating genomic islands that will enable proper detection of transfer of AMR-related genes between species.
en
dc.format.extent
11 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by-nc/4.0/
dc.subject
horizontal gene transfer detection
en
dc.subject
antimicrobial resistance
en
dc.subject
genomic surveillance
en
dc.subject.ddc
500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::570 Biowissenschaften; Biologie
dc.title
Genomic data representations for horizontal gene transfer detection
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
lqaf165
dcterms.bibliographicCitation.doi
10.1093/nargab/lqaf165
dcterms.bibliographicCitation.journaltitle
NAR Genomics and Bioinformatics
dcterms.bibliographicCitation.number
4
dcterms.bibliographicCitation.volume
7
dcterms.bibliographicCitation.url
https://doi.org/10.1093/nargab/lqaf165
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Mathematik

refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2631-9268
refubium.resourceType.provider
WoS-Alert