dc.contributor.author
Bartoszewicz, Jakub Maciej
dc.contributor.author
Seidel, Anja
dc.contributor.author
Renard, Bernhard Y.
dc.date.accessioned
2021-11-25T13:32:09Z
dc.date.available
2021-11-25T13:32:09Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/32858
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-32584
dc.description.abstract
Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.
en
dc.format.extent
14 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
viral host prediction
en
dc.subject
deep neural architectures
en
dc.subject
interpretability tools
en
dc.subject.ddc
500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::570 Biowissenschaften; Biologie
dc.title
Interpretable detection of novel human viruses from genome sequencing data
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
lqab004
dcterms.bibliographicCitation.doi
10.1093/nargab/lqab004
dcterms.bibliographicCitation.journaltitle
NAR Genomics and Bioinformatics
dcterms.bibliographicCitation.number
1
dcterms.bibliographicCitation.volume
3
dcterms.bibliographicCitation.url
https://doi.org/10.1093/nargab/lqab004
refubium.affiliation
Mathematik und Informatik
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2631-9268
refubium.resourceType.provider
WoS-Alert