dc.contributor.author
Liu-Wei, Wang
dc.contributor.author
Toorn, Wiep van der
dc.contributor.author
Bohn, Patrick
dc.contributor.author
Hölzer, Martin
dc.contributor.author
Smyth, Redmond P.
dc.contributor.author
Kleist, Max von
dc.date.accessioned
2024-09-09T12:18:46Z
dc.date.available
2024-09-09T12:18:46Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/44848
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-44558
dc.description.abstract
Background
Direct RNA sequencing (dRNA-seq) on the Oxford Nanopore Technologies (ONT) platforms can produce reads covering up to full-length gene transcripts, while containing decipherable information about RNA base modifications and poly-A tail lengths. Although many published studies have been expanding the potential of dRNA-seq, its sequencing accuracy and error patterns remain understudied.
Results
We present the first comprehensive evaluation of sequencing accuracy and characterisation of systematic errors in dRNA-seq data from diverse organisms and synthetic in vitro transcribed RNAs. We found that for sequencing kits SQK-RNA001 and SQK-RNA002, the median read accuracy ranged from 87% to 92% across species, and deletions significantly outnumbered mismatches and insertions. Due to their high abundance in the transcriptome, heteropolymers and short homopolymers were the major contributors to the overall sequencing errors. We also observed systematic biases across all species at the levels of single nucleotides and motifs. In general, cytosine/uracil-rich regions were more likely to be erroneous than guanines and adenines. By examining raw signal data, we identified the underlying signal-level features potentially associated with the error patterns and their dependency on sequence contexts. While read quality scores can be used to approximate error rates at base and read levels, failure to detect DNA adapters may be a source of errors and data loss. By comparing distinct basecallers, we reason that some sequencing errors are attributable to signal insufficiency rather than algorithmic (basecalling) artefacts. Lastly, we generated dRNA-seq data using the latest SQK-RNA004 sequencing kit released at the end of 2023 and found that although the overall read accuracy increased, the systematic errors remain largely identical compared to the previous kits.
Conclusions
As the first systematic investigation of dRNA-seq errors, this study offers a comprehensive overview of reproducible error patterns across diverse datasets, identifies potential signal-level insufficiency, and lays the foundation for error correction methods.
en
dc.format.extent
15 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Nanopore sequencing
en
dc.subject
Direct RNA sequencing
en
dc.subject
Transcriptomics
en
dc.subject
Epitranscriptomics
en
dc.subject
Sequencing errors
en
dc.subject
Sequencing accuracy
en
dc.subject.ddc
500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::570 Biowissenschaften; Biologie
dc.title
Sequencing accuracy and systematic errors of nanopore direct RNA sequencing
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
528
dcterms.bibliographicCitation.doi
10.1186/s12864-024-10440-w
dcterms.bibliographicCitation.journaltitle
BMC Genomics
dcterms.bibliographicCitation.number
1
dcterms.bibliographicCitation.volume
25
dcterms.bibliographicCitation.url
https://doi.org/10.1186/s12864-024-10440-w
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Mathematik

refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
1471-2164
refubium.resourceType.provider
WoS-Alert