dc.contributor.author
Scott, Suzanne
dc.contributor.author
Grigson, Susanna
dc.contributor.author
Hartkopf, Felix
dc.contributor.author
Hallwirth, Claus V.
dc.contributor.author
Alexander, Ian E.
dc.contributor.author
Bauer, Denis C.
dc.contributor.author
Wilson, Laurence O. W.
dc.date.accessioned
2022-06-23T11:57:27Z
dc.date.available
2022-06-23T11:57:27Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/35399
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-35115
dc.description.abstract
Viral integration is a complex biological process, and it is useful to have a reference integration dataset with known properties to compare experimental data against, or for comparing with the results from computational tools that detect integration. To generate these data, we developed a pipeline for simulating integrations of a viral or vector genome into a host genome. Our method reproduces more complex characteristics of vector and viral integration, including integration of sub-genomic fragments, structural variation of the integrated genomes, and deletions from the host genome at the integration site. Our method [1] takes the form of a snakemake [2] pipeline, consisting of a Python [3] script using the Biopython [4] module that simulates integrations of a viral reference into a host reference. This produces a reference containing integrations, from which sequencing reads are simulated using ART [5]. The IDs of the reads crossing integration junctions are then annotated using another python script to produce the final output, consisting of the simulated reads and a table of the locations of those integrations and the reads crossing each integration junction. To illustrate our method, we provide simulated reads, integration locations, as well as the code required to simulate integrations using any virus and host reference. This simulation method was used to investigate the performance of viral integration tools in our research [6].
en
dc.format.extent
7 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Gene therapy
en
dc.subject
Vector Integration
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
A bioinformatic pipeline for simulating viral integration data
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
108161
dcterms.bibliographicCitation.doi
10.1016/j.dib.2022.108161
dcterms.bibliographicCitation.journaltitle
Data in Brief
dcterms.bibliographicCitation.volume
42
dcterms.bibliographicCitation.url
https://doi.org/10.1016/j.dib.2022.108161
refubium.affiliation
Mathematik und Informatik
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2352-3409
refubium.resourceType.provider
WoS-Alert