SVModeler: Simulation of human haplotypes containing synthetic structural variants
Other authors
Publication date
2024-09-19Abstract
Abstract
Motivation: Long-read sequencing overcomes the limits of short-reads by providing longer sequences that can span
repetitive regions, improving the detection of structural variants and accurately resolving their sequences. Nonetheless,
detection and annotation of structural variants remains a computational challenge, requiring active development and
benchmarking of available algorithms. The current availability of detailed sequence information for large collections of
SVs identified using long-read sequencing presents a valuable opportunity for developing and training realistic novel
simulation frameworks, which can be used for the evaluation of SV callers.
Results: SVModeler is a newly developed computational tool to simulate synthetic human haplotypes containing embedded
SVs. A unique feature of SVModeler is its capability to leverage SV catalogs to model the genome-wide distribution,
frequency and sequence features of various SV classes, including tandem duplications, mobile elements and variable
number tandem repeats. As a proof of principle, SVModeler has been trained with a large catalog of polymorphic SVs
identified in a dataset comprising 1.019 samples from the 1000 Genomes Project, which represents the largest collection
of diverse humans sequenced with long reads to date.
Code availability: https://github.com/ismaelveramu/SVModeler
Contact: ismael.vera@uvic.cat
Document Type
Master's final project
Language
English
Keywords
Genòmica
Bioinformàtica
Mutació (Biologia)
Pages
10 p.
Note
Curs 2023-2024
This item appears in the following Collection(s)
Rights
Aquest document està subjecte a aquesta llicència Creative Commons
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ca