Unknown

Dataset Information

0

TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data.


ABSTRACT: BACKGROUND:Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS:We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS:TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.

SUBMITTER: Bolognini D 

PROVIDER: S-EPMC7539535 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data.

Bolognini Davide D   Magi Alberto A   Benes Vladimir V   Korbel Jan O JO   Rausch Tobias T  

GigaScience 20201001 10


<h4>Background</h4>Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing te  ...[more]

Similar Datasets

| S-EPMC8361843 | biostudies-literature
| S-EPMC6853708 | biostudies-literature
| S-EPMC5629557 | biostudies-literature
| S-EPMC6966772 | biostudies-literature
| S-EPMC7545597 | biostudies-literature
| S-EPMC8764290 | biostudies-literature
| S-EPMC4957933 | biostudies-literature
| S-EPMC4365909 | biostudies-literature