Unknown

Dataset Information

0

Tools and best practices for retrotransposon analysis using high-throughput sequencing data.


ABSTRACT: Background:Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedicated parameters and algorithms has to be taken into consideration when transposable elements regulation is investigated with sequencing datasets. Results:Here, we used simulated reads on the mouse and human genomes to define the best parameters for aligning transposable element-derived reads on a reference genome. The efficiency of the most commonly used aligners was compared and we further evaluated how transposable element representation should be estimated using available methods. The mappability of the different transposon families in the mouse and the human genomes was calculated giving an overview into their evolution. Conclusions:Based on simulated data, we provided recommendations on the alignment and the quantification steps to be performed when transposon expression or regulation is studied, and identified the limits in detecting specific young transposon families of the mouse and human genomes. These principles may help the community to adopt standard procedures and raise awareness of the difficulties encountered in the study of transposable elements.

SUBMITTER: Teissandier A 

PROVIDER: S-EPMC6935493 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

Tools and best practices for retrotransposon analysis using high-throughput sequencing data.

Teissandier Aurélie A   Servant Nicolas N   Barillot Emmanuel E   Bourc'his Deborah D  

Mobile DNA 20191229


<h4>Background</h4>Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedi  ...[more]

Similar Datasets

| S-EPMC4574606 | biostudies-literature
| S-EPMC3627586 | biostudies-literature
2018-06-08 | GSE107768 | GEO
| S-EPMC4048240 | biostudies-literature
| S-EPMC5685169 | biostudies-literature
| S-EPMC4312887 | biostudies-literature
2018-06-08 | GSE107767 | GEO
2018-06-08 | GSE107766 | GEO
| S-EPMC8480091 | biostudies-literature
| S-EPMC3965039 | biostudies-literature