Unknown

Dataset Information

0

Discovery and characterization of Alu repeat sequences via precise local read assembly.


ABSTRACT: Alu insertions have contributed to >11% of the human genome and ?30-35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly of WGS data as a means to reconstruct the full sequence of insertion events from Illumina paired end reads. Comparison with published calls obtained using PacBio long-reads indicates a false discovery rate below 5%, at the cost of reduced sensitivity due to the colocation of reference and non-reference repeats. We generate a highly accurate call set of 1614 completely assembled Alu variants from 53 samples from the Human Genome Diversity Project (HGDP) panel. We utilize the reconstructed alternative insertion haplotypes to genotype 1010 fully assembled insertions, obtaining >99% agreement with genotypes obtained by PCR. In our assembled sequences, we find evidence of premature insertion mechanisms and observe 5' truncation in 16% of AluYa5 and AluYb8 insertions. The sites of truncation coincide with stem-loop structures and SRP9/14 binding sites in the Alu RNA, implicating L1 ORF2p pausing in the generation of 5' truncations. Additionally, we identified variable AluJ and AluS elements that likely arose due to non-retrotransposition mechanisms.

SUBMITTER: Wildschutte JH 

PROVIDER: S-EPMC4666360 | biostudies-literature | 2015 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Discovery and characterization of Alu repeat sequences via precise local read assembly.

Wildschutte Julia H JH   Baron Alayna A   Diroff Nicolette M NM   Kidd Jeffrey M JM  

Nucleic acids research 20151025 21


Alu insertions have contributed to >11% of the human genome and ∼30-35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly of WGS data as a means to reconstruct the full sequence of insertion events from Illumina paired end reads. Comparison with published calls obtained using PacBio long-reads indicates a false discovery ra  ...[more]

Similar Datasets

| S-EPMC3106317 | biostudies-literature
| S-EPMC5411769 | biostudies-literature
| S-EPMC5411767 | biostudies-literature
| S-EPMC10699202 | biostudies-literature
| S-EPMC7274563 | biostudies-literature
| S-EPMC8068675 | biostudies-literature
| S-EPMC8361843 | biostudies-literature
| S-EPMC8743549 | biostudies-literature
| S-EPMC6465186 | biostudies-literature
| S-EPMC8706387 | biostudies-literature