Unknown

Dataset Information

0

Detection and characterization of novel sequence insertions using paired-end next-generation sequencing.


ABSTRACT: In the past few years, human genome structural variation discovery has enjoyed increased attention from the genomics research community. Many studies were published to characterize short insertions, deletions, duplications and inversions, and associate copy number variants (CNVs) with disease. Detection of new sequence insertions requires sequence data, however, the 'detectable' sequence length with read-pair analysis is limited by the insert size. Thus, longer sequence insertions that contribute to our genetic makeup are not extensively researched.We present NovelSeq: a computational framework to discover the content and location of long novel sequence insertions using paired-end sequencing data generated by the next-generation sequencing platforms. Our framework can be built as part of a general sequence analysis pipeline to discover multiple types of genetic variation (SNPs, structural variation, etc.), thus it requires significantly less-computational resources than de novo sequence assembly. We apply our methods to detect novel sequence insertions in the genome of an anonymous donor and validate our results by comparing with the insertions discovered in the same genome using various sources of sequence data.The implementation of the NovelSeq pipeline is available at http://compbio.cs.sfu.ca/strvar.htmeee@gs.washington.edu; cenk@cs.sfu.ca

SUBMITTER: Hajirasouliha I 

PROVIDER: S-EPMC2865866 | biostudies-literature | 2010 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detection and characterization of novel sequence insertions using paired-end next-generation sequencing.

Hajirasouliha Iman I   Hormozdiari Fereydoun F   Alkan Can C   Kidd Jeffrey M JM   Birol Inanc I   Eichler Evan E EE   Sahinalp S Cenk SC  

Bioinformatics (Oxford, England) 20100412 10


<h4>Motivation</h4>In the past few years, human genome structural variation discovery has enjoyed increased attention from the genomics research community. Many studies were published to characterize short insertions, deletions, duplications and inversions, and associate copy number variants (CNVs) with disease. Detection of new sequence insertions requires sequence data, however, the 'detectable' sequence length with read-pair analysis is limited by the insert size. Thus, longer sequence insert  ...[more]

Similar Datasets

| S-EPMC2987311 | biostudies-literature
| S-EPMC5217656 | biostudies-literature
| S-EPMC4074385 | biostudies-literature
| S-EPMC6070977 | biostudies-literature
| S-EPMC8457494 | biostudies-literature
| S-EPMC7118314 | biostudies-literature
| S-EPMC4009769 | biostudies-literature
| S-EPMC3039614 | biostudies-literature
| S-EPMC4498532 | biostudies-literature
| S-EPMC2784330 | biostudies-literature