Project description:Balanced chromosome rearrangements (BCRs) can cause genetic diseases by disrupting or inactivating specific genes, and the characterization of breakpoints in disease-associated BCRs has been instrumental in the molecular elucidation of a wide variety of genetic disorders. However, mapping chromosome breakpoints using traditional methods, such as in situ hybridization with fluorescent dye-labeled bacterial artificial chromosome clones (BAC-FISH), is rather laborious and time-consuming. In addition, the resolution of BAC-FISH is often insufficient to unequivocally identify the disrupted gene. To overcome these limitations, we have performed shotgun sequencing of flow-sorted derivative chromosomes using "next-generation" (Illumina/Solexa) multiplex sequencing-by-synthesis technology. As shown here for three different disease-associated BCRs, the coverage attained by this platform is sufficient to bridge the breakpoints by PCR amplification, and this procedure allows the determination of their exact nucleotide positions within a few weeks. Its implementation will greatly facilitate large-scale breakpoint mapping and gene finding in patients with disease-associated balanced translocations.
Project description:NGPS is a method for de-novo, full-length protein sequencing in high throughput. The method is based on cleavage of the protein at semi-random sites by microwave-assisted acid hydrolysis (MAAH), enrichment of LC-MS/MS amenable peptides from the hydrolysate by solid-phase-extraction, LC-MS/MS analysis, de-novo long peptide tag sequencing of resulting peptides and assembly of peptide tags into consensus contigs.
Project description:High-throughput next-generation sequencing technologies pose increasing demands on the efficiency, accuracy and usability of data analysis software. In this article, we present ZOOM Lite, a software for efficient reads mapping and result visualization. With a kernel capable of mapping tens of millions of Illumina or AB SOLiD sequencing reads efficiently and accurately, and an intuitive graphical user interface, ZOOM Lite integrates reads mapping and result visualization into a easy to use pipeline on desktop PC. The software handles both single-end and paired-end reads, and can output both the unique mapping result or the top N mapping results for each read. Additionally, the software takes a variety of input file formats and outputs to several commonly used result formats. The software is freely available at http://bioinfor.com/zoom/lite/.
Project description:Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
Project description:Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software.
Project description:We present a genome-wide approach to map DNA double-strand breaks (DSBs) at nucleotide resolution by a method we termed BLESS (direct in situ breaks labeling, enrichment on streptavidin and next-generation sequencing). We validated and tested BLESS using human and mouse cells and different DSBs-inducing agents and sequencing platforms. BLESS was able to detect telomere ends, Sce endonuclease-induced DSBs and complex genome-wide DSB landscapes. As a proof of principle, we characterized the genomic landscape of sensitivity to replication stress in human cells, and we identified >2,000 nonuniformly distributed aphidicolin-sensitive regions (ASRs) overrepresented in genes and enriched in satellite repeats. ASRs were also enriched in regions rearranged in human cancers, with many cancer-associated genes exhibiting high sensitivity to replication stress. Our method is suitable for genome-wide mapping of DSBs in various cells and experimental conditions, with a specificity and resolution unachievable by current techniques.
Project description:Balanced chromosome rearrangements (BCRs) can cause genetic diseases by disrupting or inactivating specific genes, and the characterisation of breakpoints in disease-associated BCRs has been instrumental in the molecular elucidation of a wide variety of genetic disorders. However, mapping chromosome breakpoints using traditional methods, such as in situ hybridization with fluorescent dye-labeled bacterial artificial chromosome clones (BAC-FISH), is rather laborious and time consuming. In addition, the resolution of BAC-FISH is often insufficient to unequivocally identify the disrupted gene. To overcome these limitations, we have performed shotgun sequencing of flow-sorted derivative chromosomes using ânext generationâ (Solexa/Illumina) multiplex sequencing-by-synthesis technology. As shown here for three different disease-associated BCRs, the coverage attained by this platform is sufficient to bridge the breakpoints by PCR amplification, and this procedure allows to determine their exact nucleotide positions within few weeks. Its implementation will greatly facilitate large-scale breakpoint mapping and gene finding in patients with disease-associated balanced translocations. Array CGH was performed in three carriers of balanced translocations to exclude DNA copy number changes.
Project description:Next-generation DNA sequencing (NGS) produces vast amounts of DNA sequence data, but it is not specifically designed to generate data suitable for genetic mapping. Recently developed DNA library preparation methods for NGS have helped solve this problem, however, by combining the use of reduced representation libraries with DNA sample barcoding to generate genome-wide genotype data from a common set of genetic markers across a large number of samples. Here we use such a method, called genotyping-by-sequencing (GBS), to produce a data set for genetic mapping in an F1 population of apples (Malus × domestica) segregating for skin color. We show that GBS produces a relatively large, but extremely sparse, genotype matrix: over 270,000 SNPs were discovered but most SNPs have too much missing data across samples to be useful for genetic mapping. After filtering for genotype quality and missing data, only 6% of the 85 million DNA sequence reads contributed to useful genotype calls. Despite this limitation, using existing software and a set of simple heuristics, we generated a final genotype matrix containing 3967 SNPs from 89 DNA samples from a single lane of Illumina HiSeq and used it to create a saturated genetic linkage map and to identify a known QTL underlying apple skin color. We therefore demonstrate that GBS is a cost-effective method for generating genome-wide SNP data suitable for genetic mapping in a highly diverse and heterozygous agricultural species. We anticipate future improvements to the GBS analysis pipeline presented here that will enhance the utility of next-generation DNA sequence data for the purposes of genetic mapping across diverse species.
Project description:BACKGROUND: Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions). RESULTS: We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent 'read-backmapping' to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach. CONCLUSIONS: We recommend applying our general 'two-step' mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results.
Project description:MotivationThe advent of next-generation sequencing technologies has increased the accuracy and quantity of sequence data, opening the door to greater opportunities in genomic research.ResultsIn this article, we present GNUMAP (Genomic Next-generation Universal MAPper), a program capable of overcoming two major obstacles in the mapping of reads from next-generation sequencing runs. First, we have created an algorithm that probabilistically maps reads to repeat regions in the genome on a quantitative basis. Second, we have developed a probabilistic Needleman-Wunsch algorithm which utilizes _prb.txt and _int.txt files produced in the Solexa/Illumina pipeline to improve the mapping accuracy for lower quality reads and increase the amount of usable data produced in a given experiment.AvailabilityThe source code for the software can be downloaded from http://dna.cs.byu.edu/gnumap.