Project description:We have developed a solution-based method for targeted DNA capture-sequencing that is directed to the complete human exome. Using this approach allows the discovery of greater than 95% of all expected heterozygous singe base variants, requires as little as 3 Gbp of raw sequence data and constitutes an effective tool for identifying rare coding alleles in large scale genomic studies.
Project description:BACKGROUND:Targeted capture of genomic regions reduces sequencing cost while generating higher coverage by allowing biomedical researchers to focus on specific loci of interest, such as exons. Targeted capture also has the potential to facilitate the generation of genomic data from DNA collected via saliva or buccal cells. DNA samples derived from these cell types tend to have a lower human DNA yield, may be degraded from age and/or have contamination from bacteria or other ambient oral microbiota. However, thousands of samples have been previously collected from these cell types, and saliva collection has the advantage that it is a non-invasive and appropriate for a wide variety of research. RESULTS:We demonstrate successful enrichment and sequencing of 15 South African KhoeSan exomes and 2 full genomes with samples initially derived from saliva. The expanded exome dataset enables us to characterize genetic diversity free from ascertainment bias for multiple KhoeSan populations, including new exome data from six HGDP Namibian San, revealing substantial population structure across the Kalahari Desert region. Additionally, we discover and independently verify thirty-one previously unknown KIR alleles using methods we developed to accurately map and call the highly polymorphic HLA and KIR loci from exome capture data. Finally, we show that exome capture of saliva-derived DNA yields sufficient non-human sequences to characterize oral microbial communities, including detection of bacteria linked to oral disease (e.g. Prevotella melaninogenica). For comparison, two samples were sequenced using standard full genome library preparation without exome capture and we found no systematic bias of metagenomic information between exome-captured and non-captured data. CONCLUSIONS:DNA from human saliva samples, collected and extracted using standard procedures, can be used to successfully sequence high quality human exomes, and metagenomic data can be derived from non-human reads. We find that individuals from the Kalahari carry a higher oral pathogenic microbial load than samples surveyed in the Human Microbiome Project. Additionally, rare variants present in the exomes suggest strong population structure across different KhoeSan populations.
Project description:Mapping-by-sequencing data for HvRAW1. Exome sequencing was done for two phenotypic bulks, each comprising 180 rough- or smooth-awned recombinants of the Morex x Barke RIL F8 population.
Project description:We propose a targeted re-sequencing simulator Wessim that generates synthetic exome sequencing reads from a given sample genome. Wessim emulates conventional exome capture technologies, including Agilent's SureSelect and NimbleGen's SeqCap, to generate DNA fragments from genomic target regions. The target regions can be either specified by genomic coordinates or inferred from in silico probe hybridization. Coupled with existing next-generation sequencing simulators, Wessim generates a realistic artificial exome sequencing data, which is essential for developing and evaluating exome-targeted variant callers.Source code and the packaged version of Wessim with manuals are available at http://sak042.github.com/Wessim/.Supplementary data are available at Bioinformatics online.
Project description:Previous studies in bulk tissue suggest that there are abundant expression quantitative trait loci (eQTLs) in human brain. This sample series is of cerebellar Purkinje cells isolated using laser capture microdissection from human cases without neurological disease but of known genotypes. These data may be helpful in confirming eQTLs in bulk tissue or in mapping other gene expression traits in an enriched neuronal population. Authorized Access data: Mapping of GEO sample accessions to dbGaP subject/sample IDs is available through dbGaP Authorized Access, see http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000249
Project description:We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpel's power to detect long (?30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.
Project description:BACKGROUND: Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. RESULTS: Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. CONCLUSIONS: Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products.
Project description:DNA was isolated from Apcmin/+;KrasLSL-G12D/+;VillinCre;Lgr5DTReGFP (AKVL), Apcmin/+;KrasLSL-G12D/+;VillinCre;Lgr5DTReGFP;p53KO (AKVPL) and Apcmin/+;KrasLSL-G12D/+;VillinCre;Lgr5DTReGFP;p53KO,Smad4KO (AKVPSL) organoids as well as the spleen of the AKVL donor animal The "SAMPLE_ID" sample characteristic is a sample identifier internal to Genentech.
Project description:Previous studies in bulk tissue suggest that there are abundant expression quantitative trait loci (eQTLs) in human brain. This sample series is of cerebellar Purkinje cells isolated using laser capture microdissection from human cases without neurological disease but of known genotypes. These data may be helpful in confirming eQTLs in bulk tissue or in mapping other gene expression traits in an enriched neuronal population. Authorized Access data: Mapping of GEO sample accessions to dbGaP subject/sample IDs is available through dbGaP Authorized Access, see http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000249 The aim of this study was to examine gene expression in isolated purkinje cells from the human cerebellum. We obtained frozen brain tissue from the cerebellum. We stained sections with cresyl violet and separated Purkinje cells based on morphology and location within the cerebellum using laser capture microdissection. Expression analyses were then performed.