Project description:Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the data set. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-?B signalling was indicated by mutations in 11 members of the NF-?B pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge.
Project description:Following the publication of this article [1], it was noted that due to a typesetting error the figure legends were paired incorrectly.
Project description:BACKGROUND: Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging. RESULTS: We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered. CONCLUSIONS: The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms.
Project description:Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.Using sequence data from a branch of the European ancestral tree as yet unsequenced, we identify variants that may be specific to this population. Through comparisons with HapMap and previous genetic association studies, we identified novel disease-associated variants, including a novel nonsense variant putatively associated with inflammatory bowel disease. We describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information. This analysis has implications for future re-sequencing studies and validates the imputation of Irish haplotypes using data from the current Human Genome Diversity Cell Line Panel (HGDP-CEPH). Finally, we identify gene duplication events as constituting significant targets of recent positive selection in the human lineage.Our findings show that there remains utility in generating whole genome sequences to illustrate both general principles and reveal specific instances of human biology. With increasing access to low cost sequencing we would predict that even armed with the resources of a small research group a number of similar initiatives geared towards answering specific biological questions will emerge.
Project description:Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action.
Project description:The CP 96-1252 cultivar of sugarcane is a complex hybrid of commercial importance. DNA was extracted from lab-grown leaf tissue and sequenced. The raw Illumina DNA sequencing results provide 101 Gbp of genome sequence reads. The dataset is available from https://www.ncbi.nlm.nih.gov/bioproject/PRJNA345486/.
Project description:The COVID-19 pandemic has accounted for millions of infections and hundreds of thousand deaths worldwide in a short-time period. The patients demonstrate a great diversity in clinical and laboratory manifestations and disease severity. Nonetheless, little is known about the host genetic contribution to the observed interindividual phenotypic variability. Here, we report the first host genetic study in the Chinese population by deeply sequencing and analyzing 332 COVID-19 patients categorized by varying levels of severity from the Shenzhen Third People's Hospital. Upon a total of 22.2 million genetic variants, we conducted both single-variant and gene-based association tests among five severity groups including asymptomatic, mild, moderate, severe, and critical ill patients after the correction of potential confounding factors. Pedigree analysis suggested a potential monogenic effect of loss of function variants in GOLGA3 and DPP7 for critically ill and asymptomatic disease demonstration. Genome-wide association study suggests the most significant gene locus associated with severity were located in TMEM189-UBE2V1 that involved in the IL-1 signaling pathway. The p.Val197Met missense variant that affects the stability of the TMPRSS2 protein displays a decreasing allele frequency among the severe patients compared to the mild and the general population. We identified that the HLA-A*11:01, B*51:01, and C*14:02 alleles significantly predispose the worst outcome of the patients. This initial genomic study of Chinese patients provides genetic insights into the phenotypic difference among the COVID-19 patient groups and highlighted genes and variants that may help guide targeted efforts in containing the outbreak. Limitations and advantages of the study were also reviewed to guide future international efforts on elucidating the genetic architecture of host-pathogen interaction for COVID-19 and other infectious and complex diseases.
Project description:With the recent completion of a high-quality sequence of the human genome, the challenge is now to understand the functional elements that it encodes. Comparative genomic analysis offers a powerful approach for finding such elements by identifying sequences that have been highly conserved during evolution. Here, we propose an initial strategy for detecting such regions by generating low-redundancy sequence from a collection of 16 eutherian mammals, beyond the 7 for which genome sequence data are already available. We show that such sequence can be accurately aligned to the human genome and used to identify most of the highly conserved regions. Although not a long-term substitute for generating high-quality genomic sequences from many mammalian species, this strategy represents a practical initial approach for rapidly annotating the most evolutionarily conserved sequences in the human genome, providing a key resource for the systematic study of human genome function.
Project description:PurposeThe Australian Genomics Cardiovascular Disorders Flagship was a national multidisciplinary collaboration. It aimed to investigate the feasibility of genome sequencing (GS) and functional genomics to resolve variants of uncertain significance (VUS) in the clinical management of patients and families with cardiomyopathies, primary arrhythmias, and congenital heart disease (CHD).MethodsBetween April 2019 and December 2021, 600 probands meeting cardiovascular disorder criteria from 17 cardiology and genetics clinics across Australia were enrolled in the Flagship and underwent GS. The Flagship adopted a tiered approach to GS analysis. Tier 1 analysis assessed genes with established clinical validity for each cardiovascular condition. Tier 2 analysis assessed lesser-evidenced research-based genes. Tier 3 analysis assessed the functional impact of VUS that remained after tier 1 and tier 2 analysis.ResultsOverall, a pathogenic or likely pathogenic variant was identified in 41% of participants with a cardiomyopathy, 40% with an arrhythmia syndrome, and 15% with a familial CHD/CHD+Extra Cardiac Anomalies. A VUS outcome ranged from 13% for arrhythmias to 34% for CHD/CHD+Extra Cardiac Anomalies participants. Tier 2 research analysis identified a likely pathogenic/pathogenic variant for a further 15 participants and a VUS for an additional 15 participants.ConclusionThe Flagship successfully facilitated a model of care that harnesses clinical GS and functional genomics for the resolution of VUS in the clinical setting. This valuable data set can be used to inform clinical practice and facilitate research into the future.
Project description:Circulating metabolite levels may reflect the state of the human organism in health and disease, however, the genetic architecture of metabolites is not fully understood. We have performed a whole-genome sequencing association analysis of both common and rare variants in up to 11,840 multi-ethnic participants from five studies with up to 1666 circulating metabolites. We have discovered 1985 novel variant-metabolite associations, and validated 761 locus-metabolite associations reported previously. Seventy-nine novel variant-metabolite associations have been replicated, including three genetic loci located on the X chromosome that have demonstrated its involvement in metabolic regulation. Gene-based analysis have provided further support for seven metabolite-replicated loci pairs and their biologically plausible genes. Among those novel replicated variant-metabolite pairs, follow-up analyses have revealed that 26 metabolites have colocalized with 21 tissues, seven metabolite-disease outcome associations have been putatively causal, and 7 metabolites might be regulated by plasma protein levels. Our results have depicted the genetic contribution to circulating metabolite levels, providing additional insights into understanding human disease.