Project description:Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the data set. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-?B signalling was indicated by mutations in 11 members of the NF-?B pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge.
Project description:Following the publication of this article [1], it was noted that due to a typesetting error the figure legends were paired incorrectly.
Project description:BACKGROUND: Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging. RESULTS: We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered. CONCLUSIONS: The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms.
Project description:Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.Using sequence data from a branch of the European ancestral tree as yet unsequenced, we identify variants that may be specific to this population. Through comparisons with HapMap and previous genetic association studies, we identified novel disease-associated variants, including a novel nonsense variant putatively associated with inflammatory bowel disease. We describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information. This analysis has implications for future re-sequencing studies and validates the imputation of Irish haplotypes using data from the current Human Genome Diversity Cell Line Panel (HGDP-CEPH). Finally, we identify gene duplication events as constituting significant targets of recent positive selection in the human lineage.Our findings show that there remains utility in generating whole genome sequences to illustrate both general principles and reveal specific instances of human biology. With increasing access to low cost sequencing we would predict that even armed with the resources of a small research group a number of similar initiatives geared towards answering specific biological questions will emerge.
Project description:Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action.
Project description:The COVID-19 pandemic has accounted for millions of infections and hundreds of thousand deaths worldwide in a short-time period. The patients demonstrate a great diversity in clinical and laboratory manifestations and disease severity. Nonetheless, little is known about the host genetic contribution to the observed interindividual phenotypic variability. Here, we report the first host genetic study in the Chinese population by deeply sequencing and analyzing 332 COVID-19 patients categorized by varying levels of severity from the Shenzhen Third People's Hospital. Upon a total of 22.2 million genetic variants, we conducted both single-variant and gene-based association tests among five severity groups including asymptomatic, mild, moderate, severe, and critical ill patients after the correction of potential confounding factors. Pedigree analysis suggested a potential monogenic effect of loss of function variants in GOLGA3 and DPP7 for critically ill and asymptomatic disease demonstration. Genome-wide association study suggests the most significant gene locus associated with severity were located in TMEM189-UBE2V1 that involved in the IL-1 signaling pathway. The p.Val197Met missense variant that affects the stability of the TMPRSS2 protein displays a decreasing allele frequency among the severe patients compared to the mild and the general population. We identified that the HLA-A*11:01, B*51:01, and C*14:02 alleles significantly predispose the worst outcome of the patients. This initial genomic study of Chinese patients provides genetic insights into the phenotypic difference among the COVID-19 patient groups and highlighted genes and variants that may help guide targeted efforts in containing the outbreak. Limitations and advantages of the study were also reviewed to guide future international efforts on elucidating the genetic architecture of host-pathogen interaction for COVID-19 and other infectious and complex diseases.
Project description:The CP 96-1252 cultivar of sugarcane is a complex hybrid of commercial importance. DNA was extracted from lab-grown leaf tissue and sequenced. The raw Illumina DNA sequencing results provide 101 Gbp of genome sequence reads. The dataset is available from https://www.ncbi.nlm.nih.gov/bioproject/PRJNA345486/.
Project description:With the recent completion of a high-quality sequence of the human genome, the challenge is now to understand the functional elements that it encodes. Comparative genomic analysis offers a powerful approach for finding such elements by identifying sequences that have been highly conserved during evolution. Here, we propose an initial strategy for detecting such regions by generating low-redundancy sequence from a collection of 16 eutherian mammals, beyond the 7 for which genome sequence data are already available. We show that such sequence can be accurately aligned to the human genome and used to identify most of the highly conserved regions. Although not a long-term substitute for generating high-quality genomic sequences from many mammalian species, this strategy represents a practical initial approach for rapidly annotating the most evolutionarily conserved sequences in the human genome, providing a key resource for the systematic study of human genome function.
Project description:While whole genome sequencing (WGS) of cell-free DNA (cfDNA) holds enormous promise for detection of molecular residual disease (MRD), its performance is limited by WGS error rate. Here we introduce AccuScan, an efficient cfDNA WGS technology that enables genome-wide error correction at single read-level, achieving an error rate of 4.2 × 10-7, which is about two orders of magnitude lower than a read-centric de-noising method. The application of AccuScan to MRD demonstrated analytical sensitivity down to 10-6 circulating variant allele frequency at 99% sample-level specificity. AccuScan showed 90% landmark sensitivity (within 6 weeks after surgery) and 100% specificity for predicting relapse in colorectal cancer. It also showed 67% sensitivity and 100% specificity in esophageal cancer using samples collected within one week after surgery. When AccuScan was applied to monitor immunotherapy in melanoma patients, the circulating tumor DNA (ctDNA) levels and dynamic profiles were consistent with clinical outcomes. Overall, AccuScan provides a highly accurate WGS solution for MRD detection, empowering ctDNA detection at parts per million range without requiring high sample input or personalized reagents.
Project description:Understanding the virulence mechanisms of human pathogens from the genus Fusobacterium has been hindered by a lack of properly assembled and annotated genomes. Here we report the first complete genomes for seven Fusobacterium strains, as well as resequencing of the reference strain Fusobacterium nucleatum subsp. nucleatum ATCC 25586 (total of seven species; total of eight genomes). A highly efficient and cost-effective sequencing pipeline was achieved using sample multiplexing for short-read Illumina (150 bp) and long-read Oxford Nanopore MinION (>80 kbp) platforms, coupled with genome assembly using the open-source software Unicycler. Compared to currently available draft assemblies (previously 24 to 67 contigs), these genomes are highly accurate and consist of only one complete chromosome. We present the complete genome sequence of F. nucleatum subsp. nucleatum ATCC 23726, a genetically tractable and biomedically important strain and, in addition, reveal that the previous F. nucleatum subsp. nucleatum ATCC 25586 genome assembly contains a 452-kb genomic inversion that has been corrected using our sequencing and assembly pipeline. To enable genomic analyses by the scientific community, we concurrently used these genomes to launch FusoPortal, a repository of interactive and downloadable genomic data, genome maps, gene annotations, and protein functional analyses and classifications. In summary, this report provides detailed methods for accurately sequencing, assembling, and annotating Fusobacterium genomes, while focusing on using open-source software to foster the availability of reproducible and open data. This resource will enhance efforts to properly identify virulence proteins that may contribute to a repertoire of diseases that includes periodontitis, preterm birth, and colorectal cancer.IMPORTANCEFusobacterium spp. are Gram-negative, oral bacteria that are increasingly associated with human pathologies as diverse as periodontitis, preterm birth, and colorectal cancer. While a recent surge in F. nucleatum research has increased our understanding of this human pathogen, a lack of complete genomes has hindered the identification and characterization of associated host-pathogen virulence factors. Here we report the first eight complete Fusobacterium genomes sequenced using an Oxford Nanopore MinION and Illumina sequencing pipeline and assembled using the open-source program Unicycler. These genomes are highly accurate, and seven of the genomes represent the first complete sequences for each strain. In summary, the FusoPortal resource provides a publicly available resource that will guide future genetic, bioinformatic, and biochemical experiments to characterize this genus of emerging human pathogens.