Project description:Single cell RNA sequencing has enabled unprecedented insights into the molecular cues and cellular heterogeneity underlying human disease. However, the high costs and complexity of single cell methods remain a major obstacle for generating large scale human cohorts. Here we compare current state-of-the-art single cell multiplexing technologies, and provide a new widely applicable demultiplexing method, SNP-Fishing, that enables simple, robust high-throughput multiplexing leveraging genetic variability of patients.
Project description:Single cell RNA sequencing has enabled unprecedented insights into the molecular cues and cellular heterogeneity underlying human disease. However, the high costs and complexity of single cell methods remain a major obstacle for generating large scale human cohorts. Here we compare current state-of-the-art single cell multiplexing technologies, and provide a new widely applicable demultiplexing method, SNP-Fishing, that enables simple, robust high-throughput multiplexing leveraging genetic variability of patients.
Project description:Here, we introduce an in-silico algorithm demuxlet that harnesses naturally occurring genetic variation in a pool of cells from unrelated individuals to discover the sample identity of each cell and identify droplets containing cells from two different individuals (doublets). These two capabilities enable a simple multiplexing design that increases single cell library construction throughput by experimental design where cells from genetically diverse samples are multiplexed and captured at 2-10x over standard workflows. We further demonstrate the utility of sample multiplexing by characterizing the interindividual variability in cell type-specific responses of ~15k PBMCs to interferon-beta, a potent cytokine. Our computational tool enables sample multiplexing of droplet-based single cell RNA-seq for large-scale studies of population variation and could be extended to other single cell datasets that incorporate natural or synthetic DNA barcodes.
Project description:A comprehensive annotation of transcript isoforms in domesticated species is lacking. Especially considering that transcriptome complexity and splicing patterns are not well-conserved between species, this presents a substantial obstacle to genomic selection programs that seek to improve production, disease resistance, and reproduction. Recent advances in long-read sequencing technology have made it possible to directly extrapolate the structure of full-length transcripts without the need for transcript reconstruction. In this study, we demonstrate the power of long-read sequencing for transcriptome annotation by coupling Oxford Nanopore Technology (ONT) with large-scale multiplexing of 93 samples, comprising 32 tissues collected from adult male and female Hereford cattle. More than 30 million uniquely mapping full-length reads were obtained from a single ONT flow cell, and used to identify and characterize the expression dynamics of 99,044 transcript isoforms at 31,824 loci. Of these predicted transcripts, 21% exactly matched a reference transcript, and 61% were novel isoforms of reference genes, substantially increasing the ratio of transcript variants per gene, and suggesting that the complexity of the bovine transcriptome is comparable to that in humans. Over 7,000 transcript isoforms were extremely tissue-specific, and 61% of these were attributed to testis, which exhibited the most complex transcriptome of all interrogated tissues. Despite profiling over 30 tissues, transcription was only detected at about 60% of reference loci. Consequently, additional studies will be necessary to continue characterizing the bovine transcriptome in additional cell types, developmental stages, and physiological conditions. However, by here demonstrating the power of ONT sequencing coupled with large-scale multiplexing, the task of exhaustively annotating the bovine transcriptome - or any mammalian transcriptome - appears significantly more feasible.
Project description:In late 2019, a novel coronavirus began spreading in Wuhan, China, causing a potentially lethal respiratory viral infection. By early 2020, the novel coronavirus, called SARS-CoV-2, had spread globally, causing the COVID-19 pandemic. The infection and mutation rates of SARS-CoV-2 make it amenable to tracking introduction, spread and evolution by viral genome sequencing. Efforts to develop effective public health policies, therapeutics, or vaccines to treat or prevent COVID-19 are also expected to benefit from tracking mutations of the SARS-CoV-2 virus. Here we describe a set of comprehensive working protocols, from viral RNA extraction to analysis using established visualization tools, for high throughput sequencing of SARS-CoV-2 viral genomes using a MinION instrument. This set of protocols should serve as a reliable "how-to" reference for generating quality SARS-CoV-2 genome sequences with ARTIC primer sets and long-read nanopore sequencing technology. In addition, many of the preparation, quality control, and analysis steps will be generally applicable to other sequencing platforms.
Project description:Multitasking is the pivotal feature in next-generation chemo- or bioanalyses. However, simultaneous analyses rarely exceed over three different tasks, which is ascribed to the limited space to accommodate analyzing units and the compromised signal-to-noise (S/N) level as the number of tasks increases. Here, by leveraging superior S/N of single-molecule techniques, we analyzed five microRNA biomarkers by spatially encoding miRNA recognition units with nanometers resolution in a DNA template, while decoding the analyte binding temporally in seconds. The hairpin stem is interspersed by internal loops to encode recognition units for miRNA. By mechanical unfolding of the hairpin, individual internal loops are sequentially interrogated for the binding of each miRNA. Using this so-called topochemical spatiotemporal analysis, we were able to achieve subpicomolar detection limits of miRNAs. We anticipate that this new single-molecule topochemical analysis can massively analyze single-molecule targets.
Project description:The COVID-19 pandemic has spread rapidly throughout the world. In the UK, the initial peak was in April 2020; in the county of Norfolk (UK) and surrounding areas, which has a stable, low-density population, over 3200 cases were reported between March and August 2020. As part of the activities of the national COVID-19 Genomics Consortium (COG-UK) we undertook whole genome sequencing of the SARS-CoV-2 genomes present in positive clinical samples from the Norfolk region. These samples were collected by four major hospitals, multiple minor hospitals, care facilities and community organizations within Norfolk and surrounding areas. We combined clinical metadata with the sequencing data from regional SARS-CoV-2 genomes to understand the origins, genetic variation, transmission and expansion (spread) of the virus within the region and provide context nationally. Data were fed back into the national effort for pandemic management, whilst simultaneously being used to assist local outbreak analyses. Overall, 1565 positive samples (172 per 100 000 population) from 1376 cases were evaluated; for 140 cases between two and six samples were available providing longitudinal data. This represented 42.6 % of all positive samples identified by hospital testing in the region and encompassed those with clinical need, and health and care workers and their families. In total, 1035 cases had genome sequences of sufficient quality to provide phylogenetic lineages. These genomes belonged to 26 distinct global lineages, indicating that there were multiple separate introductions into the region. Furthermore, 100 genetically distinct UK lineages were detected demonstrating local evolution, at a rate of ~2 SNPs per month, and multiple co-occurring lineages as the pandemic progressed. Our analysis: identified a discrete sublineage associated with six care facilities; found no evidence of reinfection in longitudinal samples; ruled out a nosocomial outbreak; identified 16 lineages in key workers which were not in patients, indicating infection control measures were effective; and found the D614G spike protein mutation which is linked to increased transmissibility dominates the samples and rapidly confirmed relatedness of cases in an outbreak at a food processing facility. The large-scale genome sequencing of SARS-CoV-2-positive samples has provided valuable additional data for public health epidemiology in the Norfolk region, and will continue to help identify and untangle hidden transmission chains as the pandemic evolves.