Project description:We present MultiEditR, the first algorithm specifically designed to detect and quantify RNA editing from Sanger sequencing (z.umn.edu/multieditr). Although RNA editing is routinely evaluated by measuring the heights of peaks from in Sanger sequencing traces, the accuracy and the precision of this approach has yet to be evaluated against gold-standards next-generation sequencing methods. Through a comprehensive comparison to RNA-seq and amplicon based deep sequencing, we show that MultiEditR is accurate, precise, and reliable for detecting endogenous and programmable RNA editing.
Project description:The clinical translation of next-generation sequencing has created a paradigm shift in the diagnostic assessment of individuals with suspected rare genetic diseases. Whole-exome sequencing (WES) simultaneously examines the majority of the coding portion of the genome and is rapidly becoming accepted as an efficient alternative to clinical Sanger sequencing for diagnosing genetically heterogeneous disorders. Among reports of the clinical and diagnostic utility of WES, few studies to date have directly compared its concordance to Sanger sequencing, which is considered the clinical "gold standard". We performed a direct comparison of 391 coding and noncoding polymorphisms and variants of unknown significance identified by clinical Sanger sequencing to the WES results of 26 patients. Of the 150 well-covered coding variants identified by Sanger sequencing, 146 (97.3%) were also reported by WES. Nine genes were excluded from the comparison due to consistently low coverage in WES, which might be attributed to the use of older exome capture kits. We performed confirmatory Sanger sequencing of discordant variants; including five variants with discordant bases and four with discordant zygosity. Confirmatory Sanger sequencing supported the original Sanger report for three of the five discordant bases, one was shown to be a false positive supporting the WES data, and one result differed from both the Sanger and WES data. Two of the discordant zygosity results supported Sanger and the other two supported WES data. We report high concordance for well-covered coding variants, supporting the use of WES as a screening tool for heterogeneous disorders, and recommend the use of supplementary Sanger sequencing for poorly-covered genes when the clinical suspicion is high. Importantly, despite remaining difficulties with achieving complete coverage of the whole exome, 10 (38.5%) of the 26 compared patients were diagnosed through WES.
Project description:Total RNA was extracted from zebrafish embryos from the SAT (Sanger AB Tubingen) strain. The RNA was DNase treated. The 3' ends of fragmented RNA was pulled down using polyT oligos attached to magnetic beads, reverse transcribed, made into Illumina libraries and sequenced using IlluminaHiSeq paired-end sequencing. Protocol: Total RNA was extracted and DNase treated. Fragmented RNA was enriched for the 3 ends by pull down using a polyT oligo attached to magnetic beads. An RNA oligo comprising part of the Illumina adapter 2 was ligated to the 5 end of the captured RNA and the RNA was eluted from the beads. Reverse transcription was primed with an anchored polyT oligo with part of Illumina adapter 1 at the 5 end followed by 12 random bases, then an 8 base indexing tag, then CG and 14 T bases. An Illumina library with full adapter sequence was produced by PCR. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
Project description:Total RNA was extracted from zebrafish embryos from the SAT (Sanger AB Tbingen) strain. The RNA was DNase treated. Stranded RNAseq libraries were constructed using the Illumina TruSeq Stranded RNA protocol after treatmant with Ribozero.This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
Project description:Microbiome sequencing model is a Named Entity Recognition (NER) model that identifies and annotates microbiome nucleic acid sequencing method or platform in texts. This is the final model version used to annotate metagenomics publications in Europe PMC and enrich metagenomics studies in MGnify with sequencing metadata from literature. For more information, please refer to the following blogs: http://blog.europepmc.org/2020/11/europe-pmc-publications-metagenomics-annotations.html https://www.ebi.ac.uk/about/news/service-news/enriched-metadata-fields-mgnify-based-text-mining-associated-publications
Project description:Although high-throughput sequencers (HTS) have largely displaced their Sanger counterparts, the short read lengths and high error rates of most platforms constrain their utility for amplicon sequencing. The present study tests the capacity of single molecule, real-time (SMRT) sequencing implemented on the SEQUEL platform to overcome these limitations, employing 658 bp amplicons of the mitochondrial cytochrome c oxidase I gene as a model system.By examining templates from more than 5000 species and 20,000 specimens, the performance of SMRT sequencing was tested with amplicons showing wide variation in GC composition and varied sequence attributes. SMRT and Sanger sequences were very similar, but SMRT sequencing provided more complete coverage, especially for amplicons with homopolymer tracts. Because it can characterize amplicon pools from 10,000 DNA extracts in a single run, the SEQUEL can reduce greatly reduce sequencing costs in comparison to first (Sanger) and second generation platforms (Illumina, Ion).SMRT analysis generates high-fidelity sequences from amplicons with varying GC content and is resilient to homopolymer tracts. Analytical costs are low, substantially less than those for first or second generation sequencers. When implemented on the SEQUEL platform, SMRT analysis enables massive amplicon characterization because each instrument can recover sequences from more than 5 million DNA extracts a year.
Project description:Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.