Project description:Shotgun protein sequencing with meta-contig assembly.
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
Project description:Purpose: The goal of this study was to identify the differentially expressed genes (DEGs) in the fluoride susceptible indica rice cultivar IR-64 in response to prolonged fluoride stress. The genes exhibiting high significance of relative expression were further analyzed by RT-PCR. Results: De novo transcriptome assembly by Trinity v2.8.3 led to the identification of 158411 transcripts. The Percent GC was 49.67, contig N50 was 1327, Median contig length was 422, average contig was 768.66 and total assembled bases were 121764099. After refinement and open reading frame detection with TransDecoder 70578 transcripts were retained. Among them, 68009 transcripts had at least one hit from Uniref100, Uniprot or Pfam. Differential expression analysis identified 1303 genes to be overexpressed and 93 genes to be down regulated in response to fluoride stress. After filtering, the transcripts with absolute log2 fold change 2 or more and p-value < 0.05 were considered as significantly differentially expressed. A total of 1396 transcripts with differential expression (majority overexpressed and some down regulated) were considered for further analysis. Next, PCR analysis with gene-specific primers was performed with some of the significant DEGs associated with transport, cytoskeletal organization and signaling to identify the genes/transcripts that are involved in stress
Project description:De novo assembled transcriptomics-assisted label-free quantitative proteomics analysis reveals sex-specific proteins in the intestinal tissue of Haemaphysalis qinghaiensis
Project description:These data corresponds to RNA-Seq assays obtained from the body wall of farmed I. badionotus juveniles samples. These raw reads were used to evaluate differential gene expression between wild and farmed Isostichopus badionotus specimens. With this aim, a de-novo transcriptome assembled from wild specimens was used as reference. Further information about de-novo assembled transcriptome is available within the BioProject PRJNA639785.
2020-09-01 | GSE157183 | GEO
Project description:de novo assembled transcriptome of Helianthemum marifolium
Project description:We report the de novo assembled transcriptome of Y-organs from two intermolt and two pre-molt blue crabs. Data was obtained from RNAseq, assembled using Trinity, and differential expression was determined using DEseq2 in R.
Project description:We first report the use of next-generation massively parallel sequencing technologies and de novo transcriptome assembly to gain insight into the wide range of transcriptome of Hevea brasiliensis. The output of sequenced data showed that more than 12 million sequence reads with average length of 90nt were generated. Totally 48,768 unigenes (mean size = 488 bp) were assembled through transcriptome de novo assembly, which represent more than 3-fold of all the sequences of Hevea brasiliensis deposited in the GenBank. Assembled sequences were annotated with gene descriptions, gene ontology and clusters of orthologous group terms. Total 37,373 unigenes were successfully annotated and more than 10% of unigenes were aligned to known proteins of Euphorbiaceae. The unigenes contain nearly complete collection of known rubber-synthesis-related genes. Our data provides the most comprehensive sequence resource available for study rubber tree and demonstrates the availability of Illumina sequencing and de novo transcriptome assembly in a species lacking genome information. The transcriptome of latex and leaf in Hevea brasiliensis
Project description:While employing deep sequencing and de novo assembly to characterize the mRNA transcript profile of a cell line derived from the microbat Myotis velifer incautus, we serendipitously identified mRNAs encoding proteins with a high level of identity to herpesviruses. Next generation sequencing and de novo assembly of the viral genome from supernatants from Vero cells yielded a single contig of approximately 130 kilobases with at least 80 ORFs, predicted microRNAs and a gammaherpesvirus genomic organization. Phylogenetic analysis of the envelope glycoprotein (gB) and DNA polymerase (POLD1) revealed similarity to multiple gammaherpesvirus, including those from as yet uncultured viruses of the Rhadinovirus genus that were obtained by deep sequencing of bat tissues. Cumulatively, this study provides the first isolation and characterization of a replication competent bat gammaherpesvirus.
2016-02-01 | GSE76756 | GEO
Project description:De novo assembled genomes of Belliella spp. (Cyclobacteriaceae) strains
Project description:The genomic DNAs of strains JPCM5 and 263 of L. infantum, strains LV39 and Friedlin of L. major and strains Parrot-TarII and S125 of L. tarentolae were used in comparative genomic hybridizations to reveal the intra-species and inter-species gene content, and to validate L. tarentolae Parrot-TarII genome sequencing results. Leishmania (Sauroleishmania) tarentolae was first isolated in the lizard Tarentola mauritanica. This species is not known to be pathogenic to humans but is often used as a model organism for molecular analyses or protein overproduction. The Leishmania tarentolae Parrot-TarII strain genome sequence was resolved by high-throughput sequencing technologies. The L. tarentolae genome was first assembled de novo and then aligned against the reference L. major Friedlin genome to facilitate contig positioning and annotation, providing a 23-fold coverage of the genome. This is the first non-pathogenic to humans kinetoplastid protozoan genome to be described, and it provides an opportunity for comparison with the completed genomes of the pathogenic Leishmania species. A high synteny was observed in de novo assembled contigs between all sequenced Leishmania species. A number of limited chromosomal regions diverged between L. tarentolae and L. infantum, while remaining syntenic with L. major. Globally, over 90% of the L. tarentolae gene content was shared with the other Leishmania species. There were 250 L. major genes absent from L. tarentolae, and interestingly these missing genes were primarily expressed in the intracellular amastigote stage of the pathogenic parasites. This implies that L. tarentolae may have impaired ability to survive as an intracellular parasite. In contrast to other Leishmania genomes, two gene families were expanded in L. tarentolae, namely the leishmanolysin (GP63) and a gene related to the promastigote surface antigen (PSA31C). Overall, L. tarentolae appears to have a gene content more adapted to the insect stage rather than the mammalian one. This may partly explain its inability to replicate within mammalian macrophages and its suspected preferred life style as promastigote in the lizards.