Project description:De novo peptide sequencing is a fundamental research area in mass spectrometry (MS) based proteomics. However, those methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) and the significance of de novo peptide-spectrum matches (PSMs). Here we propose NovoBoard, a comprehensive framework to evaluate the performance of de novo peptide sequencing methods. The framework consists of diverse benchmark datasets (including tryptic, nontryptic, immunopeptidomics, and different species), and a standard set of accuracy metrics to evaluate the fragment ions, amino acids, and peptides of the de novo results. More importantly, a new approach is designed to evaluate de novo peptide sequencing methods on target-decoy spectra and to estimate their FDRs. Our results thoroughly reveal the strengths and weaknesses of different de novo peptide sequencing methods, and how their performances depend on specific applications and the types of data. Our FDR estimation also shows that some tools may perform better than the others in distinguishing between de novo PSMs and random matches, and can be used to assess the significance of de novo PSMs.
Project description:We report the de novo assembled transcriptome of Y-organs from two intermolt and two pre-molt blue crabs. Data was obtained from RNAseq, assembled using Trinity, and differential expression was determined using DEseq2 in R.
2020-07-17 | GSE154560 | GEO
Project description:Diprion similis museum specimen whole genomes
Project description:Dependent on concise, pre-defined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large scale proteomics datasets, and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) which leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.
Project description:Precision de novo peptide sequencing using mirror proteases of Ac-LysargiNase and trypsin for large-scale proteomicsPrecision de novo peptide sequencing using mirror proteases of Ac-LysargiNase and trypsin for large-scale proteomics
Project description:De novo centromeres originate occasionally from non-centromeric regions of chromosomes, providing an excellent model system to study centromeric chromatin. The maize mini-chromosome Derivative 3-3 contains a de novo centromere, which was derived from a euchromatic site on the short arm of chromosome 9 that lacks traditional centromeric repeat sequences. Our previous study found that the CENH3 binding domain of this de novo centromere is only 288 kb with a high-density gene distribution with low-density of transposons. Here we applied next generation sequencing technology to analyze gene transcription, DNA methylation for this region. Our RNA-seq data revealed that active chromatin is not a barrier for de novo centromere formation. Bisulfite-ChIP-seq results indicate a slightly increased DNA methylation level after de novo centromere formation, reaching the level of a native centromere. These results provide insight into the mechanism of de novo centromere formation and subsequent consequences. RNA-seq was carried out using material from seedling and young leaves between control and Derivative 3-3. Bisulfite-ChIP-seq was carried out with anti-CENH3 antibodies using material from young leaves in Derivative 3-3.
Project description:Formalin induces inter- and intra-molecular crosslinks within exposed cells. This cross-linking can be exploited to characterise chromatin state as in the FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) and MNase (micrococcal nuclease) assays. Here, we optimised the FAIRE and MNase assays for application upon heavily-fixed tissues as is typically found in historical formalin-preserved museum specimens. We demonstrate these assays in formalin-fixed mouse specimens and compare the chromatin signatures to specimen-matched fresh tissues. We found that heavy formalin fixation modulates rather than eliminates signatures of differential chromatin accessibility and that these chromatin profiles are reproducible, tissue-specific and sex-specific in vertebrate specimens.
Project description:One tooth of a lamprey and one piece of trunc skin was lysed and analysed for its protein content. The samples were generously provided by the Museum of Natural History Vienna. The samples were stored in ethanol and the origin of the specimen is not known.
Project description:For this manuscript, the Prochlorococcus MED4 strain shotgun proteome dataset was used for benchmarking a de novo-directed sequencing approach. De novo peptide sequencing, where the sequence of amino acids is determined directly from mass spectra rather than by comparison (or peptide spectrum matching) to a selected database. We perform a benchmarking experiment using Prochlorococcus culture data, demonstrating de novo peptides are sufficiently accurate and taxonomically specific to be useful in environmental studies. The MED4 dataset herein represents the output from peptide spectrum matching using COMET within the transproteomic pipeline (TPP). Additional MED4 data outside this manuscript are included for both trypsin and Glu-C protease digestions as well as TPP output for post-translational modification searches. De novo output data derived from Peaks Studio can be found by referencing the manuscript publication.