Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

De novo sequencing of DIA data

ABSTRACT: Testing datasets and pre-trained model for DeepNovo, a deep learning-based tool for de novo sequencing of DIA data.

INSTRUMENT(S): Q Exactive

ORGANISM(S): Homo Sapiens (ncbitaxon:9606)

SUBMITTER: Ming Li

PROVIDER: MSV000082368 | MassIVE | Wed May 16 14:18:00 BST 2018

REPOSITORIES: MassIVE

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics

Project description:Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systematically varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.

2023-01-16 | PXD037803 | Pride

DeNovo Peptide Identification Deep Learning Test Set

Project description:A set of bottom-up proteomics data for testing the deep learning network trained with data in PXD010000

2022-09-25 | PXD010613 | Pride

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers

Project description:Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood and enhancer de novo design is considered impossible. Here we built a deep learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally non-equivalent instances of the same TF motif that are determined by motif-flanking sequence and inter-motif distances. We validated these rules experimentally and demonstrated their conservation in human by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo. This SuperSeries is composed of the SubSeries listed below.

2022-02-24 | GSE183939 | GEO

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers [Human oligo UMI-STARR-seq]

2022-02-24 | GSE183938 | GEO

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers [Drosophila oligo UMI-STARR-seq]

2022-02-24 | GSE183937 | GEO

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers [Drosophila genome-wide UMI-STARR-seq]

2022-02-24 | GSE183936 | GEO

A transformer model for de novo sequencing of data-independent acquisition mass spectrometry data

Project description:A core computational challenge in the analysis of mass spectrometry data is the de novo sequencing problem, in which the generating amino acid sequence is inferred directly from an observed fragmentation spectrum without the use of a sequence database. Recently, deep learning models have made significant advances in de novo sequencing by learning from massive datasets of high confidence labeled mass spectra. However, these methods are primarily designed for data-dependent acquisition (DDA) experiments. Over the past decade, the field of mass spectrometry has been moving toward using data-independent acquisition (DIA) protocols for the analysis of complex proteomic samples due to their superior specificity and reproducibility. Hence, we present a new de novo sequencing model called Cascadia, which uses a transformer architecture to handle the more complex data generated by DIA protocols. In comparisons with existing approaches for de novo sequencing of DIA data, Cascadia achieves improved performance across a range of instruments and experimental protocols. Additionally, we demonstrate Cascadia’s ability to accurately discover de novo coding variants and peptides from the variable region of antibodies.

2024-06-21 | PXD053291 | panorama

De novo peptide sequencing by deep learning

Project description:Including all training and testing datasets, pretrained models, and source code of DeepNovo.

2017-07-25 | MSV000081382 | MassIVE

De novo assembly of siRNA immunity in wild plants

Project description:We describe an application of deep sequencing and de novo assembly of short RNA reads to investigate small interfering (si)RNAs mediated immunity in leaf samples from eight tree taxa naturally occurring in Wytham Woods, Oxfordshire, UK. BLAST search for homologues of contigs in the GenBank identified siRNA populations against a number of RNA viruses and a Ty1-copia retrotransposons in these tree species. Small RNA sequencing and de novo assembly

2012-06-01 | E-GEOD-22079 | biostudies-arrayexpress

Application of de novo sequencing to large-scale complex proteomics datasets

Project description:Dependent on concise, pre-defined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large scale proteomics datasets, and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) which leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.

2016-01-12 | PXD003317 | Pride

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data