Proteomics

Dataset Information

0

GLEAMS clustering dark proteome


ABSTRACT: GLEAMS is a deep neural network to embed spectra into a low-dimensional space in which spectra generated by the same peptide are close to one another. We have used GLEAMS as the basis for a large-scale spectrum clustering, detecting groups of unidentified, proximal spectra representing the same peptide. GLEAMS was used to embed 669 million spectra from the MassIVE-KB dataset, after which hierarchical clustering with average linkage was used to cluster the embeddings. Medoid spectra were extracted from clusters consisting of only unidentified spectra, resulting in 45 million medoid spectra representing 257 million clustered spectra. The medoid spectra were split into two groups based on cluster size (size two and size greater than two) and exported to two MGF files. ANN-SoLo was used for open modification searching, identifying 5.3 million peptide-spectrum matches. We here present the originally unidentified cluster medoid spectra and the ANN-SoLo identification results as a community resource. This is a valuable dataset to further explore the dark proteome, by investigating spectra that are observed repeatedly across many experiments but consistently remain unidentified.

INSTRUMENT(S): various instruments

ORGANISM(S): Homo Sapiens (ncbitaxon:9606)

SUBMITTER: William Stafford Noble  

PROVIDER: MSV000088598 | MassIVE | Tue Dec 21 13:18:00 GMT 2021

REPOSITORIES: MassIVE

Dataset's files

Source:
Action DRS
Other
Items per page:
1 - 1 of 1

Similar Datasets

2021-12-21 | MSV000088598 | GNPS
2021-12-21 | MSV000088599 | MassIVE
2022-03-16 | PXD022124 | Pride
2017-04-24 | PXD004896 | Pride
2016-11-25 | GSE88790 | GEO
| MSV000084314 | GNPS
2020-06-22 | PXD019880 | Pride
2018-10-04 | GSE112623 | GEO
| PRJNA607566 | ENA
2023-09-27 | GSE217930 | GEO