Dataset Information

IProX in 2021: connecting proteomics data sharing with big data.

ABSTRACT: The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics.

SUBMITTER: Chen T

PROVIDER: S-EPMC8728291 | biostudies-literature | 2022 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

iProX in 2021: connecting proteomics data sharing with big data.

Chen Tao T Ma Jie J Liu Yi Y Chen Zhiguang Z Xiao Nong N Lu Yutong Y Fu Yinjin Y Yang Chunyuan C Li Mansheng M Wu Songfeng S Wang Xue X Li Dongsheng D He Fuchu F Hermjakob Henning H Zhu Yunping Y

Nucleic acids research 20220101 D1

The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-conv ...[more]

PMID: 34871441

Dataset Information

IProX in 2021: connecting proteomics data sharing with big data.

Publications

iProX in 2021: connecting proteomics data sharing with big data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Big data from small data: data-sharing in the 'long tail' of neuroscience.
| S-EPMC4728080 | biostudies-literature

P2P proteomics -- data sharing for enhanced protein identification.
| S-EPMC3298698 | biostudies-literature

Responsible data sharing in a big data-driven translational research platform: lessons learned.
| S-EPMC6936121 | biostudies-literature

On the privacy risks of sharing clinical proteomics data.
| S-EPMC5009298 | biostudies-literature

xiSPEC: web-based visualization, analysis and sharing of proteomics data.
| S-EPMC6030980 | biostudies-literature

MatSwarm: trusted swarm transfer learning driven materials computation for secure big data sharing.
| S-EPMC11519480 | biostudies-literature

CloudProteoAnalyzer: scalable processing of big data from proteomics using cloud computing.
| S-EPMC10942798 | biostudies-literature

The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics.
| S-EPMC7145525 | biostudies-literature

When big data initiatives meet: Data sharing between THANADOS and IsoArcH for early medieval cemeteries in Austria.
| S-EPMC10293964 | biostudies-literature

A proteomics sample metadata representation for multiomics integration and big data analysis.
| S-EPMC8494749 | biostudies-literature