Unknown

Dataset Information

0

CloudProteoAnalyzer: scalable processing of big data from proteomics using cloud computing.


ABSTRACT:

Summary

Shotgun proteomics is widely used in many system biology studies to determine the global protein expression profiles of tissues, cultures, and microbiomes. Many non-distributed computer algorithms have been developed for users to process proteomics data on their local computers. However, the amount of data acquired in a typical proteomics study has grown rapidly in recent years, owing to the increasing throughput of mass spectrometry and the expanding scale of study designs. This presents a big data challenge for researchers to process proteomics data in a timely manner. To overcome this challenge, we developed a cloud-based parallel computing application to offer end-to-end proteomics data analysis software as a service (SaaS). A web interface was provided to users to upload mass spectrometry-based proteomics data, configure parameters, submit jobs, and monitor job status. The data processing was distributed across multiple nodes in a supercomputer to achieve scalability for large datasets. Our study demonstrated SaaS for proteomics as a viable solution for the community to scale up the data processing using cloud computing.

Availability and implementation

This application is available online at https://sipros.oscer.ou.edu/ or https://sipros.unt.edu for free use. The source code is available at https://github.com/Biocomputing-Research-Group/CloudProteoAnalyzer under the GPL version 3.0 license.

SUBMITTER: Li J 

PROVIDER: S-EPMC10942798 | biostudies-literature | 2024

REPOSITORIES: biostudies-literature

altmetric image

Publications

CloudProteoAnalyzer: scalable processing of big data from proteomics using cloud computing.

Li Jiancheng J   Xiong Yi Y   Feng Shichao S   Pan Chongle C   Guo Xuan X  

Bioinformatics advances 20240223 1


<h4>Summary</h4>Shotgun proteomics is widely used in many system biology studies to determine the global protein expression profiles of tissues, cultures, and microbiomes. Many non-distributed computer algorithms have been developed for users to process proteomics data on their local computers. However, the amount of data acquired in a typical proteomics study has grown rapidly in recent years, owing to the increasing throughput of mass spectrometry and the expanding scale of study designs. This  ...[more]

Similar Datasets

| S-EPMC8323418 | biostudies-literature
| S-EPMC3247927 | biostudies-other
| S-EPMC7319573 | biostudies-literature
| S-EPMC8991469 | biostudies-literature
| S-EPMC9718390 | biostudies-literature
| S-EPMC10962079 | biostudies-literature
| S-EPMC4350034 | biostudies-literature
| S-EPMC7897334 | biostudies-literature
| S-EPMC5041595 | biostudies-literature
| S-EPMC6745024 | biostudies-literature