Ontology highlight
ABSTRACT: Summary
A MapReduce-based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs.Availability
The source code along with user documentation are available on http://compbio.eecs.wsu.edu/MR-MSPolygraph.Contact
ananth@eecs.wsu.edu; william.cannon@pnnl.gov.Supplementary information
Supplementary data are available at Bioinformatics online.
SUBMITTER: Kalyanaraman A
PROVIDER: S-EPMC3198583 | biostudies-literature | 2011 Nov
REPOSITORIES: biostudies-literature
Kalyanaraman Ananth A Cannon William R WR Latt Benjamin B Baxter Douglas J DJ
Bioinformatics (Oxford, England) 20110916 21
<h4>Summary</h4>A MapReduce-based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduc ...[more]