Dataset Information

In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values.

ABSTRACT: Considering as one of the major goals in quantitative proteomics, detection of the differentially expressed proteins (DEPs) plays an important role in biomarker selection and clinical diagnostics. There have been plenty of algorithms and tools focusing on DEP detection in proteomics research. However, due to the different application scopes of these methods, and various kinds of experiment designs, it is not very apparent about the best choice for large-scale proteomics data analyses. Moreover, given the fact that proteomics data usually contain high percentage of missing values (MVs), but few replicates, a systematic evaluation of the DEP detection methods combined with the MV imputation methods is essential and urgent. Here, we analyzed a total of four representative imputation methods and five DEP methods on different experimental and simulated datasets. The results showed that (i) MV imputation could not always improve the performances of DEP detection methods and the imputation effects differed in the missing value percentages; (ii) the DEP detection methods had different statistical powers affected by the percentage of MVs. Two statistical methods (i.e. the empirical Bayesian random censoring threshold model, and the significance analysis of microarray) performed better than the other evaluated methods in terms of accuracy and sensitivity.

SUBMITTER: Wang J

PROVIDER: S-EPMC5469784 | biostudies-literature | 2017 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values.

Wang Jinxia J Li Liwei L Chen Tao T Ma Jie J Zhu Yunping Y Zhuang Jujuan J Chang Cheng C

Scientific reports 20170613 1

Considering as one of the major goals in quantitative proteomics, detection of the differentially expressed proteins (DEPs) plays an important role in biomarker selection and clinical diagnostics. There have been plenty of algorithms and tools focusing on DEP detection in proteomics research. However, due to the different application scopes of these methods, and various kinds of experiment designs, it is not very apparent about the best choice for large-scale proteomics data analyses. Moreover, ...[more]

PMID: 28611393

Similar Datasets

Project description:BackgroundThe cornea is a specialized transparent connective tissue responsible for the majority of light refraction and image focus for the retina. There are three main layers of the cornea: the epithelium that is exposed and acts as a protective barrier for the eye, the center stroma consisting of parallel collagen fibrils that refract light, and the endothelium that is responsible for hydration of the cornea from the aqueous humor. Normal cornea is an immunologically privileged tissue devoid of blood vessels, but injury can produce a loss of these conditions causing invasion of other processes that degrade the homeostatic properties resulting in a decrease in the amount of light refracted onto the retina. Determining a measure and drift of phenotypic cornea state from normal to an injured or diseased state requires knowledge of the existing protein signature within the tissue. In the study of corneal proteins, proteomics procedures have typically involved the pulverization of the entire cornea prior to analysis. Separation of the epithelium and endothelium from the core stroma and performing separate shotgun proteomics using liquid chromatography/mass spectrometry results in identification of many more proteins than previously employed methods using complete pulverized cornea.ResultsRabbit corneas were purchased, the epithelium and endothelium regions were removed, proteins processed and separately analyzed using liquid chromatography/mass spectrometry. Proteins identified from separate layers were compared against results from complete corneal samples. Protein digests were separated using a six hour liquid chromatographic gradient and ion-trap mass spectrometry used for detection of eluted peptide fractions. The SEQUEST database search results were filtered to allow only proteins with match probabilities of equal or better than 10-3 and peptides with a probability of 10-2 or less with at least two unique peptides isolated within the run along with default Xcorr values. These parameters resulted in the identification of over 350 proteins, including over 225 new proteins not previously detected in the cornea by mass spectrometry. In addition, corneal layer separation resulted in identification of nearly every protein that was identified in the complete cornea assay. The epithelium and endothelium each revealed many unique proteomes specific to each layer. In the endothelium, the protein olfactomedin-like 3 was identified for the first time in the cornea by this analysis. Olfactomedin-3 is a neuronal expressed protein also known as optimedin that stimulates formation of cell adherent and cell-cell tight junctions and its expression modulates cytoskeleton organization and cell migration. However, the function of this protein in rabbit corneal endothelium is currently unknown.ConclusionThis manuscript presents a description of a more comprehensive proteomic profile for mammalian cornea compared to past methods. The use of simple dissection procedures of the tissue and the application of long chromatographic gradients, many more proteins can be identified.

Project description:Label-free quantification of shotgun LC-MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification ("FFId"), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between "internal" and "external" (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the "uncertain" feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS ( www.openms.org ).

Dataset Information

In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values.

Publications

In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets