Dataset Information

A cost-sensitive online learning method for peptide identification.

ABSTRACT:

Background

Post-database search is a key procedure in peptide identification with tandem mass spectrometry (MS/MS) strategies for refining peptide-spectrum matches (PSMs) generated by database search engines. Although many statistical and machine learning-based methods have been developed to improve the accuracy of peptide identification, the challenge remains on large-scale datasets and datasets with a distribution of unbalanced PSMs. A more efficient learning strategy is required for improving the accuracy of peptide identification on challenging datasets. While complex learning models have larger power of classification, they may cause overfitting problems and introduce computational complexity on large-scale datasets. Kernel methods map data from the sample space to high dimensional spaces where data relationships can be simplified for modeling.

Results

In order to tackle the computational challenge of using the kernel-based learning model for practical peptide identification problems, we present an online learning algorithm, OLCS-Ranker, which iteratively feeds only one training sample into the learning model at each round, and, as a result, the memory requirement for computation is significantly reduced. Meanwhile, we propose a cost-sensitive learning model for OLCS-Ranker by using a larger loss of decoy PSMs than that of target PSMs in the loss function.

Conclusions

The new model can reduce its false discovery rate on datasets with a distribution of unbalanced PSMs. Experimental studies show that OLCS-Ranker outperforms other methods in terms of accuracy and stability, especially on datasets with a distribution of unbalanced PSMs. Furthermore, OLCS-Ranker is 15-85 times faster than CRanker.

SUBMITTER: Liang X

PROVIDER: S-EPMC7183122 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A cost-sensitive online learning method for peptide identification.

Liang Xijun X Xia Zhonghang Z Jian Ling L Wang Yongxiang Y Niu Xinnan X Link Andrew J AJ

BMC genomics 20200425 1

<h4>Background</h4>Post-database search is a key procedure in peptide identification with tandem mass spectrometry (MS/MS) strategies for refining peptide-spectrum matches (PSMs) generated by database search engines. Although many statistical and machine learning-based methods have been developed to improve the accuracy of peptide identification, the challenge remains on large-scale datasets and datasets with a distribution of unbalanced PSMs. A more efficient learning strategy is required for i ...[more]

PMID: 32334531

Similar Datasets

Project description:Deep learning-based models have been employed for the detection and classification of skin diseases through medical imaging. However, deep learning-based models are not effective for rare skin disease detection and classification. This is mainly due to the reason that rare skin disease has very a smaller number of data samples. Thus, the dataset will be highly imbalanced, and due to the bias in learning, most of the models give better performances. The deep learning models are not effective in detecting the affected tiny portions of skin disease in the overall regions of the image. This paper presents an attention-cost-sensitive deep learning-based feature fusion ensemble meta-classifier approach for skin cancer detection and classification. Cost weights are included in the deep learning models to handle the data imbalance during training. To effectively learn the optimal features from the affected tiny portions of skin image samples, attention is integrated into the deep learning models. The features from the finetuned models are extracted and the dimensionality of the features was further reduced by using a kernel-based principal component (KPCA) analysis. The reduced features of the deep learning-based finetuned models are fused and passed into ensemble meta-classifiers for skin disease detection and classification. The ensemble meta-classifier is a two-stage model. The first stage performs the prediction of skin disease and the second stage performs the classification by considering the prediction of the first stage as features. Detailed analysis of the proposed approach is demonstrated for both skin disease detection and skin disease classification. The proposed approach demonstrated an accuracy of 99% on skin disease detection and 99% on skin disease classification. In all the experimental settings, the proposed approach outperformed the existing methods and demonstrated a performance improvement of 4% accuracy for skin disease detection and 9% accuracy for skin disease classification. The proposed approach can be used as a computer-aided diagnosis (CAD) tool for the early diagnosis of skin cancer detection and classification in healthcare and medical environments. The tool can accurately detect skin diseases and classify the skin disease into their skin disease family.

Dataset Information

A cost-sensitive online learning method for peptide identification.

Background

Results

Conclusions

Publications

A cost-sensitive online learning method for peptide identification.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets