Unknown

Dataset Information

0

Predicting the multi-label protein subcellular localization through multi-information fusion and MLSI dimensionality reduction based on MLFE classifier.


ABSTRACT: Multi-label protein subcellular localization (SCL) is an indispensable way to study protein function. It can locate a certain protein (such as the human transmembrane protein that promotes the invasion of the SARS-CoV-2) or expression product at a specific location in a cell, which can provide a reference for clinical treatment of diseases such as COVID-19. The paper proposes a novel method named ML-locMLFE. First of all, six feature extraction methods are adopted to obtain protein effective information. These methods include pseudo amino acid composition (PseAAC), encoding based on grouped weight (EBGW), gene ontology (GO), multi-scale continuous and discontinuous (MCD), residue probing transformation (RPT) and evolutionary distance transformation (EDT). In the next part, we utilize the multi-label information latent semantic index (MLSI) method to avoid the interference of redundant information. In the end, multi-label learning with feature induced labeling information enrichment (MLFE) is adopted to predict the multi-label protein SCL. The Gram-positive bacteria dataset is chosen as a training set, while the Gram-negative bacteria dataset, virus dataset, newPlant dataset and SARS-CoV-2 dataset as the test sets. The overall actual accuracy (OAA) of the first four datasets is 99.23%, 93.82%, 93.24%, and 96.72% by the leave-one-out cross validation (LOOCV). It is worth mentioning that the OAA prediction result of our predictor on the SARS-CoV-2 dataset is 72.73%. The results indicate that the ML-locMLFE method has obvious advantages in predicting the SCL of multi-label protein, which provides new ideas for further research on the SCL of multi-label protein. The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/ML-locMLFE/. Supplementary data are available at Bioinformatics online.

SUBMITTER: Liu Y 

PROVIDER: S-EPMC8690230 | biostudies-literature | 2021 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predicting the multi-label protein subcellular localization through multi-information fusion and MLSI dimensionality reduction based on MLFE classifier.

Liu Yushuang Y   Jin Shuping S   Gao Hongli H   Wang Xue X   Wang Congjing C   Zhou Weifeng W   Yu Bin B  

Bioinformatics (Oxford, England) 20220201 5


<h4>Motivation</h4>Multi-label (ML) protein subcellular localization (SCL) is an indispensable way to study protein function. It can locate a certain protein (such as the human transmembrane protein that promotes the invasion of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)) or expression product at a specific location in a cell, which can provide a reference for clinical treatment of diseases such as coronavirus disease 2019 (COVID-19).<h4>Results</h4>The article proposes a n  ...[more]

Similar Datasets

| S-EPMC3068162 | biostudies-literature
| S-EPMC4914962 | biostudies-literature
| S-EPMC3117797 | biostudies-literature
| S-EPMC4765148 | biostudies-literature
| S-EPMC3374840 | biostudies-literature
| S-EPMC10877048 | biostudies-literature
| S-EPMC9252801 | biostudies-literature
| S-EPMC5001230 | biostudies-literature
| S-EPMC10829269 | biostudies-literature
| S-EPMC4860209 | biostudies-literature