Unknown

Dataset Information

0

Photosynthetic protein classification using genome neighborhood-based machine learning feature.


ABSTRACT: Identification of novel photosynthetic proteins is important for understanding and improving photosynthetic efficiency. Synergistically, genome neighborhood can provide additional useful information to identify photosynthetic proteins. We, therefore, expected that applying a computational approach, particularly machine learning (ML) with the genome neighborhood-based feature should facilitate the photosynthetic function assignment. Our results revealed a functional relationship between photosynthetic genes and their conserved neighboring genes observed by 'Phylo score', indicating their functions could be inferred from the genome neighborhood profile. Therefore, we created a new method for extracting patterns based on the genome neighborhood network (GNN) and applied them for the photosynthetic protein classification using ML algorithms. Random forest (RF) classifier using genome neighborhood-based features achieved the highest accuracy up to 87% in the classification of photosynthetic proteins and also showed better performance (Mathew's correlation coefficient = 0.718) than other available tools including the sequence similarity search (0.447) and ML-based method (0.361). Furthermore, we demonstrated the ability of our model to identify novel photosynthetic proteins compared to the other methods. Our classifier is available at http://bicep2.kmutt.ac.th/photomod_standalone, https://bit.ly/2S0I2Ox and DockerHub: https://hub.docker.com/r/asangphukieo/photomod.

SUBMITTER: Sangphukieo A 

PROVIDER: S-EPMC7189237 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Photosynthetic protein classification using genome neighborhood-based machine learning feature.

Sangphukieo Apiwat A   Laomettachit Teeraphan T   Ruengjitchatchawalya Marasri M  

Scientific reports 20200428 1


Identification of novel photosynthetic proteins is important for understanding and improving photosynthetic efficiency. Synergistically, genome neighborhood can provide additional useful information to identify photosynthetic proteins. We, therefore, expected that applying a computational approach, particularly machine learning (ML) with the genome neighborhood-based feature should facilitate the photosynthetic function assignment. Our results revealed a functional relationship between photosynt  ...[more]

Similar Datasets

| S-EPMC7180962 | biostudies-literature
2016-11-23 | GSE85539 | GEO
| S-EPMC7459797 | biostudies-literature
| S-EPMC10115869 | biostudies-literature
| S-EPMC7927045 | biostudies-literature
| S-EPMC7815773 | biostudies-literature
| S-EPMC5088188 | biostudies-literature
| S-EPMC9098537 | biostudies-literature
| S-EPMC6875180 | biostudies-literature
| S-EPMC5736185 | biostudies-other