Unknown

Dataset Information

0

Within- and cross-species predictions of plant specialized metabolism genes using transfer learning.


ABSTRACT: Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like Arabidopsis thaliana have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learning strategy, transfer learning, where knowledge from A. thaliana is transferred to predict gene functions in cultivated tomato with fewer experimentally annotated genes. The first tomato SM/GM prediction model using only tomato data performs well (F-measure = 0.74, compared with 0.5 for random and 1.0 for perfect predictions), but from manually curating 88 SM/GM genes, we found many mis-predicted entries were likely mis-annotated. When the SM/GM prediction models built with A. thaliana data were used to filter out genes where the A. thaliana-based model predictions disagreed with tomato annotations, the new tomato model trained with filtered data improved significantly (F-measure = 0.92). Our study demonstrates that SM/GM genes can be better predicted by leveraging cross-species information. Additionally, our findings provide an example for transfer learning in genomics where knowledge can be transferred from an information-rich species to an information-poor one.

SUBMITTER: Moore BM 

PROVIDER: S-EPMC7731531 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

Within- and cross-species predictions of plant specialized metabolism genes using transfer learning.

Moore Bethany M BM   Wang Peipei P   Fan Pengxiang P   Lee Aaron A   Leong Bryan B   Lou Yann-Ru YR   Schenck Craig A CA   Sugimoto Koichi K   Last Robert R   Lehti-Shiu Melissa D MD   Barry Cornelius S CS   Shiu Shin-Han SH  

In silico plants 20200730 1


Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like <i>Arabidopsis thaliana</i> have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learnin  ...[more]

Similar Datasets

| S-EPMC6369796 | biostudies-literature
| S-EPMC7286674 | biostudies-literature
| S-EPMC8045754 | biostudies-literature
| S-EPMC6886449 | biostudies-literature
| S-EPMC9590541 | biostudies-literature
| S-EPMC10949956 | biostudies-literature
| S-EPMC5723210 | biostudies-literature
2020-04-22 | GSE142238 | GEO
| S-EPMC10217997 | biostudies-literature
| S-EPMC6603238 | biostudies-literature