Unknown

Dataset Information

0

Discovery of novel carbohydrate-active enzymes through the rational exploration of the protein sequences space.


ABSTRACT: Over the last two decades, the number of gene/protein sequences gleaned from sequencing projects of individual genomes and environmental DNA has grown exponentially. Only a tiny fraction of these predicted proteins has been experimentally characterized, and the function of most proteins remains hypothetical or only predicted based on sequence similarity. Despite the development of postgenomic methods, such as transcriptomics, proteomics, and metabolomics, the assignment of function to protein sequences remains one of the main challenges in modern biology. As in all classes of proteins, the growing number of predicted carbohydrate-active enzymes (CAZymes) has not been accompanied by a systematic and accurate attribution of function. Taking advantage of the CAZy database, which groups CAZymes into families and subfamilies based on amino acid similarities, we recombinantly produced 564 proteins selected from subfamilies without any biochemically characterized representatives, from distant relatives of characterized enzymes and from nonclassified proteins that show little similarity with known CAZymes. Screening these proteins for activity on a wide collection of carbohydrate substrates led to the discovery of 13 CAZyme families (two of which were also discovered by others during the course of our work), revealed three previously unknown substrate specificities, and assigned a function to 25 subfamilies.

SUBMITTER: Helbert W 

PROVIDER: S-EPMC6442616 | biostudies-literature | 2019 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Discovery of novel carbohydrate-active enzymes through the rational exploration of the protein sequences space.

Helbert William W   Poulet Laurent L   Drouillard Sophie S   Mathieu Sophie S   Loiodice Mélanie M   Couturier Marie M   Lombard Vincent V   Terrapon Nicolas N   Turchetto Jeremy J   Vincentelli Renaud R   Henrissat Bernard B  

Proceedings of the National Academy of Sciences of the United States of America 20190308 13


Over the last two decades, the number of gene/protein sequences gleaned from sequencing projects of individual genomes and environmental DNA has grown exponentially. Only a tiny fraction of these predicted proteins has been experimentally characterized, and the function of most proteins remains hypothetical or only predicted based on sequence similarity. Despite the development of postgenomic methods, such as transcriptomics, proteomics, and metabolomics, the assignment of function to protein se  ...[more]

Similar Datasets

| S-EPMC7348039 | biostudies-literature
| S-EPMC4978508 | biostudies-literature
| S-EPMC3965031 | biostudies-literature
| S-EPMC3261701 | biostudies-literature
| S-EPMC2696019 | biostudies-literature
| S-EPMC4636310 | biostudies-literature
| S-EPMC8555327 | biostudies-literature
| S-EPMC8159259 | biostudies-literature
| S-EPMC4125193 | biostudies-literature
| S-EPMC3178493 | biostudies-literature