Unknown

Dataset Information

0

Predicting the capsid architecture of phages from metagenomic data.


ABSTRACT: Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids' diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene-the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method's accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems.

SUBMITTER: Lee DY 

PROVIDER: S-EPMC8814770 | biostudies-literature | 2022

REPOSITORIES: biostudies-literature

altmetric image

Publications

Predicting the capsid architecture of phages from metagenomic data.

Lee Diana Y DY   Bartels Caitlin C   McNair Katelyn K   Edwards Robert A RA   Swairjo Manal A MA   Luque Antoni A  

Computational and structural biotechnology journal 20220105


Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids' diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic informat  ...[more]

Similar Datasets

| S-EPMC7692497 | biostudies-literature
| S-EPMC4547618 | biostudies-literature
| S-EPMC5887522 | biostudies-literature
| S-EPMC7762592 | biostudies-literature
| S-EPMC11411106 | biostudies-literature
| S-EPMC4879823 | biostudies-literature
| S-EPMC3940491 | biostudies-literature
| S-EPMC9778933 | biostudies-literature
| S-EPMC5758519 | biostudies-literature
| S-EPMC11500453 | biostudies-literature