Unknown

Dataset Information

0

MetaGeneHunt for protein domain annotation in short-read metagenomes.


ABSTRACT: The annotation of short-reads metagenomes is an essential process to understand the functional potential of sequenced microbial communities. Annotation techniques based solely on the identification of local matches tend to confound local sequence similarity and overall protein homology and thus don't mirror the complex multidomain architecture and the shuffling of functional domains in many protein families. Here, we present MetaGeneHunt to identify specific protein domains and to normalize the hit-counts based on the domain length. We used MetaGeneHunt to investigate the potential for carbohydrate processing in the mouse gastrointestinal tract. We sampled, sequenced, and analyzed the microbial communities associated with the bolus in the stomach, intestine, cecum, and colon of five captive mice. Focusing on Glycoside Hydrolases (GHs) we found that, across samples, 58.3% of the 4,726,023 short-read sequences matching with a GH domain-containing protein were located outside the domain of interest. Next, before comparing the samples, the counts of localized hits matching the domains of interest were normalized to account for the corresponding domain length. Microbial communities in the intestine and cecum displayed characteristic GH profiles matching distinct microbial assemblages. Conversely, the stomach and colon were associated with structurally and functionally more diverse and variable microbial communities. Across samples, despite fluctuations, changes in the functional potential for carbohydrate processing correlated with changes in community composition. Overall MetaGeneHunt is a new way to quickly and precisely identify discrete protein domains in sequenced metagenomes processed with MG-RAST. In addition, using the sister program "GeneHunt" to create custom Reference Annotation Table, MetaGeneHunt provides an unprecedented way to (re)investigate the precise distribution of any protein domain in short-reads metagenomes.

SUBMITTER: Berlemont R 

PROVIDER: S-EPMC7205989 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC9561172 | biostudies-literature
| S-EPMC9262707 | biostudies-literature
2023-09-01 | GSE225380 | GEO
2008-11-04 | E-GEOD-13441 | biostudies-arrayexpress
| S-EPMC5425171 | biostudies-other
| S-EPMC6612896 | biostudies-literature
| S-EPMC3464235 | biostudies-literature
| S-EPMC9628182 | biostudies-literature
| S-EPMC3821552 | biostudies-literature
| PRJNA935371 | ENA