Unknown

Dataset Information

0

Domain-centric database to uncover structure of minimally characterized viral genomes.


ABSTRACT: Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data provided contain information such as sequence position and neighboring domains from 30,947 pHMM-identified domains from each reference viral genome. Domains were identified from viral whole-genome sequence using automated profile Hidden Markov Models (pHMM). This study also describes the framework for constructing "domain neighborhoods", as well as the dataset representing it. These data can be used to examine shared and differing domain architectures across viral genomes, to elucidate potential functional properties of genes, and potentially to classify viruses.

SUBMITTER: Bramley JC 

PROVIDER: S-EPMC7316859 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Domain-centric database to uncover structure of minimally characterized viral genomes.

Bramley John C JC   Yenkin Alex L AL   Zaydman Mark A MA   DiAntonio Aaron A   Milbrandt Jeffrey D JD   Buchser William J WJ  

Scientific data 20200625 1


Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data provided contain information such as sequence position and neighboring domains from 30,947 pHMM-identified domains from each reference viral genome. Domains were identified from viral whole-genome seque  ...[more]

Similar Datasets

| S-EPMC155287 | biostudies-literature
| S-EPMC3531119 | biostudies-literature
| S-EPMC3013797 | biostudies-literature
| S-EPMC5054492 | biostudies-literature
| S-EPMC7614987 | biostudies-literature
| S-EPMC4885607 | biostudies-literature
| S-EPMC4699468 | biostudies-literature
| S-EPMC5753281 | biostudies-literature
| S-EPMC3965063 | biostudies-literature
| S-EPMC7809626 | biostudies-literature