Unknown

Dataset Information

0

Applying Shannon's information theory to bacterial and phage genomes and metagenomes.


ABSTRACT: All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.

SUBMITTER: Akhter S 

PROVIDER: S-EPMC3539204 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Applying Shannon's information theory to bacterial and phage genomes and metagenomes.

Akhter Sajia S   Bailey Barbara A BA   Salamon Peter P   Aziz Ramy K RK   Edwards Robert A RA  

Scientific reports 20130108


All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequence  ...[more]

Similar Datasets

| S-EPMC3653736 | biostudies-literature
| S-EPMC4828583 | biostudies-literature
| S-EPMC7498425 | biostudies-literature
| S-EPMC4498404 | biostudies-literature
| S-EPMC5712468 | biostudies-literature
| S-EPMC7111523 | biostudies-literature
| S-EPMC4270193 | biostudies-literature
| S-EPMC1800777 | biostudies-literature
| S-EPMC8269204 | biostudies-literature
| S-EPMC4740571 | biostudies-literature