Identification of discriminative characteristics for clusters from biologic data with InforBIO software.
Ontology highlight
ABSTRACT: BACKGROUND: There are a number of different methods for generation of trees and algorithms for phylogenetic analysis in the study of bacterial taxonomy. Genotypic information, such as SSU rRNA gene sequences, now plays a more prominent role in microbial systematics than does phenotypic information. However, the integration of genotypic and phenotypic information for polyphasic studies is necessary for the classification and identification of microbes. Thus, we devised an algorithm that objectively identifies discriminative characteristics for focused clusters on generated trees from a dataset composed of coded data, such as phenotypic information. Moreover, this algorithm has been integrated into the polyphasic analysis software, InforBIO. RESULTS: We developed a differential-character-finding algorithm based on information measures and used this algorithm to identify the characteristic that best discriminates operational taxonomic unit clusters. For all characteristics in a dataset, the algorithm estimates commonality in focused clusters and diversity among clusters by scoring based on Shannon's and relative entropies. All the characteristics selected for scoring are equally weighted. Thresholds for the scores are defined to identify discriminative characteristics for clusters efficiently from a database. The unique feature of the algorithm, which is implemented in the InforBIO software, is that it can identify the phenotypic characteristics that discriminate and are associated with the clusters of a phylogenetic tree. We successfully applied this algorithm to the study of phylogenetic clusters of Pseudomonas species. CONCLUSION: The algorithm in the InforBIO software is a novel and useful approach for microbial polyphasic studies. The algorithm can also be applied to diverse cluster analyses. The InforBIO software is available from the download site http://wdcm.nig.ac.jp/inforbio/. This software is free for personal but not commercial use.
SUBMITTER: Tanaka N
PROVIDER: S-EPMC1973088 | biostudies-other | 2007
REPOSITORIES: biostudies-other
ACCESS DATA