Dataset Information

HIPPI: highly accurate protein family classification with ensembles of HMMs.

ABSTRACT:

Background

Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics.

Results

We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy.

Conclusion

HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .

SUBMITTER: Nguyen NP

PROVIDER: S-EPMC5123343 | biostudies-literature | 2016 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

HIPPI: highly accurate protein family classification with ensembles of HMMs.

Nguyen Nam-Phuong NP Nute Michael M Mirarab Siavash S Warnow Tandy T

BMC genomics 20161111 Suppl 10

<h4>Background</h4>Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics.<h4>Results</h4>We present ...[more]

PMID: 28185571

Dataset Information

HIPPI: highly accurate protein family classification with ensembles of HMMs.

Background

Results

Conclusion

Publications

HIPPI: highly accurate protein family classification with ensembles of HMMs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

ClassyFlu: classification of influenza A viruses with Discriminatively trained profile-HMMs.
| S-EPMC3880301 | biostudies-literature

Highly accurate protein structure prediction with AlphaFold.
| S-EPMC8371605 | biostudies-literature

Predicting conserved protein motifs with Sub-HMMs.
| S-EPMC2879284 | biostudies-literature

More accurate recombination prediction in HIV-1 using a robust decoding algorithm for HMMs.
| S-EPMC3123234 | biostudies-literature

Ensemblator v3: Robust atom-level comparative analyses and classification of protein structure ensembles.
| S-EPMC5734391 | biostudies-literature

Trajectory-based training enables protein simulations with accurate folding and Boltzmann ensembles in cpu-hours.
| S-EPMC6307714 | biostudies-literature

taxMaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time.
| S-EPMC5932614 | biostudies-literature

Beyond classification: gene-family phylogenies from shotgun metagenomic reads enable accurate community analysis.
| S-EPMC3701559 | biostudies-literature

Finding RNA-Protein Interaction Sites Using HMMs.
| S-EPMC5568642 | biostudies-other

Highly accurate protein structure prediction for the human proteome.
| S-EPMC8387240 | biostudies-literature