Unknown

Dataset Information

0

HitKeeper, a generic software package for hit list management.


ABSTRACT: BACKGROUND:The automated annotation of biological sequences (protein, DNA) relies on the computation of hits (predicted features) on the sequences using various algorithms. Public databases of biological sequences provide a wealth of biological "knowledge", for example manually validated annotations (features) that are located on the sequences, but mining the sequence annotations and especially the predicted and curated features requires dedicated tools. Due to the heterogeneity and diversity of the biological information, it is difficult to handle redundancy, frequent updates, taxonomic information and "private" data together with computational algorithms in a common workflow. RESULTS:We present HitKeeper, a software package that controls the fully automatic handling of multiple biological databases and of hit list calculations on a large scale. The software implements an asynchronous update system that introduces updates and computes hits as soon as new data become available. A query interface enables the user to search sequences by specifying constraints, such as retrieving sequences that contain specific motifs, or a defined arrangement of motifs ("metamotifs"), or filtering based on the taxonomic classification of a sequence. CONCLUSION:The software provides a generic and modular framework to handle the redundancy and incremental updates of biological databases, and an original query language. It is published under the terms and conditions of version 2 of the GNU Public License and available at http://hitkeeper.sourceforge.net.

SUBMITTER: Hau J 

PROVIDER: S-EPMC1852800 | biostudies-literature | 2007 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

HitKeeper, a generic software package for hit list management.

Hau Jörg J   Muller Michael M   Pagni Marco M  

Source code for biology and medicine 20070328


<h4>Background</h4>The automated annotation of biological sequences (protein, DNA) relies on the computation of hits (predicted features) on the sequences using various algorithms. Public databases of biological sequences provide a wealth of biological "knowledge", for example manually validated annotations (features) that are located on the sequences, but mining the sequence annotations and especially the predicted and curated features requires dedicated tools. Due to the heterogeneity and dive  ...[more]

Similar Datasets

| S-EPMC8425072 | biostudies-literature
2020-12-05 | GSE162690 | GEO
| S-EPMC5029491 | biostudies-literature
| S-EPMC7059772 | biostudies-literature
| S-EPMC6391495 | biostudies-literature
| S-EPMC3840585 | biostudies-literature
| S-EPMC3005000 | biostudies-literature
| S-EPMC4996692 | biostudies-literature
| S-EPMC7113135 | biostudies-literature
| S-EPMC4608353 | biostudies-other