Unknown

Dataset Information

0

Numerical classification of coding sequences.


ABSTRACT: DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

SUBMITTER: Collins DW 

PROVIDER: S-EPMC312190 | biostudies-other | 1992 Mar

REPOSITORIES: biostudies-other

altmetric image

Publications

Numerical classification of coding sequences.

Collins D W DW   Liu C C CC   Jukes T H TH  

Nucleic acids research 19920301 6


DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for e  ...[more]

Similar Datasets

| S-EPMC8517333 | biostudies-literature
| S-EPMC110283 | biostudies-literature
| S-EPMC8294561 | biostudies-literature
| S-EPMC6553707 | biostudies-literature
| S-EPMC8691036 | biostudies-literature
| S-EPMC2887045 | biostudies-literature
| S-EPMC10454140 | biostudies-literature
| S-EPMC310970 | biostudies-literature
| S-EPMC6859337 | biostudies-literature
| S-EPMC10025431 | biostudies-literature