Unknown

Dataset Information

0

Multiple Alignment of Promoter Sequences from the Arabidopsis thaliana L. Genome.


ABSTRACT: In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (x). We generated sets of artificial DNA sequences with x ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existing methods could produce statistically significant alignments only for the sets with x < 2.5, whereas MAHDS could operate on sequences with x = 4.4. We also used MAHDS to analyze a set of promoter sequences from the Arabidopsis thaliana genome and discovered many conserved regions upstream of the transcription initiation site (from -499 to +1 bp); a part of the downstream region (from +1 to +70 bp) also significantly contributed to the obtained alignments. The possibilities of applying the newly developed method for the identification of promoter sequences in any genome are discussed. A server for multiple alignment of nucleotide sequences has been created.

SUBMITTER: Korotkov EV 

PROVIDER: S-EPMC7909805 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Multiple Alignment of Promoter Sequences from the <i>Arabidopsis thaliana</i> L. Genome.

Korotkov Eugene V EV   Suvorova Yulia M YM   Kostenko Dmitrii O DO   Korotkova Maria A MA  

Genes 20210121 2


In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (<i>x</i>). We generated sets of artificial DNA sequences with <i>x</i> ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existin  ...[more]

Similar Datasets

| S-EPMC1206989 | biostudies-other
| S-EPMC555476 | biostudies-literature
| S-EPMC546147 | biostudies-literature
| S-EPMC8233505 | biostudies-literature
| S-EPMC6048449 | biostudies-literature
| S-EPMC4890748 | biostudies-literature
| S-EPMC4179140 | biostudies-literature
| S-EPMC1539025 | biostudies-literature
| S-EPMC6330006 | biostudies-literature
| S-EPMC4856438 | biostudies-literature