Unknown

Dataset Information

0

Aligning multiple genomic sequences with the threaded blockset aligner.


ABSTRACT: We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.

SUBMITTER: Blanchette M 

PROVIDER: S-EPMC383317 | biostudies-literature | 2004 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications


We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The ou  ...[more]

Similar Datasets

| S-EPMC310868 | biostudies-literature
| S-EPMC10457662 | biostudies-literature
| S-EPMC2377433 | biostudies-literature
| S-EPMC5374649 | biostudies-literature
| S-EPMC8248648 | biostudies-literature
| S-EPMC4384290 | biostudies-literature
| S-EPMC546147 | biostudies-literature
| S-EPMC6488671 | biostudies-other
| S-EPMC4120144 | biostudies-literature
| S-EPMC6334396 | biostudies-literature