Unknown

Dataset Information

0

Annotating large genomes with exact word matches.


ABSTRACT: We have developed a tool for rapidly determining the number of exact matches of any word within large, internally repetitive genomes or sets of genomes. Thus we can readily annotate any sequence, including the entire human genome, with the counts of its constituent words. We create a Burrows-Wheeler transform of the genome, which together with auxiliary data structures facilitating counting, can reside in about one gigabyte of RAM. Our original interest was motivated by oligonucleotide probe design, and we describe a general protocol for defining unique hybridization probes. But our method also has applications for the analysis of genome structure and assembly. We demonstrate the identification of chromosome-specific repeats, and outline a general procedure for finding undiscovered repeats. We also illustrate the changing contents of the human genome assemblies by comparing the annotations built from different genome freezes.

SUBMITTER: Healy J 

PROVIDER: S-EPMC403711 | biostudies-literature | 2003 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Annotating large genomes with exact word matches.

Healy John J   Thomas Elizabeth E EE   Schwartz Jacob T JT   Wigler Michael M  

Genome research 20030915 10


We have developed a tool for rapidly determining the number of exact matches of any word within large, internally repetitive genomes or sets of genomes. Thus we can readily annotate any sequence, including the entire human genome, with the counts of its constituent words. We create a Burrows-Wheeler transform of the genome, which together with auxiliary data structures facilitating counting, can reside in about one gigabyte of RAM. Our original interest was motivated by oligonucleotide probe des  ...[more]

Similar Datasets

| S-EPMC1764478 | biostudies-literature
| S-EPMC8902461 | biostudies-literature
| S-EPMC2732316 | biostudies-literature
| S-EPMC6528274 | biostudies-literature
| S-EPMC8892979 | biostudies-literature
| S-EPMC2722993 | biostudies-literature
| S-EPMC3670165 | biostudies-literature
| S-EPMC534664 | biostudies-literature
| S-EPMC5409309 | biostudies-literature
| S-EPMC2646279 | biostudies-literature