Dataset Information

Consensus higher order repeats and frequency of string distributions in human genome.

ABSTRACT: Key string algorithm (KSA) could be viewed as robust computational generalization of restriction enzyme method. KSA enables robust and effective identification and structural analyzes of any given genomic sequences, like in the case of NCBI assembly for human genome. We have developed a method, using total frequency distribution of all r-bp key strings in dependence on the fragment length l, to determine the exact size of all repeats within the given genomic sequence, both of monomeric and HOR type. Subsequently, for particular fragment lengths equal to each of these repeat sizes we compute the partial frequency distribution of r-bp key strings; the key string with highest frequency is a dominant key string, optimal for segmentation of a given genomic sequence into repeat units. We illustrate how a wide class of 3-bp key strings leads to a key-string-dependent periodic cell which enables a simple identification and consensus length determinations of HORs, or any other highly convergent repeat of monomeric or HOR type, both tandem or dispersed. We illustrated KSA application for HORs in human genome and determined consensus HORs in the Build 35.1 assembly. In the next step we compute suprachromosomal family classification and CENP-B box / pJalpha distributions for HORs. In the case of less convergent repeats, like for example monomeric alpha satellite (20-40% divergence), we searched for optimal compact key string using frequency method and developed a concept of composite key string (GAAAC--CTTTG) or flexible relaxation (28 bp key string) which provides both monomeric alpha satellites as well as alpha monomer segmentation of internal HOR structure. This method is convenient also for study of R-strand (direct) / S-strand (reverse complement) alpha monomer alternations. Using KSA we identified 16 alternating regions of R-strand and S-strand monomers in one contig in choromosome 7. Use of CENP-B box and/or pJalpha motif as key string is suitable both for identification of HORs and monomeric pattern as well as for studies of CENP-B box / pJalpha distribution. As an example of application of KSA to sequences outside of HOR regions we present our finding of a tandem with highly convergent 3434-bp Long monomer in chromosome 5 (divergence less then 0.3%).

SUBMITTER: Paar V

PROVIDER: S-EPMC2435359 | biostudies-literature | 2007 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Consensus higher order repeats and frequency of string distributions in human genome.

Paar Vladimir V Basar Ivan I Rosandić Marija M Gluncić Matko M

Current genomics 20070401 2

Key string algorithm (KSA) could be viewed as robust computational generalization of restriction enzyme method. KSA enables robust and effective identification and structural analyzes of any given genomic sequences, like in the case of NCBI assembly for human genome. We have developed a method, using total frequency distribution of all r-bp key strings in dependence on the fragment length l, to determine the exact size of all repeats within the given genomic sequence, both of monomeric and HOR t ...[more]

PMID: 18660848

Dataset Information

Consensus higher order repeats and frequency of string distributions in human genome.

Publications

Consensus higher order repeats and frequency of string distributions in human genome.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly.
| S-EPMC6447721 | biostudies-literature

Multisensory perceptual interactions between higher-order temporal frequency signals.
| S-EPMC6472995 | biostudies-literature

Higher order organization of human placental aromatase.
| S-EPMC3217041 | biostudies-literature

Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture.
| S-EPMC4978254 | biostudies-literature

DNA methylation maintains integrity of higher order genome architecture
2021-09-21 | GSE158011 | GEO

Gene frequency distributions reject a neutral model of genome evolution.
| S-EPMC3595032 | biostudies-literature

DNA methylation maintains integrity of higher order genome architecture (Nanopore)
2021-09-21 | GSE159663 | GEO

DNA methylation maintains integrity of higher order genome architecture (WGBS)
2021-09-21 | GSE158010 | GEO

Disruption of Higher Order DNA Structures in Friedreich's Ataxia (GAA)n Repeats by PNA or LNA Targeting.
| S-EPMC5112992 | biostudies-literature

Examination of the effect of the annealing cation on higher order structures containing guanine or isoguanine repeats.
| S-EPMC2888532 | biostudies-literature