Unknown

Dataset Information

0

Comparison of genomic data via statistical distribution.


ABSTRACT: Sequence comparison has become an essential tool in bioinformatics, because highly homologous sequences usually imply significant functional or structural similarity. Traditional sequence analysis techniques are based on preprocessing and alignment, which facilitate measuring and quantitative characterization of genetic differences, variability and complexity. However, recent developments of next generation and whole genome sequencing technologies give rise to new challenges that are related to measuring similarity and capturing rearrangements of large segments contained in the genome. This work is devoted to illustrating different methods recently introduced for quantifying sequence distances and variability. Most of the alignment-free methods rely on counting words, which are small contiguous fragments of the genome. Our approach considers the locations of nucleotides in the sequences and relies more on appropriate statistical distributions. The results of this technique for comparing sequences, by extracting information and comparing matching fidelity and location regularization information, are very encouraging, specifically to classify mutation sequences.

SUBMITTER: Amiri S 

PROVIDER: S-EPMC5361063 | biostudies-literature | 2016 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Comparison of genomic data via statistical distribution.

Amiri Saeid S   Dinov Ivo D ID  

Journal of theoretical biology 20160725


Sequence comparison has become an essential tool in bioinformatics, because highly homologous sequences usually imply significant functional or structural similarity. Traditional sequence analysis techniques are based on preprocessing and alignment, which facilitate measuring and quantitative characterization of genetic differences, variability and complexity. However, recent developments of next generation and whole genome sequencing technologies give rise to new challenges that are related to  ...[more]

Similar Datasets

| S-EPMC3740633 | biostudies-literature
| S-EPMC8042666 | biostudies-literature
| S-EPMC5570337 | biostudies-literature
2011-08-10 | GSE26736 | GEO
| S-EPMC3230912 | biostudies-literature
2011-08-10 | GSE26732 | GEO
2011-08-10 | GSE26735 | GEO
| S-EPMC3592408 | biostudies-literature
| S-EPMC6833991 | biostudies-literature
| S-EPMC1891338 | biostudies-literature