Unknown

Dataset Information

0

BiomvRhsmm: genomic segmentation with hidden semi-Markov model.


ABSTRACT: High-throughput technologies like tiling array and next-generation sequencing (NGS) generate continuous homogeneous segments or signal peaks in the genome that represent transcripts and transcript variants (transcript mapping and quantification), regions of deletion and amplification (copy number variation), or regions characterized by particular common features like chromatin state or DNA methylation ratio (epigenetic modifications). However, the volume and output of data produced by these technologies present challenges in analysis. Here, a hidden semi-Markov model (HSMM) is implemented and tailored to handle multiple genomic profile, to better facilitate genome annotation by assisting in the detection of transcripts, regulatory regions, and copy number variation by holistic microarray or NGS. With support for various data distributions, instead of limiting itself to one specific application, the proposed hidden semi-Markov model is designed to allow modeling options to accommodate different types of genomic data and to serve as a general segmentation engine. By incorporating genomic positions into the sojourn distribution of HSMM, with optional prior learning using annotation or previous studies, the modeling output is more biologically sensible. The proposed model has been compared with several other state-of-the-art segmentation models through simulation benchmarking, which shows that our efficient implementation achieves comparable or better sensitivity and specificity in genomic segmentation.

SUBMITTER: Du Y 

PROVIDER: S-EPMC4065698 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

biomvRhsmm: genomic segmentation with hidden semi-Markov model.

Du Yang Y   Murani Eduard E   Ponsuksili Siriluck S   Wimmers Klaus K  

BioMed research international 20140603


High-throughput technologies like tiling array and next-generation sequencing (NGS) generate continuous homogeneous segments or signal peaks in the genome that represent transcripts and transcript variants (transcript mapping and quantification), regions of deletion and amplification (copy number variation), or regions characterized by particular common features like chromatin state or DNA methylation ratio (epigenetic modifications). However, the volume and output of data produced by these tech  ...[more]

Similar Datasets

| S-EPMC7455056 | biostudies-literature
| S-EPMC2912474 | biostudies-literature
| S-EPMC8173987 | biostudies-literature
| S-EPMC4797261 | biostudies-other
2012-10-18 | GSE34490 | GEO
| S-EPMC6373422 | biostudies-literature
| S-EPMC2735038 | biostudies-literature
| S-EPMC3114652 | biostudies-literature
| S-EPMC5609058 | biostudies-literature
| S-EPMC1479840 | biostudies-literature