Unknown

Dataset Information

0

Benchmarking of 4C-seq pipelines based on real and simulated data.


ABSTRACT:

Motivation

With its capacity for high-resolution data output in one region of interest, chromosome conformation capture combined with high-throughput sequencing (4C-seq) is a state-of-the-art next-generation sequencing technique that provides epigenetic insights, and regularly advances current medical research. However, 4C-seq data are complex and prone to biases, and while specialized programs exist, an unbiased, extensive benchmarking is still lacking. Furthermore, neither substantial datasets with fully characterized ground truth, nor simulation programs for realistic 4C-seq data have been published.

Results

We conducted a benchmarking study on 66 4C-seq samples from 20 datasets, and developed a novel 4C-seq simulation software, Basic4CSim, to allow for detailed comparisons of 4C-seq algorithms on 50 simulated datasets with 10-120 samples each. Simulations and benchmarking were adapted to address different characteristics of 4C-seq data. Simulated data were compared with published samples to validate simulation settings. We identified differences between 4C-seq algorithms in terms of precision, recall, interaction structure, and run time, and observed general trends. Novel differential pipeline versions of single-sample based 4C-seq algorithms were included in the benchmarking. While no single tool was optimally suited for both near-cis and far-cis, and both single-sample and differential analyses, choosing a high-performing algorithm variant did improve results considerably. For near-cis scenarios, r3Cseq, peakC and FourCSeq offered high precision, while fourSig demonstrated high overall F1 scores in far-cis analyses. Finally, 4C-seq simulations may aid in the development of improved analysis algorithms.

Availability and implementation

Basic4CSim is available at https://github.com/walter-ca/Basic4CSim.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Walter C 

PROVIDER: S-EPMC6901067 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Benchmarking of 4C-seq pipelines based on real and simulated data.

Walter Carolin C   Schuetzmann Daniel D   Rosenbauer Frank F   Dugas Martin M  

Bioinformatics (Oxford, England) 20191201 23


<h4>Motivation</h4>With its capacity for high-resolution data output in one region of interest, chromosome conformation capture combined with high-throughput sequencing (4C-seq) is a state-of-the-art next-generation sequencing technique that provides epigenetic insights, and regularly advances current medical research. However, 4C-seq data are complex and prone to biases, and while specialized programs exist, an unbiased, extensive benchmarking is still lacking. Furthermore, neither substantial  ...[more]

Similar Datasets

2019-05-29 | GSE123131 | GEO
| PRJNA507614 | ENA
| S-EPMC8251607 | biostudies-literature
| S-EPMC7648640 | biostudies-literature
| S-EPMC3679908 | biostudies-literature
| S-EPMC8569188 | biostudies-literature
| S-EPMC5079477 | biostudies-literature
| S-EPMC4985025 | biostudies-literature
| S-EPMC4005674 | biostudies-literature
| S-EPMC5792058 | biostudies-literature