Unknown

Dataset Information

0

NGS-Integrator: An efficient tool for combining multiple NGS data tracks using minimum Bayes' factors.


ABSTRACT:

Background

Next-generation sequencing (NGS) is widely used for genome-wide identification and quantification of DNA elements involved in the regulation of gene transcription. Studies that generate multiple high-throughput NGS datasets require data integration methods for two general tasks: 1) generation of genome-wide data tracks representing an aggregate of multiple replicates of the same experiment; and 2) combination of tracks from different experimental types that provide complementary information regarding the location of genomic features such as enhancers.

Results

NGS-Integrator is a Java-based command line application, facilitating efficient integration of multiple genome-wide NGS datasets. NGS-Integrator first transforms all input data tracks using the complement of the minimum Bayes' factor so that all values are expressed in the range [0,1] representing the probability of a true signal given the background noise. Then, NGS-Integrator calculates the joint probability for every genomic position to create an integrated track. We provide examples using real NGS data generated in our laboratory and from the mouse ENCODE database.

Conclusions

Our results show that NGS-Integrator is both time- and memory-efficient. Our examples show that NGS-Integrator can integrate information to facilitate downstream analyses that identify functional regulatory domains along the genome.

SUBMITTER: Wen B 

PROVIDER: S-EPMC7678096 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

NGS-Integrator: An efficient tool for combining multiple NGS data tracks using minimum Bayes' factors.

Wen Bronte B   Jung Hyun Jun HJ   Chen Lihe L   Saeed Fahad F   Knepper Mark A MA  

BMC genomics 20201119 1


<h4>Background</h4>Next-generation sequencing (NGS) is widely used for genome-wide identification and quantification of DNA elements involved in the regulation of gene transcription. Studies that generate multiple high-throughput NGS datasets require data integration methods for two general tasks: 1) generation of genome-wide data tracks representing an aggregate of multiple replicates of the same experiment; and 2) combination of tracks from different experimental types that provide complementa  ...[more]

Similar Datasets

| S-EPMC5374454 | biostudies-literature
| S-EPMC10085951 | biostudies-literature
| S-EPMC2987358 | biostudies-literature
| S-EPMC6933617 | biostudies-literature
| S-EPMC4198698 | biostudies-literature
| S-EPMC3522561 | biostudies-literature
| S-EPMC6366007 | biostudies-literature
| S-EPMC3524941 | biostudies-literature
| S-EPMC6404320 | biostudies-literature
| S-EPMC7610016 | biostudies-literature