Unknown

Dataset Information

0

Detecting epigenetic motifs in low coverage and metagenomics settings.


ABSTRACT:

Background

It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes.

Methods

Here we provide a new method for detecting epigenetic motifs in bacteria on datasets with low-coverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer's neighborhood.

Conclusions

Our method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with "neighbor" modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.

Availability

https://github.com/alibashir/EMMCKmer.

SUBMITTER: Beckmann ND 

PROVIDER: S-EPMC4168715 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detecting epigenetic motifs in low coverage and metagenomics settings.

Beckmann Noam D ND   Karri Sashank S   Fang Gang G   Bashir Ali A  

BMC bioinformatics 20140910


<h4>Background</h4>It has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes.<h4>Methods</h4>Here we provide a new method  ...[more]

Similar Datasets

| S-EPMC8480091 | biostudies-literature
| PRJEB31654 | ENA
| S-EPMC10913427 | biostudies-literature
2018-03-01 | GSE108841 | GEO
| S-EPMC9277797 | biostudies-literature
| S-EPMC7355282 | biostudies-literature
| S-EPMC9743772 | biostudies-literature
| PRJEB78702 | ENA
| PRJEB22896 | ENA
2005-06-03 | GSE2347 | GEO