Unknown

Dataset Information

0

Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq.


ABSTRACT: Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost 'capture' method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.

SUBMITTER: Massarat AR 

PROVIDER: S-EPMC8373110 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC8289380 | biostudies-literature
| S-EPMC8699717 | biostudies-literature
| S-EPMC4546161 | biostudies-literature
| S-EPMC7934446 | biostudies-literature
| S-EPMC7523641 | biostudies-literature
| S-EPMC10776385 | biostudies-literature
| S-EPMC10070275 | biostudies-literature
| S-EPMC7333383 | biostudies-literature
| S-EPMC7910485 | biostudies-literature
| S-EPMC10043503 | biostudies-literature