Unknown

Dataset Information

0

Edge effects in calling variants from targeted amplicon sequencing.


ABSTRACT:

Background

Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls.

Results

We show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced.

Conclusions

Read bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region.

SUBMITTER: Satya RV 

PROVIDER: S-EPMC4302139 | biostudies-literature | 2014 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Edge effects in calling variants from targeted amplicon sequencing.

Satya Ravi Vijaya RV   DiCarlo John J  

BMC genomics 20141205


<h4>Background</h4>Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negativ  ...[more]

Similar Datasets

| S-EPMC4444292 | biostudies-literature
| S-EPMC10371994 | biostudies-literature
| S-EPMC10258123 | biostudies-literature
| S-EPMC3827295 | biostudies-literature
| S-EPMC8665737 | biostudies-literature
| S-EPMC5876661 | biostudies-literature
| S-EPMC10501585 | biostudies-literature
| S-EPMC3563481 | biostudies-literature
| S-EPMC9284968 | biostudies-literature
| S-EPMC5531809 | biostudies-literature