Unknown

Dataset Information

0

New targets acquired: Improving locus recovery from the Angiosperms353 probe set.


ABSTRACT: Universal target enrichment kits maximize utility across wide evolutionary breadth while minimizing the number of baits required to create a cost-efficient kit. The Angiosperms353 kit has been successfully used to capture loci throughout the angiosperms, but the default target reference file includes sequence information from only 6-18 taxa per locus. Consequently, reads sequenced from on-target DNA molecules may fail to map to references, resulting in fewer on-target reads for assembly, and reducing locus recovery. We expanded the Angiosperms353 target file, incorporating sequences from 566 transcriptomes to produce a 'mega353' target file, with each locus represented by 17-373 taxa. This mega353 file is a drop-in replacement for the original Angiosperms353 file in HybPiper analyses. We provide tools to subsample the file based on user-selected taxon groups, and to incorporate other transcriptome or protein-coding gene data sets. Compared to the default Angiosperms353 file, the mega353 file increased the percentage of on-target reads by an average of 32%, increased locus recovery at 75% length by 49%, and increased the total length of the concatenated loci by 29%. Increasing the phylogenetic density of the target reference file results in improved recovery of target capture loci. The mega353 file and associated scripts are available at: https://github.com/chrisjackson-pellicle/NewTargets.

SUBMITTER: McLay TGB 

PROVIDER: S-EPMC8312740 | biostudies-literature | 2021 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications


<h4>Premise</h4>Universal target enrichment kits maximize utility across wide evolutionary breadth while minimizing the number of baits required to create a cost-efficient kit. The Angiosperms353 kit has been successfully used to capture loci throughout the angiosperms, but the default target reference file includes sequence information from only 6-18 taxa per locus. Consequently, reads sequenced from on-target DNA molecules may fail to map to references, resulting in fewer on-target reads for a  ...[more]

Similar Datasets

| S-EPMC8361741 | biostudies-literature
| S-EPMC8362113 | biostudies-literature
| S-EPMC3224148 | biostudies-literature
| S-EPMC11621025 | biostudies-literature
| S-EPMC3944414 | biostudies-literature
| S-EPMC9402582 | biostudies-literature
| S-EPMC4693086 | biostudies-literature
| S-EPMC8362060 | biostudies-literature
| S-EPMC8312745 | biostudies-literature