Dataset Information

VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

ABSTRACT:

Motivation

Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches.

Results

A principal reason for false-positive VecScreen matches is that the sequence and the matching vector subsequence originate from closely related or identical organisms (for example, both originate in Escherichia coli). We collected information on the taxonomy of sources of vector segments in the UniVec database used by VecScreen. We used that information in two overlapping software pipelines for retrospective analysis of contamination in GenBank and for prospective analysis of contamination in new sequence submissions. Using the retrospective pipeline, we identified and corrected over 8000 contaminated sequences in the nonredundant nucleotide database. The prospective analysis pipeline has been in production use since April 2017 to evaluate some new GenBank submissions.

Availability and implementation

Data on the sources of UniVec entries were included in release 10.0 (ftp://ftp.ncbi.nih.gov/pub/UniVec/). The main software is freely available at https://github.com/aaschaffer/vecscreen_plus_taxonomy.

Contact

aschaffe@helix.nih.gov.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Schaffer AA

PROVIDER: S-EPMC6030928 | biostudies-literature | 2018 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Schäffer Alejandro A AA Nawrocki Eric P EP Choi Yoon Y Kitts Paul A PA Karsch-Mizrachi Ilene I McVeigh Richard R

Bioinformatics (Oxford, England) 20180301 5

<h4>Motivation</h4>Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches.<h4>Results</h4>A principal reason for false-positive VecScreen matches is that t ...[more]

PMID: 29069347

Dataset Information

VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Motivation

Results

Availability and implementation

Contact

Supplementary information

Publications

VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

The effect of imposing a higher, uniform tobacco tax in Vietnam.
| S-EPMC1557504 | biostudies-literature

Tobacco price and use following California Proposition 56 tobacco tax increase.
| S-EPMC8513910 | biostudies-literature

The impact of a 25-cent-per-drink alcohol tax increase.
| S-EPMC3794433 | biostudies-literature

Legumes can increase cadmium contamination in neighboring crops.
| S-EPMC3419222 | biostudies-literature

Using search query surveillance to monitor tax avoidance and smoking cessation following the United States' 2009 "SCHIP" cigarette tax increase.
| S-EPMC3059206 | biostudies-literature

Impact of cigarette tax increase on health and financing outcomes in four Indian states.
| S-EPMC7548764 | biostudies-literature

High-Throughput Screening Identifies Kinase Inhibitors That Increase Dual Adeno-Associated Viral Vector Transduction In Vitro and in Mouse Retina.
| S-EPMC6098407 | biostudies-literature

State-Level Tax Policy, Cancer Screening, and Mortality Rates in the US.
| S-EPMC12048849 | biostudies-literature

Changes in retail cigarette price after tax increase: Findings from the 2018-2020 ITC Vietnam surveys.
| S-EPMC11650784 | biostudies-literature

Longer Contact Times Increase Cross-Contamination of Enterobacter aerogenes from Surfaces to Food.
| S-EPMC5066366 | biostudies-literature