Unknown

Dataset Information

0

Consistent RNA sequencing contamination in GTEx and other data sets.


ABSTRACT: A challenge of next generation sequencing is read contamination. We use Genotype-Tissue Expression (GTEx) datasets and technical metadata along with RNA-seq datasets from other studies to understand factors that contribute to contamination. Here we report, of 48 analyzed tissues in GTEx, 26 have variant co-expression clusters of four highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and/or CELA3A). Fourteen additional highly expressed genes from other tissues also indicate contamination. Sample contamination is strongly associated with a sample being sequenced on the same day as a tissue that natively expresses those genes. Discrepant SNPs across four contaminating genes validate the contamination. Low-level contamination affects ~40% of samples and leads to numerous eQTL assignments in inappropriate tissues among these 18 genes. This type of contamination occurs widely, impacting bulk and single cell (scRNA-seq) data set analysis. In conclusion, highly expressed, tissue-enriched genes basally contaminate GTEx and other datasets impacting analyses.

SUBMITTER: Nieuwenhuis TO 

PROVIDER: S-EPMC7176728 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Consistent RNA sequencing contamination in GTEx and other data sets.

Nieuwenhuis Tim O TO   Yang Stephanie Y SY   Verma Rohan X RX   Pillalamarri Vamsee V   Arking Dan E DE   Rosenberg Avi Z AZ   McCall Matthew N MN   Halushka Marc K MK  

Nature communications 20200422 1


A challenge of next generation sequencing is read contamination. We use Genotype-Tissue Expression (GTEx) datasets and technical metadata along with RNA-seq datasets from other studies to understand factors that contribute to contamination. Here we report, of 48 analyzed tissues in GTEx, 26 have variant co-expression clusters of four highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and/or CELA3A). Fourteen additional highly expressed genes from other tissues also indicate contam  ...[more]

Similar Datasets

| S-EPMC7763177 | biostudies-literature
| S-EPMC7677776 | biostudies-literature
| S-EPMC6940275 | biostudies-literature
| S-EPMC3458526 | biostudies-other
| S-EPMC6986043 | biostudies-literature
2017-07-12 | GSE86354 | GEO
| S-EPMC5656403 | biostudies-literature
2013-04-09 | GSE45878 | GEO
| S-EPMC3018392 | biostudies-literature
| S-EPMC6020721 | biostudies-literature