Unknown

Dataset Information

0

Detecting sample swaps in diverse NGS data types using linkage disequilibrium.


ABSTRACT: As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar methods and is effective even when data are sparse or from different assays. Application of Crosscheck to 8851 ENCODE ChIP-, RNA-, and DNase-seq datasets enabled us to identify and correct dozens of mislabeled samples and ambiguous metadata annotations, representing ~1% of ENCODE datasets.

SUBMITTER: Javed N 

PROVIDER: S-EPMC7391710 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Detecting sample swaps in diverse NGS data types using linkage disequilibrium.

Javed Nauman N   Farjoun Yossi Y   Fennell Tim J TJ   Epstein Charles B CB   Bernstein Bradley E BE   Shoresh Noam N  

Nature communications 20200729 1


As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar methods and is effective even when data are sparse or from different assays. Application of Crosscheck to 8851 ENCODE ChIP-, RNA-, and DNase-seq datasets enabled us to identify and correct dozens of mis  ...[more]

Similar Datasets

| S-EPMC7038669 | biostudies-literature
| S-EPMC5322361 | biostudies-literature
| S-EPMC7059597 | biostudies-literature
| S-EPMC6472186 | biostudies-literature
| S-EPMC6505142 | biostudies-other
| S-EPMC5972415 | biostudies-other