Unknown

Dataset Information

0

NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types.


ABSTRACT: In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. AVAILABILITY:https://github.com/parklab/NGSCheckMate.

SUBMITTER: Lee S 

PROVIDER: S-EPMC5499645 | biostudies-literature | 2017 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types.

Lee Sejoon S   Lee Soohyun S   Ouellette Scott S   Park Woong-Yang WY   Lee Eunjung A EA   Park Peter J PJ  

Nucleic acids research 20170601 11


In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software packa  ...[more]

Similar Datasets

| S-EPMC6918881 | biostudies-literature
| S-EPMC7031678 | biostudies-literature
| S-EPMC2896157 | biostudies-literature
2017-04-03 | PXD003804 | Pride
| S-EPMC9671411 | biostudies-literature
| S-EPMC4064128 | biostudies-literature
| S-EPMC6429328 | biostudies-literature
| S-EPMC3148210 | biostudies-literature
| S-EPMC3370281 | biostudies-literature
| S-EPMC5961346 | biostudies-literature