Dataset Information

Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.

ABSTRACT: Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.

SUBMITTER: Shen F

PROVIDER: S-EPMC7073954 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.

Shen Feichen F Kidd Jeffrey M JM

Genes 20200129 2

Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective applic ...[more]

PMID: 32013076

Similar Datasets

Project description:Fast, definitive diagnosis of Creutzfeldt-Jakob disease (CJD) is important in assessing patient care options and transmission risks. Real-time quaking-induced conversion (RT-QuIC) assays of cerebrospinal fluid (CSF) and nasal-brushing specimens are valuable in distinguishing CJD from non-CJD conditions but have required 2.5 to 5 days. Here, an improved RT-QuIC assay is described which identified positive CSF samples within 4 to 14 h with better analytical sensitivity. Moreover, analysis of 11 CJD patients demonstrated that while 7 were RT-QuIC positive using the previous conditions, 10 were positive using the new assay. In these and further analyses, a total of 46 of 48 CSF samples from sporadic CJD patients were positive, while all 39 non-CJD patients were negative, giving 95.8% diagnostic sensitivity and 100% specificity. This second-generation RT-QuIC assay markedly improved the speed and sensitivity of detecting prion seeds in CSF specimens from CJD patients. This should enhance prospects for rapid and accurate ante mortem CJD diagnosis. A long-standing problem in dealing with various neurodegenerative protein misfolding diseases is early and accurate diagnosis. This issue is particularly important with human prion diseases, such as CJD, because prions are deadly, transmissible, and unusually resistant to decontamination. The recently developed RT-QuIC test allows for highly sensitive and specific detection of CJD in human cerebrospinal fluid and is being broadly implemented as a key diagnostic tool. However, as currently applied, RT-QuIC takes 2.5 to 5 days and misses 11 to 23% of CJD cases. Now, we have markedly improved RT-QuIC analysis of human CSF such that CJD and non-CJD patients can be discriminated in a matter of hours rather than days with enhanced sensitivity. These improvements should allow for much faster, more accurate, and practical testing for CJD. In broader terms, our study provides a prototype for tests for misfolded protein aggregates that cause many important amyloid diseases, such as Alzheimer's, Parkinson's, and tauopathies.

Dataset Information

Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.

Publications

Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets