Unknown

Dataset Information

0

GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data.


ABSTRACT:

Background

Next-generation sequencing is revolutionising diagnosis and treatment of rare diseases, however its application to understanding common disease aetiology is limited. Rare disease applications binarily attribute genetic change(s) at a single locus to a specific phenotype. In common diseases, where multiple genetic variants within and across genes contribute to disease, binary modelling cannot capture the burden of pathogenicity harboured by an individual across a given gene/pathway. We present GenePy, a novel gene-level scoring system for integration and analysis of next-generation sequencing data on a per-individual basis that transforms NGS data interpretation from variant-level to gene-level. This simple and flexible scoring system is intuitive and amenable to integration for machine learning, network and topological approaches, facilitating the investigation of complex phenotypes.

Results

Whole-exome sequencing data from 508 individuals were used to generate GenePy scores. For each variant a score is calculated incorporating: i) population allele frequency estimates; ii) individual zygosity, determined through standard variant calling pipelines and; iii) any user defined deleteriousness metric to inform on functional impact. GenePy then combines scores generated for all variants observed into a single gene score for each individual. We generated a matrix of ~?14,000 GenePy scores for all individuals for each of sixteen popular deleteriousness metrics. All per-gene scores are corrected for gene length. The majority of genes generate GenePy scores -?4) compared to the most commonly applied association tool that combines common and rare variation (p?=?0.003).

Conclusions

Per-gene per-individual GenePy scores are intuitive when assessing genetic variation in individual patients or comparing scores between groups. GenePy outperforms the currently accepted best practice tools for combining common and rare variation. GenePy scores are suitable for downstream data integration with transcriptomic and proteomic data that also report at the gene level.

SUBMITTER: Mossotto E 

PROVIDER: S-EPMC6524327 | biostudies-literature | 2019 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data.

Mossotto E E   Ashton J J JJ   O'Gorman L L   Pengelly R J RJ   Beattie R M RM   MacArthur B D BD   Ennis S S  

BMC bioinformatics 20190516 1


<h4>Background</h4>Next-generation sequencing is revolutionising diagnosis and treatment of rare diseases, however its application to understanding common disease aetiology is limited. Rare disease applications binarily attribute genetic change(s) at a single locus to a specific phenotype. In common diseases, where multiple genetic variants within and across genes contribute to disease, binary modelling cannot capture the burden of pathogenicity harboured by an individual across a given gene/pat  ...[more]

Similar Datasets

| S-EPMC3813857 | biostudies-other
| S-EPMC7282053 | biostudies-literature
| S-EPMC8496226 | biostudies-literature
| S-EPMC5521745 | biostudies-other
| S-EPMC4757956 | biostudies-literature
| S-EPMC3712213 | biostudies-literature
| S-EPMC3167057 | biostudies-literature
| S-EPMC7954749 | biostudies-literature
| S-EPMC4673978 | biostudies-literature
| S-EPMC6258556 | biostudies-literature