Unknown

Dataset Information

0

Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning.


ABSTRACT: Elucidating functionality in non-coding regions is a key challenge in human genomics. It has been shown that intolerance to variation of coding and proximal non-coding sequence is a strong predictor of human disease relevance. Here, we integrate intolerance to variation, functional genomic annotations and primary genomic sequence to build JARVIS: a comprehensive deep learning model to prioritize non-coding regions, outperforming other human lineage-specific scores. Despite being agnostic to evolutionary conservation, JARVIS performs comparably or outperforms conservation-based scores in classifying pathogenic single-nucleotide and structural variants. In constructing JARVIS, we introduce the genome-wide residual variation intolerance score (gwRVIS), applying a sliding-window approach to whole genome sequencing data from 62,784 individuals. gwRVIS distinguishes Mendelian disease genes from more tolerant CCDS regions and highlights ultra-conserved non-coding elements as the most intolerant regions in the human genome. Both JARVIS and gwRVIS capture previously inaccessible human-lineage constraint information and will enhance our understanding of the non-coding genome.

SUBMITTER: Vitsios D 

PROVIDER: S-EPMC7940646 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC9890318 | biostudies-literature
| S-EPMC7682815 | biostudies-literature
| S-EPMC5860117 | biostudies-other
| S-EPMC7535126 | biostudies-literature
| S-EPMC8171027 | biostudies-literature
| S-EPMC5112596 | biostudies-literature
| S-EPMC7098089 | biostudies-literature
| S-EPMC3526296 | biostudies-literature
2023-07-10 | GSE221870 | GEO
| S-EPMC8749460 | biostudies-literature