Unknown

Dataset Information

0

Quantile regression for challenging cases of eQTL mapping.


ABSTRACT: Mapping of expression quantitative trait loci (eQTLs) facilitates interpretation of the regulatory path from genetic variants to their associated disease or traits. High-throughput sequencing of RNA (RNA-seq) has expedited the exploration of these regulatory variants. However, eQTL mapping is usually confronted with the analysis challenges caused by overdispersion and excessive dropouts in RNA-seq. The heavy-tailed distribution of gene expression violates the assumption of Gaussian distributed errors in linear regression for eQTL detection, which results in increased Type I or Type II errors. Applying rank-based inverse normal transformation (INT) can make the expression values more normally distributed. However, INT causes information loss and leads to uninterpretable effect size estimation. After comprehensive examination of the impact from overdispersion and excessive dropouts, we propose to apply a robust model, quantile regression, to map eQTLs for genes with high degree of overdispersion or large number of dropouts. Simulation studies show that quantile regression has the desired robustness to outliers and dropouts, and it significantly improves eQTL mapping. From a real data analysis, the most significant eQTL discoveries differ between quantile regression and the conventional linear model. Such discrepancy becomes more prominent when the dropout effect or the overdispersion effect is large. All the results suggest that quantile regression provides more reliable and accurate eQTL mapping than conventional linear models. It deserves more attention for the large-scale eQTL mapping.

SUBMITTER: Sun B 

PROVIDER: S-EPMC7673343 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Quantile regression for challenging cases of eQTL mapping.

Sun Bo B   Chen Liang L  

Briefings in bioinformatics 20200901 5


Mapping of expression quantitative trait loci (eQTLs) facilitates interpretation of the regulatory path from genetic variants to their associated disease or traits. High-throughput sequencing of RNA (RNA-seq) has expedited the exploration of these regulatory variants. However, eQTL mapping is usually confronted with the analysis challenges caused by overdispersion and excessive dropouts in RNA-seq. The heavy-tailed distribution of gene expression violates the assumption of Gaussian distributed e  ...[more]

Similar Datasets

| S-EPMC5870877 | biostudies-literature
| S-EPMC8636089 | biostudies-literature
| S-EPMC5462897 | biostudies-literature
| S-EPMC3050018 | biostudies-literature
| S-EPMC8725653 | biostudies-literature
| S-EPMC6193274 | biostudies-literature
| S-EPMC3624800 | biostudies-literature
| S-EPMC3312995 | biostudies-literature
| S-EPMC5662245 | biostudies-literature
| S-EPMC4123128 | biostudies-literature