Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

0

EQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors


ABSTRACT: Motivation: Identification of eQTL, the genetic loci that contribute to heritable variation in gene expression, can be obstructed by factors that produce variation in expression profiles if these factors are unmeasured or hidden from direct analysis. Methods: We have developed a method for Hidden Expression Factor analysis (HEFT) that identifies individual and pleiotropic effects of eQTL in the presence of hidden factors. The HEFT model simultaneously accounts for the effects of genotypes while learning hidden factors, where we make use of the complete likelihood of a unified multivariate regression and factor analysis model to derive a ridge estimator for combined factor learning and detection of eQTL. HEFT requires no pre-estimation of hidden factor effects, no iterative model selection, it provides p-values, and is extremely fast, requiring just a few hours to complete an eQTL analysis of thousands of expression variables when analyzing hundreds of thousands of SNPs on a standard 8 core 2.6G desktop. Results: By analyzing simulated data, we demonstrate that HEFT can correct for an unknown number of hidden factors and outperforms related hidden factor methods for eQTL analysis, where the improved performance is particularly evident in the detection of eQTL with multivariate effects. To demonstrate a real-world application, we applied HEFT to identify eQTL affecting gene expression in human lung tissue for a study that included presumptive hidden factors. The analysis identified a number of eQTL with direct relevance to lung disease that could not be found without a hidden factor analysis, including cis-eQTL for GTF2H1 and MTRR, genes that have been independently associated with lung cancer. We have developed HEFT, a fast multivariate method that detect eQTLs by analyzing thousands of traits simultaneously in the presence of hidden factors. HEFT employs a combined regression factor analysis approach to analyze the gene expression and genotype data sets, looking for univariate or multivariate eQTLs that regulate gene expression, while simultaneously controlling for both orthogonal or non-orthogonal hidden factors. We show by extensive simulation that HEFT outperforms competing methods, and by applying HEFT to a study that included presumptive hidden factors, we identified a number of eQTL with direct relevance to lung disease that could not be found without a hidden factor analysis. HEFT analysis results file (includes all results, not just top hits) linked below as supplementary file.

ORGANISM(S): Homo sapiens

SUBMITTER: Yael Strulovici-Barel 

PROVIDER: E-GEOD-40364 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

Similar Datasets

2014-01-07 | GSE40364 | GEO
| PRJNA173725 | ENA
2012-07-02 | E-GEOD-39036 | biostudies-arrayexpress
2015-01-02 | E-GEOD-53790 | biostudies-arrayexpress
2011-06-22 | GSE29964 | GEO
2011-11-29 | E-GEOD-30777 | biostudies-arrayexpress
| PRJDB3536 | ENA
2020-07-21 | PXD019169 | Pride
2013-03-02 | GSE42408 | GEO
2009-01-31 | E-GEOD-14656 | biostudies-arrayexpress