Unknown

Dataset Information

0

Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions.


ABSTRACT: Data subject to heavy-tailed errors are commonly encountered in various scientific fields. To address this problem, procedures based on quantile regression and Least Absolute Deviation (LAD) regression have been developed in recent years. These methods essentially estimate the conditional median (or quantile) function. They can be very different from the conditional mean functions, especially when distributions are asymmetric and heteroscedastic. How can we efficiently estimate the mean regression functions in ultra-high dimensional setting with existence of only the second moment? To solve this problem, we propose a penalized Huber loss with diverging parameter to reduce biases created by the traditional Huber loss. Such a penalized robust approximate quadratic (RA-quadratic) loss will be called RA-Lasso. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, our results reveal that the RA-lasso estimator produces a consistent estimator at the same rate as the optimal rate under the light-tail situation. We further study the computational convergence of RA-Lasso and show that the composite gradient descent algorithm indeed produces a solution that admits the same optimal rate after sufficient iterations. As a byproduct, we also establish the concentration inequality for estimating population mean when there exists only the second moment. We compare RA-Lasso with other regularized robust estimators based on quantile regression and LAD regression. Extensive simulation studies demonstrate the satisfactory finite-sample performance of RA-Lasso.

SUBMITTER: Fan J 

PROVIDER: S-EPMC5412601 | biostudies-literature | 2017 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions.

Fan Jianqing J   Li Quefeng Q   Wang Yuyan Y  

Journal of the Royal Statistical Society. Series B, Statistical methodology 20160414 1


Data subject to heavy-tailed errors are commonly encountered in various scientific fields. To address this problem, procedures based on quantile regression and Least Absolute Deviation (LAD) regression have been developed in recent years. These methods essentially estimate the conditional median (or quantile) function. They can be very different from the conditional mean functions, especially when distributions are asymmetric and heteroscedastic. How can we efficiently estimate the mean regressi  ...[more]

Similar Datasets

| S-EPMC3767535 | biostudies-literature
| S-EPMC4373540 | biostudies-literature
| S-EPMC4143773 | biostudies-literature
| S-EPMC6193274 | biostudies-literature
| S-EPMC3198579 | biostudies-literature
| S-EPMC10788450 | biostudies-literature
| S-EPMC4627720 | biostudies-literature
| S-EPMC4734767 | biostudies-literature
| S-EPMC7313320 | biostudies-literature
| S-EPMC3767561 | biostudies-literature