Unknown

Dataset Information

0

A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.


ABSTRACT: We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracular result holds when the number of sub-populations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. We also test the heterogeneity among a large number of sub-populations. All the above results require to regularize each sub-estimation as though it had the entire sample size. Our general theory applies to the divide-and-conquer approach that is often used to deal with massive homogeneous data. A technical by-product of this paper is the statistical inferences for the general kernel ridge regression. Thorough numerical results are also provided to back up our theory.

SUBMITTER: Zhao T 

PROVIDER: S-EPMC5394596 | biostudies-literature | 2016 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.

Zhao Tianqi T   Cheng Guang G   Liu Han H  

Annals of statistics 20160707 4


We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracular result holds when the number of sub-populations does not grow too fast  ...[more]

Similar Datasets

| S-EPMC6364750 | biostudies-literature
| S-EPMC5525063 | biostudies-literature
| S-EPMC2681270 | biostudies-literature
| S-EPMC8648855 | biostudies-literature
| S-EPMC3222957 | biostudies-literature
| S-EPMC4126456 | biostudies-literature
| S-EPMC3715115 | biostudies-literature
| S-EPMC4551847 | biostudies-literature
| S-EPMC2929143 | biostudies-literature
2015-07-10 | E-GEOD-67237 | biostudies-arrayexpress