Dataset Information

A Massive Data Framework for M-Estimators with Cubic-Rate.

ABSTRACT: The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing the asymptotic distribution of the aggregated M-estimators using a weighted average with weights depending on the subgroup sample sizes. Under certain condition on the growing rate of the number of subgroups, the resulting aggregated estimators are shown to have faster convergence rate and asymptotic normal distribution, which are more tractable in both computation and inference than the original M-estimators based on pooled data. Our theory applies to a wide class of M-estimators with cube root convergence rate, including the location estimator, maximum score estimator and value search estimator. Empirical performance via simulations and a real data application also validate our theoretical findings.

SUBMITTER: Shi C

PROVIDER: S-EPMC6364750 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A Massive Data Framework for M-Estimators with Cubic-Rate.

Shi Chengchun C Lu Wenbin W Song Rui R

Journal of the American Statistical Association 20180619 524

The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing the asymptotic distribution of the aggregated M-estimators using a weighted average with weights depending on the subgroup sample sizes. Under certain condition on the growing rate of the number of subgroups, the resulting aggregated estimators are shown to hav ...[more]

PMID: 30739966

Similar Datasets

Project description:BackgroundThe analysis of molecular variation within and between populations is crucial to establish strategies for conservation as well as to detect the footprint of spatially heterogeneous selection. The traditional estimator of genetic differentiation (F(ST)) has been shown to be misleading if genetic diversity is high. Alternative estimators of F(ST) have been proposed, but their robustness to variation in mutation rate is not clearly established. We first investigated the effect of mutation and migration rate using computer simulations and examined their joint influence on Q(ST), a measure of genetic differentiation for quantitative traits. We further used experimental data in natural populations of Arabidopsis thaliana to characterize the effect of mutation rate on various estimates of population differentiation. Since natural species exhibit various degrees of self-fertilisation, we also investigated the effect of mating system on the different estimators.ResultsIf mutation rate is high and migration rate low, classical measures of genetic differentiation are misleading. Only Phi(ST), an estimator that takes the mutational distances between alleles into account, is independent of mutation rate, for all migration rates. However, the performance of Phi(ST) depends on the underlying mutation model and departures from this model cause its performance to degrade. We further show that Q(ST) has the same bias. We provide evidence that, in A. thaliana, microsatellite variation correlates with mutation rate. We thereby demonstrate that our results on estimators of genetic differentiation have important implications, even for species that are well established models in population genetics and molecular biology.ConclusionsWe find that alternative measures of differentiation like F'(ST) and D are not suitable for estimating effective migration rate and should not be used in studies of local adaptation. Genetic differentiation should instead be measured using an estimator that takes mutation rate into account, such as Phi(ST). Furthermore, in systems where migration between populations is low, such as A. thaliana, Q(ST) < F(ST) cannot be taken as evidence for homogenising selection as has been traditionally thought.

Dataset Information

A Massive Data Framework for M-Estimators with Cubic-Rate.

Publications

A Massive Data Framework for M-Estimators with Cubic-Rate.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets