Project description:BackgroundEffective coverage research is increasing rapidly in global health and development, as researchers use a range of measures and combine data sources to adjust coverage for the quality of services received. However, most estimates of effective coverage that combine data sources are reported only as point estimates, which may be due to the challenge of calculating the variance for a composite measure. In this paper, we evaluate three methods to quantify the uncertainty in the estimation of effective coverage.MethodsWe conducted a simulation study to evaluate the performance of the exact, delta, and parametric bootstrap methods for constructing confidence intervals around point estimates that are calculated from combined data on coverage and quality. We assessed performance by computing the number of nominally 95% confidence intervals that contain the truth for a range of coverage and quality values and data source sample sizes. To illustrate these approaches, we applied the delta and exact methods to estimates of adjusted coverage of antenatal care (ANC) in Senegal. We used household survey data for coverage and health facility assessments for readiness to provide services.ResultsWith small sample sizes, when the true effective coverage value was close to the boundaries 0 or 1, the exact and parametric bootstrap methods resulted in substantial over or undercoverage and, for the exact method, a high proportion of invalid confidence intervals, while the delta method yielded modest overcoverage. The proportion of confidence intervals containing the truth in all three methods approached the intended 95% with larger sample sizes and as the true effective coverage value moved away from the 0 or 1 boundary. Confidence intervals for adjusted ANC in Senegal were largely overlapping across the delta and exact methods, although at the sub-national level, the exact method produced invalid confidence intervals for estimates near 0 or 1. We provide the code to implement these methods.ConclusionsThe uncertainty around an effective coverage estimate can be characterized; this should become standard practice if effective coverage estimates are to become part of national and global health monitoring. The delta method approach outperformed the other methods in this study; we recommend its use for appropriate inference from effective coverage estimates that combine data sources, particularly when either sample size is small. When used for estimates created from facility type or regional strata, these methods require assumptions of independence that must be considered in each example.
Project description:The efficiency of feed utilization plays an important role in animal breeding. However, measuring feed intake (FI) is costly on an individual basis under practical conditions. Using group measurements to model FI could be practically feasible and cost-effective. The objectives of this study were to develop a random regression model based on repeated group measurements with consideration of missing phenotypes caused by drop out animals. Focus is on variance components (VC) estimation and genetic evaluation, and to investigate the effect of group composition on VC estimation and genetic evaluation using simulated datasets. Data were simulated based on individual FI in a pig population. Each individual had measurement on FI at 6 different time points, reflecting 6 different weeks during the test period. The simulated phenotypes consisted of additive genetic, permanent environment, and random residual effects. Additive genetic and permanent environmental effects were both simulated and modeled by first order Legendre polynomials. Three grouping scenarios based on genetic relationships among the group members were investigated: (1) medium within and across pen genetic relationship; (2) high within group relationship; (3) low within group relationship. To investigate the effect of the drop out animals during test period, a proportion (15%) of animals with individual phenotypes was set as the drop out animals, and two drop out scenarios within each grouping scenario were assessed: (1) animals were randomly dropped out; (2) animals with lower phenotypes were dropped out based on the ranking at each time point. The results show that using group measurements yielded similar VCs estimates but with larger SDs compared with the corresponding scenario of using individual measurements. Compared to scenarios without drop out, similar VC estimates were observed when animals were dropped out randomly, whereas reduced VC estimates were observed when animals were dropped out by the ranking of phenotypes. Different grouping scenarios produced similar VC estimates. Compared to scenarios without drop out, there were no loss of accuracies of genetic evaluation for drop out scenarios. However, dropping out animals by the ranking of phenotypes produced larger bias of estimated breeding values compared to the scenario without dropped out animals and scenario of dropping out animals by random. In conclusion, with an optimized group structure, the developed model can properly handle group measurements with drop out animals, and can achieve comparable accuracy of genetic evaluation for traits measured at the group level.
Project description:Error variance estimation plays an important role in statistical inference for high dimensional regression models. This paper concerns with error variance estimation in high dimensional sparse additive model. We study the asymptotic behavior of the traditional mean squared errors, the naive estimate of error variance, and show that it may significantly underestimate the error variance due to spurious correlations which are even higher in nonparametric models than linear models. We further propose an accurate estimate for error variance in ultrahigh dimensional sparse additive model by effectively integrating sure independence screening and refitted cross-validation techniques (Fan, Guo and Hao, 2012). The root n consistency and the asymptotic normality of the resulting estimate are established. We conduct Monte Carlo simulation study to examine the finite sample performance of the newly proposed estimate. A real data example is used to illustrate the proposed methodology.
Project description:Microarrays are one of the most widely used high throughput technologies. One of the main problems in the area is that conventional estimates of the variances that are required in the t-statistic and other statistics are unreliable owing to the small number of replications. Various methods have been proposed in the literature to overcome this lack of degrees of freedom problem. In this context, it is commonly observed that the variance increases proportionally with the intensity level, which has led many researchers to assume that the variance is a function of the mean. Here we concentrate on estimation of the variance as a function of an unknown mean in two models: the constant coefficient of variation model and the quadratic variance-mean model. Because the means are unknown and estimated with few degrees of freedom, naive methods that use the sample mean in place of the true mean are generally biased because of the errors-in-variables phenomenon. We propose three methods for overcoming this bias. The first two are variations on the theme of the so-called heteroscedastic simulation-extrapolation estimator, modified to estimate the variance function consistently. The third class of estimators is entirely different, being based on semiparametric information calculations. Simulations show the power of our methods and their lack of bias compared with the naive method that ignores the measurement error. The methodology is illustrated by using microarray data from leukaemia patients.
Project description:Linear mixed models are widely used in ecological and biological applications, especially in genetic studies. Reliable estimation of variance components is crucial for using linear mixed models. However, standard methods, such as the restricted maximum likelihood (REML), are computationally inefficient in large samples and may be unstable with small samples. Other commonly used methods, such as the Haseman-Elston (HE) regression, may yield negative estimates of variances. Utilizing regularized estimation strategies, we propose the restricted Haseman-Elston (REHE) regression and REHE with resampling (reREHE) estimators, along with an inference framework for REHE, as fast and robust alternatives that provide nonnegative estimates with comparable accuracy to REML. The merits of REHE are illustrated using real data and benchmark simulation studies.
Project description:BackgroundGrowth Mixture Modeling (GMM) is commonly used to group individuals on their development over time, but convergence issues and impossible values are common. This can result in unreliable model estimates. Constraining variance parameters across classes or over time can solve these issues, but can also seriously bias estimates if variances differ. We aimed to determine which variance parameters can best be constrained in Growth Mixture Modeling.MethodsTo identify the variance constraints that lead to the best performance for different sample sizes, we conducted a simulation study and next verified our results with the TRacking Adolescent Individuals' Lives Survey (TRAILS) cohort.ResultsIf variance parameters differed across classes and over time, fitting a model without constraints led to the best results. No constrained model consistently performed well. However, the model that constrained the random effect variance and residual variances across classes consistently performed very poorly. For a small sample size (N = 100) all models showed issues. In TRAILS, the same model showed substantially different results from the other models and performed poorly in terms of model fit.ConclusionsIf possible, a Growth Mixture Model should be fit without any constraints on variance parameters. If not, we recommend to try different variance specifications and to not solely rely on the default model, which constrains random effect variances and residual variances across classes. The variance structure must always be reported Researchers should carefully follow the GRoLTS-Checklist when analyzing and reporting trajectory analyses.