Dataset Information

Variance estimation, design effects, and sample size calculations for respondent-driven sampling.

ABSTRACT: Hidden populations, such as injection drug users and sex workers, are central to a number of public health problems. However, because of the nature of these groups, it is difficult to collect accurate information about them, and this difficulty complicates disease prevention efforts. A recently developed statistical approach called respondent-driven sampling improves our ability to study hidden populations by allowing researchers to make unbiased estimates of the prevalence of certain traits in these populations. Yet, not enough is known about the sample-to-sample variability of these prevalence estimates. In this paper, we present a bootstrap method for constructing confidence intervals around respondent-driven sampling estimates and demonstrate in simulations that it outperforms the naive method currently in use. We also use simulations and real data to estimate the design effects for respondent-driven sampling in a number of situations. We conclude with practical advice about the power calculations that are needed to determine the appropriate sample size for a study using respondent-driven sampling. In general, we recommend a sample size twice as large as would be needed under simple random sampling.

SUBMITTER: Salganik MJ

PROVIDER: S-EPMC1705515 | biostudies-other | 2006 Nov

REPOSITORIES: biostudies-other

ACCESS DATA

Similar Datasets

Project description:Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total population data.Total population data on age, tribe, religion, socioeconomic status, sexual activity, and HIV status were available on a population of 2402 male household heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, using current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample).We recruited 927 household heads. Full and small RDS samples were largely representative of the total population, but both samples underrepresented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven sampling statistical inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven sampling bootstrap 95% confidence intervals included the population proportion.Respondent-driven sampling produced a generally representative sample of this well-connected nonhidden population. However, current respondent-driven sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience sampling method, and caution is required when interpreting findings based on the sampling method.

Project description:IntroductionRespondent-driven sampling (RDS) is a variant of a link-tracing design intended for generating unbiased estimates of the composition of hidden populations that typically involves giving participants several coupons to recruit their peers into the study. RDS may generate biased estimates if coupons are distributed non-randomly or if potential recruits present for interview non-randomly. We explore if biases detected in an RDS study were due to either of these mechanisms, and propose and apply weights to reduce bias due to non-random presentation for interview.MethodsUsing data from the total population, and the population to whom recruiters offered their coupons, we explored how age and socioeconomic status were associated with being offered a coupon, and, if offered a coupon, with presenting for interview. Population proportions were estimated by weighting by the assumed inverse probabilities of being offered a coupon (as in existing RDS methods), and also of presentation for interview if offered a coupon by age and socioeconomic status group.ResultsYounger men were under-recruited primarily because they were less likely to be offered coupons. The under-recruitment of higher socioeconomic status men was due in part to them being less likely to present for interview. Consistent with these findings, weighting for non-random presentation for interview by age and socioeconomic status group greatly improved the estimate of the proportion of men in the lowest socioeconomic group, reducing the root-mean-squared error of RDS estimates of socioeconomic status by 38%, but had little effect on estimates for age. The weighting also improved estimates for tribe and religion (reducing root-mean-squared-errors by 19-29%), but had little effect for sexual activity or HIV status.ConclusionsData collected from recruiters on the characteristics of men to whom they offered coupons may be used to reduce bias in RDS studies. Further evaluation of this new method is required.

Project description:BackgroundEstimates of the sizes of hidden populations, including female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID), are essential for understanding the magnitude of vulnerabilities, health care needs, risk behaviors, and HIV and other infections.ObjectiveThis article advances the successive sampling-population size estimation (SS-PSE) method by examining the performance of a modification allowing visibility to be jointly modeled with population size in the context of 15 datasets. Datasets are from respondent-driven sampling (RDS) surveys of FSW, MSM, and PWID from three cities in Armenia. We compare and evaluate the accuracy of our imputed visibility population size estimates to those found for the same populations through other unpublished methods. We then suggest questions that are useful for eliciting information needed to compute SS-PSE and provide guidelines and caveats to improve the implementation of SS-PSE for real data.MethodsSS-PSE approximates the RDS sampling mechanism via the successive sampling model and uses the order of selection of the sample to provide information on the distribution of network sizes over the population members. We incorporate visibility imputation, a measure of a person's propensity to participate in the study, given that inclusion probabilities for RDS are unknown and social network sizes, often used as a proxy for inclusion probability, are subject to measurement errors from self-reported study data.ResultsFSW in Yerevan (2012, 2016) and Vanadzor (2016) as well as PWID in Yerevan (2014), Gyumri (2016), and Vanadzor (2016) had great fits with prior estimations. The MSM populations in all three cities had inconsistencies with expert prior values. The maximum low prior value was larger than the minimum high prior value, making a great fit impossible. One possible explanation is the inclusion of transgender individuals in the MSM populations during these studies. There could be differences between what experts perceive as the size of the population, based on who is an eligible member of that population, and what members of the population perceive. There could also be inconsistencies among different study participants, as some may include transgender individuals in their accounting of personal network size, while others may not. Because of these difficulties, the transgender population was split apart from the MSM population for the 2018 study.ConclusionsPrior estimations from expert opinions may not always be accurate. RDS surveys should be assessed to ensure that they have met all of the assumptions, that variables have reached convergence, and that the network structure of the population does not have bottlenecks. We recommend that SS-PSE be used in conjunction with other population size estimations commonly used in RDS, as well as results of other years of SS-PSE, to ensure generation of the most accurate size estimation.

Dataset Information

Variance estimation, design effects, and sample size calculations for respondent-driven sampling.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets