Dataset Information

Grouped circular data in biology: advice for effectively implementing statistical procedures.

ABSTRACT:

Abstract

The most common statistical procedure with a sample of circular data is to test the null hypothesis that points are spread uniformly around the circle without a preferred direction. An array of tests for this has been developed. However, these tests were designed for continuously distributed data, whereas often (e.g. due to limited precision of measurement techniques) collected data is aggregated into a set of discrete values (e.g. rounded to the nearest degree). This disparity can cause an uncontrolled increase in type I error rate, an effect that is particularly problematic for tests that are based on the distribution of arc lengths between adjacent points (such as the Rao spacing test). Here, we demonstrate that an easy-to-apply modification can correct this problem, and we recommend this modification when using any test, other than the Rayleigh test, of circular uniformity on aggregated data. We provide R functions for this modification for several commonly used tests. In addition, we tested the power of a recently proposed test, the Gini test. However, we concluded that it lacks sufficient increase in power to replace any of the tests already in common use. In conclusion, using any of the standard circular tests (except the Rayleigh test) without modifications on rounded/aggregated data, especially with larger sample sizes, will increase the proportion of false-positive results-but we demonstrate that a simple and general modification avoids this problem.

Significance statement

Circular data are widespread across biological disciplines, e.g. in orientation studies or circadian rhythms. Often these data are rounded to the nearest 1-10 degrees. We have shown previously that this leads to an inflation of false-positive results when testing whether the data is significantly different from a random distribution using the Rao test. Here we present a modification that avoids this increase in false-positives for rounded data while retaining statistical power for a variety of tests. In sum, we provide comprehensive guidance on how best to test for departure from uniformity in non-continuous data.

SUBMITTER: Landler L

PROVIDER: S-EPMC7373216 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Grouped circular data in biology: advice for effectively implementing statistical procedures.

Landler Lukas L Ruxton Graeme D GD Malkemper E Pascal EP

Behavioral ecology and sociobiology 20200720 8

<h4>Abstract</h4>The most common statistical procedure with a sample of circular data is to test the null hypothesis that points are spread uniformly around the circle without a preferred direction. An array of tests for this has been developed. However, these tests were designed for continuously distributed data, whereas often (e.g. due to limited precision of measurement techniques) collected data is aggregated into a set of discrete values (e.g. rounded to the nearest degree). This disparity ...[more]

PMID: 32728310

Similar Datasets

Project description:IntroductionSurveys are common research tools, and questionnaires revisions are a common occurrence in longitudinal studies. Revisions can, at times, introduce systematic shifts in measures of interest. We formulate that questionnaire revision are a stochastic process with transition matrices. Thus, revision shifts can be reduced by first estimating these transition matrices, which can be utilized in estimation of interested measures.Materials and methodAn ideal survey response model is defined by mapping between the true value of a participant's response to an interval in the grouped data type scale. A population completed surveys multiple times, as modeled with multiple stochastic process. This included stochastic processes related to true values and intervals. While multiple factors contribute to changes in survey responses, here, we explored the method that can mitigate the effects of questionnaire revision. We proposed the Version Alignment Method (VAM), a data preprocessing tool, which can separate the transitions according to revisions from all transitions via solving an optimization problem and using the revision-related transitions to remove the revision effect. To verify VAM, we used simulation data to study the estimation error and a real life MJ dataset containing large amounts of long-term questionnaire responses with several questionnaire revisions to study its feasibility.ResultWe compared the difference of the annual average between consecutive years. Without adjustment, the difference is 0.593 when the revision occurred, while VAM brought it down to 0.115, where difference between years without revision was in the 0.005, 0.125 range. Furthermore, our method rendered the responses to the same set of intervals, thus comparing the relative frequency of items before and after revisions became possible. The average estimation error in L infinity was 0.0044 which occupied the 95% CI which was constructed by bootstrap analysis.ConclusionQuestionnaire revisions can induce different response bias and information loss, thus causing inconsistencies in the estimated measures. Conventional methods can only partly remedy this issue. Our proposal, VAM, can estimate the aggregate difference of all revision-related systematic errors and can reduce the differences, thus reducing inconsistencies in the final estimations of longitudinal studies.

Dataset Information

Grouped circular data in biology: advice for effectively implementing statistical procedures.

Abstract

Significance statement

Publications

Grouped circular data in biology: advice for effectively implementing statistical procedures.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets