Dataset Information

Comparison of large-scale citizen science data and long-term study data for phenology modeling.

ABSTRACT: Large-scale observational data from citizen science efforts are becoming increasingly common in ecology, and researchers often choose between these and data from intensive local-scale studies for their analyses. This choice has potential trade-offs related to spatial scale, observer variance, and interannual variability. Here we explored this issue with phenology by comparing models built using data from the large-scale, citizen science USA National Phenology Network (USA-NPN) effort with models built using data from more intensive studies at Long Term Ecological Research (LTER) sites. We built statistical and process based phenology models for species common to each data set. From these models, we compared parameter estimates, estimates of phenological events, and out-of-sample errors between models derived from both USA-NPN and LTER data. We found that model parameter estimates for the same species were most similar between the two data sets when using simple models, but parameter estimates varied widely as model complexity increased. Despite this, estimates for the date of phenological events and out-of-sample errors were similar, regardless of the model chosen. Predictions for USA-NPN data had the lowest error when using models built from the USA-NPN data, while LTER predictions were best made using LTER-derived models, confirming that models perform best when applied at the same scale they were built. This difference in the cross-scale model comparison is likely due to variation in phenological requirements within species. Models using the USA-NPN data set can integrate parameters over a large spatial scale while those using an LTER data set can only estimate parameters for a single location. Accordingly, the choice of data set depends on the research question. Inferences about species-specific phenological requirements are best made with LTER data, and if USA-NPN or similar data are all that is available, then analyses should be limited to simple models. Large-scale predictive modeling is best done with the larger-scale USA-NPN data, which has high spatial representation and a large regional species pool. LTER data sets, on the other hand, have high site fidelity and thus characterize inter-annual variability extremely well. Future research aimed at forecasting phenology events for particular species over larger scales should develop models that integrate the strengths of both data sets.

SUBMITTER: Taylor SD

PROVIDER: S-EPMC7378950 | biostudies-literature | 2019 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comparison of large-scale citizen science data and long-term study data for phenology modeling.

Taylor Shawn D SD Meiners Joan M JM Riemer Kristina K Orr Michael C MC White Ethan P EP

Ecology 20181224 2

Large-scale observational data from citizen science efforts are becoming increasingly common in ecology, and researchers often choose between these and data from intensive local-scale studies for their analyses. This choice has potential trade-offs related to spatial scale, observer variance, and interannual variability. Here we explored this issue with phenology by comparing models built using data from the large-scale, citizen science USA National Phenology Network (USA-NPN) effort with models ...[more]

PMID: 30499218

Similar Datasets

Project description:Citizen science platforms are quickly accumulating hundreds of millions of biodiversity observations around the world annually. Quantifying and correcting for the biases in citizen science datasets remains an important first step before these data are used to address ecological questions and monitor biodiversity. One source of potential bias among datasets is the difference between those citizen science programs that have unstructured protocols and those that have semi-structured or structured protocols for submitting observations. To quantify biases in an unstructured citizen science platform, we contrasted bird observations from the unstructured iNaturalist platform with that from a semi-structured citizen science platform-eBird-for the continental United States. We tested whether four traits of species (body size, commonness, flock size, and color) predicted if a species was under- or over-represented in the unstructured dataset compared with the semi-structured dataset. We found strong evidence that large-bodied birds were over-represented in the unstructured citizen science dataset; moderate evidence that common species were over-represented in the unstructured dataset; strong evidence that species in large groups were over-represented; and no evidence that colorful species were over-represented in unstructured citizen science data. Our results suggest that biases exist in unstructured citizen science data when compared with semi-structured data, likely as a result of the detectability of a species and the inherent recording process. Importantly, in programs like iNaturalist the detectability process is two-fold-first, an individual organism needs to be detected, and second, it needs to be photographed, which is likely easier for many large-bodied species. Our results indicate that caution is warranted when using unstructured citizen science data in ecological modelling, and highlight body size as a fundamental trait that can be used as a covariate for modelling opportunistic species occurrence records, representing the detectability or identifiability in unstructured citizen science datasets. Future research in this space should continue to focus on quantifying and documenting biases in citizen science data, and expand our research by including structured citizen science data to understand how biases differ among unstructured, semi-structured, and structured citizen science platforms.

Project description:An increasing number of citizen science water monitoring programs is continuously collecting water quality data on streams throughout the United States. Operating under quality assurance protocols, this type of monitoring data can be extremely valuable for scientists and professional agencies, but in some cases has been of limited use due to concerns about the accuracy of data collected by volunteers. Although a growing body of studies attempts to address accuracy concerns by comparing volunteer data to professional data, rarely has this been conducted with large-scale datasets generated by citizen scientists. This study assesses the relative accuracy of volunteer water quality data collected by the Texas Stream Team (TST) citizen science program from 1992-2016 across the State of Texas by comparing it to professional data from corresponding stations during the same time period. Use of existing data meant that sampling times and protocols were not controlled for, thus professional and volunteer comparisons were refined to samples collected at stations within 60 meters of one another and during the same year. Results from the statewide TST dataset include 82 separate station/year ANOVAs and demonstrate that large-scale, existing volunteer and professional data with unpaired samples can show agreement of ~80% for all analyzed parameters (DO = 77%, pH = 79%, conductivity = 85%). In addition, to assess whether limiting variation within the source datasets increased the level of agreement between volunteers and professionals, data were analyzed at a local scale. Data from a single partner city, with increased controls on sampling times and locations and correction of a systematic bias in DO, confirmed this by showing an even greater agreement of 91% overall from 2009-2017 (DO = 91%, pH = 83%, conductivity = 100%). An experimental sampling dataset was analyzed and yielded similar results, indicating that existing datasets can be as accurate as experimental datasets designed with researcher supervision. Our findings underscore the reliability of large-scale citizen science monitoring datasets already in existence, and their potential value to scientific research and water management programs.

Project description:Information on species' distributions, abundances, and how they change over time is central to the study of the ecology and conservation of animal populations. This information is challenging to obtain at landscape scales across range-wide extents for two main reasons. First, landscape-scale processes that affect populations vary throughout the year and across species' ranges, requiring high-resolution, year-round data across broad, sometimes hemispheric, spatial extents. Second, while citizen science projects can collect data at these resolutions and extents, using these data requires appropriate analysis to address known sources of bias. Here, we present an analytical framework to address these challenges and generate year-round, range-wide distributional information using citizen science data. To illustrate this approach, we apply the framework to Wood Thrush (Hylocichla mustelina), a long-distance Neotropical migrant and species of conservation concern, using data from the citizen science project eBird. We estimate occurrence and abundance across a range of spatial scales throughout the annual cycle. Additionally, we generate intra-annual estimates of the range, intra-annual estimates of the associations between species and characteristics of the landscape, and interannual trends in abundance for breeding and non-breeding seasons. The range-wide population trajectories for Wood Thrush show a close correspondence between breeding and non-breeding seasons with steep declines between 2010 and 2013 followed by shallower rates of decline from 2013 to 2016. The breeding season range-wide population trajectory based on the independently collected and analyzed North American Breeding Bird Survey data also shows this pattern. The information provided here fills important knowledge gaps for Wood Thrush, especially during the less studied migration and non-breeding periods. More generally, the modeling framework presented here can be used to accurately capture landscape scale intra- and interannual distributional dynamics for broadly distributed, highly mobile species.

Dataset Information

Comparison of large-scale citizen science data and long-term study data for phenology modeling.

Publications

Comparison of large-scale citizen science data and long-term study data for phenology modeling.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets