Dataset Information

Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance maps.

ABSTRACT: BACKGROUND:Open-access biodiversity databases including mainly citizen science data make temporally and spatially extensive species' observation data available to a wide range of users. Such data have limitations however, which include: sampling bias in favour of recorder distribution, lack of survey effort assessment, and lack of coverage of the distribution of all organisms. These limitations are not always recorded, while any technical assessment or scientific research based on such data should include an evaluation of the uncertainty of its source data and researchers should acknowledge this information in their analysis. The here proposed maps of ignorance are a critical and easy way to implement a tool to not only visually explore the quality of the data, but also to filter out unreliable results. NEW INFORMATION:I present simple algorithms to display ignorance maps as a tool to report the spatial distribution of the bias and lack of sampling effort across a study region. Ignorance scores are expressed solely based on raw data in order to rely on the fewest assumptions possible. Therefore there is no prediction or estimation involved. The rationale is based on the assumption that it is appropriate to use species groups as a surrogate for sampling effort because it is likely that an entire group of species observed by similar methods will share similar bias. Simple algorithms are then used to transform raw data into ignorance scores scaled 0-1 that are easily comparable and scalable. Because of the need to perform calculations over big datasets, simplicity is crucial for web-based implementations on infrastructures for biodiversity information. With these algorithms, any infrastructure for biodiversity information can offer a quality report of the observations accessed through them. Users can specify a reference taxonomic group and a time frame according to the research question. The potential of this tool lies in the simplicity of its algorithms and in the lack of assumptions made about the bias distribution, giving the user the freedom to tailor analyses to their specific needs.

SUBMITTER: Ruete A

PROVIDER: S-EPMC4549634 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance maps.

Ruete Alejandro A

Biodiversity data journal 20150728 3

<h4>Background</h4>Open-access biodiversity databases including mainly citizen science data make temporally and spatially extensive species' observation data available to a wide range of users. Such data have limitations however, which include: sampling bias in favour of recorder distribution, lack of survey effort assessment, and lack of coverage of the distribution of all organisms. These limitations are not always recorded, while any technical assessment or scientific research based on such d ...[more]

PMID: 26312050

Similar Datasets

Project description:Species distribution models (SDM) are tools used to determine environmental features that influence the geographic distribution of species' abundance and have been used to analyze presence-only records. Analysis of presence-only records may require correction for nondetection sampling bias to yield reliable conclusions. In addition, individuals of some species of animals may be highly aggregated and standard SDMs ignore environmental features that may influence aggregation behavior.We contend that nondetection sampling bias can be treated as missing data. Statistical theory and corrective methods are well developed for missing data, but have been ignored in the literature on SDMs. We developed a marked inhomogeneous Poisson point process model that accounted for nondetection and aggregation behavior in animals and tested our methods on simulated data.Correcting for nondetection sampling bias requires estimates of the probability of detection which must be obtained from auxiliary data, as presence-only data do not contain information about the detection mechanism. Weighted likelihood methods can be used to correct for nondetection if estimates of the probability of detection are available. We used an inhomogeneous Poisson point process model to model group abundance, a zero-truncated generalized linear model to model group size, and combined these two models to describe the distribution of abundance. Our methods performed well on simulated data when nondetection was accounted for and poorly when detection was ignored.We recommend researchers consider the effects of nondetection sampling bias when modeling species distributions using presence-only data. If information about the detection process is available, we recommend researchers explore the effects of nondetection and, when warranted, correct the bias using our methods. We developed our methods to analyze opportunistic presence-only records of whooping cranes (Grus americana), but expect that our methods will be useful to ecologists analyzing opportunistic presence-only records of other species of animals.

Dataset Information

Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance maps.

Publications

Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance maps.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets