Project description:BackgroundRapid data sharing can maximize the utility of data. In epidemics and pandemics like Zika, Ebola, and COVID-19, the case for such practices seems especially urgent and warranted. Yet rapidly sharing data widely has previously generated significant concerns related to equity. The continued lack of understanding and guidance on equitable data sharing raises the following questions: Should data sharing in epidemics and pandemics primarily advance utility, or should it advance equity as well? If so, what norms comprise equitable data sharing in epidemics and pandemics? Do these norms address the equity-related concerns raised by researchers, data providers, and other stakeholders? What tensions must be balanced between equity and other values?MethodsTo explore these questions, we undertook a systematic scoping review of the literature on data sharing in epidemics and pandemics and thematically analyzed identified literature for its discussion of ethical values, norms, concerns, and tensions, with a particular (but not exclusive) emphasis on equity. We wanted to both understand how equity in data sharing is being conceptualized and draw out other important values and norms for data sharing in epidemics and pandemics.ResultsWe found that values of utility, equity, solidarity, and reciprocity were described, and we report their associated norms, including researcher recognition; rapid, real-time sharing; capacity development; and fair benefits to data generators, data providers, and source countries. The value of utility and its associated norms were discussed substantially more than others. Tensions between utility norms (e.g., rapid, real-time sharing) and equity norms (e.g., researcher recognition, equitable access) were raised.ConclusionsThis study found support for equity being advanced by data sharing in epidemics and pandemics. However, norms for equitable data sharing in epidemics and pandemics require further development, particularly in relation to power sharing and participatory approaches prioritizing inclusion. Addressing structural inequities in the wider global health landscape is also needed to achieve equitable data sharing in epidemics and pandemics.
Project description:Spindle event detection is a key component in analyzing human sleep. However, detection of these oscillatory patterns by experts is time consuming and costly. Automated detection algorithms are cost efficient and reproducible but require robust datasets to be trained and validated. Using the MODA (Massive Online Data Annotation) platform, we used crowdsourcing to produce a large open-source dataset of high quality, human-scored sleep spindles (5342 spindles, from 180 subjects). We evaluated the performance of three subtype scorers: "experts, researchers and non-experts", as well as 7 previously published spindle detection algorithms. Our findings show that only two algorithms had performance scores similar to human experts. Furthermore, the human scorers agreed on the average spindle characteristics (density, duration and amplitude), but there were significant age and sex differences (also observed in the set of detected spindles). This study demonstrates how the MODA platform can be used to generate a highly valid open source standardized dataset for researchers to train, validate and compare automated detectors of biological signals such as the EEG.
Project description:Proponents of big data claim it will fuel a social research revolution, but skeptics challenge its reliability and decontextualization. The largest subset of big data is not designed for social research. Data augmentation-systematic assessment of measurement against known quantities and expansion of extant data with new information-is an important tool to maximize such data's validity and research value. Using trained research assistants or specialized algorithms are common approaches to augmentation but may not scale to big data or appease skeptics. We consider a third alternative: data augmentation with online crowdsourcing. Three empirical cases illustrate strengths and limitations of crowdsourcing, using Amazon Mechanical Turk to verify automated coding, link online databases, and gather data on online resources. Using these, we develop best practice guidelines and a reporting template to enhance reproducibility. Carefully designed, correctly applied, and rigorously documented crowdsourcing help address concerns about big data's usefulness for social research.
Project description:Given globalization and other social phenomena, controlling the spread of infectious diseases has become an imperative public health priority. A plethora of interventions that in theory can mitigate the spread of pathogens have been proposed and applied. Evaluating the effectiveness of such interventions is costly and in many circumstances unrealistic. Most important, the community effect (i.e., the ability of the intervention to minimize the spread of the pathogen from people who received the intervention to other community members) can rarely be evaluated. Here we propose a study design that can build and evaluate evidence in support of the community effect of an intervention. The approach exploits molecular evolutionary dynamics of pathogens in order to track new infections as having arisen from either a control or an intervention group. It enables us to evaluate whether an intervention reduces the number and length of new transmission chains in comparison with a control condition, and thus lets us estimate the relative decrease in new infections in the community due to the intervention. We provide as an example one working scenario of a way the approach can be applied with a simulation study and associated power calculations.
Project description:Understanding how animals move within their environment is a burgeoning field of research. Despite this, relatively basic data, such as the locomotor speeds that animals choose to walk at in the wild, are sparse. If animals choose to walk with dynamic similarity, they will move at equal dimensionless speeds, represented by Froude number (Fr). Fr may be interpreted from simple limb kinematics obtained from video data. Here, using Internet videos, limb kinematics were measured in 112 bird and mammal species weighing between 0.61 and 5400?kg. This novel method of data collection enabled the determination of kinematics for animals walking at their self-selected speeds without the need for exhaustive fieldwork. At larger sizes, both birds and mammals prefer to walk at slower relative speeds and relative stride frequencies, as preferred Fr decreased in larger species, indicating that Fr may not be a good predictor of preferred locomotor speeds. This may result from the observation that the minimum cost of transport is approached at lower Fr in larger species. Birds walk with higher duty factors, lower stride frequencies and longer stance times compared to mammals at self-selected speeds. The trend towards lower preferred Fr is also apparent in extinct vertebrate species.
Project description:Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users' online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.
Project description:We present a new database of Dutch word recognition times for a total of 54 thousand words, called the Dutch Crowdsourcing Project. The data were collected with an internet vocabulary test. The database is limited to native Dutch speakers. Participants were asked to indicate which words they knew. Their response times were registered, even though the participants were not asked to respond as fast as possible. Still, the response times correlate around .7 with the response times of the Dutch Lexicon Projects for shared words. Also results of virtual experiments indicate that the new response times are a valid addition to the Dutch Lexicon Projects. This not only means that we have useful response times for some 20 thousand extra words, but we now also have data on differences in response latencies as a function of education and age. The new data correspond better to word use in the Netherlands.
Project description:Aggression in group-housed laboratory mice is a serious animal welfare concern. Further understanding of the causes of mouse aggression could have a significant impact on a large number of laboratory animals. The NC3Rs led a crowdsourcing project to collect data on the prevalence and potential triggers of aggression in laboratory mice. The crowdsourcing approach collected data from multiple institutions and is the first time such an approach has been applied to a laboratory animal welfare problem. Technicians observed group-housed, male mice during daily routine cage checks and recorded all incidents of aggression-related injuries. In total, 44 facilities participated in the study and data was collected by 143 animal technicians. A total of 788 incidents of aggression-related injuries were reported across a sample population of 137,580 mice. The mean facility-level prevalence of aggression-related incidents reported across facilities was equivalent to 15 in 1,000 mice. Key factors influencing the prevalence of aggression included strain; number of mice per cage; how mice were selected into a cage; cage cleaning protocols; and transfer of nesting material. Practical recommendations have been provided to minimise aggressive behaviour in group-housed, male mice based upon the results of the study and taking into consideration the current published literature.
Project description:The Internet has enabled the emergence of collective problem solving, also known as crowdsourcing, as a viable option for solving complex tasks. However, the openness of crowdsourcing presents a challenge because solutions obtained by it can be sabotaged, stolen, and manipulated at a low cost for the attacker. We extend a previously proposed crowdsourcing dilemma game to an iterated game to address this question. We enumerate pure evolutionarily stable strategies within the class of so-called reactive strategies, i.e., those depending on the last action of the opponent. Among the 4096 possible reactive strategies, we find 16 strategies each of which is stable in some parameter regions. Repeated encounters of the players can improve social welfare when the damage inflicted by an attack and the cost of attack are both small. Under the current framework, repeated interactions do not really ameliorate the crowdsourcing dilemma in a majority of the parameter space.
Project description:The accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.