Project description:BackgroundRapid data sharing can maximize the utility of data. In epidemics and pandemics like Zika, Ebola, and COVID-19, the case for such practices seems especially urgent and warranted. Yet rapidly sharing data widely has previously generated significant concerns related to equity. The continued lack of understanding and guidance on equitable data sharing raises the following questions: Should data sharing in epidemics and pandemics primarily advance utility, or should it advance equity as well? If so, what norms comprise equitable data sharing in epidemics and pandemics? Do these norms address the equity-related concerns raised by researchers, data providers, and other stakeholders? What tensions must be balanced between equity and other values?MethodsTo explore these questions, we undertook a systematic scoping review of the literature on data sharing in epidemics and pandemics and thematically analyzed identified literature for its discussion of ethical values, norms, concerns, and tensions, with a particular (but not exclusive) emphasis on equity. We wanted to both understand how equity in data sharing is being conceptualized and draw out other important values and norms for data sharing in epidemics and pandemics.ResultsWe found that values of utility, equity, solidarity, and reciprocity were described, and we report their associated norms, including researcher recognition; rapid, real-time sharing; capacity development; and fair benefits to data generators, data providers, and source countries. The value of utility and its associated norms were discussed substantially more than others. Tensions between utility norms (e.g., rapid, real-time sharing) and equity norms (e.g., researcher recognition, equitable access) were raised.ConclusionsThis study found support for equity being advanced by data sharing in epidemics and pandemics. However, norms for equitable data sharing in epidemics and pandemics require further development, particularly in relation to power sharing and participatory approaches prioritizing inclusion. Addressing structural inequities in the wider global health landscape is also needed to achieve equitable data sharing in epidemics and pandemics.
Project description:Spindle event detection is a key component in analyzing human sleep. However, detection of these oscillatory patterns by experts is time consuming and costly. Automated detection algorithms are cost efficient and reproducible but require robust datasets to be trained and validated. Using the MODA (Massive Online Data Annotation) platform, we used crowdsourcing to produce a large open-source dataset of high quality, human-scored sleep spindles (5342 spindles, from 180 subjects). We evaluated the performance of three subtype scorers: "experts, researchers and non-experts", as well as 7 previously published spindle detection algorithms. Our findings show that only two algorithms had performance scores similar to human experts. Furthermore, the human scorers agreed on the average spindle characteristics (density, duration and amplitude), but there were significant age and sex differences (also observed in the set of detected spindles). This study demonstrates how the MODA platform can be used to generate a highly valid open source standardized dataset for researchers to train, validate and compare automated detectors of biological signals such as the EEG.
Project description:Proponents of big data claim it will fuel a social research revolution, but skeptics challenge its reliability and decontextualization. The largest subset of big data is not designed for social research. Data augmentation-systematic assessment of measurement against known quantities and expansion of extant data with new information-is an important tool to maximize such data's validity and research value. Using trained research assistants or specialized algorithms are common approaches to augmentation but may not scale to big data or appease skeptics. We consider a third alternative: data augmentation with online crowdsourcing. Three empirical cases illustrate strengths and limitations of crowdsourcing, using Amazon Mechanical Turk to verify automated coding, link online databases, and gather data on online resources. Using these, we develop best practice guidelines and a reporting template to enhance reproducibility. Carefully designed, correctly applied, and rigorously documented crowdsourcing help address concerns about big data's usefulness for social research.
Project description:Given globalization and other social phenomena, controlling the spread of infectious diseases has become an imperative public health priority. A plethora of interventions that in theory can mitigate the spread of pathogens have been proposed and applied. Evaluating the effectiveness of such interventions is costly and in many circumstances unrealistic. Most important, the community effect (i.e., the ability of the intervention to minimize the spread of the pathogen from people who received the intervention to other community members) can rarely be evaluated. Here we propose a study design that can build and evaluate evidence in support of the community effect of an intervention. The approach exploits molecular evolutionary dynamics of pathogens in order to track new infections as having arisen from either a control or an intervention group. It enables us to evaluate whether an intervention reduces the number and length of new transmission chains in comparison with a control condition, and thus lets us estimate the relative decrease in new infections in the community due to the intervention. We provide as an example one working scenario of a way the approach can be applied with a simulation study and associated power calculations.
Project description:The emergence of severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) reawakened the need to rapidly understand the molecular etiologies, pandemic potential, and prospective treatments of infectious agents. The lack of existing data on SARS-CoV-2 hampered early attempts to treat severe forms of coronavirus disease-2019 (COVID-19) during the pandemic. This study coupled existing transcriptomic data from severe acute respiratory syndrome-related coronavirus 1 (SARS-CoV-1) lung infection animal studies with crowdsourcing statistical approaches to derive temporal meta-signatures of host responses during early viral accumulation and subsequent clearance stages. Unsupervised and supervised machine learning approaches identified top dysregulated genes and potential biomarkers (e.g. CXCL10, BEX2, and ADM). Temporal meta-signatures revealed distinct gene expression programs with biological implications to a series of host responses underlying sustained Cxcl10 expression and Stat signaling. Cell cycle switched from G1/G0 phase genes, early in infection, to a G2/M gene signature during late infection that correlated with the enrichment of DNA damage response and repair genes. The SARS-CoV-1 meta-signatures were shown to closely emulate human SARS-CoV-2 host responses from emerging RNAseq, single cell, and proteomics data with early monocyte-macrophage activation followed by lymphocyte proliferation. The circulatory hormone adrenomedullin was observed as maximally elevated in elderly patients who died from COVID-19. Stage-specific correlations to compounds with potential to treat COVID-19 and future coronavirus infections were in part validated by a subset of twenty-four that are in clinical trials to treat COVID-19. This study represents a roadmap to leverage existing data in the public domain to derive novel molecular and biological insights and potential treatments to emerging human pathogens.
Project description:This article presents 14 quick tips to build a team to crowdsource data for public health advocacy. It includes tips around team building and logistics, infrastructure setup, media and industry outreach, and project wrap-up and archival for posterity.
Project description:Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users' online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.
Project description:Optimizing the impact on the economy of control strategies aiming at containing the spread of COVID-19 is a critical challenge. We use daily new case counts of COVID-19 patients reported by local health administrations from different Metropolitan Statistical Areas (MSAs) within the US to parametrize a model that well describes the propagation of the disease in each area. We then introduce a time-varying control input that represents the level of social distancing imposed on the population of a given area and solve an optimal control problem with the goal of minimizing the impact of social distancing on the economy in the presence of relevant constraints, such as a desired level of suppression for the epidemics at a terminal time. We find that with the exception of the initial time and of the final time, the optimal control input is well approximated by a constant, specific to each area, which contrasts with the implemented system of reopening 'in phases'. For all the areas considered, this optimal level corresponds to stricter social distancing than the level estimated from data. Proper selection of the time period for application of the control action optimally is important: depending on the particular MSA this period should be either short or long or intermediate. We also consider the case that the transmissibility increases in time (due e.g. to increasingly colder weather), for which we find that the optimal control solution yields progressively stricter measures of social distancing. We finally compute the optimal control solution for a model modified to incorporate the effects of vaccinations on the population and we see that depending on a number of factors, social distancing measures could be optimally reduced during the period over which vaccines are administered to the population.
Project description:Understanding how animals move within their environment is a burgeoning field of research. Despite this, relatively basic data, such as the locomotor speeds that animals choose to walk at in the wild, are sparse. If animals choose to walk with dynamic similarity, they will move at equal dimensionless speeds, represented by Froude number (Fr). Fr may be interpreted from simple limb kinematics obtained from video data. Here, using Internet videos, limb kinematics were measured in 112 bird and mammal species weighing between 0.61 and 5400?kg. This novel method of data collection enabled the determination of kinematics for animals walking at their self-selected speeds without the need for exhaustive fieldwork. At larger sizes, both birds and mammals prefer to walk at slower relative speeds and relative stride frequencies, as preferred Fr decreased in larger species, indicating that Fr may not be a good predictor of preferred locomotor speeds. This may result from the observation that the minimum cost of transport is approached at lower Fr in larger species. Birds walk with higher duty factors, lower stride frequencies and longer stance times compared to mammals at self-selected speeds. The trend towards lower preferred Fr is also apparent in extinct vertebrate species.
Project description:We present a new database of Dutch word recognition times for a total of 54 thousand words, called the Dutch Crowdsourcing Project. The data were collected with an internet vocabulary test. The database is limited to native Dutch speakers. Participants were asked to indicate which words they knew. Their response times were registered, even though the participants were not asked to respond as fast as possible. Still, the response times correlate around .7 with the response times of the Dutch Lexicon Projects for shared words. Also results of virtual experiments indicate that the new response times are a valid addition to the Dutch Lexicon Projects. This not only means that we have useful response times for some 20 thousand extra words, but we now also have data on differences in response latencies as a function of education and age. The new data correspond better to word use in the Netherlands.