Project description:As the COVID-19 pandemic spreads across the world, it is important to understand its features and responses to public health interventions in real-time. The field of infectious diseases epidemiology has highly advanced modeling strategies that yield relevant estimates. These include the doubling time of the epidemic and various other representations of the numbers of cases identified over time. Crude estimates of these quantities suffer from dependence on the underlying testing strategies within communities. We clarify the functional relationship between testing and the epidemic parameters, and thereby derive sensitivity analyses that explore the range of possible truths under various testing dynamics. We derive the required adjustment to the estimates of interest for New York City. We demonstrate that crude estimates that assume stable testing or complete testing can be biased.
Project description:Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.
Project description:BACKGROUND:Pedigree reconstruction using genetic analysis provides a useful means to estimate fundamental population biology parameters relating to population demography, trait heritability and individual fitness when combined with other sources of data. However, there remain limitations to pedigree reconstruction in wild populations, particularly in systems where parent-offspring relationships cannot be directly observed, there is incomplete sampling of individuals, or molecular parentage inference relies on low quality DNA from archived material. While much can still be inferred from incomplete or sparse pedigrees, it is crucial to evaluate the quality and power of available genetic information a priori to testing specific biological hypotheses. Here, we used microsatellite markers to reconstruct a multi-generation pedigree of wild Atlantic salmon (Salmo salar L.) using archived scale samples collected with a total trapping system within a river over a 10 year period. Using a simulation-based approach, we determined the optimal microsatellite marker number for accurate parentage assignment, and evaluated the power of the resulting partial pedigree to investigate important evolutionary and quantitative genetic characteristics of salmon in the system. RESULTS:We show that at least 20 microsatellites (ave. 12 alleles/locus) are required to maximise parentage assignment and to improve the power to estimate reproductive success and heritability in this study system. We also show that 1.5 fold differences can be detected between groups simulated to have differing reproductive success, and that it is possible to detect moderate heritability values for continuous traits (h2?~?0.40) with more than 80% power when using 28 moderately to highly polymorphic markers. CONCLUSION:The methodologies and work flow described provide a robust approach for evaluating archived samples for pedigree-based research, even where only a proportion of the total population is sampled. The results demonstrate the feasibility of pedigree-based studies to address challenging ecological and evolutionary questions in free-living populations, where genealogies can be traced only using molecular tools, and that significant increases in pedigree assignment power can be achieved by using higher numbers of markers.
Project description:Quantile estimation has attracted significant research interests in recent years. However, there has been only a limited literature on quantile estimation in the presence of incomplete data. In this paper, we propose a general framework to address this problem. Our framework combines the two widely adopted approaches for missing data analysis, the imputation approach and the inverse probability weighting approach, via the empirical likelihood method. The proposed method is capable of dealing with many different missingness settings. We mainly study three of them: (i) estimating the marginal quantile of a response that is subject to missingness while there are fully observed covariates; (ii) estimating the conditional quantile of a fully observed response while the covariates are partially available; and (iii) estimating the conditional quantile of a response that is subject to missingness with fully observed covariates and extra auxiliary variables. The proposed method allows multiple models for both the missingness probability and the data distribution. The resulting estimators are multiply robust in the sense that they are consistent if any one of these models is correctly specified. The asymptotic distributions are established using the empirical process theory.
Project description:Social interactions shape the patterns of spreading processes in a population. Techniques such as diaries or proximity sensors allow to collect data about encounters and to build networks of contacts between individuals. The contact networks obtained from these different techniques are however quantitatively different. Here, we first show how these discrepancies affect the prediction of the epidemic risk when these data are fed to numerical models of epidemic spread: low participation rate, under-reporting of contacts and overestimation of contact durations in contact diaries with respect to sensor data determine indeed important differences in the outcomes of the corresponding simulations with for instance an enhanced sensitivity to initial conditions. Most importantly, we investigate if and how information gathered from contact diaries can be used in such simulations in order to yield an accurate description of the epidemic risk, assuming that data from sensors represent the ground truth. The contact networks built from contact sensors and diaries present indeed several structural similarities: this suggests the possibility to construct, using only the contact diary network information, a surrogate contact network such that simulations using this surrogate network give the same estimation of the epidemic risk as simulations using the contact sensor network. We present and compare several methods to build such surrogate data, and show that it is indeed possible to obtain a good agreement between the outcomes of simulations using surrogate and sensor data, as long as the contact diary information is complemented by publicly available data describing the heterogeneity of the durations of human contacts.
Project description:Signal correlation (r s) is commonly defined as the correlation between the tuning curves of two neurons and is widely used as a metric of tuning similarity. It is fundamental to how populations of neurons represent stimuli and has been central to many studies of neural coding. Yet the classic estimate, Pearson's correlation coefficient, [Formula: see text], between the average responses of two neurons to a set of stimuli suffers from confounding biases. The estimate [Formula: see text] can be downwardly biased by trial-to-trial variability and also upwardly biased by trial-to-trial correlation between neurons, and these biases can hide important aspects of neural coding. Here we provide analytic results on the source of these biases and explore them for ranges of parameters that are relevant for electrophysiological experiments. We then provide corrections for these biases that we validate in simulation. Furthermore, we apply these corrected estimators to make the following novel experimental observation in cortical area MT: pairs of nearby neurons that are strongly tuned for motion direction tend to have high signal correlation, and pairs that are weakly tuned tend to have low signal correlation. We dismiss a trivial explanation for this and find that an analogous trend holds for orientation tuning in the primary visual cortex. We also consider the potential consequences for encoding whereby the association of signal correlation and tuning strength naturally regularizes the dimensionality of downstream computations.SIGNIFICANCE STATEMENT Fundamental to how cortical neurons encode information about the environment is their functional similarity, that is, the redundancy in what they encode and their shared noise. These properties have been extensively studied theoretically and experimentally throughout the nervous system, but here we show that a common estimator of functional similarity has confounding biases. We characterize these biases and provide estimators that do not suffer from them. Using our improved estimators, we demonstrate a novel result, that is, there is a positive relationship between tuning curve similarity and amplitude for nearby neurons in the visual cortical motion area MT. We provide a simple stochastic model explaining this relationship and discuss how it would naturally regularize the dimensionality of neural encoding.
Project description:Modeling infectious disease dynamics has been critical throughout the COVID-19 pandemic. Of particular interest are the incidence, prevalence, and effective reproductive number (Rt). Estimating these quantities is challenging due to under-ascertainment, unreliable reporting, and time lags between infection, onset, and testing. We propose a Multilevel Epidemic Regression Model to Account for Incomplete Data (MERMAID) to jointly estimate Rt, ascertainment rates, incidence, and prevalence over time in one or multiple regions. Specifically, MERMAID allows for a flexible regression model of Rt that can incorporate geographic and time-varying covariates. To account for under-ascertainment, we (a) model the ascertainment probability over time as a function of testing metrics and (b) jointly model data on confirmed infections and population-based serological surveys. To account for delays between infection, onset, and reporting, we model stochastic lag times as missing data, and develop an EM algorithm to estimate the model parameters. We evaluate the performance of MERMAID in simulation studies, and assess its robustness by conducting sensitivity analyses in a range of scenarios of model misspecifications. We apply the proposed method to analyze COVID-19 daily confirmed infection counts, PCR testing data, and serological survey data across the United States. Based on our model, we estimate an overall COVID-19 prevalence of 12.5% (ranging from 2.4% in Maine to 20.2% in New York) and an overall ascertainment rate of 45.5% (ranging from 22.5% in New York to 81.3% in Rhode Island) in the United States from March to December 2020. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Project description:Most of the recent epidemic outbreaks in the world have as a trigger, a strong migratory component as has been evident in the recent Covid-19 pandemic. In this work we address the problem of migration of human populations and its effect on pathogen reinfections in the case of Dengue, using a Markov-chain susceptible-infected-susceptible (SIS) metapopulation model over a network. Our model postulates a general contact rate that represents a local measure of several factors: the population size of infected hosts that arrive at a given location as a function of total population size, the current incidence at neighboring locations, and the connectivity of the network where the disease spreads. This parameter can be interpreted as an indicator of outbreak risk at a given location. This parameter is tied to the fraction of individuals that move across boundaries (migration). To illustrate our model capabilities, we estimate from epidemic Dengue data in Mexico the dynamics of migration at a regional scale incorporating climate variability represented by an index based on precipitation data.
Project description:Based on international guidelines, the elaboration of national carbon (C) budgets in many countries has tended to set aside the capacity of grazing lands to sequester C as soil organic carbon (SOC). A widely applied simple method assumes a steady state for SOC stocks in grasslands and a long-term equilibrium between annual C gains and losses. This article presents a theoretical method based on the annual conversion of belowground biomass into SOC to include the capacity of grazing-land soils to sequester C in greenhouse gases (GHG) calculations. Average figures from both methods can be combined with land-use/land-cover data to reassess the net C sequestration of the rural sector from a country. The results of said method were validated with empirical values based on peer-reviewed literature that provided annual data on SOC sequestration. This methodology offers important differences over pre-existing GHG landscape approach calculation methods: •improves the estimation about the capacity of grazing-land soils to sequester C assuming these lands are not in a steady state and•counts C gains when considering that grazing lands are managed at low livestock densities.
Project description:The temporal dynamics of species diversity are shaped by variations in the rates of speciation and extinction, and there is a long history of inferring these rates using first and last appearances of taxa in the fossil record. Understanding diversity dynamics critically depends on unbiased estimates of the unobserved times of speciation and extinction for all lineages, but the inference of these parameters is challenging due to the complex nature of the available data. Here, we present a new probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record. The rates are allowed to vary through time independently of each other, and the probability of preservation and sampling is explicitly incorporated in the model to estimate the true lifespan of each lineage. We implement a Bayesian algorithm to assess the presence of rate shifts by exploring alternative diversification models. Tests on a range of simulated data sets reveal the accuracy and robustness of our approach against violations of the underlying assumptions and various degrees of data incompleteness. Finally, we demonstrate the application of our method with the diversification of the mammal family Rhinocerotidae and reveal a complex history of repeated and independent temporal shifts of both speciation and extinction rates, leading to the expansion and subsequent decline of the group. The estimated parameters of the birth-death process implemented here are directly comparable with those obtained from dated molecular phylogenies. Thus, our model represents a step towards integrating phylogenetic and fossil information to infer macroevolutionary processes.