Project description:ObjectiveA common form of validation study compares alternative methods for collecting data. The Bland-Altman plot pairs observations across methods and plots their mean values vs. their difference. This method provides only limited information, however, when the range of observed values is small relative to the number of observations. This brief report shows how adding a simple bar chart to a Bland-Altman plot adds essential additional information.Study design and settingThe methodological approach is illustrated using data from a randomized controlled clinical trial of patients in a U.S. county health system.ResultsWhen the number of unique values is small, a Bland-Altman plot alone may provide inadequate information. Adding a bar chart yields new and essential information about agreement, bias, and heteroscedasticity.ConclusionStudies validating one data-collection method against another can be performed successfully even when the number of unique values is small.
Project description:Genetic clustering algorithms, implemented in programs such as STRUCTURE and ADMIXTURE, have been used extensively in the characterisation of individuals and populations based on genetic data. A successful example is the reconstruction of the genetic history of African Americans as a product of recent admixture between highly differentiated populations. Histories can also be reconstructed using the same procedure for groups that do not have admixture in their recent history, where recent genetic drift is strong or that deviate in other ways from the underlying inference model. Unfortunately, such histories can be misleading. We have implemented an approach, badMIXTURE, to assess the goodness of fit of the model using the ancestry "palettes" estimated by CHROMOPAINTER and apply it to both simulated data and real case studies. Combining these complementary analyses with additional methods that are designed to test specific hypotheses allows a richer and more robust analysis of recent demographic history.
Project description:Recent work has raised awareness about the need to replace bar graphs of continuous data with informative graphs showing the data distribution. The impact of these efforts is not known. The present observational meta-research study examined how often scientists in different fields use various graph types, and assessed whether visualization practices have changed between 2010 and 2020. We developed and validated an automated screening tool, designed to identify bar graphs of counts or proportions, bar graphs of continuous data, bar graphs with dot plots, dot plots, box plots, violin plots, histograms, pie charts, and flow charts. Papers from 23 fields (approximately 1000 papers/field per year) were randomly selected from PubMed Central and screened (n=227998). F1 scores for different graphs ranged between 0.83 and 0.95 in the internal validation set. While the tool also performed well in external validation sets, F1 scores were lower for uncommon graphs. Bar graphs are more often used incorrectly to display continuous data than they are used correctly to display counts or proportions. The proportion of papers that use bar graphs of continuous data varies markedly across fields (range in 2020: 4-58%), with high rates in biochemistry and cell biology, complementary and alternative medicine, physiology, genetics, oncology and carcinogenesis, pharmacology, microbiology and immunology. Visualization practices have improved in some fields in recent years. Fewer than 25% of papers use flow charts, which provide information about attrition and the risk of bias. The present study highlights the need for continued interventions to improve visualization and identifies fields that would benefit most.
Project description:Historical GIS involves applying GIS to historical research. Using a unique method, I recovered historical tree survey information stored in bar chart figures of a 1956 publication. I converted PDF files to TIF files, which is a format for a GIS layer. I then employed GIS tools to measure lengths of each bar in the TIF file and used a regression (R2 = 97%) to convert bar lengths to numerical values of tree composition. I joined this information to a spatial GIS layer of Indiana, USA. To validate results, I compared predictions against an independent dataset and written summaries. I determined that historically (circa 1799 to 1846) in Indiana, oaks were 27% of all trees, beech was 25%, hickories and sugar maple were 7% each, and ash was 4.5%. Beech forests dominated (i.e., >24% of all trees) 44% of 8.9 million ha (i.e., where data were available in Indiana), oak forests dominated 29%, beech and oak forests dominated 4.5%, and oak savannas were in 6% of Indiana, resulting in beech and/or oak dominance in 84% of the state. This method may be valuable to reclaim information available in published figures, when associated raw data are not available.
Project description:In this paper we introduce the network histogram, a statistical summary of network interactions to be used as a tool for exploratory data analysis. A network histogram is obtained by fitting a stochastic blockmodel to a single observation of a network dataset. Blocks of edges play the role of histogram bins and community sizes that of histogram bandwidths or bin sizes. Just as standard histograms allow for varying bandwidths, different blockmodel estimates can all be considered valid representations of an underlying probability model, subject to bandwidth constraints. Here we provide methods for automatic bandwidth selection, by which the network histogram approximates the generating mechanism that gives rise to exchangeable random graphs. This makes the blockmodel a universal network representation for unlabeled graphs. With this insight, we discuss the interpretation of network communities in light of the fact that many different community assignments can all give an equally valid representation of such a network. To demonstrate the fidelity-versus-interpretability tradeoff inherent in considering different numbers and sizes of communities, we analyze two publicly available networks--political weblogs and student friendships--and discuss how to interpret the network histogram when additional information related to node and edge labeling is present.
Project description:The sorting nexins (SNX), constitute a diverse family of molecules that play varied roles in membrane trafficking, cell signaling, membrane remodeling, organelle motility and autophagy. In particular, the SNX-BAR proteins, a SNX subfamily characterized by a C-terminal dimeric Bin/Amphiphysin/Rvs (BAR) lipid curvature domain and a conserved Phox-homology domain, are of great interest. In budding yeast, many SNX-BARs proteins have well-characterized endo-vacuolar trafficking roles. Phylogenetic analyses allowed us to identify an additional SNX-BAR protein, Vps501, with a novel endo-vacuolar role. We report that Vps501 uniquely localizes to the vacuolar membrane and has physical and genetic interactions with the SEA complex to regulate TORC1 inactivation. We found cells displayed a severe deficiency in starvation-induced/nonselective autophagy only when SEA complex subunits are ablated in combination with Vps501, indicating a cooperative role with the SEA complex during TORC1 signaling during autophagy induction. Additionally, we found the SEACIT complex becomes destabilized in vps501Δsea1Δ cells, which resulted in aberrant endosomal TORC1 activity and subsequent Atg13 hyperphosphorylation. We have also discovered that the vacuolar localization of Vps501 is dependent upon a direct interaction with Sea1 and a unique lipid binding specificity that is also required for its function. This article is protected by copyright. All rights reserved.
Project description:The properties of all electrolyte solutions, whether the solvent is aqueous or nonaqueous, are strongly dependent on the nature of the ions in solution. The consequences of these specific-ion effects are significant and manifest from biochemistry to battery technology. The "law of matching water affinities" (LMWA) has proven to be a powerful concept for understanding and predicting specific-ion effects in a wide range of systems, including the stability of proteins and colloids, solubility, the behavior of lipids, surfactants, and polyelectrolytes, and catalysis in water and ionic liquids. It provides a framework for considering how the ions of an electrolyte interact in manifestations of ion specificity and therefore represents a considerable conceptual advance on the Hofmeister or lyotropic series in understanding specific-ion effects. Underpinning the development of the law of matching water affinities were efforts to interpret the so-called "volcano plots". Volcano plots exhibit a stark inverted "V" shape trend for a range of electrolyte dependent solution properties when plotted against the difference in solvation energies of the ions that constitute the electrolyte. Here we test the hypothesis that volcano plots are also manifest in nonaqueous solvents in order to investigate whether the LMWA can be extended to nonaqueous solvents. First we examine the standard solvation energies of electrolytes in nonaqueous solvents for evidence of volcano trends and then extend this to include the solubility and the activity/osmotic coefficients of electrolytes, in order to explore real electrolyte concentrations. We find that with respect to the solvent volcano trends are universal, which brings into question the role of solvent affinity in the manifestation of specific-ion effects. We also show that the volcano trends are maintained when the ionic radii are used in place of the absolute solvation energies as the abscissa, thus showing that ion sizes, rather than the solvent affinities, fundamentally determine the manifestation of ion specificity. This leads us to propose that specific-ion effects across all solvents including water can be understood by considering the relative sizes of the anion and cation, provided the ions are spherical or tetrahedral. This is an extension of the LMWA to all solvents in which the "water affinity" is replaced with the relative size of the anion and cation.
Project description:BackgroundIt is often desirable to observe how a disease progresses over time in individual patients, rather than graphing group averages; and since multiple outcomes are typically recorded on each patient, it would be advantageous to visualise disease progression on multiple variables simultaneously.MethodsA variety of vector plots and a path plot have been developed for this purpose, and data from a longitudinal Huntington's disease study are used to illustrate the utility of these graphical methods for exploratory data analysis.ResultsInitial and final values for three outcome variables can be easily visualised per patient, along with the change in these variables over time. In addition to the disease trajectory, the path individual patients take from initial to final observation can be traced. Categorical variables can be coded with different types of vectors or paths (e.g. different colours, line types, line thickness) and separate panels can be used to include further categorical or continuous variables, allowing clear visualisation of further information for each individual. In addition, summary statistics such as mean vectors, bivariate interquartile ranges and convex polygons can be included to assist in interpreting trajectories, comparing groups, and detecting multivariate outliers.ConclusionVector and path plots are useful graphical methods for exploratory data analysis when individual-level information on multiple variables over time is desired, and they have several advantages over plotting each variable separately.