Project description:Antibodies testing in the coronavirus era is frequently promoted, but the underlying statistics behind their validation has come under more scrutiny in recent weeks. We provide calculations, interpretations, and plots of positive and negative predictive values under a variety of scenarios. Prevalence, sensitivity, and specificity are estimated within ranges of values from researchers and antibodies manufacturers. Illustrative examples are highlighted, and interactive plots are provided in the Supplementary Information. Implications are discussed for society overall and across diverse locations with different levels of disease burden. Specifically, the proportion of positive serology tests that are false can differ drastically from up to 3%-88% for people from different places with different proportions of infected people in the populations while the false negative rate is typically under 10%.
Project description:Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so-and yet these misinterpretations dominate much of the scientific literature. In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions. Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations. We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect. We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power. We conclude with guidelines for improving statistical interpretation and reporting.
Project description:BackgroundAs a result of reporting bias, or frauds, false or misunderstood findings may represent the majority of published research claims. This article provides simple methods that might help to appraise the quality of the reporting of randomized, controlled trials (RCT).MethodsThis evaluation roadmap proposed herein relies on four steps: evaluation of the distribution of the reported variables; evaluation of the distribution of the reported p values; data simulation using parametric bootstrap and explicit computation of the p values. Such an approach was illustrated using published data from a retracted RCT comparing a hydroxyethyl starch versus albumin-based priming for cardiopulmonary bypass.ResultsDespite obvious nonnormal distributions, several variables are presented as if they were normally distributed. The set of 16 p values testing for differences in baseline characteristics across randomized groups did not follow a Uniform distribution on [0,1] (p = 0.045). The p values obtained by explicit computations were different from the results reported by the authors for the two following variables: urine output at 5 hours (calculated p value < 10-6, reported p ≥ 0.05); packed red blood cells (PRBC) during surgery (calculated p value = 0.08; reported p < 0.05). Finally, parametric bootstrap found p value > 0.05 in only 5 of the 10,000 simulated datasets concerning urine output 5 hours after surgery. Concerning PRBC transfused during surgery, parametric bootstrap showed that only the corresponding p value had less than a 50% chance to be inferior to 0.05 (3,920/10,000, p value < 0.05).ConclusionsSuch simple evaluation methods might offer some warning signals. However, it should be emphasized that such methods do not allow concluding to the presence of error or fraud but should rather be used to justify asking for an access to the raw data.
Project description:A common application of differential expression analysis is finding genes that are differentially expressed upon treatment in only one out of several groups of samples. One of the approaches is to test for significant difference in expression between treatment and control separately in the two groups, and then select genes that show statistical significance in one group only. This approach is then often combined with a gene set enrichment analysis to find pathways and gene sets regulated by treatment in only this group. Here we show that this procedure is statistically incorrect and that the interaction between treatment and group should be tested instead. Moreover, we show that gene set enrichment analysis applied to such incorrectly defined genes group-specific genes may result in misleading artifacts. Due to the presence of false negatives, genes significant in one, but not the other group are enriched in gene sets which correspond to the overall effect of the treatment. Thus, the results appear related to the problem at hand, but do not reflect the group-specific effect of a treatment. A literature search revealed that more than a quarter of papers which used a Venn diagram to illustrate the results of separate differential analysis have also applied this incorrect reasoning.
Project description:Various exact tests for statistical inference are available for powerful and accurate decision rules provided that corresponding critical values are tabulated or evaluated via Monte Carlo methods. This article introduces a novel hybrid method for computing p-values of exact tests by combining Monte Carlo simulations and statistical tables generated a priori. To use the data from Monte Carlo generations and tabulated critical values jointly, we employ kernel density estimation within Bayesian-type procedures. The p-values are linked to the posterior means of quantiles. In this framework, we present relevant information from the Monte Carlo experiments via likelihood-type functions, whereas tabulated critical values are used to reflect prior distributions. The local maximum likelihood technique is employed to compute functional forms of prior distributions from statistical tables. Empirical likelihood functions are proposed to replace parametric likelihood functions within the structure of the posterior mean calculations to provide a Bayesian-type procedure with a distribution-free set of assumptions. We derive the asymptotic properties of the proposed nonparametric posterior means of quantiles process. Using the theoretical propositions, we calculate the minimum number of needed Monte Carlo resamples for desired level of accuracy on the basis of distances between actual data characteristics (e.g. sample sizes) and characteristics of data used to present corresponding critical values in a table. The proposed approach makes practical applications of exact tests simple and rapid. Implementations of the proposed technique are easily carried out via the recently developed STATA and R statistical packages.
Project description:BackgroundQuality of reporting for Randomized Clinical Trials (RCTs) in oncology was analyzed in several systematic reviews, but, in this setting, there is paucity of data for the outcomes definitions and consistency of reporting for statistical tests in RCTs and Observational Studies (OBS). The objective of this review was to describe those two reporting aspects, for OBS and RCTs in oncology.MethodsFrom a list of 19 medical journals, three were retained for analysis, after a random selection: British Medical Journal (BMJ), Annals of Oncology (AoO) and British Journal of Cancer (BJC). All original articles published between March 2009 and March 2014 were screened. Only studies whose main outcome was accompanied by a corresponding statistical test were included in the analysis. Studies based on censored data were excluded. Primary outcome was to assess quality of reporting for description of primary outcome measure in RCTs and of variables of interest in OBS. A logistic regression was performed to identify covariates of studies potentially associated with concordance of tests between Methods and Results parts.Results826 studies were included in the review, and 698 were OBS. Variables were described in Methods section for all OBS studies and primary endpoint was clearly detailed in Methods section for 109 RCTs (85.2%). 295 OBS (42.2%) and 43 RCTs (33.6%) had perfect agreement for reported statistical test between Methods and Results parts. In multivariable analysis, variable "number of included patients in study" was associated with test consistency: aOR (adjusted Odds Ratio) for third group compared to first group was equal to: aOR Grp3 = 0.52 [0.31-0.89] (P value = 0.009).ConclusionVariables in OBS and primary endpoint in RCTs are reported and described with a high frequency. However, statistical tests consistency between methods and Results sections of OBS is not always noted. Therefore, we encourage authors and peer reviewers to verify consistency of statistical tests in oncology studies.
Project description:We calculate the statistics of diffusion-limited arrival-time distribution by a Monte Carlo method to suggest a simple statistical resolution of the enduring puzzle of nanobiosensors: a persistent gap between reports of analyte detection at approximately femtomolar concentration and theory suggesting the impossibility of approximately subpicomolar detection at the corresponding incubation time. The incubation time used in the theory is actually the mean incubation time, while experimental conditions suggest that device stability limited the minimum incubation time. The difference in incubation times-both described by characteristic power laws-provides an intuitive explanation of different detection limits anticipated by theory and experiments.
Project description:Previous studies have suggested that patient care may be compromised as a consequence of poor communication between clinicians and laboratory professionals in cases in which molecular genetic test results are reported. To understand better the contributing factors to such compromised care, we investigated both pre- and postanalytical processes using cystic fibrosis mutation analysis as our model. We found that although the majority of test requisition forms requested patient/family information that was necessary for the proper interpretation of test results, in many cases, these data were not provided by the individuals filling out the forms. We found instances in which result reports for simulated diagnostic testing described individuals as carriers where only a single mutation was found with no comment pertaining to a diagnosis of cystic fibrosis. Similarly, reports based on simulated scenarios for carrier testing were problematic when no mutations were identified, and the patient's race/ethnicity and family history were not discussed in reference to residual risk of disease. Remarkably, a pilot survey of obstetrician-gynecologists revealed that office staff, including secretaries, often helped order genetic tests and reported test results to patients, raising questions about what efforts are undertaken to ensure personnel competency. These findings are reviewed in light of what efforts should be taken to improve the quality of test-ordering and result-reporting practices.
Project description:BackgroundDespite much discussion in the epidemiologic literature surrounding the use of null hypothesis significance testing (NHST) for inferences, the reporting practices of veterinary researchers have not been examined. We conducted a survey of articles published in Preventive Veterinary Medicine, a leading veterinary epidemiology journal, aimed at (a) estimating the frequency of reporting p values, confidence intervals and statistical significance between 1997 and 2017, (b) determining whether this varies by article section and (c) determining whether this varies over time.MethodsWe used systematic cluster sampling to select 985 original research articles from issues published in March, June, September and December of each year of the study period. Using the survey data analysis menu in Stata, we estimated overall and yearly proportions of article sections (abstracts, results-texts, results-tables and discussions) reporting p values, confidence intervals and statistical significance. Additionally, we estimated the proportion of p values less than 0.05 reported in each section, the proportion of article sections in which p values were reported as inequalities, and the proportion of article sections in which confidence intervals were interpreted as if they were significance tests. Finally, we used Generalised Estimating Equations to estimate prevalence odds ratios and 95% confidence intervals, comparing the occurrence of each of the above-mentioned reporting elements in one article section relative to another.ResultsOver the 20-year period, for every 100 published manuscripts, 31 abstracts (95% CI [28-35]), 65 results-texts (95% CI [61-68]), 23 sets of results-tables (95% CI [20-27]) and 59 discussion sections (95% CI [56-63]) reported statistical significance at least once. Only in the case of results-tables, were the numbers reporting p values (48; 95% CI [44-51]), and confidence intervals (44; 95% CI [41-48]) higher than those reporting statistical significance. We also found that a substantial proportion of p values were reported as inequalities and most were less than 0.05. The odds of a p value being less than 0.05 (OR = 4.5; 95% CI [2.3-9.0]) or being reported as an inequality (OR = 3.2; 95% CI [1.3-7.6]) was higher in the abstracts than in the results-texts. Additionally, when confidence intervals were interpreted, on most occasions they were used as surrogates for significance tests. Overall, no time trends in reporting were observed for any of the three reporting elements over the study period.ConclusionsDespite the availability of superior approaches to statistical inference and abundant criticism of its use in the epidemiologic literature, NHST is substantially the most common means of inference in articles published in Preventive Veterinary Medicine. This pattern has not changed substantially between 1997 and 2017.