Project description:MIXED MODELS: Models allowing for continuous heterogeneity by assuming that value of one or more parameters follow a specified distribution have become increasingly popular. This is known as 'mixing' parameters, and it is standard practice by researchers--and the default option in many statistical programs--to base test statistics for mixed models on simulations using asymmetric draws (e.g. Halton draws). PROBLEM 1: INCONSISTENT LR TESTS DUE TO ASYMMETRIC DRAWS: This paper shows that when the estimated likelihood functions depend on standard deviations of mixed parameters this practice is very likely to cause misleading test results for the number of draws usually used today. The paper illustrates that increasing the number of draws is a very inefficient solution strategy requiring very large numbers of draws to ensure against misleading test statistics. The main conclusion of this paper is that the problem can be solved completely by using fully antithetic draws, and that using one dimensionally antithetic draws is not enough to solve the problem. PROBLEM 2: MAINTAINING THE CORRECT DIMENSIONS WHEN REDUCING THE MIXING DISTRIBUTION: A second point of the paper is that even when fully antithetic draws are used, models reducing the dimension of the mixing distribution must replicate the relevant dimensions of the quasi-random draws in the simulation of the restricted likelihood. Again this is not standard in research or statistical programs. The paper therefore recommends using fully antithetic draws replicating the relevant dimensions of the quasi-random draws in the simulation of the restricted likelihood and that this should become the default option in statistical programs. JEL classification: C15; C25.
Project description:In the 1970s, Professor Robbins and his coauthors extended the Vile and Wald inequality in order to derive the fundamental theoretical results regarding likelihood ratio based sequential tests with power one. The law of the iterated logarithm confirms an optimal property of the power one tests. In parallel with Robbins's decision-making procedures, we propose and examine sequential empirical likelihood ratio (ELR) tests with power one. In this setting, we develop the nonparametric one- and two-sided ELR tests. It turns out that the proposed sequential ELR tests significantly outperform the classical nonparametric t-statistic-based counterparts in many scenarios based on different underlying data distributions.
Project description:Regression analyses are commonly performed with doubly limited continuous dependent variables; for instance, when modeling the behavior of rates, proportions and income concentration indices. Several models are available in the literature for use with such variables, one of them being the unit gamma regression model. In all such models, parameter estimation is typically performed using the maximum likelihood method and testing inferences on the model's parameters are usually based on the likelihood ratio test. Such a test can, however, deliver quite imprecise inferences when the sample size is small. In this paper, we propose two modified likelihood ratio test statistics for use with the unit gamma regressions that deliver much more accurate inferences when the number of data points in small. Numerical (i.e. simulation) evidence is presented for both fixed dispersion and varying dispersion models, and also for tests that involve nonnested models. We also present and discuss two empirical applications.
Project description:One of the most important issues in the critical assessment of spatio-temporal stochastic models for epidemics is the selection of the transmission kernel used to represent the relationship between infectious challenge and spatial separation of infected and susceptible hosts. As the design of control strategies is often based on an assessment of the distance over which transmission can realistically occur and estimation of this distance is very sensitive to the choice of kernel function, it is important that models used to inform control strategies can be scrutinised in the light of observation in order to elicit possible evidence against the selected kernel function. While a range of approaches to model criticism is in existence, the field remains one in which the need for further research is recognised. In this paper, building on earlier contributions by the authors, we introduce a new approach to assessing the validity of spatial kernels-the latent likelihood ratio tests-which use likelihood-based discrepancy variables that can be used to compare the fit of competing models, and compare the capacity of this approach to detect model mis-specification with that of tests based on the use of infection-link residuals. We demonstrate that the new approach can be used to formulate tests with greater power than infection-link residuals to detect kernel mis-specification particularly when the degree of mis-specification is modest. This new tests avoid the use of a fully Bayesian approach which may introduce undesirable complications related to computational complexity and prior sensitivity.
Project description:The likelihood ratio test (LRT) is widely used for comparing the relative fit of nested latent variable models. Following Wilks' theorem, the LRT is conducted by comparing the LRT statistic with its asymptotic distribution under the restricted model, a [Formula: see text] distribution with degrees of freedom equal to the difference in the number of free parameters between the two nested models under comparison. For models with latent variables such as factor analysis, structural equation models and random effects models, however, it is often found that the [Formula: see text] approximation does not hold. In this note, we show how the regularity conditions of Wilks' theorem may be violated using three examples of models with latent variables. In addition, a more general theory for LRT is given that provides the correct asymptotic theory for these LRTs. This general theory was first established in Chernoff (J R Stat Soc Ser B (Methodol) 45:404-413, 1954) and discussed in both van der Vaart (Asymptotic statistics, Cambridge, Cambridge University Press, 2000) and Drton (Ann Stat 37:979-1012, 2009), but it does not seem to have received enough attention. We illustrate this general theory with the three examples.
Project description:Genotype imputation has become standard practice in modern genetic studies. As sequencing-based reference panels continue to grow, increasingly more markers are being well or better imputed but at the same time, even more markers with relatively low minor allele frequency are being imputed with low imputation quality. Here, we propose new methods that incorporate imputation uncertainty for downstream association analysis, with improved power and/or computational efficiency. We consider two scenarios: I) when posterior probabilities of all potential genotypes are estimated; and II) when only the one-dimensional summary statistic, imputed dosage, is available. For scenario I, we have developed an expectation-maximization likelihood-ratio test for association based on posterior probabilities. When only imputed dosages are available (scenario II), we first sample the genotype probabilities from its posterior distribution given the dosages, and then apply the EM-LRT on the sampled probabilities. Our simulations show that type I error of the proposed EM-LRT methods under both scenarios are protected. Compared with existing methods, EM-LRT-Prob (for scenario I) offers optimal statistical power across a wide spectrum of MAF and imputation quality. EM-LRT-Dose (for scenario II) achieves a similar level of statistical power as EM-LRT-Prob and, outperforms the standard Dosage method, especially for markers with relatively low MAF or imputation quality. Applications to two real data sets, the Cebu Longitudinal Health and Nutrition Survey study and the Women's Health Initiative Study, provide further support to the validity and efficiency of our proposed methods.
Project description:Laboratory tests are performed to make effective clinical decisions. However, inappropriate laboratory test ordering hampers patient care and increases financial burden for healthcare. An automated laboratory test recommendation system can provide rapid and appropriate test selection, potentially improving the workflow to help physicians spend more time treating patients. The main objective of this study was to develop a deep learning-based automated system to recommend appropriate laboratory tests. A retrospective data collection was performed at the National Health Insurance database between 1 January 2013, and 31 December 2013. We included all prescriptions that had at least one laboratory test. A total of 1,463,837 prescriptions from 530,050 unique patients was included in our study. Of these patients, 296,541 were women (55.95%), the range of age was between 1 and 107 years. The deep learning (DL) model achieved a higher area under the receiver operating characteristics curve (AUROC micro = 0.98, and AUROC macro = 0.94). The findings of this study show that the DL model can accurately and efficiently identify laboratory tests. This model can be integrated into existing workflows to reduce under- and over-utilization problems.
Project description:Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%-50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.
Project description:Heterozygous ARID1B variants result in Coffin-Siris syndrome. Features may include hypoplastic nails, slow growth, characteristic facial features, hypotonia, hypertrichosis, and sparse scalp hair. Most reported cases are due to ARID1B loss of function variants. We report a boy with developmental delay, feeding difficulties, aspiration, recurrent respiratory infections, slow growth, and hypotonia without a clinical diagnosis, where a previously unreported ARID1B missense variant was classified as a variant of uncertain significance. The pathogenicity of this variant was refined through combined methodologies including genome-wide methylation signature analysis (EpiSign), Machine Learning (ML) facial phenotyping, and LIRICAL. Trio exome sequencing and EpiSign were performed. ML facial phenotyping compared facial images using FaceMatch and GestaltMatcher to syndrome-specific libraries to prioritize the trio exome bioinformatic pipeline gene list output. Phenotype-driven variant prioritization was performed with LIRICAL. A de novo heterozygous missense variant, ARID1B p.(Tyr1268His), was reported as a variant of uncertain significance. The ACMG classification was refined to likely pathogenic by a supportive methylation signature, ML facial phenotyping, and prioritization through LIRICAL. The ARID1B genotype-phenotype has been expanded through an extended analysis of missense variation through genome-wide methylation signatures, ML facial phenotyping, and likelihood-ratio gene prioritization.
Project description:BACKGROUND: An outbreak of severe acute respiratory syndrome (SARS) began in Canada in February 2003. The initial diagnosis of SARS was based on clinical and epidemiological criteria. During the outbreak, molecular and serologic tests for the SARS-associated coronavirus (SARS-CoV) became available. However, without a "gold standard," it was impossible to determine the usefulness of these tests. We describe how these tests were used during the first phase of the SARS outbreak in Toronto and offer some recommendations that may be useful if SARS returns. METHODS: We examined the results of all diagnostic laboratory tests used in 117 patients admitted to hospitals in Toronto who met the Health Canada criteria for suspect or probable SARS. Focusing on tests for SARS-CoV, we attempted to determine the optimal specimen types and timing of specimen collection. RESULTS: Diagnostic test results for SARS-CoV were available for 110 of the 117 patients. SARS-CoV was detected by means of reverse-transcriptase polymerase chain reaction (RT-PCR) in at least one specimen in 59 (54.1%) of 109 patients. Serologic test results of convalescent samples were positive in 50 (96.2%) of 52 patients for whom paired serum samples were collected during the acute and convalescent phases of the illness. Of the 110 patients, 78 (70.9%) had specimens that tested positive by means of RT-PCR, serologic testing or both methods. The proportion of RT-PCR test results that were positive was similar between patients who met the criteria for suspect SARS (50.8%, 95% confidence interval [CI] 38.4%-63.2%) and those who met the criteria for probable SARS (58.0%, 95% CI 44.2%-70.7%). SARS-CoV was detected in nasopharyngeal swabs in 33 (32.4%) of 102 patients, in stool specimens in 19 (63.3%) of 30 patients, and in specimens from the lower respiratory tract in 10 (58.8%) of 17 patients. INTERPRETATION: These findings suggest that the rapid diagnostic tests in use at the time of the initial outbreak lack sufficient sensitivity to be used clinically to rule out SARS. As tests for SARS-CoV continue to be optimized, evaluation of the clinical presentation and elucidation of a contact history must remain the cornerstone of SARS diagnosis. In patients with SARS, specimens taken from the lower respiratory tract and stool samples test positive by means of RT-PCR more often than do samples taken from other areas.