Project description:Machine learning promises to revolutionize clinical decision making and diagnosis. In medical diagnosis a doctor aims to explain a patient's symptoms by determining the diseases causing them. However, existing machine learning approaches to diagnosis are purely associative, identifying diseases that are strongly correlated with a patients symptoms. We show that this inability to disentangle correlation from causation can result in sub-optimal or dangerous diagnoses. To overcome this, we reformulate diagnosis as a counterfactual inference task and derive counterfactual diagnostic algorithms. We compare our counterfactual algorithms to the standard associative algorithm and 44 doctors using a test set of clinical vignettes. While the associative algorithm achieves an accuracy placing in the top 48% of doctors in our cohort, our counterfactual algorithm places in the top 25% of doctors, achieving expert clinical accuracy. Our results show that causal reasoning is a vital missing ingredient for applying machine learning to medical diagnosis.
Project description:Advances in high-throughput sequencing technologies have reduced the cost of genotyping dramatically and led to genomic prediction being widely used in animal and plant breeding, and increasingly in human genetics. Inspired by the efficient computing of linear mixed model and the accurate prediction of Bayesian methods, we propose a machine learning-based method incorporating cross-validation, multiple regression, grid search, and bisection algorithms named KAML that aims to combine the advantages of prediction accuracy with computing efficiency. KAML exhibits higher prediction accuracy than existing methods, and it is available at https://github.com/YinLiLin/KAML.
Project description:BACKGROUND: With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. RESULTS: We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. CONCLUSIONS: Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.
Project description:IntroductionMachine learning (ML) is a set of models and methods that can detect patterns in vast amounts of data and use this information to perform various kinds of decision-making under uncertain conditions. This review explores the current role of this technology in plastic surgery by outlining the applications in clinical practice, diagnostic and prognostic accuracies, and proposed future direction for clinical applications and research.MethodsEMBASE, MEDLINE, CENTRAL and ClinicalTrials.gov were searched from 1990 to 2020. Any clinical studies (including case reports) which present the diagnostic and prognostic accuracies of machine learning models in the clinical setting of plastic surgery were included. Data collected were clinical indication, model utilised, reported accuracies, and comparison with clinical evaluation.ResultsThe database identified 1181 articles, of which 51 articles were included in this review. The clinical utility of these algorithms was to assist clinicians in diagnosis prediction (n=22), outcome prediction (n=21) and pre-operative planning (n=8). The mean accuracy is 88.80%, 86.11% and 80.28% respectively. The most commonly used models were neural networks (n=31), support vector machines (n=13), decision trees/random forests (n=10) and logistic regression (n=9).ConclusionsML has demonstrated high accuracies in diagnosis and prognostication of burn patients, congenital or acquired facial deformities, and in cosmetic surgery. There are no studies comparing ML to clinician's performance. Future research can be enhanced using larger datasets or utilising data augmentation, employing novel deep learning models, and applying these to other subspecialties of plastic surgery.
Project description:The accurate quantification of wall loss caused by corrosion is critical to the reliable life estimation of pipes and pressure vessels. Traditional thickness gauging by scanning a probe is slow and requires access to all points on the surface; this is impractical in many cases as corrosion often occurs where access is restricted, such as beneath supports where water collects. Guided wave tomography presents a solution to this; by transmitting guided waves through the region of interest and exploiting their dispersive nature, it is possible to build up a map of thickness. While the best results have been seen when using the fundamental modes A0 and S0 at low frequency, the complex scattering of the waves causes errors within the reconstruction. It is demonstrated that these lead to an underestimate in wall loss for A0 but an overestimate for S0. Further analysis showed that this error was related to density variation, which was proportional to thickness. It was demonstrated how this could be corrected for in the reconstructions, in many cases resulting in the near-elimination of the error across a range of defects, and greatly improving the accuracy of life estimates from guided wave tomography.
Project description:We apply causal machine learning algorithms to assess the causal effect of a marketing intervention, namely a coupon campaign, on the sales of a retailer. Besides assessing the average impacts of different types of coupons, we also investigate the heterogeneity of causal effects across different subgroups of customers, e.g., between clients with relatively high vs. low prior purchases. Finally, we use optimal policy learning to determine (in a data-driven way) which customer groups should be targeted by the coupon campaign in order to maximize the marketing intervention's effectiveness in terms of sales. We find that only two out of the five coupon categories examined, namely coupons applicable to the product categories of drugstore items and other food, have a statistically significant positive effect on retailer sales. The assessment of group average treatment effects reveals substantial differences in the impact of coupon provision across customer groups, particularly across customer groups as defined by prior purchases at the store, with drugstore coupons being particularly effective among customers with high prior purchases and other food coupons among customers with low prior purchases. Our study provides a use case for the application of causal machine learning in business analytics to evaluate the causal impact of specific firm policies (like marketing campaigns) for decision support.