Project description:In this study, we developed machine learning-based prediction models for early childhood caries and compared their performances with the traditional regression model. We analyzed the data of 4195 children aged 1-5 years from the Korea National Health and Nutrition Examination Survey data (2007-2018). Moreover, we developed prediction models using the XGBoost (version 1.3.1), random forest, and LightGBM (version 3.1.1) algorithms in addition to logistic regression. Two different methods were applied for variable selection, including a regression-based backward elimination and a random forest-based permutation importance classifier. We compared the area under the receiver operating characteristic (AUROC) values and misclassification rates of the different models and observed that all four prediction models had AUROC values ranging between 0.774 and 0.785. Furthermore, no significant difference was observed between the AUROC values of the four models. Based on the results, we can confirm that both traditional logistic regression and ML-based models can show favorable performance and can be used to predict early childhood caries, identify ECC high-risk groups, and implement active preventive treatments. However, further research is essential to improving the performance of the prediction model using recent methods, such as deep learning.
Project description:In this work, plasma samples of 5 metabolic syndrome patients and 5 healthy volunteers were collected. Then, high-throughput RNA sequencing was performed to detect the expression of plasma coding RNA.
Project description:The blood flow through the major vessels holds great diagnostic potential for the identification of cardiovascular complications and is therefore routinely assessed with current diagnostic modalities. Heart valves are subject to high hydrodynamic loads which render them prone to premature degradation. Failing native aortic valves are routinely replaced with bioprosthetic heart valves. This type of prosthesis is limited by a durability that is often less than the patient's life expectancy. Frequent assessment of valvular function can therefore help to ensure good long-term outcomes and to plan reinterventions. In this article, we describe how unsupervised novelty detection algorithms can be used to automate the interpretation of blood flow data to improve outcomes through early detection of adverse cardiovascular events without requiring repeated check-ups in a clinical environment. The proposed method was tested in an in-vitro flow loop which allowed simulating a failing aortic valve in a laboratory setting. Aortic regurgitation of increasing severity was deliberately introduced with tube-shaped inserts, preventing complete valve closure during diastole. Blood flow recordings from a flow meter at the location of the ascending aorta were analyzed with the algorithms introduced in this article and a diagnostic index was defined that reflects the severity of valvular degradation. The results indicate that the proposed methodology offers a high sensitivity towards pathological changes of valvular function and that it is capable of automatically identifying valvular degradation. Such methods may be a step towards computer-assisted diagnostics and telemedicine that provide the clinician with novel tools to improve patient care.
Project description:Autism Spectrum Disorder is a neuropsychiatric condition affecting 53 million children worldwide and for which early diagnosis is critical to the outcome of behavior therapies. Machine learning applied to features manually extracted from readily accessible videos (e.g., from smartphones) has the potential to scale this diagnostic process. However, nearly unavoidable variability in video quality can lead to missing features that degrade algorithm performance. To manage this uncertainty, we evaluated the impact of missing values and feature imputation methods on two previously published autism detection classifiers, trained on standard-of-care instrument scoresheets and tested on ratings of 140 children videos from YouTube. We compare the baseline method of listwise deletion to classic univariate and multivariate techniques. We also introduce a feature replacement method that, based on a score, selects a feature from an expanded dataset to fill-in the missing value. The replacement feature selected can be identical for all records (general) or automatically adjusted to the record considered (dynamic). Our results show that general and dynamic feature replacement methods achieve a higher performance than classic univariate and multivariate methods, supporting the hypothesis that algorithmic management can maintain the fidelity of video-based diagnostics in the face of missing values and variable video quality.
Project description:Rising global population and climate change realities dictate that agricultural productivity must be accelerated. Results from current traditional research approaches are difficult to extrapolate to all possible fields because they are dependent on specific soil types, weather conditions, and background management combinations that are not applicable nor translatable to all farms. A method that accurately evaluates the effectiveness of infinite cropping system interactions (involving multiple management practices) to increase maize and soybean yield across the US does not exist. Here, we utilize extensive databases and artificial intelligence algorithms and show that complex interactions, which cannot be evaluated in replicated trials, are associated with large crop yield variability and thus, potential for substantial yield increases. Our approach can accelerate agricultural research, identify sustainable practices, and help overcome future food demands.
Project description:Tea trees are kept in shaded locations to increase their chlorophyll content, which influences green tea quality. Therefore, monitoring change in chlorophyll content under low light conditions is important for managing tea trees and producing high-quality green tea. Hyperspectral remote sensing is one of the most frequently used methods for estimating chlorophyll content. Numerous studies based on data collected under relatively low-stress conditions and many hyperspectral indices and radiative transfer models show that shade-grown tea performs poorly. The performance of four machine learning algorithms-random forest, support vector machine, deep belief nets, and kernel-based extreme learning machine (KELM)-in evaluating data collected from tea leaves cultivated under different shade treatments was tested. KELM performed best with a root-mean-square error of 8.94 ± 3.05 ?g cm-2 and performance to deviation values from 1.70 to 8.04 for the test data. These results suggest that a combination of hyperspectral reflectance and KELM has the potential to trace changes in the chlorophyll content of shaded tea leaves.
Project description:BackgroundGene circuits are important in many aspects of biology, and perform a wide variety of different functions. For example, some circuits oscillate (e.g. the cell cycle), some are bistable (e.g. as cells differentiate), some respond sharply to environmental signals (e.g. ultrasensitivity), and some pattern multicellular tissues (e.g. Turing's model). Often, one starts from a given circuit, and using simulations, asks what functions it can perform. Here we want to do the opposite: starting from a prescribed function, can we find a circuit that executes this function? Whilst simple in principle, this task is challenging from a computational perspective, since gene circuit models are complex systems with many parameters. In this work, we adapted machine-learning algorithms to significantly accelerate gene circuit discovery.ResultsWe use gradient-descent optimization algorithms from machine learning to rapidly screen and design gene circuits. With this approach, we found that we could rapidly design circuits capable of executing a range of different functions, including those that: (1) recapitulate important in vivo phenomena, such as oscillators, and (2) perform complex tasks for synthetic biology, such as counting noisy biological events.ConclusionsOur computational pipeline will facilitate the systematic study of natural circuits in a range of contexts, and allow the automatic design of circuits for synthetic biology. Our method can be readily applied to biological networks of any type and size, and is provided as an open-source and easy-to-use python module, GeneNet.
Project description:Healthcare researchers have been working on mortality prediction for COVID-19 patients with differing levels of severity. A rapid and reliable clinical evaluation of disease intensity will assist in the allocation and prioritization of mortality mitigation resources. The novelty of the work proposed in this paper is an early prediction model of high mortality risk for both COVID-19 and non-COVID-19 patients, which provides state-of-the-art performance, in an external validation cohort from a different population. Retrospective research was performed on two separate hospital datasets from two different countries for model development and validation. In the first dataset, COVID-19 and non-COVID-19 patients were admitted to the emergency department in Boston (24 March 2020 to 30 April 2020), and in the second dataset, 375 COVID-19 patients were admitted to Tongji Hospital in China (10 January 2020 to 18 February 2020). The key parameters to predict the risk of mortality for COVID-19 and non-COVID-19 patients were identified and a nomogram-based scoring technique was developed using the top-ranked five parameters. Age, Lymphocyte count, D-dimer, CRP, and Creatinine (ALDCC), information acquired at hospital admission, were identified by the logistic regression model as the primary predictors of hospital death. For the development cohort, and internal and external validation cohorts, the area under the curves (AUCs) were 0.987, 0.999, and 0.992, respectively. All the patients are categorized into three groups using ALDCC score and death probability: Low (probability < 5%), Moderate (5% < probability < 50%), and High (probability > 50%) risk groups. The prognostic model, nomogram, and ALDCC score will be able to assist in the early identification of both COVID-19 and non-COVID-19 patients with high mortality risk, helping physicians to improve patient management.
Project description:BackgroundPsychiatry is nearly entirely reliant on patient self-reporting, and there are few objective and reliable tests or sources of collateral information available to help diagnostic and assessment procedures. Technology offers opportunities to collect objective digital data to complement patient experience and facilitate more informed treatment decisions.ObjectiveWe aimed to develop computational algorithms based on internet search activity designed to support diagnostic procedures and relapse identification in individuals with schizophrenia spectrum disorders.MethodsWe extracted 32,733 time-stamped search queries across 42 participants with schizophrenia spectrum disorders and 74 healthy volunteers between the ages of 15 and 35 (mean 24.4 years, 44.0% male), and built machine-learning diagnostic and relapse classifiers utilizing the timing, frequency, and content of online search activity.ResultsClassifiers predicted a diagnosis of schizophrenia spectrum disorders with an area under the curve value of 0.74 and predicted a psychotic relapse in individuals with schizophrenia spectrum disorders with an area under the curve of 0.71. Compared with healthy participants, those with schizophrenia spectrum disorders made fewer searches and their searches consisted of fewer words. Prior to a relapse hospitalization, participants with schizophrenia spectrum disorders were more likely to use words related to hearing, perception, and anger, and were less likely to use words related to health.ConclusionsOnline search activity holds promise for gathering objective and easily accessed indicators of psychiatric symptoms. Utilizing search activity as collateral behavioral health information would represent a major advancement in efforts to capitalize on objective digital data to improve mental health monitoring.
Project description:Malaria detection through microscopic examination of stained blood smears is a diagnostic challenge that heavily relies on the expertise of trained microscopists. This paper presents an automated analysis method for detection and staging of red blood cells infected by the malaria parasite Plasmodium falciparum at trophozoite or schizont stage. Unlike previous efforts in this area, this study uses quantitative phase images of unstained cells. Erythrocytes are automatically segmented using thresholds of optical phase and refocused to enable quantitative comparison of phase images. Refocused images are analyzed to extract 23 morphological descriptors based on the phase information. While all individual descriptors are highly statistically different between infected and uninfected cells, each descriptor does not enable separation of populations at a level satisfactory for clinical utility. To improve the diagnostic capacity, we applied various machine learning techniques, including linear discriminant classification (LDC), logistic regression (LR), and k-nearest neighbor classification (NNC), to formulate algorithms that combine all of the calculated physical parameters to distinguish cells more effectively. Results show that LDC provides the highest accuracy of up to 99.7% in detecting schizont stage infected cells compared to uninfected RBCs. NNC showed slightly better accuracy (99.5%) than either LDC (99.0%) or LR (99.1%) for discriminating late trophozoites from uninfected RBCs. However, for early trophozoites, LDC produced the best accuracy of 98%. Discrimination of infection stage was less accurate, producing high specificity (99.8%) but only 45.0%-66.8% sensitivity with early trophozoites most often mistaken for late trophozoite or schizont stage and late trophozoite and schizont stage most often confused for each other. Overall, this methodology points to a significant clinical potential of using quantitative phase imaging to detect and stage malaria infection without staining or expert analysis.