Project description:Diabetic nephropathy (DN), a multifaceted disease with various contributing factors, presents challenges in understanding its underlying causes. Uncovering biomarkers linked to this condition can shed light on its pathogenesis and support the creation of new diagnostic and treatment methods. Gene expression data were sourced from accessible public databases, and Weighted Gene Co-expression Network Analysis (WGCNA)was employed to pinpoint gene co-expression modules relevant to DN. Subsequently, various machine learning techniques, such as random forest, lasso regression algorithm (LASSO), and support vector machine-recursive feature elimination (SVM-REF), were utilized for distinguishing DN cases from controls using the identified gene modules. Additionally, functional enrichment analyses were conducted to explore the biological roles of these genes. Our analysis revealed 131 genes showing distinct expression patterns between controlled and uncontrolled groups. During the integrated WCGNA, we identified 61 co-expressed genes encompassing both categories. The enrichment analysis highlighted involvement in various immune responses and complex activities. Techniques like Random Forest, LASSO, and SVM-REF were applied to pinpoint key hub genes, leading to the identification of VWF and DNASE1L3. In the context of DN, they demonstrated significant consistency in both expression and function. Our research uncovered potential biomarkers for DN through the application of WGCNA and various machine learning methods. The results indicate that 2 central genes could serve as innovative diagnostic indicators and therapeutic targets for this disease. This discovery offers fresh perspectives on the development of DN and could contribute to the advancement of new diagnostic and treatment approaches.
Project description:Ransomware-related cyber-attacks have been on the rise over the last decade, disturbing organizations considerably. Developing new and better ways to detect this type of malware is necessary. This research applies dynamic analysis and machine learning to identify the ever-evolving ransomware signatures using selected dynamic features. Since most of the attributes are shared by diverse ransomware-affected samples, our study can be used for detecting current and even new variants of the threat. This research has the following objectives: (1) Execute experiments with encryptor and locker ransomware combined with goodware to generate JSON files with dynamic parameters using a sandbox. (2) Analyze and select the most relevant and non-redundant dynamic features for identifying encryptor and locker ransomware from goodware. (3) Generate and make public a dynamic features dataset that includes these selected parameters for samples of different artifacts. (4) Apply the dynamic feature dataset to obtain models with machine learning algorithms. Five platforms, 20 ransomware, and 20 goodware artifacts were evaluated. The final feature dataset is composed of 2000 registers of 50 characteristics each. This dataset allows for a machine learning detection with a 10-fold cross-evaluation with an average accuracy superior to 0.99 for gradient boosted regression trees, random forest, and neural networks.
Project description:In this study, we developed machine learning-based prediction models for early childhood caries and compared their performances with the traditional regression model. We analyzed the data of 4195 children aged 1-5 years from the Korea National Health and Nutrition Examination Survey data (2007-2018). Moreover, we developed prediction models using the XGBoost (version 1.3.1), random forest, and LightGBM (version 3.1.1) algorithms in addition to logistic regression. Two different methods were applied for variable selection, including a regression-based backward elimination and a random forest-based permutation importance classifier. We compared the area under the receiver operating characteristic (AUROC) values and misclassification rates of the different models and observed that all four prediction models had AUROC values ranging between 0.774 and 0.785. Furthermore, no significant difference was observed between the AUROC values of the four models. Based on the results, we can confirm that both traditional logistic regression and ML-based models can show favorable performance and can be used to predict early childhood caries, identify ECC high-risk groups, and implement active preventive treatments. However, further research is essential to improving the performance of the prediction model using recent methods, such as deep learning.
Project description:In this work, plasma samples of 5 metabolic syndrome patients and 5 healthy volunteers were collected. Then, high-throughput RNA sequencing was performed to detect the expression of plasma coding RNA.
Project description:The use of offensive terms in user-generated content on different social media platforms is one of the major concerns for these platforms. The offensive terms have a negative impact on individuals, which may lead towards the degradation of societal and civilized manners. The immense amount of content generated at a higher speed makes it humanly impossible to categorise and detect offensive terms. Besides, it is an open challenge for natural language processing (NLP) to detect such terminologies automatically. Substantial efforts are made for high-resource languages such as English. However, it becomes more challenging when dealing with resource-poor languages such as Urdu. Because of the lack of standard datasets and pre-processing tools for automatic offensive terms detection. This paper introduces a combinatorial pre-processing approach in developing a classification model for cross-platform (Twitter and YouTube) use. The approach uses datasets from two different platforms (Twitter and YouTube) the training and testing the model, which is trained to apply decision tree, random forest and naive Bayes algorithms. The proposed combinatorial pre-processing approach is applied to check how machine learning models behave with different combinations of standard pre-processing techniques for low-resource language in the cross-platform setting. The experimental results represent the effectiveness of the machine learning model over different subsets of traditional pre-processing approaches in building a classification model for automatic offensive terms detection for a low resource language, i.e., Urdu, in the cross-platform scenario. In the experiments, when dataset D1 is used for training and D2 is applied for testing, the pre-processing approach named Stopword removal produced better results with an accuracy of 83.27%. Whilst, in this case, when dataset D2 is used for training and D1 is applied for testing, stopword removal and punctuation removal were observed as a better preprocessing approach with an accuracy of 74.54%. The combinatorial approach proposed in this paper outperformed the benchmark for the considered datasets using classical as well as ensemble machine learning with an accuracy of 82.9% and 97.2% for dataset D1 and D2, respectively.
Project description:BackgroundThis study applied machine learning (ML) algorithms to construct a model for predicting EN initiation for patients in the intensive care unit (ICU) and identifying populations in need of EN at an early stage.MethodsThis study collected patient information from the Medical Information Mart for Intensive Care IV database. All patients enrolled were split randomly into a training set and a validation set. Six ML models were established to evaluate the initiation of EN, and the best model was determined according to the area under curve (AUC) and accuracy. The best model was interpreted using the Local Interpretable Model-Agnostic Explanations (LIME) algorithm and SHapley Additive exPlanation (SHAP) values.ResultsA total of 53,150 patients participated in the study. They were divided into a training set (42,520, 80%) and a validation set (10,630, 20%). In the validation set, XGBoost had the optimal prediction performance with an AUC of 0.895. The SHAP values revealed that sepsis, sequential organ failure assessment score, and acute kidney injury were the three most important factors affecting EN initiation. The individualized forecasts were displayed using the LIME algorithm.ConclusionThe XGBoost model was established and validated for early prediction of EN initiation in ICU patients.
Project description:BackgroundEpilepsy is the fourth-most common neurological disorder, affecting an estimated 50 million patients globally. Nearly 40% of patients have uncontrolled seizures yet incur 80% of the cost. Anti-epileptic drugs commonly result in resistance and reversion to uncontrolled drug-resistant epilepsy and are often associated with significant adverse effects. This has led to a trial-and-error system in which physicians spend months to years attempting to identify the optimal therapeutic approach.ObjectiveTo investigate the potential clinical utility from the context of optimal therapeutic prediction of characterizing cellular electrophysiology. It is well-established that genomic data alone can sometimes be predictive of effective therapeutic approach. Thus, to assess the predictive power of electrophysiological data, machine learning strategies are implemented to predict a subject's genetically defined class in an in silico model using brief electrophysiological recordings obtained from simulated neuronal networks.MethodsA dynamic network of isogenic neurons is modeled in silico for 1-s for 228 dynamically modeled patients falling into one of three categories: healthy, general sodium channel gain of function, or inhibitory sodium channel loss of function. Data from previous studies investigating the electrophysiological and cellular properties of neurons in vitro are used to define the parameters governing said models. Ninety-two electrophysiological features defining the nature and consistency of network connectivity, activity, waveform shape, and complexity are extracted for each patient network and t-tests are used for feature selection for the following machine learning algorithms: Neural Network, Support Vector Machine, Gaussian Naïve Bayes Classifier, Decision Tree, and Gradient Boosting Decision Tree. Finally, their performance in accurately predicting which genetic category the subjects fall under is assessed.ResultsSeveral machine learning algorithms excel in using electrophysiological data from isogenic neurons to accurately predict genetic class with a Gaussian Naïve Bayes Classifier predicting healthy, gain of function, and overall, with the best accuracy, area under the curve, and F1. The Gradient Boosting Decision Tree performs the best for loss of function models indicated by the same metrics.ConclusionsIt is possible for machine learning algorithms to use electrophysiological data to predict clinically valuable metrics such as optimal therapeutic approach, especially when combining several models.
Project description:The blood flow through the major vessels holds great diagnostic potential for the identification of cardiovascular complications and is therefore routinely assessed with current diagnostic modalities. Heart valves are subject to high hydrodynamic loads which render them prone to premature degradation. Failing native aortic valves are routinely replaced with bioprosthetic heart valves. This type of prosthesis is limited by a durability that is often less than the patient's life expectancy. Frequent assessment of valvular function can therefore help to ensure good long-term outcomes and to plan reinterventions. In this article, we describe how unsupervised novelty detection algorithms can be used to automate the interpretation of blood flow data to improve outcomes through early detection of adverse cardiovascular events without requiring repeated check-ups in a clinical environment. The proposed method was tested in an in-vitro flow loop which allowed simulating a failing aortic valve in a laboratory setting. Aortic regurgitation of increasing severity was deliberately introduced with tube-shaped inserts, preventing complete valve closure during diastole. Blood flow recordings from a flow meter at the location of the ascending aorta were analyzed with the algorithms introduced in this article and a diagnostic index was defined that reflects the severity of valvular degradation. The results indicate that the proposed methodology offers a high sensitivity towards pathological changes of valvular function and that it is capable of automatically identifying valvular degradation. Such methods may be a step towards computer-assisted diagnostics and telemedicine that provide the clinician with novel tools to improve patient care.
Project description:Autism Spectrum Disorder is a neuropsychiatric condition affecting 53 million children worldwide and for which early diagnosis is critical to the outcome of behavior therapies. Machine learning applied to features manually extracted from readily accessible videos (e.g., from smartphones) has the potential to scale this diagnostic process. However, nearly unavoidable variability in video quality can lead to missing features that degrade algorithm performance. To manage this uncertainty, we evaluated the impact of missing values and feature imputation methods on two previously published autism detection classifiers, trained on standard-of-care instrument scoresheets and tested on ratings of 140 children videos from YouTube. We compare the baseline method of listwise deletion to classic univariate and multivariate techniques. We also introduce a feature replacement method that, based on a score, selects a feature from an expanded dataset to fill-in the missing value. The replacement feature selected can be identical for all records (general) or automatically adjusted to the record considered (dynamic). Our results show that general and dynamic feature replacement methods achieve a higher performance than classic univariate and multivariate methods, supporting the hypothesis that algorithmic management can maintain the fidelity of video-based diagnostics in the face of missing values and variable video quality.
Project description:Rising global population and climate change realities dictate that agricultural productivity must be accelerated. Results from current traditional research approaches are difficult to extrapolate to all possible fields because they are dependent on specific soil types, weather conditions, and background management combinations that are not applicable nor translatable to all farms. A method that accurately evaluates the effectiveness of infinite cropping system interactions (involving multiple management practices) to increase maize and soybean yield across the US does not exist. Here, we utilize extensive databases and artificial intelligence algorithms and show that complex interactions, which cannot be evaluated in replicated trials, are associated with large crop yield variability and thus, potential for substantial yield increases. Our approach can accelerate agricultural research, identify sustainable practices, and help overcome future food demands.