Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:In recent years, China's e-commerce industry has developed at a high speed, and the scale of various industries has continued to expand. Service-oriented enterprises such as e-commerce transactions and information technology came into being. This paper analyzes the shortcomings and challenges of traditional online shopping behavior prediction methods, and proposes an online shopping behavior analysis and prediction system. The paper chooses linear model logistic regression and decision tree based XGBoost model. After optimizing the model, it is found that the nonlinear model can make better use of these features and get better prediction results. In this paper, we first combine the single model, and then use the model fusion algorithm to fuse the prediction results of the single model. The purpose is to avoid the accuracy of the linear model easy to fit and the decision tree model over-fitting. The results show that the model constructed by the article has further improvement than the single model. Finally, through two sets of contrast experiments, it is proved that the algorithm selected in this paper can effectively filter the features, which simplifies the complexity of the model to a certain extent and improves the classification accuracy of machine learning. The XGBoost hybrid model based on p/n samples is simpler than a single model. Machine learning models are not easily over-fitting and therefore more robust.
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status.
Project description:BackgroundThe American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS) used with ultrasonography cannot guide the individual management of solid breast tumors, but preoperative core biopsy categories (CBCs) can. We aimed to use machine learning to analyze clinical and ultrasonic features for predicting CBCs and to aid in the development of a new ultrasound (US) imaging reporting system for solid tumors of the breast.MethodsThis retrospective study included women with solid breast tumors who underwent US-guided core needle biopsy from March 1, 2019, to December 31, 2019. All patients were randomly assigned to a training or validation cohort (7:3 ratio). CBC was predicted using 5 machine learning models: random forest (RF), support vector machine (SVM), k-nearest-neighbor (KNN), multilayer perceptron (MLP), and ridge regression (RR). In the validation cohort, the area under the curve (AUC) and accuracy were ascertained for every algorithm. Based on AUC values, the optimal algorithm was determined, and the features' importance was depicted.ResultsA total of 1,082 female patients were included (age range, 12-96 years; mean age ± standard deviation, 42.22±13.37 years). The proportion of the 4 CBCs was 4% (44/1,185) for the B1 group, 60% (714/1,185) for the B2 group, 5% (57/1,185) for the B3 group, and 31% (370/1,185) for the B5 group. In the validation cohort, AUCs of the optimal algorithm constructed RF were 0.78, 0.88, 0.64, and 0.92 for B1, B2, B3, and B5, respectively, with an accuracy of 0.82.ConclusionsMachine learning could strongly predict CBC, particularly in B2 and B5 categories of solid breast tumors, with RF being the optimal machine learning model.
Project description:Customer satisfaction and their positive sentiments are some of the various goals for successful companies. However, analyzing customer reviews to predict accurate sentiments have been proven to be challenging and time-consuming due to high volumes of collected data from various sources. Several researchers approach this with algorithms, methods, and models. These include machine learning and deep learning (DL) methods, unigram and skip-gram based algorithms, as well as the Artificial Neural Network (ANN) and bag-of-word (BOW) regression model. Studies and research have revealed incoherence in polarity, model overfitting and performance issues, as well as high cost in data processing. This experiment was conducted to solve these revealing issues, by building a high performance yet cost-effective model for predicting accurate sentiments from large datasets containing customer reviews. This model uses the fastText library from Facebook's AI research (FAIR) Lab, as well as the traditional Linear Support Vector Machine (LSVM) to classify text and word embedding. Comparisons of this model were also done with the author's a custom multi-layer Sentiment Analysis (SA) Bi-directional Long Short-Term Memory (SA-BLSTM) model. The proposed fastText model, based on results, obtains a higher accuracy of 90.71% as well as 20% in performance compared to LSVM and SA-BLSTM models.
Project description:Objective: Early identification of individuals who are at risk for suicide is crucial in supporting suicide prevention. Machine learning is emerging as a promising approach to support this objective. Machine learning is broadly defined as a set of mathematical models and computational algorithms designed to automatically learn complex patterns between predictors and outcomes from example data, without being explicitly programmed to do so. The model's performance continuously improves over time by learning from newly available data. Method: This concept paper explores how machine learning approaches applied to healthcare data obtained from electronic health records, including billing and claims data, can advance our ability to accurately predict future suicidal behavior. Results: We provide a general overview of machine learning concepts, summarize exemplar studies, describe continued challenges, and propose innovative research directions. Conclusion: Machine learning has potential for improving estimation of suicide risk, yet important challenges and opportunities remain. Further research can focus on incorporating evolving methods for addressing data imbalances, understanding factors that affect generalizability across samples and healthcare systems, expanding the richness of the data, leveraging newer machine learning approaches, and developing automatic learning systems.
Project description:BackgroundThe logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction.MethodsThe experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure-activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN).ResultsThe three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R2) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products.ConclusionsThis work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.
Project description:BACKGROUND:Prognostication is an essential tool for risk adjustment and decision making in the intensive care unit (ICU). Research into prognostication in ICU has so far been limited to data from admission or the first 24 hours. Most ICU admissions last longer than this, decisions are made throughout an admission, and some admissions are explicitly intended as time-limited prognostic trials. Despite this, temporal changes in prognostic ability during ICU admission has received little attention to date. Current predictive models, in the form of prognostic clinical tools, are typically derived from linear models and do not explicitly handle incremental information from trends. Machine learning (ML) allows predictive models to be developed which use non-linear predictors and complex interactions between variables, thus allowing incorporation of trends in measured variables over time; this has made it possible to investigate prognosis throughout an admission. METHODS AND FINDINGS:This study uses ML to assess the predictability of ICU mortality as a function of time. Logistic regression against physiological data alone outperformed APACHE-II and demonstrated several important interactions including between lactate & noradrenaline dose, between lactate & MAP, and between age & MAP consistent with the current sepsis definitions. ML models consistently outperformed logistic regression with Deep Learning giving the best results. Predictive power was maximal on the second day and was further improved by incorporating trend data. Using a limited range of physiological and demographic variables, the best machine learning model on the first day showed an area under the receiver-operator characteristic curve (AUC) of 0.883 (σ = 0.008), compared to 0.846 (σ = 0.010) for a logistic regression from the same predictors and 0.836 (σ = 0.007) for a logistic regression based on the APACHE-II score. Adding information gathered on the second day of admission improved the maximum AUC to 0.895 (σ = 0.008). Beyond the second day, predictive ability declined. CONCLUSION:This has implications for decision making in intensive care and provides a justification for time-limited trials of ICU therapy; the assessment of prognosis over more than one day may be a valuable strategy as new information on the second day helps to differentiate outcomes. New ML models based on trend data beyond the first day could greatly improve upon current risk stratification tools.
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:Numerous postural sway metrics have been shown to be sensitive to balance impairment and fall risk in individuals with MS. Yet, there are no guidelines concerning the most appropriate postural sway metrics to monitor impairment. This investigation implemented a machine learning approach to assess the accuracy and feature importance of various postural sway metrics to differentiate individuals with MS from healthy controls as a function of physiological fall risk. 153 participants (50 controls and 103 individuals with MS) underwent a static posturography assessment and a physiological fall risk assessment. Participants were further classified into four subgroups based on fall risk: controls, low-risk MS (n?=?34), moderate-risk MS (n?=?27), high-risk MS (n?=?42). Twenty common sway metrics were derived following standard procedures and subsequently used to train a machine learning algorithm (random forest - RF, with 10-fold cross validation) to predict individuals' fall risk grouping. The sway-metric based RF classifier had high accuracy in discriminating controls from MS individuals (>86%). Sway sample entropy was identified as the strongest feature for classification of low-risk MS individuals from healthy controls. Whereas for all other comparisons, mediolateral sway amplitude was identified as the strongest predictor for fall risk groupings.