Browse
Submit Data
Databases
API
Help

Dataset Information

45 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Random distributed logistic regression framework for predicting potential lncRNA‒disease association.

ABSTRACT:

SUBMITTER: Sun Y

PROVIDER: S-EPMC8373264 | biostudies-literature | 2021 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Publications

Random distributed logistic regression framework for predicting potential lncRNA‒disease association.

Sun Yichen Y Zhao Hongqian H Zhou Gang G Guan Tianhao T Wang Yujie Y Gao Jie J

Journal of molecular cell biology 20210801 5

PMID: 33493268

Similar Datasets

Global network random walk for predicting potential human lncRNA-disease associations.

Project description:There is more and more evidence that the mutation and dysregulation of long non-coding RNA (lncRNA) are associated with numerous diseases, including cancers. However, experimental methods to identify associations between lncRNAs and diseases are expensive and time-consuming. Effective computational approaches to identify disease-related lncRNAs are in high demand; and would benefit the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In light of some limitations of existing computational methods, we developed a global network random walk model for predicting lncRNA-disease associations (GrwLDA) to reveal the potential associations between lncRNAs and diseases. GrwLDA is a universal network-based method and does not require negative samples. This method can be applied to a disease with no known associated lncRNA (isolated disease) and to lncRNA with no known associated disease (novel lncRNA). The leave-one-out cross validation (LOOCV) method was implemented to evaluate the predicted performance of GrwLDA. As a result, GrwLDA obtained reliable AUCs of 0.9449, 0.8562, and 0.8374 for overall, novel lncRNA and isolated disease prediction, respectively, significantly outperforming previous methods. Case studies of colon, gastric, and kidney cancers were also implemented, and the top 5 disease-lncRNA associations were reported for each disease. Interestingly, 13 (out of the 15) associations were confirmed by literature mining.

| S-EPMC5622075 | biostudies-literature

Random effect based tests for multinomial logistic regression in genetic association studies.

Project description: Not available

| S-EPMC9209005 | biostudies-literature

Differentially private distributed logistic regression using private and public data.

Project description:BackgroundPrivacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced.MethodologyIn this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data.Experiments and resultsWe try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios.ConclusionLogistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.

| S-EPMC4101668 | biostudies-literature

Random forest vs. logistic regression: Predicting angiographic in-stent restenosis after second-generation drug-eluting stent implantation.

Project description:As the rate of percutaneous coronary intervention increases, in-stent restenosis (ISR) has become a burden. Random forest (RF) could be superior to logistic regression (LR) for predicting ISR due to its robustness. We developed an RF model and compared its performance with the LR one for predicting ISR. We retrospectively included 1501 patients (age: 64.0 ± 10.3; male: 76.7%; ISR events: 279) who underwent coronary angiography at 9 to 18 months after implantation of 2nd generation drug-eluting stents. The data were randomly split into a pair of train and test datasets for model development and validation with 50 repeats. The predictive performance was assessed by the area under the curve (AUC) of the receiver operating characteristic (ROC). The RF models predicted ISR with larger AUC-ROCs of 0.829 ± 0.025 compared to 0.784 ± 0.027 of the LR models. The difference was statistically significant in 29 of the 50 repeats. The RF and LR models had similar sensitivity using the same cutoff threshold, but the specificity was significantly higher in the RF models, reducing 25% of the false positives. By removing the high leverage outliers, the LR models had comparable AUC-ROC to the RF models. Compared to the LR, the RF was more robust and significantly improved the performance for predicting ISR. It could cost-effectively identify patients with high ISR risk and help the clinical decision of coronary stenting.

| S-EPMC9126385 | biostudies-literature

Predicting LncRNA-Disease Association by a Random Walk With Restart on Multiplex and Heterogeneous Networks.

Project description:Studies have found that long non-coding RNAs (lncRNAs) play important roles in many human biological processes, and it is critical to explore potential lncRNA-disease associations, especially cancer-associated lncRNAs. However, traditional biological experiments are costly and time-consuming, so it is of great significance to develop effective computational models. We developed a random walk algorithm with restart on multiplex and heterogeneous networks of lncRNAs and diseases to predict lncRNA-disease associations (MHRWRLDA). First, multiple disease similarity networks are constructed by using different approaches to calculate similarity scores between diseases, and multiple lncRNA similarity networks are also constructed by using different approaches to calculate similarity scores between lncRNAs. Then, a multiplex and heterogeneous network was constructed by integrating multiple disease similarity networks and multiple lncRNA similarity networks with the lncRNA-disease associations, and a random walk with restart on the multiplex and heterogeneous network was performed to predict lncRNA-disease associations. The results of Leave-One-Out cross-validation (LOOCV) showed that the value of Area under the curve (AUC) was 0.68736, which was improved compared with the classical algorithm in recent years. Finally, we confirmed a few novel predicted lncRNAs associated with specific diseases like colon cancer by literature mining. In summary, MHRWRLDA contributes to predict lncRNA-disease associations.

| S-EPMC8417042 | biostudies-literature

EXpectation Propagation LOgistic REgRession (EXPLORER): distributed privacy-preserving online model learning.

Project description:We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection, etc.) as the traditional frequentist logistic regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications.

| S-EPMC3676314 | biostudies-literature

Random forest versus logistic regression: a large-scale benchmark experiment.

Project description:Background and goalThe Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach competing with logistic regression in many innovation-friendly scientific fields.ResultsIn this context, we present a large scale benchmarking experiment based on 243 real datasets comparing the prediction performance of the original version of RF with default parameters and LR as binary classification tools. Most importantly, the design of our benchmark experiment is inspired from clinical trial methodology, thus avoiding common pitfalls and major sources of biases.ConclusionRF performed better than LR according to the considered accuracy measured in approximately 69% of the datasets. The mean difference between RF and LR was 0.029 (95%-CI =[0.022,0.038]) for the accuracy, 0.041 (95%-CI =[0.031,0.053]) for the Area Under the Curve, and - 0.027 (95%-CI =[-0.034,-0.021]) for the Brier score, all measures thus suggesting a significantly better performance of RF. As a side-result of our benchmarking experiment, we observed that the results were noticeably dependent on the inclusion criteria used to select the example datasets, thus emphasizing the importance of clear statements regarding this dataset selection process. We also stress that neutral studies similar to ours, based on a high number of datasets and carefully designed, will be necessary in the future to evaluate further variants, implementations or parameters of random forests which may yield improved accuracy compared to the original version with default values.

| S-EPMC6050737 | biostudies-literature

Predicting lncRNA-protein interactions through deep learning framework employing multiple features and random forest algorithm.

Project description:RNA-protein interaction (RPI) is crucial to the life processes of diverse organisms. Various researchers have identified RPI through long-term and high-cost biological experiments. Although numerous machine learning and deep learning-based methods for predicting RPI currently exist, their robustness and generalizability have significant room for improvement. This study proposes LPI-MFF, an RPI prediction model based on multi-source information fusion, to address these issues. The LPI-MFF employed protein-protein interactions features, sequence features, secondary structure features, and physical and chemical properties as the information sources with the corresponding coding scheme, followed by the random forest algorithm for feature screening. Finally, all information was combined and a classification method based on convolutional neural networks is used. The experimental results of fivefold cross-validation demonstrated that the accuracy of LPI-MFF on RPI1807 and NPInter was 97.60% and 97.67%, respectively. In addition, the accuracy rate on the independent test set RPI1168 was 84.9%, and the accuracy rate on the Mus musculus dataset was 90.91%. Accordingly, LPI-MFF demonstrated greater robustness and generalization than other prevalent RPI prediction methods.

| S-EPMC10929084 | biostudies-literature

Evaluating logistic regression and geographically weighted logistic regression models for predicting orange-fleshed sweet potato adoption intention in Benin.

Project description:The low adoption rate of biofortified crops, like orange-fleshed sweet potatoes (OFSP), by farmers remains a major food security concern. Accurate forecasting models for OFSP adoption intention are essential for breeding and introduction projects. This study aims to (i) identify key predictors of OFSP adoption intention among farmers in Benin, integrating various factors, and (ii) investigate regional variations in these predictors through different modeling approaches. We used a diverse set of predictors, including social, geographical, and psychological constructs, to model adoption intention in different sweet potato production areas in Benin. Both logistic regression (LR) and geographically weighted logistic regression (GWLR) models were developed and assessed. The GWLR model significantly outperformed the LR model, achieving a validated result of 94.2%, compared to 87% for the LR model. The GWLR model accurately identified areas with medium and high adoption propensities, mainly in northern Benin, aligning closely with observed data. Driving factors showed robust spatial heterogeneities, influencing OFSP adoption intentions differently across regions, with correlations ranging from positive to negative. The GWLR model excels in elucidating the spatial nuances of diverse factors, offering a promising avenue for more reliable predictions for OFSP adoption.

| S-EPMC11909162 | biostudies-literature

A random forest based computational model for predicting novel lncRNA-disease associations.

Project description:BackgroundAccumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Accurately identifying disease-associated lncRNAs is helpful to study the mechanism of lncRNAs in diseases and explore new therapies of diseases. Many lncRNA-disease association (LDA) prediction models have been implemented by integrating multiple kinds of data resources. However, most of the existing models ignore the interference of noisy and redundancy information among these data resources.ResultsTo improve the ability of LDA prediction models, we implemented a random forest and feature selection based LDA prediction model (RFLDA in short). First, the RFLDA integrates the experiment-supported miRNA-disease associations (MDAs) and LDAs, the disease semantic similarity (DSS), the lncRNA functional similarity (LFS) and the lncRNA-miRNA interactions (LMI) as input features. Then, the RFLDA chooses the most useful features to train prediction model by feature selection based on the random forest variable importance score that takes into account not only the effect of individual feature on prediction results but also the joint effects of multiple features on prediction results. Finally, a random forest regression model is trained to score potential lncRNA-disease associations. In terms of the area under the receiver operating characteristic curve (AUC) of 0.976 and the area under the precision-recall curve (AUPR) of 0.779 under 5-fold cross-validation, the performance of the RFLDA is better than several state-of-the-art LDA prediction models. Moreover, case studies on three cancers demonstrate that 43 of the 45 lncRNAs predicted by the RFLDA are validated by experimental data, and the other two predicted lncRNAs are supported by other LDA prediction models.ConclusionsCross-validation and case studies indicate that the RFLDA has excellent ability to identify potential disease-associated lncRNAs.

| S-EPMC7099795 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data