Browse
Submit Data
Databases
API
Help

Dataset Information

10 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

ILearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.

ABSTRACT: Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.

SUBMITTER: Chen Z

PROVIDER: S-EPMC8191785 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Prediction of Breast Cancer Estrogen Receptor Status using Machine Learning

Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.

2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress

Comprehensive hepatotoxicity prediction: ensemble model integrating machine learning and deep learning.

Project description:Chemicals may lead to acute liver injuries, posing a serious threat to human health. Achieving the precise safety profile of a compound is challenging due to the complex and expensive testing procedures. In silico approaches will aid in identifying the potential risk of drug candidates in the initial stage of drug development and thus mitigating the developmental cost. In current studies, QSAR models were developed for hepatotoxicity predictions using the ensemble strategy to integrate machine learning (ML) and deep learning (DL) algorithms using various molecular features. A large dataset of 2588 chemicals and drugs was randomly divided into training (80%) and test (20%) sets, followed by the training of individual base models using diverse machine learning or deep learning based on three different kinds of descriptors and fingerprints. Feature selection approaches were employed to proceed with model optimizations based on the model performance. Hybrid ensemble approaches were further utilized to determine the method with the best performance. The voting ensemble classifier emerged as the optimal model, achieving an excellent prediction accuracy of 80.26%, AUC of 82.84%, and recall of over 93% followed by bagging and stacking ensemble classifiers method. The model was further verified by an external test set, internal 10-fold cross-validation, and rigorous benchmark training, exhibiting much better reliability than the published models. The proposed ensemble model offers a dependable assessment with a good performance for the prediction regarding the risk of chemicals and drugs to induce liver damage.

| S-EPMC11373136 | biostudies-literature

Machine learning-based e-commerce platform repurchase customer prediction model.

Project description:In recent years, China's e-commerce industry has developed at a high speed, and the scale of various industries has continued to expand. Service-oriented enterprises such as e-commerce transactions and information technology came into being. This paper analyzes the shortcomings and challenges of traditional online shopping behavior prediction methods, and proposes an online shopping behavior analysis and prediction system. The paper chooses linear model logistic regression and decision tree based XGBoost model. After optimizing the model, it is found that the nonlinear model can make better use of these features and get better prediction results. In this paper, we first combine the single model, and then use the model fusion algorithm to fuse the prediction results of the single model. The purpose is to avoid the accuracy of the linear model easy to fit and the decision tree model over-fitting. The results show that the model constructed by the article has further improvement than the single model. Finally, through two sets of contrast experiments, it is proved that the algorithm selected in this paper can effectively filter the features, which simplifies the complexity of the model to a certain extent and improves the classification accuracy of machine learning. The XGBoost hybrid model based on p/n samples is simpler than a single model. Machine learning models are not easily over-fitting and therefore more robust.

| S-EPMC7714352 | biostudies-literature

PASSer2.0: Accurate Prediction of Protein Allosteric Sites Through Automated Machine Learning.

Project description:Allostery is a fundamental process in regulating protein activities. The discovery, design, and development of allosteric drugs demand better identification of allosteric sites. Several computational methods have been developed previously to predict allosteric sites using static pocket features and protein dynamics. Here, we define a baseline model for allosteric site prediction and present a computational model using automated machine learning. Our model, PASSer2.0, advanced the previous results and performed well across multiple indicators with 82.7% of allosteric pockets appearing among the top three positions. The trained machine learning model has been integrated with the Protein Allosteric Sites Server (PASSer) to facilitate allosteric drug discovery.

| S-EPMC9309527 | biostudies-literature

Nucleic acid visualization with UCSF Chimera.

Project description:With the increase in the number of large, 3D, high-resolution nucleic acid structures, particularly of the 30S and 50S ribosomal subunits and the intact bacterial ribosome, advancements in the visualization of nucleic acid structural features are essential. Large molecular structures are complicated and detailed, and one goal of visualization software is to allow the user to simplify the display of some features and accent others. We describe an extension to the UCSF Chimera molecular visualization system for the purpose of displaying and highlighting nucleic acid characteristics, including a new representation of sugar pucker, several options for abstraction of base geometries that emphasize stacking and base pairing, and an adaptation of the ribbon backbone to accommodate the nucleic acid backbone. Molecules are displayed and manipulated interactively, allowing the user to change the representations as desired for small molecules, proteins and nucleic acids. This software is available as part of the UCSF Chimera molecular visualization system and thus is integrated with a suite of existing tools for molecular graphics.

| S-EPMC1368656 | biostudies-literature

Microfluidic platform for isolating nucleic acid targets using sequence specific hybridization.

Project description:The separation of target nucleic acid sequences from biological samples has emerged as a significant process in today's diagnostics and detection strategies. In addition to the possible clinical applications, the fundamental understanding of target and sequence specific hybridization on surface modified magnetic beads is of high value. In this paper, we describe a novel microfluidic platform that utilizes a mobile magnetic field in static microfluidic channels, where single stranded DNA (ssDNA) molecules are isolated via nucleic acid hybridization. We first established efficient isolation of biotinylated capture probe (BP) using streptavidin-coated magnetic beads. Subsequently, we investigated the hybridization of target ssDNA with BP bound to beads and explained these hybridization kinetics using a dual-species kinetic model. The number of hybridized target ssDNA molecules was determined to be about 6.5 times less than that of BP on the bead surface, due to steric hindrance effects. The hybridization of target ssDNA with non-complementary BP bound to bead was also examined, and non-specific hybridization was found to be insignificant. Finally, we demonstrated highly efficient capture and isolation of target ssDNA in the presence of non-target ssDNA, where as low as 1% target ssDNA can be detected from mixture. The microfluidic method described in this paper is significantly relevant and is broadly applicable, especially towards point-of-care biological diagnostic platforms that require binding and separation of known target biomolecules, such as RNA, ssDNA, or protein.

| S-EPMC3745474 | biostudies-other

Automated machine learning for endemic active tuberculosis prediction from multiplex serological data.

Project description:Serological diagnosis of active tuberculosis (TB) is enhanced by detection of multiple antibodies due to variable immune responses among patients. Clinical interpretation of these complex datasets requires development of suitable algorithms, a time consuming and tedious undertaking addressed by the automated machine learning platform MILO (Machine Intelligence Learning Optimizer). MILO seamlessly integrates data processing, feature selection, model training, and model validation to simultaneously generate and evaluate thousands of models. These models were then further tested for generalizability on out-of-sample secondary and tertiary datasets. Out of 31 antigens evaluated, a 23-antigen model was the most robust on both the secondary dataset (TB vs healthy) and the tertiary dataset (TB vs COPD) with sensitivity of 90.5% and respective specificities of 100.0% and 74.6%. MILO represents a user-friendly, end-to-end solution for automated generation and deployment of optimized models, ideal for applications where rapid clinical implementation is critical such as emerging infectious diseases.

| S-EPMC8429671 | biostudies-literature

Automated prediction of mastitis infection patterns in dairy herds using machine learning.

Project description:Mastitis in dairy cattle is extremely costly both in economic and welfare terms and is one of the most significant drivers of antimicrobial usage in dairy cattle. A critical step in the prevention of mastitis is the diagnosis of the predominant route of transmission of pathogens into either contagious (CONT) or environmental (ENV), with environmental being further subdivided as transmission during either the nonlactating "dry" period (EDP) or lactating period (EL). Using data from 1000 farms, random forest algorithms were able to replicate the complex herd level diagnoses made by specialist veterinary clinicians with a high degree of accuracy. An accuracy of 98%, positive predictive value (PPV) of 86% and negative predictive value (NPV) of 99% was achieved for the diagnosis of CONT vs ENV (with CONT as a "positive" diagnosis), and an accuracy of 78%, PPV of 76% and NPV of 81% for the diagnosis of EDP vs EL (with EDP as a "positive" diagnosis). An accurate, automated mastitis diagnosis tool has great potential to aid non-specialist veterinary clinicians to make a rapid herd level diagnosis and promptly implement appropriate control measures for an extremely damaging disease in terms of animal health, productivity, welfare and antimicrobial use.

| S-EPMC7062853 | biostudies-literature

Prediction of Breast Cancer Estrogen Receptor Status using Machine Learning

2013-01-01 | GSE29210 | GEO

Quantifying Biopolymer Sequence Recognition using Biophysically Informed Machine Learning

Project description:Quantifying sequence-specific protein-ligand interactions is critical for understanding and exploiting numerous cellular processes, including gene expression regulation and signal transduction. Given their importance, next-generation sequencing (NGS) based assays that characterize such recognition with high-throughput are increasingly being used to profile a range of protein classes and interactions. However, these methods do not measure the biophysical parameters that have long been used to uncover the quantitative rules underlying sequence recognition. We developed a highly flexible machine learning framework, called ProBound, to quantify sequence recognition in terms of biophysical parameters based on NGS data. ProBound quantifies transcription factor (TF) behavior with models that accurately predict binding affinity over a range exceeding that of previous resources, captures the impact of DNA modifications and conformational flexibility of multi-TF complexes, and infers specificity directly from \textit{in vivo} data such as ChIP-seq without peak calling. When coupled with a new assay called Kd-seq, it quantifies the absolute affinity of protein-ligand interactions. Its applicability extends beyond thermodynamic equilibrium binding, to the kinetics of kinase-substrate interactions. Altogether, ProBound provides a versatile algorithmic framework for understanding sequence recognition in a wide variety of biological contexts.

2021-06-02 | GSE175942 | GEO

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data