Browse
Submit Data
Databases
API
Help

Dataset Information

21 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Corn Yield Prediction With Ensemble CNN-DNN.

ABSTRACT: We investigate the predictive performance of two novel CNN-DNN machine learning ensemble models in predicting county-level corn yields across the US Corn Belt (12 states). The developed data set is a combination of management, environment, and historical corn yields from 1980 to 2019. Two scenarios for ensemble creation are considered: homogenous and heterogenous ensembles. In homogenous ensembles, the base CNN-DNN models are all the same, but they are generated with a bagging procedure to ensure they exhibit a certain level of diversity. Heterogenous ensembles are created from different base CNN-DNN models which share the same architecture but have different hyperparameters. Three types of ensemble creation methods were used to create several ensembles for either of the scenarios: Basic Ensemble Method (BEM), Generalized Ensemble Method (GEM), and stacked generalized ensembles. Results indicated that both designed ensemble types (heterogenous and homogenous) outperform the ensembles created from five individual ML models (linear regression, LASSO, random forest, XGBoost, and LightGBM). Furthermore, by introducing improvements over the heterogenous ensembles, the homogenous ensembles provide the most accurate yield predictions across US Corn Belt states. This model could make 2019 yield predictions with a root mean square error of 866 kg/ha, equivalent to 8.5% relative root mean square and could successfully explain about 77% of the spatio-temporal variation in the corn grain yields. The significant predictive power of this model can be leveraged for designing a reliable tool for corn yield prediction which will in turn assist agronomic decision makers.

SUBMITTER: Shahhosseini M

PROVIDER: S-EPMC8364956 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

A CNN-RNN Framework for Crop Yield Prediction.

Project description:Crop yield prediction is extremely challenging due to its dependence on multiple factors such as crop genotype, environmental factors, management practices, and their interactions. This paper presents a deep learning framework using convolutional neural networks (CNNs) and recurrent neural networks (RNNs) for crop yield prediction based on environmental data and management practices. The proposed CNN-RNN model, along with other popular methods such as random forest (RF), deep fully connected neural networks (DFNN), and LASSO, was used to forecast corn and soybean yield across the entire Corn Belt (including 13 states) in the United States for years 2016, 2017, and 2018 using historical data. The new model achieved a root-mean-square-error (RMSE) 9% and 8% of their respective average yields, substantially outperforming all other methods that were tested. The CNN-RNN has three salient features that make it a potentially useful method for other crop yield prediction studies. (1) The CNN-RNN model was designed to capture the time dependencies of environmental factors and the genetic improvement of seeds over time without having their genotype information. (2) The model demonstrated the capability to generalize the yield prediction to untested environments without significant drop in the prediction accuracy. (3) Coupled with the backpropagation method, the model could reveal the extent to which weather conditions, accuracy of weather predictions, soil conditions, and management practices were able to explain the variation in the crop yields.

| S-EPMC6993602 | biostudies-literature

County-Level Soybean Yield Prediction Using Deep CNN-LSTM Model.

Project description:Yield prediction is of great significance for yield mapping, crop market planning, crop insurance, and harvest management. Remote sensing is becoming increasingly important in crop yield prediction. Based on remote sensing data, great progress has been made in this field by using machine learning, especially the Deep Learning (DL) method, including Convolutional Neural Network (CNN) or Long Short-Term Memory (LSTM). Recent experiments in this area suggested that CNN can explore more spatial features and LSTM has the ability to reveal phenological characteristics, which both play an important role in crop yield prediction. However, very few experiments combining these two models for crop yield prediction have been reported. In this paper, we propose a deep CNN-LSTM model for both end-of-season and in-season soybean yield prediction in CONUS at the county-level. The model was trained by crop growth variables and environment variables, which include weather data, MODIS Land Surface Temperature (LST) data, and MODIS Surface Reflectance (SR) data; historical soybean yield data were employed as labels. Based on the Google Earth Engine (GEE), all these training data were combined and transformed into histogram-based tensors for deep learning. The results of the experiment indicate that the prediction performance of the proposed CNN-LSTM model can outperform the pure CNN or LSTM model in both end-of-season and in-season. The proposed method shows great potential in improving the accuracy of yield prediction for other crops like corn, wheat, and potatoes at fine scales in the future.

| S-EPMC6832950 | biostudies-literature

Investigation of REFINED CNN ensemble learning for anti-cancer drug sensitivity prediction.

Project description:MotivationAnti-cancer drug sensitivity prediction using deep learning models for individual cell line is a significant challenge in personalized medicine. Recently developed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) CNN (Convolutional Neural Network)-based models have shown promising results in improving drug sensitivity prediction. The primary idea behind REFINED-CNN is representing high dimensional vectors as compact images with spatial correlations that can benefit from CNN architectures. However, the mapping from a high dimensional vector to a compact 2D image depends on the a priori choice of the distance metric and projection scheme with limited empirical procedures guiding these choices.ResultsIn this article, we consider an ensemble of REFINED-CNN built under different choices of distance metrics and/or projection schemes that can improve upon a single projection based REFINED-CNN model. Results, illustrated using NCI60 and NCI-ALMANAC databases, demonstrate that the ensemble approaches can provide significant improvement in prediction performance as compared to individual models. We also develop the theoretical framework for combining different distance metrics to arrive at a single 2D mapping. Results demonstrated that distance-averaged REFINED-CNN produced comparable performance as obtained from stacking REFINED-CNN ensemble but with significantly lower computational cost.Availability and implementationThe source code, scripts, and data used in the paper have been deposited in GitHub (https://github.com/omidbazgirTTU/IntegratedREFINED).Supplementary informationSupplementary data are available at Bioinformatics online.

| S-EPMC8275339 | biostudies-literature

GCSENet: A GCN, CNN and SENet ensemble model for microRNA-disease association prediction.

Project description:Recently, an increasing number of studies have demonstrated that miRNAs are involved in human diseases, indicating that miRNAs might be a potential pathogenic factor for various diseases. Therefore, figuring out the relationship between miRNAs and diseases plays a critical role in not only the development of new drugs, but also the formulation of individualized diagnosis and treatment. As the prediction of miRNA-disease association via biological experiments is expensive and time-consuming, computational methods have a positive effect on revealing the association. In this study, a novel prediction model integrating GCN, CNN and Squeeze-and-Excitation Networks (GCSENet) was constructed for the identification of miRNA-disease association. The model first captured features by GCN based on a heterogeneous graph including diseases, genes and miRNAs. Then, considering the different effects of genes on each type of miRNA and disease, as well as the different effects of the miRNA-gene and disease-gene relationships on miRNA-disease association, a feature weight was set and a combination of miRNA-gene and disease-gene associations was added as feature input for the convolution operation in CNN. Furthermore, the squeeze and excitation blocks of SENet were applied to determine the importance of each feature channel and enhance useful features by means of the attention mechanism, thus achieving a satisfactory prediction of miRNA-disease association. The proposed method was compared against other state-of-the-art methods. It achieved an AUROC score of 95.02% and an AUPR score of 95.55% in a 10-fold cross-validation, which led to the finding that the proposed method is superior to these popular methods on most of the performance evaluation indexes.

| S-EPMC8205154 | biostudies-literature

SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction.

Project description:Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series to obtain biological time sequence (BTS). Hybrid ensemble learning framework (SaPt-CNN-LSTM-AR-EA) for BTS is proposed. Single-sequence and multi-sequence models are respectively constructed with self-adaption pre-training one-dimensional convolutional recurrent neural network and autoregressive fractional integrated moving average fused evolutionary algorithm. In DNA sequence experiments with six viruses, SaPt-CNN-LSTM-AR-EA realized the good overall prediction performance and the prediction accuracy and correlation respectively reached 1.7073 and 0.9186. SaPt-CNN-LSTM-AR-EA was compared with other five benchmark models so as to verify its effectiveness and stability. SaPt-CNN-LSTM-AR-EA increased the average accuracy by about 30%. The framework proposed in this article is significant in biology, biomedicine, and computer science, and can be widely applied in sequence splicing, computational biology, bioinformation, and other fields.

| S-EPMC10559882 | biostudies-literature

Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt.

Project description:This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt. The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, investigate which combinations of hybrid models provide the most accurate predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction. Five ML models (linear regression, LASSO, LightGBM, random forest, and XGBoost) and six ensemble models have been designed to address the research question. The results suggest that adding simulation crop model variables (APSIM) as input features to ML models can decrease yield prediction root mean squared error (RMSE) from 7 to 20%. Furthermore, we investigated partial inclusion of APSIM features in the ML prediction models and we found soil moisture related APSIM variables are most influential on the ML predictions followed by crop-related and phenology-related variables. Finally, based on feature importance measure, it has been observed that simulated APSIM average drought stress and average water table depth during the growing season are the most important APSIM inputs to ML. This result indicates that weather information alone is not sufficient and ML models need more hydrological inputs to make improved yield predictions.

| S-EPMC7810832 | biostudies-literature

Integrating plant morphological traits with remote-sensed multispectral imageries for accurate corn grain yield prediction.

Project description:Sustainable crop production requires adequate and efficient management practices to reduce the negative environmental impacts of excessive nitrogen (N) fertilization. Remote sensing has gained traction as a low-cost and time-efficient tool for monitoring and managing cropping systems. In this study, vegetation indices (VIs) obtained from an unmanned aerial vehicle (UAV) were used to detect corn (Zea mays L.) response to varying N rates (ranging from 0 to 208 kg N ha-1) and fertilizer application methods (liquid urea ammonium nitrate (UAN), urea side-dressing and slow-release fertilizer). Four VIs were evaluated at three different growth stages of corn (V6, R3, and physiological maturity) along with morphological traits including plant height and leaf chlorophyll content (SPAD) to determine their predictive capability for corn yield. Our results show no differences in grain yield (average 13.2 Mg ha-1) between furrow-applied slow-release fertilizer at ≥156 kg N ha-1 and 208 kg N ha-1 side-dressed urea. Early season remote-sensed VIs and morphological data collected at V6 were least effective for grain yield prediction. Moreover, multivariate grain yield prediction was more accurate than univariate. Late-season measurements at the R3 and mature growth stages using a combination of normalized difference vegetation index (NDVI) and green normalized difference vegetation index (GNDVI) in a multilinear regression model showed effective prediction for corn yield. Additionally, a combination of NDVI and normalized difference red edge index (NDRE) in a multi-exponential regression model also demonstrated good prediction capabilities.

| S-EPMC10986971 | biostudies-literature

An ensemble approach for imbalanced multiclass malware classification using 1D-CNN.

Project description:Dependence on the internet and computer programs demonstrates the significance of computer programs in our day-to-day lives. Such demands motivate malware developers to create more malware, both in terms of quantity and variety. Researchers are constantly faced with hurdles while attempting to protect themselves from potential hazards and risks due to malware authors' usage of code obfuscation techniques. Metamorphic and polymorphic variations are easily able to elude the widely utilized signature-based detection procedures. Researchers are more interested in deep learning approaches than machine learning techniques to analyze the behavior of such a vast number of virus variants. Researchers have been drawn to the categorization of malware within itself in addition to the classification of malware against benign programs to examine the behavioral differences between them. In order to investigate the relationship between the application programming interface (API) calls throughout API sequences and classify them, this work uses the one-dimensional convolutional neural network (1D-CNN) model to solve a multiclass classification problem. On API sequences, feature vectors for distinctive APIs are created using the Word2Vec word embedding approach and the skip-gram model. The one-vs.-rest approach is used to train 1D-CNN models to categorize malware, and all of them are then combined with a suggested ModifiedSoftVoting algorithm to improve classification. On the open benchmark dataset Mal-API-2019, the suggested ensembled 1D-CNN architecture captures improved evaluation scores with an accuracy of 0.90, a weighted average F1-score of 0.90, and an AUC score of more than 0.96 for all classes of malware.

| S-EPMC10702937 | biostudies-literature

Gene prediction of aging-related diseases based on DNN and Mashup.

Project description:BackgroundAt present, the bioinformatics research on the relationship between aging-related diseases and genes is mainly through the establishment of a machine learning multi-label model to classify each gene. Most of the existing methods for predicting pathogenic genes mainly rely on specific types of gene features, or directly encode multiple features with different dimensions, use the same encoder to concatenate and predict the final results, which will be subject to many limitations in the applicability of the algorithm. Possible shortcomings of the above include: incomplete coverage of gene features by a single type of biomics data, overfitting of small dimensional datasets by a single encoder, or underfitting of larger dimensional datasets.MethodsWe use the known gene disease association data and gene descriptors, such as gene ontology terms (GO), protein interaction data (PPI), PathDIP, Kyoto Encyclopedia of genes and genomes Genes (KEGG), etc, as input for deep learning to predict the association between genes and diseases. Our innovation is to use Mashup algorithm to reduce the dimensionality of PPI, GO and other large biological networks, and add new pathway data in KEGG database, and then combine a variety of biological information sources through modular Deep Neural Network (DNN) to predict the genes related to aging diseases.Result and conclusionThe results show that our algorithm is more effective than the standard neural network algorithm (the Area Under the ROC curve from 0.8795 to 0.9153), gradient enhanced tree classifier and logistic regression classifier. In this paper, we firstly use DNN to learn the similar genes associated with the known diseases from the complex multi-dimensional feature space, and then provide the evidence that the assumed genes are associated with a certain disease.

| S-EPMC8680025 | biostudies-literature

EP-DNN: A Deep Neural Network-Based Global Enhancer Prediction Algorithm.

Project description:We present EP-DNN, a protocol for predicting enhancers based on chromatin features, in different cell types. Specifically, we use a deep neural network (DNN)-based architecture to extract enhancer signatures in a representative human embryonic stem cell type (H1) and a differentiated lung cell type (IMR90). We train EP-DNN using p300 binding sites, as enhancers, and TSS and random non-DHS sites, as non-enhancers. We perform same-cell and cross-cell predictions to quantify the validation rate and compare against two state-of-the-art methods, DEEP-ENCODE and RFECS. We find that EP-DNN has superior accuracy with a validation rate of 91.6%, relative to 85.3% for DEEP-ENCODE and 85.5% for RFECS, for a given number of enhancer predictions and also scales better for a larger number of enhancer predictions. Moreover, our H1 → IMR90 predictions turn out to be more accurate than IMR90 → IMR90, potentially because H1 exhibits a richer signature set and our EP-DNN model is expressive enough to extract these subtleties. Our work shows how to leverage the full expressivity of deep learning models, using multiple hidden layers, while avoiding overfitting on the training data. We also lay the foundation for exploration of cross-cell enhancer predictions, potentially reducing the need for expensive experimentation.

| S-EPMC5144062 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data