Dataset Information

MoleculeNet: a benchmark for molecular machine learning.

ABSTRACT: Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

SUBMITTER: Wu Z

PROVIDER: S-EPMC5868307 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MoleculeNet: a benchmark for molecular machine learning.

Wu Zhenqin Z Ramsundar Bharath B Feinberg Evan N EN Gomes Joseph J Geniesse Caleb C Pappu Aneesh S AS Leswing Karl K Pande Vijay V

Chemical science 20171031 2

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This ...[more]

PMID: 29629118

Similar Datasets

Project description:ObjectiveTracking seizures is crucial for epilepsy monitoring and treatment evaluation. Current epilepsy care relies on caretaker seizure diaries, but clinical seizure monitoring may miss seizures. Wearable devices may be better tolerated and more suitable for long-term ambulatory monitoring. This study evaluates the seizure detection performance of custom-developed machine learning (ML) algorithms across a broad spectrum of epileptic seizures utilizing wrist- and ankle-worn multisignal biosensors.MethodsWe enrolled patients admitted to the epilepsy monitoring unit and asked them to wear a wearable sensor on either their wrists or ankles. The sensor recorded body temperature, electrodermal activity, accelerometry (ACC), and photoplethysmography, which provides blood volume pulse (BVP). We used electroencephalographic seizure onset and offset as determined by a board-certified epileptologist as a standard comparison. We trained and validated ML for two different algorithms: Algorithm 1, ML methods for developing seizure type-specific detection models for nine individual seizure types; and Algorithm 2, ML methods for building general seizure type-agnostic detection, lumping together all seizure types.ResultsWe included 94 patients (57.4% female, median age = 9.9 years) and 548 epileptic seizures (11 066 h of sensor data) for a total of 930 seizures and nine seizure types. Algorithm 1 detected eight of nine seizure types better than chance (area under the receiver operating characteristic curve [AUC-ROC] = .648-.976). Algorithm 2 detected all nine seizure types better than chance (AUC-ROC = .642-.995); a fusion of ACC and BVP modalities achieved the best AUC-ROC (.752) when combining all seizure types together.SignificanceAutomatic seizure detection using ML from multimodal wearable sensor data is feasible across a broad spectrum of epileptic seizures. Preliminary results show better than chance seizure detection. The next steps include validation of our results in larger datasets, evaluation of the detection utility tool for additional clinical seizure types, and integration of additional clinical information.

Dataset Information

MoleculeNet: a benchmark for molecular machine learning.

Publications

MoleculeNet: a benchmark for molecular machine learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets