Unknown

Dataset Information

0

Block-wise Exploration of Molecular Descriptors with Multi-block Orthogonal Component Analysis (MOCA).


ABSTRACT: Data tables for machine learning and structure-activity relationship modelling (QSAR) are often naturally organized in blocks of data, where multiple molecular representations or sets of descriptors form the blocks. Multi-block Orthogonal Component Analysis (MOCA), a new analytical tool, can be used to explore such data structures in a single model, identifying principal components that are unique to a single block or joint over multiple blocks. We applied MOCA to two sets of 550 and 300 molecules and up to 9213 molecular descriptors organized in 11 blocks. The MOCA models reveal relationships between the blocks and overarching trends across the whole dataset. Based on the MOCA joint components, we propose a quantitative metric for the redundancy of blocks, useful for a priori block-wise feature selection or evaluation of new molecular representations. The second data set includes 7 ecotoxicological study endpoints for crop protection chemicals, for which we (re-)discovered some general trends and linked them to molecular properties. Using a single MOCA model we estimated the predictive potential of each block and the model-ability of the target block.

SUBMITTER: Schmidt S 

PROVIDER: S-EPMC9285065 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC6090891 | biostudies-other
| S-EPMC2873089 | biostudies-literature
| S-EPMC7999099 | biostudies-literature
| PRJEB15590 | ENA
| S-EPMC4736107 | biostudies-literature
| S-EPMC7666528 | biostudies-literature
| S-EPMC4908129 | biostudies-literature