Dataset Information

Hard Data Analytics Problems Make for Better Data Analysis Algorithms: Bioinformatics as an Example.

ABSTRACT: Data mining and knowledge discovery techniques have greatly progressed in the last decade. They are now able to handle larger and larger datasets, process heterogeneous information, integrate complex metadata, and extract and visualize new knowledge. Often these advances were driven by new challenges arising from real-world domains, with biology and biotechnology a prime source of diverse and hard (e.g., high volume, high throughput, high variety, and high noise) data analytics problems. The aim of this article is to show the broad spectrum of data mining tasks and challenges present in biological data, and how these challenges have driven us over the years to design new data mining and knowledge discovery procedures for biodata. This is illustrated with the help of two kinds of case studies. The first kind is focused on the field of protein structure prediction, where we have contributed in several areas: by designing, through regression, functions that can distinguish between good and bad models of a protein's predicted structure; by creating new measures to characterize aspects of a protein's structure associated with individual positions in a protein's sequence, measures containing information that might be useful for protein structure prediction; and by creating accurate estimators of these structural aspects. The second kind of case study is focused on omics data analytics, a class of biological data characterized for having extremely high dimensionalities. Our methods were able not only to generate very accurate classification models, but also to discover new biological knowledge that was later ratified by experimentalists. Finally, we describe several strategies to tightly integrate knowledge extraction and data mining in order to create a new class of biodata mining algorithms that can natively embrace the complexity of biological data, efficiently generate accurate information in the form of classification/regression models, and extract valuable new knowledge. Thus, a complete data-to-information-to-knowledge pipeline is presented.

SUBMITTER: Bacardit J

PROVIDER: S-EPMC4174911 | biostudies-literature | 2014 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Hard Data Analytics Problems Make for Better Data Analysis Algorithms: Bioinformatics as an Example.

Bacardit Jaume J Widera Paweł P Lazzarini Nicola N Krasnogor Natalio N

Big data 20140901 3

Data mining and knowledge discovery techniques have greatly progressed in the last decade. They are now able to handle larger and larger datasets, process heterogeneous information, integrate complex metadata, and extract and visualize new knowledge. Often these advances were driven by new challenges arising from real-world domains, with biology and biotechnology a prime source of diverse and hard (e.g., high volume, high throughput, high variety, and high noise) data analytics problems. The aim ...[more]

PMID: 25276500

Dataset Information

Hard Data Analytics Problems Make for Better Data Analysis Algorithms: Bioinformatics as an Example.

Publications

Hard Data Analytics Problems Make for Better Data Analysis Algorithms: Bioinformatics as an Example.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Abstractions, algorithms and data structures for structural bioinformatics in PyCogent.
| S-EPMC3253748 | biostudies-literature

DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists.
| S-EPMC1933169 | biostudies-literature

Bitter and sweet make tomato hard to (b)eat.
| S-EPMC8126962 | biostudies-literature

The recurrent clubfoot: can gait analysis help us make better preoperative decisions?
| S-EPMC2664418 | biostudies-literature

Cord Blood Transplantation: Can We Make it Better?
| S-EPMC3774998 | biostudies-literature

Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives.
| S-EPMC4165507 | biostudies-literature

Can sign language make you better at hand processing?
| S-EPMC5874053 | biostudies-literature

Ten simple rules to make your publication look better.
| S-EPMC8136654 | biostudies-literature

Toward a better analysis of secreted proteins: the example of the myeloid cells secretome.
| S-EPMC2386146 | biostudies-literature