Dataset Information

Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data.

ABSTRACT: Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward's algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups.

SUBMITTER: Konopka BM

PROVIDER: S-EPMC6107146 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data.

Konopka Bogumil M BM Lwow Felicja F Owczarz Magdalena M Łaczmański Łukasz Ł

PloS one 20180823 8

Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of ...[more]

PMID: 30138442

Similar Datasets

Project description:BackgroundAnalysis of dynamic metabolomics data holds the promise to improve our understanding of underlying mechanisms in metabolism. For example, it may detect changes in metabolism due to the onset of a disease. Dynamic or time-resolved metabolomics data can be arranged as a three-way array with entries organized according to a subjects mode, a metabolites mode and a time mode. While such time-evolving multiway data sets are increasingly collected, revealing the underlying mechanisms and their dynamics from such data remains challenging. For such data, one of the complexities is the presence of a superposition of several sources of variation: induced variation (due to experimental conditions or inborn errors), individual variation, and measurement error. Multiway data analysis (also known as tensor factorizations) has been successfully used in data mining to find the underlying patterns in multiway data. To explore the performance of multiway data analysis methods in terms of revealing the underlying mechanisms in dynamic metabolomics data, simulated data with known ground truth can be studied.ResultsWe focus on simulated data arising from different dynamic models of increasing complexity, i.e., a simple linear system, a yeast glycolysis model, and a human cholesterol model. We generate data with induced variation as well as individual variation. Systematic experiments are performed to demonstrate the advantages and limitations of multiway data analysis in analyzing such dynamic metabolomics data and their capacity to disentangle the different sources of variations. We choose to use simulations since we want to understand the capability of multiway data analysis methods which is facilitated by knowing the ground truth.ConclusionOur numerical experiments demonstrate that despite the increasing complexity of the studied dynamic metabolic models, tensor factorization methods CANDECOMP/PARAFAC(CP) and Parallel Profiles with Linear Dependences (Paralind) can disentangle the sources of variations and thereby reveal the underlying mechanisms and their dynamics.

Project description:BACKGROUND:A number of research funders, biomedical journals, pharmaceutical companies, and regulatory agencies have adopted policies advocating or mandating that clinical trialists share data with external investigators. We therefore sought to determine whether certain characteristics of trialists or their trials are associated with more unfavorable perceptions of data sharing. To date, no prior research has addressed this issue. METHODS:We conducted an exploratory analysis of responses to a cross-sectional, web-based survey. The survey sample consisted of trialists who were corresponding authors of clinical trials published in 2010 or 2011 in one of six general medical journals with the highest impact factors in 2011. The following key characteristics were examined: trialists' academic productivity and geographic location, trial funding source and size, and the journal in which it was published. Main outcome measures included: support for data sharing in principle, concerns with data sharing through repositories, and reasons for granting or denying requests. Chi-squared tests and Fisher's exact tests were used to assess statistical significance. RESULTS:Of 683 potential respondents, 317 completed the survey (response rate 46%). Both support for data sharing and reporting of specific concerns with sharing data through repositories exceeded 75%, but neither differed by trialist or trial characteristics. However, there were some significant differences in explicit reasons to share or withhold data. Respondents located in Western Europe more frequently indicated they have or would share data in order to receive academic benefits or recognition when compared with respondents located in the United States or Canada (58 versus 31%). In addition, respondents who were the most academically productive less frequently indicated they have or would withhold data in order to protect research subjects when compared with less academically productive respondents (24 versus 40%), as did respondents who received industry funding when compared with those who had not (24 versus 43%). CONCLUSIONS:Respondents indicated strong support for data sharing overall. There were few notable differences in how trialists viewed the benefits and risks of data sharing when categorized by trialists' academic productivity and geographic location, trial funding source and size, and the journal in which it was published.

Dataset Information

Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data.

Publications

Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets