Dataset Information

A complementary graphical method for reducing and analyzing large data sets. Case studies demonstrating thresholds setting and selection.

ABSTRACT: OBJECTIVES:Graphical displays can make data more understandable; however, large graphs can challenge human comprehension. We have previously described a filtering method to provide high-level summary views of large data sets. In this paper we demonstrate our method for setting and selecting thresholds to limit graph size while retaining important information by applying it to large single and paired data sets, taken from patient and bibliographic databases. METHODS:Four case studies are used to illustrate our method. The data are either patient discharge diagnoses (coded using the International Classification of Diseases, Clinical Modifications [ICD9-CM]) or Medline citations (coded using the Medical Subject Headings [MeSH]). We use combinations of different thresholds to obtain filtered graphs for detailed analysis. The thresholds setting and selection, such as thresholds for node counts, class counts, ratio values, p values (for diff data sets), and percentiles of selected class count thresholds, are demonstrated with details in case studies. The main steps include: data preparation, data manipulation, computation, and threshold selection and visualization. We also describe the data models for different types of thresholds and the considerations for thresholds selection. RESULTS:The filtered graphs are 1%-3% of the size of the original graphs. For our case studies, the graphs provide 1) the most heavily used ICD9-CM codes, 2) the codes with most patients in a research hospital in 2011, 3) a profile of publications on "heavily represented topics" in MEDLINE in 2011, and 4) validated knowledge about adverse effects of the medication of rosiglitazone and new interesting areas in the ICD9-CM hierarchy associated with patients taking the medication of pioglitazone. CONCLUSIONS:Our filtering method reduces large graphs to a manageable size by removing relatively unimportant nodes. The graphical method provides summary views based on computation of usage frequency and semantic context of hierarchical terminology. The method is applicable to large data sets (such as a hundred thousand records or more) and can be used to generate new hypotheses from data sets coded with hierarchical terminologies.

SUBMITTER: Jing X

PROVIDER: S-EPMC4209908 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A complementary graphical method for reducing and analyzing large data sets. Case studies demonstrating thresholds setting and selection.

Jing X X Cimino J J JJ

Methods of information in medicine 20140414 3

<h4>Objectives</h4>Graphical displays can make data more understandable; however, large graphs can challenge human comprehension. We have previously described a filtering method to provide high-level summary views of large data sets. In this paper we demonstrate our method for setting and selecting thresholds to limit graph size while retaining important information by applying it to large single and paired data sets, taken from patient and bibliographic databases.<h4>Methods</h4>Four case studi ...[more]

PMID: 24727931

Dataset Information

A complementary graphical method for reducing and analyzing large data sets. Case studies demonstrating thresholds setting and selection.

Publications

A complementary graphical method for reducing and analyzing large data sets. Case studies demonstrating thresholds setting and selection.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets.
| S-EPMC5555681 | biostudies-literature

A graphical method for analyzing distance restraints using residual dipolar couplings for structure determination of symmetric protein homo-oligomers.
| S-EPMC3104227 | biostudies-literature

PRO-Angoff method for remote standard setting: establishing clinical thresholds for the upper digestive disease tool.
| S-EPMC10933216 | biostudies-literature

Selection and estimation for mixed graphical models.
| S-EPMC5018402 | biostudies-literature

Efficient Bayesian Regularization for Graphical Model Selection.
| S-EPMC7592715 | biostudies-literature

Hierarchical sets: analyzing pangenome structure through scalable set visualizations.
| S-EPMC5447240 | biostudies-literature

Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets.
| S-EPMC9333302 | biostudies-literature

MARIS: Method for Analyzing RNA following Intracellular Sorting
2014-01-17 | E-GEOD-54179 | biostudies-arrayexpress

Bayesian graphical models for regression on multiple data sets with different variables.
| S-EPMC2648903 | biostudies-literature

GenomeFlow: a comprehensive graphical tool for modeling and analyzing 3D genome structure.
| S-EPMC6477968 | biostudies-literature