Dataset Information

The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data.

ABSTRACT: We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort ('housekeeping genes') typically used for normalisation in expression profiling studies. Those genes (transcripts) that we determined to be useable as reference genes differed greatly from previous suggestions based on hypothesis-driven approaches. A limitation of this initial study is that a single (albeit large) dataset was employed for both tissues and cell lines. We here extend this analysis to encompass seven other large datasets. Although their absolute values differ a little, the Gini values and median expression levels of the various genes are well correlated with each other between the various cell line datasets, implying that our original choice of the more ubiquitously expressed low-Gini-coefficient genes was indeed sound. In tissues, the Gini values and median expression levels of genes showed a greater variation, with the GC of genes changing with the number and types of tissues in the data sets. In all data sets, regardless of whether this was derived from tissues or cell lines, we also show that the GC is a robust measure of gene expression stability. Using the GC as a measure of expression stability we illustrate its utility to find tissue- and cell line-optimised housekeeping genes without any prior bias, that again include only a small number of previously reported housekeeping genes. We also independently confirmed this experimentally using RT-qPCR with 40 candidate GC genes in a panel of 10 cell lines. These were termed the Gini Genes. In many cases, the variation in the expression levels of classical reference genes is really quite huge (e.g. 44 fold for GAPDH in one data set), suggesting that the cure (of using them as normalising genes) may in some cases be worse than the disease (of not doing so). We recommend the present data-driven approach for the selection of reference genes by using the easy-to-calculate and robust GC.

SUBMITTER: Wright Muelas M

PROVIDER: S-EPMC6884504 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data.

Wright Muelas Marina M Mughal Farah F O'Hagan Steve S Day Philip J PJ Kell Douglas B DB

Scientific reports 20191129 1

We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort ('housekeeping genes') typically used for normalisation in expression profiling studies. Those genes (transcripts) that we determined to be useable as reference genes differed greatly from previous suggestions based on hypothesis-driven approaches. A limitation of this initial study is that a single (albeit lar ...[more]

PMID: 31784565

Similar Datasets

Project description:BackgroundUnderstanding inequality in infectious disease burden requires clear and unbiased indicators. The Gini coefficient, conventionally used as a macroeconomic descriptor of inequality, is potentially useful to quantify epidemiological heterogeneity. With a potential range from 0 (all populations equal) to 1 (populations having maximal differences), this coefficient is used here to show the extent and persistence of inequality of malaria infection burden at a wide variety of population levels.MethodsFirst, the Gini coefficient was applied to quantify variation among World Health Organization world regions for malaria and other major global health problems. Malaria heterogeneity was then measured among countries within the geographical sub-region where burden is greatest, among the major administrative divisions in several of these countries, and among selected local communities. Data were analysed from previous research studies, national surveys, and global reports, and Gini coefficients were calculated together with confidence intervals using bootstrap resampling methods.ResultsMalaria showed a very high level of inequality among the world regions (Gini coefficient, G = 0.77, 95% CI 0.66-0.81), more extreme than for any of the other major global health problems compared at this level. Within the most highly endemic geographical sub-region, there was substantial inequality in estimated malaria incidence among countries of West Africa, which did not decrease between 2010 (G = 0.28, 95% CI 0.19-0.36) and 2018 (G = 0.31, 0.22-0.39). There was a high level of sub-national variation in prevalence among states within Nigeria (G = 0.30, 95% CI 0.26-0.35), contrasting with more moderate variation within Ghana (G = 0.18, 95% CI 0.12-0.25) and Sierra Leone (G = 0.17, 95% CI 0.12-0.22). There was also significant inequality in prevalence among local village communities, generally more marked during dry seasons when there was lower mean prevalence. The Gini coefficient correlated strongly with the standard coefficient of variation, which has no finite range.ConclusionsThe Gini coefficient is a useful descriptor of epidemiological inequality at all population levels, with confidence intervals and interpretable bounds. Wider use of the coefficient would give broader understanding of malaria heterogeneity revealed by multiple types of studies, surveys and reports, providing more accessible insight from available data.

Dataset Information

The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data.

Publications

The role and robustness of the Gini coefficient as an unbiased tool for the selection of Gini genes for normalising expression profiling data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets