Dataset Information

Kernel density weighted loess normalization improves the performance of detection within asymmetrical data.

ABSTRACT: BACKGROUND: Normalization of gene expression data has been studied for many years and various strategies have been formulated to deal with various types of data. Most normalization algorithms rely on the assumption that the number of up-regulated genes and the number of down-regulated genes are roughly the same. However, the well-known Golden Spike experiment presents a unique situation in which differentially regulated genes are biased toward one direction, thereby challenging the conclusions of previous bench mark studies. RESULTS: This study proposes two novel approaches, KDL and KDQ, based on kernel density estimation to improve upon the basic idea of invariant set selection. The key concept is to provide various importance scores to data points on the MA plot according to their proximity to the cluster of the null genes under the assumption that null genes are more densely distributed than those that are differentially regulated. The comparison is demonstrated in the Golden Spike experiment as well as with simulation data using the ROC curves and compression rates. KDL and KDQ in combination with GCRMA provided the best performance among all approaches. CONCLUSIONS: This study determined that methods based on invariant sets are better able to resolve the problem of asymmetry. Normalization, either before or after expression summary for probesets, improves performance to a similar degree.

SUBMITTER: Hsieh WP

PROVIDER: S-EPMC3118355 | biostudies-literature | 2011

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Kernel density weighted loess normalization improves the performance of detection within asymmetrical data.

Hsieh Wen-Ping WP Chu Tzu-Ming TM Lin Yu-Min YM Wolfinger Russell D RD

BMC bioinformatics 20110601

<h4>Background</h4>Normalization of gene expression data has been studied for many years and various strategies have been formulated to deal with various types of data. Most normalization algorithms rely on the assumption that the number of up-regulated genes and the number of down-regulated genes are roughly the same. However, the well-known Golden Spike experiment presents a unique situation in which differentially regulated genes are biased toward one direction, thereby challenging the conclu ...[more]

PMID: 21631915

Similar Datasets

Project description:BackgroundAntiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance.ResultsWe analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs.ConclusionsResults show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket.org/elies_ramon/catkern.

Dataset Information

Kernel density weighted loess normalization improves the performance of detection within asymmetrical data.

Publications

Kernel density weighted loess normalization improves the performance of detection within asymmetrical data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets