Dataset Information

Empirical evaluation of data normalization methods for molecular classification.

ABSTRACT: Background: Data artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-marker molecular classifiers-an increasingly important application of microarrays in the era of personalized medicine.

Methods: In this study, we set out to evaluate the performance of three commonly used methods for data normalization in the context of molecular classification, using extensive simulations based on re-sampling from a unique pair of microRNA microarray datasets for the same set of samples. The data and code for our simulations are freely available as R packages at GitHub.

Results: In the presence of confounding handling effects, all three normalization methods tended to improve the accuracy of the classifier when evaluated in an independent test data. The level of improvement and the relative performance among the normalization methods depended on the relative level of molecular signal, the distributional pattern of handling effects (e.g., location shift vs scale change), and the statistical method used for building the classifier. In addition, cross-validation was associated with biased estimation of classification accuracy in the over-optimistic direction for all three normalization methods.

Conclusion: Normalization may improve the accuracy of molecular classification for data with confounding handling effects; however, it cannot circumvent the over-optimistic findings associated with cross-validation for assessing classification accuracy.

SUBMITTER: Huang HC

PROVIDER: S-EPMC5899419 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Empirical evaluation of data normalization methods for molecular classification.

Huang Huei-Chung HC Qin Li-Xuan LX

PeerJ 20180411

<h4>Background</h4>Data artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-mar ...[more]

PMID: 29666754

Similar Datasets

Project description:In numerous classification problems, class distribution is not balanced. For example, positive examples are rare in the fields of disease diagnosis and credit card fraud detection. General machine learning methods are known to be suboptimal for such imbalanced classification. One popular solution is to balance training data by oversampling the underrepresented (or undersampling the overrepresented) classes before applying machine learning algorithms. However, despite its popularity, the effectiveness of sampling has not been rigorously and comprehensively evaluated. This study assessed combinations of seven sampling methods and eight machine learning classifiers (56 varieties in total) using 31 datasets with varying degrees of imbalance. We used the areas under the precision-recall curve (AUPRC) and receiver operating characteristics curve (AUROC) as the performance measures. The AUPRC is known to be more informative for imbalanced classification than the AUROC. We observed that sampling significantly changed the performance of the classifier (paired t-tests P < 0.05) only for few cases (12.2% in AUPRC and 10.0% in AUROC). Surprisingly, sampling was more likely to reduce rather than improve the classification performance. Moreover, the adverse effects of sampling were more pronounced in AUPRC than in AUROC. Among the sampling methods, undersampling performed worse than others. Also, sampling was more effective for improving linear classifiers. Most importantly, we did not need sampling to obtain the optimal classifier for most of the 31 datasets. In addition, we found two interesting examples in which sampling significantly reduced AUPRC while significantly improving AUROC (paired t-tests P < 0.05). In conclusion, the applicability of sampling is limited because it could be ineffective or even harmful. Furthermore, the choice of the performance measure is crucial for decision making. Our results provide valuable insights into the effect and characteristics of sampling for imbalanced classification.

Project description:ContextColor normalization techniques for histology have not been empirically tested for their utility for computational pathology pipelines.AimsWe compared two contemporary techniques for achieving a common intermediate goal - epithelial-stromal classification.Settings and designExpert-annotated regions of epithelium and stroma were treated as ground truth for comparing classifiers on original and color-normalized images.Materials and methodsEpithelial and stromal regions were annotated on thirty diverse-appearing H and E stained prostate cancer tissue microarray cores. Corresponding sets of thirty images each were generated using the two color normalization techniques. Color metrics were compared for original and color-normalized images. Separate epithelial-stromal classifiers were trained and compared on test images. Main analyses were conducted using a multiresolution segmentation (MRS) approach; comparative analyses using two other classification approaches (convolutional neural network [CNN], Wndchrm) were also performed.Statistical analysisFor the main MRS method, which relied on classification of super-pixels, the number of variables used was reduced using backward elimination without compromising accuracy, and test - area under the curves (AUCs) were compared for original and normalized images. For CNN and Wndchrm, pixel classification test-AUCs were compared.ResultsKhan method reduced color saturation while Vahadane reduced hue variance. Super-pixel-level test-AUC for MRS was 0.010-0.025 (95% confidence interval limits ± 0.004) higher for the two normalized image sets compared to the original in the 10-80 variable range. Improvement in pixel classification accuracy was also observed for CNN and Wndchrm for color-normalized images.ConclusionsColor normalization can give a small incremental benefit when a super-pixel-based classification method is used with features that perform implicit color normalization while the gain is higher for patch-based classification methods for classifying epithelium versus stroma.

Dataset Information

Empirical evaluation of data normalization methods for molecular classification.

Publications

Empirical evaluation of data normalization methods for molecular classification.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets