Dataset Information

Distribution-preserving data augmentation.

ABSTRACT: In the last decade, deep learning has been applied in a wide range of problems with tremendous success. This success mainly comes from large data availability, increased computational power, and theoretical improvements in the training phase. As the dataset grows, the real world is better represented, making it possible to develop a model that can generalize. However, creating a labeled dataset is expensive, time-consuming, and sometimes not likely in some domains if not challenging. Therefore, researchers proposed data augmentation methods to increase dataset size and variety by creating variations of the existing data. For image data, variations can be obtained by applying color or spatial transformations, only one or a combination. Such color transformations perform some linear or nonlinear operations in the entire image or in the patches to create variations of the original image. The current color-based augmentation methods are usually based on image processing methods that apply color transformations such as equalizing, solarizing, and posterizing. Nevertheless, these color-based data augmentation methods do not guarantee to create plausible variations of the image. This paper proposes a novel distribution-preserving data augmentation method that creates plausible image variations by shifting pixel colors to another point in the image color distribution. We achieved this by defining a regularized density decreasing direction to create paths from the original pixels' color to the distribution tails. The proposed method provides superior performance compared to existing data augmentation methods which is shown using a transfer learning scenario on the UC Merced Land-use, Intel Image Classification, and Oxford-IIIT Pet datasets for classification and segmentation tasks.

SUBMITTER: Saran NA

PROVIDER: S-EPMC8176531 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Distribution-preserving data augmentation.

Saran Nurdan Ayse NA Saran Murat M Nar Fatih F

PeerJ. Computer science 20210527

In the last decade, deep learning has been applied in a wide range of problems with tremendous success. This success mainly comes from large data availability, increased computational power, and theoretical improvements in the training phase. As the dataset grows, the real world is better represented, making it possible to develop a model that can generalize. However, creating a labeled dataset is expensive, time-consuming, and sometimes not likely in some domains if not challenging. Therefore, ...[more]

PMID: 34141893

Dataset Information

Distribution-preserving data augmentation.

Publications

Distribution-preserving data augmentation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Optimal distribution-preserving downsampling of large biomedical data sets (opdisDownsampling).
| S-EPMC8341664 | biostudies-literature

Anatomical Remnant-Preserving Double-Bundle ACL Reconstruction With a New Remnant Augmentation Technique.
| S-EPMC7029215 | biostudies-literature

Revealing the spatial distribution of a disease while preserving privacy.
| S-EPMC2584758 | biostudies-literature

Privacy-preserving storage of sequenced genomic data.
| S-EPMC8487550 | biostudies-literature

Privacy-preserving data sharing via probabilistic modeling.
| S-EPMC8276015 | biostudies-literature

Arthroscopic Bursa-Augmented Rotator Cuff Repair: A Vasculature-preserving Technique for Subacromial Bursal Harvest and Tendon Augmentation.
| S-EPMC8185525 | biostudies-literature

Privacy-preserving techniques of genomic data-a survey.
| S-EPMC6585383 | biostudies-literature

Virtual data augmentation method for reaction prediction.
| S-EPMC9556613 | biostudies-literature

Data Augmentation Enhances Plant-Genomic-Enabled Predictions.
| S-EPMC10969940 | biostudies-literature

Efficient Data Augmentation for Fitting Stochastic Epidemic Models to Prevalence Data.
| S-EPMC6275108 | biostudies-literature