Abstract
This article aims to compare Generative Adversarial Network (GAN) models and feature selection methods for generating synthetic data in order to improve the validity of a classification model. The synthetic data generation technique involves generating new data samples from existing data to increase the diversity of the data and help the model generalize better. The multidimensional aspect of the data refers to the fact that it can have multiple features or variables that describe it. The GAN models have proven to be effective in preserving the statistical properties of the original data. However, the order of data augmentation and feature selection is crucial to build robust and accurate predictive models. By comparing the different GAN models with feature selection methods on multidimensional datasets, this article aims to determine the best combination to support the validity of a classification model in multidimensional data.
Original language | English |
---|---|
Article number | 37 |
Pages (from-to) | 1-20 |
Number of pages | 20 |
Journal | Journal of Data and Information Quality |
Volume | 15 |
Issue number | 3 |
DOIs | |
Publication status | Published - 28 Sept 2023 |
Keywords
- data augmentation
- filter method
- model validity
- Multidimensional data
- wrapper method
ASJC Scopus subject areas
- Information Systems
- Information Systems and Management