Synthetic generation of multidimensional data to improve classification model validity

Ahmad Al-Qerem, Ali Mohd Ali, Hani Attar*, Shadi Nashwan, Lianyong Qi, Mohammad Kazem Moghimi, Ahmed Solyman

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
68 Downloads (Pure)

Abstract

This article aims to compare Generative Adversarial Network (GAN) models and feature selection methods for generating synthetic data in order to improve the validity of a classification model. The synthetic data generation technique involves generating new data samples from existing data to increase the diversity of the data and help the model generalize better. The multidimensional aspect of the data refers to the fact that it can have multiple features or variables that describe it. The GAN models have proven to be effective in preserving the statistical properties of the original data. However, the order of data augmentation and feature selection is crucial to build robust and accurate predictive models. By comparing the different GAN models with feature selection methods on multidimensional datasets, this article aims to determine the best combination to support the validity of a classification model in multidimensional data.

Original languageEnglish
Article number37
Pages (from-to)1-20
Number of pages20
JournalJournal of Data and Information Quality
Volume15
Issue number3
DOIs
Publication statusPublished - 28 Sept 2023

Keywords

  • data augmentation
  • filter method
  • model validity
  • Multidimensional data
  • wrapper method

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Synthetic generation of multidimensional data to improve classification model validity'. Together they form a unique fingerprint.

Cite this