Generative Methods for Imbalanced Tabular Datasets

While working on modeling human comfort (i.e. thermal or visual), we found that the available datasets are often limited. These datasets used as ground truth have the participants subjective responses in a given environmental context as a multi-class problem. However, even in controlled experiments, it is hard to exploit the same number of responses for all possible classes of comfort.

Given this, we try to explore the use of Deep Generative Methods to generate synthetic samples that aim to complement real ones in order to achieve a more robust and rich dataset for machine learning models.

Preliminary findings show the potential of using a combination of real and synthetic samples for modeling. Additionally, methods for tabular multi-class dataset generation have not explored as much as theyr image counterpart, leaving room for exploration of new techniques.

[BuildSys ‘19 Abstract]