Class imbalance refers to a situation in a dataset where the number of samples across different classes is uneven, with some classes having significantly more samples than others. Class imbalance is a common issue in real-world datasets, especially in fields like healthcare, fraud detection, and anomaly detection, where rare events or conditions are often more critical to identify but are underrepresented in the data. The main approaches to address class imbalance in deep learning include [1]: Pre-processing techniques: These techniques involve modifying the dataset before training the model. e.g. Random undersampling, Random oversampling, etc. Post-processing techniques: These techniques are applied after model training and focus on adjusting the model’s predictions to correct imbalances. For instance, adjusting the decision threshold of the model to increase the sensitivity to the minority class. Learning stage algorithms: These modify the model’s learning process to handle class imbalance. This can involve adjusting loss functions to assign higher penalties to misclassifications of minority class samples. Cross Entropy (CE) is not well-suited for imbalanced datasets as it treats all classes equally. Weighted Cross Entropy (WCE) assigns different weights to different classes based on their frequency in the dataset [2]. By assigning higher weights to minority classes, the model is encouraged to pay more attention to these classes during training. Focal loss is a modification of CE loss that down-weighs the loss assigned to well-classified samples, thereby focusing the model’s attention on hard-to-classify samples, such as those from the minority class [3]. A novel framework for loss function design by generalizing CE and focal loss as polynomial expansions is proposed in [4]. The paper emphasizes the use of simple poly-1 loss which modifies the leading term of the polynomial expansion of CE loss. Although these approaches have shown promise in addressing class imbalance in deep learning, there is no one-size-fits-all solution, and the choice of method depends on the specific characteristics of the dataset and the problem at hand. The loss functions like focal loss or polynomial loss require careful tuning of hyperparameters to achieve optimal performance. In [4], the authors give a theoretical explanation of the effectiveness of the polynomial loss by using gradients of the loss function and show that making the leading polynomial term of CE loss to be zero can help in addressing the class imbalance problem but also hypothesize that making the leading term to be positive can help in increasing the model’s confidence in the predictions. Less research is available on explainable AI for class imbalance in deep learning models [1]. The previous works in [5, 6] have given significant contributions to reduce bias in logistic regression due to class imbalance. In future, these methods will be explored in the context of deep learning models. The works in [7, 8, 9] have shown that robust methods can be used to improve model performance in presence of outliers and noisy labels. These methods will be explored in the context of robust learning in deep learning models.


Experimental Results
Sampling techniques such as random oversampling and random undersampling have been performed on Biocon dataset to address class imbalance in binary classification task of classifying white light oral cavity images into suspicious and non-suspicious classes for Oral cancer screening. The dataset contains 2932 images in minority (suspicious) class and 14812 images in majority (nonsuspicious) class. WCE, focal loss and polynomial loss have been tested on the dataset to address class imbalance. Table 1 shows the results of the experiments on the dataset. The experiments were conducted on MobileViTv2 model with 50 epochs and a batch size of 64. A total of 10 model instances were trained for each experiment and mean and standard deviation of sensitivity and specificity were reported. The results show that random undersampling, WCE and focal loss outperform the baseline CE loss in terms of sensitivity and specificity on considered dataset. The results indicate that addressing class imbalance using sampling techniques or weighted loss functions can improve the model’s 2 performance on imbalanced datasets. The hypothesis given in [4], that making the leading polynomial term of CE loss positive can help in increasing the model’s confidence in predicting true class (Pt), is tested on the Biocon dataset using poly-1 loss. The poly-1 loss (Lpoly−1) is defined as [4]: Lpoly−1 = − log(Pt) + ε(1 − Pt) = (1 + ε)(1 − Pt) + 1 2 (1 − Pt) 2 + 1 3 (1 − Pt) 3 + . . . The mean Pt for class 0 and class 1 is calculated for poly-1 loss with leading term of CE loss set to 0 (ε = −1) and 2 (ε = 1). Mean Pt is also calculated for CE, WCE and focal loss along with random undersampling on the Biocon dataset. Class 0 is majority class and class 1 is minority class in the dataset. The results are shown in Figure 1. The obtained results do not support the hypothesis that making the leading polynomial term of the loss function positive can help in increasing the model’s confidence in the predictions.
References