class imbalance problem in CNN

A systematic study of the class imbalance problem in convolutional neural networks

  • several methods:oversampling,undersampling,two-phase training,thresholding
  • problem:some classes have a significantly higher number of examples in the training set than other classes –class imbalance ,Eg:computer vision,medical diagnosis,fraud detection,the frequency of one class can be 1000 times less than another class
  • A new method for CNNs was introduced that trains the network in two-phases in which the network is trained on the balanced data first and then the output layers are fine-tuned
  • Methods for addressing imbalance

    • Data level methods:

      • Oversampling:random minority oversampling,which simply replicates randomly selected samples from minority classes

      • Undersampling:

        examples are removed randomly from majority classes until all classes have the same number of examples;

        disadvantage:discard a portion of available data

    • Classifier level methods:

      • Threshold:

        adjust the decision threshold of a classifier

        There are estimated for each class by its frequency in the imbalanced dataset before sampling is applied

        Bayesian a posteriori probabilities

        a given datapoint x,their output for class i,y(x)=p(i | x)=p(i)*p(x | i)/p(x)

      • Cost sensitive learning:

        different cost to misclassification of examples from different classes

        One approach is threshold moving or post scaling

        Another adaptation of neural work is modify the learning rate such that higher cost examples contribute more to the update of weights

        Finally we can train the network by minimizing the misclassification cost instead of standard loss function

      • One-class classification:

        recognizes positive instances

        identity function:a reconstruction error between the input and output patterns Eg:absolute error,square sum of errors,Euclidean or Mahalanobis distance

        prove to work well for extremely high imbalance when classification problems turns info anomaly detection

      • Hybrid of problems:

        Two-phase training,approached as pixel level classification

        involve network pre-training on balanced dataset and then fine-tuning the last output layer before softmax on the original , imbalance data

  • Experiments

    • Forms of imbalance:

      • step imbalance:the number of examples is equal within minority classes and equal within majority classes but differs between the majority and minority classes
      • linear imbalance:the difference between consecutive pairs of classes is constant
    • Evaluation metrics and testing

      • The metric:overall accuracy

      • the area under the receiver operating characteristic curve(ROC AUC)

        ROC has also been used to compare performance of classifiers trained on imbalance datasets

  • Conclusions

    • The effect of class imbalance on classification performance is detrimental

    • The influence of imbalance on classification performance increases with the scale of a task

    • The impact of imbalance cannot be explained by the lower total number of training cases and depends on the distribution of examples among classes

    • The method that in most of the cases outperforms all others with respect to multiclass ROC AUC was oversampling

    • For extreme ratio of imbalance and large portion of classes being minority , undersampling performs on a par with oversampling

    • To achieve the best accuracy , one should apply thresholding to compensate for prior class probabilities . A combination of thresholding with baseline and oversampling is the most preferable , whereas it should not be combined with undersampling

    • Oversampling should be applied to the level that totally eliminates the imbalance , whereas undersampling can perform better when the imbalance is only removed to some extent

    • As supposed to some classical machine learning models , oversampling does not necessarily cause overfitting of convolutional neural networks