• Chỉ mục bởi
  • Năm xuất bản
LIÊN KẾT WEBSITE

Noise-adaptive synthetic oversampling technique

Vo Faculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, Viet Nam|
Tuong (55513981400) | H. Anh (57222549479); Le Informetrics Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam| Trang (57194430442); Vo Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Viet Nam| Minh Thanh (57203062383); Nguyen Faculty of Information Technology, Ho Chi Minh City Open University, Ho Chi Minh City, Viet Nam|

Applied Intelligence Số 11, năm 2021 (Tập 51, trang 7827-7836)

ISSN: 0924669X

ISSN: 0924669X

DOI: 10.1007/s10489-021-02341-2

Tài liệu thuộc danh mục:

Article

English

Từ khóa: Machine learning; Class imbalance problems; Empirical experiments; Hybrid techniques; Imbalanced dataset; Machine learning models; Number of samples; Oversampling technique; Predictive performance; Large dataset
Tóm tắt tiếng anh
In the field of supervised learning, the problem of class imbalance is one of the most difficult problems, and has attracted a great deal of research attention in recent years. In an imbalanced dataset, minority classes are those that contain very small numbers of data samples, while the remaining classes have a very large number of data samples. This type of imbalance reduces the predictive performance of machine learning models. There are currently three approaches for dealing with the class imbalance problem: algorithm-level, data-level, and ensemble-based approaches. Of these, data-level approaches are the most widely used, and consist of three sub-categories: under-sampling, oversampling, and hybrid techniques. Oversampling techniques generate synthetic samples for the minority class to balance an imbalanced dataset. However, existing oversampling approaches do not have a strategy for handling noise samples in imbalanced and noisy datasets, which leads to a reduction in the predictive performance of machine learning models. This study therefore proposes a noise-adaptive synthetic oversampling technique (NASOTECH) to deal with the class imbalance problem in imbalanced and noisy datasets. The noise-adaptive synthetic oversampling (NASO) strategy is first introduced, which is used to identify the number of samples generated for each sample in the minority class, based on the concept of the noise ratio. Next, the NASOTECH algorithm is proposed, based on the NASO strategy, to handle the class imbalance problem in imbalanced and noisy datasets. Finally, empirical experiments are conducted on several synthetic and real datasets to verify the effectiveness of the proposed approach. The experimental results confirm that NASOTECH outperforms three state-of-the-art oversampling techniques in terms of accuracy and geometric mean (G-mean) on imbalanced and noisy datasets. � 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

Xem chi tiết