Machine Learning with Applications: Dealing with Interpretability and Imbalanced Datasets

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (30 July 2022) | Viewed by 11728

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Informatics and Digital Technologies, University of Rijeka, 51000 Rijeka, Croatia
Interests: artificial intelligence; data science; machine learning; explainable artificial intelligence; explainable machine learning; human-centric AI; trustworthy Internet of Things systems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Mechatronics at the College of Engineering, Beijing Lab of Food Quality and Safety, China Agricultural University (East Campus), Beijing 100083, China
Interests: sensors (IoT, flexible sensors) and data processing in food supply chain/industrial engineering; live animal management
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Faculty of Informatics and Digital Technologies, University of Rijeka, Radmile Matejčić 2, 51000 Rijeka, Croatia
Interests: artificial intelligence; machine learning; interpretable machine learning; educational data mining; natural language processing; machine translation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

A major disadvantage of using machine learning is that insights about the data are hidden in increasingly complex models. Moreover, the best performing models are often ensembles which cannot be interpreted, even if each single model could be interpreted. Explainable Machine Learning (Explanatory Artificial Intelligence, XAI) summarizes the reasons for black-box behaviour with the aim to gain the trust of users.

This Special Issue of Electronics will provide a forum for discussing exciting research on applying Interpretable Machine Learning (IML) methods on data captured by sensors or generated by interaction of users with systems in a variety of domains. Both original research articles and comprehensive review papers are welcome. We invite also submissions dealing with imbalanced classification problem in which the distribution of examples across the known classes is biased or skewed.

Topics of Interest of this Special Issue include, but are not limited to

  • Interpretability (intrinsic or post hoc)
    • Global model interpretability
    • Local model interpretability
  • Feature selection techniques
  • Imbalanced classification
  • Explainable AI decision support systems
  • Real-world applications of Interpretable Machine Learning in areas such as:
    • Intelligent transportation systems
    • Food safety
    • Agriculture
    • Natural Language Processing
    • Education
    • Healthcare
    • Finance
    • Smart cities

Prof. Dr. Maja Matetic
Prof. Dr. Xiaoshuan Zhang
Dr. Marija Brkić Bakarić
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Explainable machine learning
  • Interpretable machine learning
  • Model interpretability
  • Model-agnostic techniques
  • Trustworthy Internet of Things (IoT) systems
  • Rare event prediction
  • Extreme event prediction
  • Class imbalance
  • Feature selection

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 395 KiB  
Article
KDE-Based Ensemble Learning for Imbalanced Data
by Firuz Kamalov, Sherif Moussa and Jorge Avante Reyes
Electronics 2022, 11(17), 2703; https://doi.org/10.3390/electronics11172703 - 29 Aug 2022
Cited by 6 | Viewed by 1623
Abstract
Imbalanced class distribution affects many applications in machine learning, including medical diagnostics, text classification, intrusion detection and many others. In this paper, we propose a novel ensemble classification method designed to deal with imbalanced data. The proposed method trains each tree in the [...] Read more.
Imbalanced class distribution affects many applications in machine learning, including medical diagnostics, text classification, intrusion detection and many others. In this paper, we propose a novel ensemble classification method designed to deal with imbalanced data. The proposed method trains each tree in the ensemble using uniquely generated synthetically balanced data. The data balancing is carried out via kernel density estimation, which offers a natural and effective approach to generating new sample points. We show that the proposed method results in a lower variance of the model estimator. The proposed method is tested against benchmark classifiers on a range of simulated and real-life data. The results of experiments show that the proposed classifier significantly outperforms the benchmark methods. Full article
Show Figures

Figure 1

16 pages, 3852 KiB  
Article
Rapid and Accurate Diagnosis of COVID-19 Cases from Chest X-ray Images through an Optimized Features Extraction Approach
by K. G. Satheesh Kumar, Arunachalam Venkatesan, Deepika Selvaraj and Alex Noel Joseph Raj
Electronics 2022, 11(17), 2682; https://doi.org/10.3390/electronics11172682 - 26 Aug 2022
Cited by 3 | Viewed by 1259
Abstract
The mutants of novel coronavirus (COVID-19 or SARS-Cov-2) are spreading with different variants across the globe, affecting human health and the economy. Rapid detection and providing timely treatment for the COVID-19 infected is the greater challenge. For fast and cost-effective detection, artificial intelligence [...] Read more.
The mutants of novel coronavirus (COVID-19 or SARS-Cov-2) are spreading with different variants across the globe, affecting human health and the economy. Rapid detection and providing timely treatment for the COVID-19 infected is the greater challenge. For fast and cost-effective detection, artificial intelligence (AI) can perform a key role in enhancing chest X-ray images and classifying them as infected/non-infected. However, AI needs huge datasets to train and detect the COVID-19 infection, which may impact the overall system speed. Therefore, Deep Neural Network (DNN) is preferred over standard AI models to speed up the classification with a set of features from the datasets. Further, to have accurate feature extraction, an algorithm that combines Zernike Moment Feature (ZMF) and Gray Level Co-occurrence Matrix Feature (GF) is proposed and implemented. The proposed algorithm uses 36 Zernike Moment features with variance and contrast textures. This helps to detect the COVID-19 infection accurately. Finally, the Region Blocking (RB) approach with an optimum sub-image size (32 × 32) is employed to improve the processing speed up to 2.6 times per image. The performance of this implementation presents an accuracy (A) of 93.4%, sensitivity (Se) of 72.4%, specificity (Sp) of 95%, precision (Pr) of 74.9% and F1-score (F1) of 72.3%. These metrics illustrate that the proposed model can identify the COVID-19 infection with a lesser dataset and improved accuracy up to 1.3 times than state-of-the-art existing models. Full article
Show Figures

Figure 1

17 pages, 5812 KiB  
Article
The Impact of Partial Balance of Imbalanced Dataset on Classification Performance
by Qing Li, Chang Zhao, Xintai He, Kun Chen and Runze Wang
Electronics 2022, 11(9), 1322; https://doi.org/10.3390/electronics11091322 - 21 Apr 2022
Cited by 4 | Viewed by 1784
Abstract
The imbalance of network data seriously affects the classification performance of algorithms. Most studies have only used a rough description of data imbalance with less exploration of the specific factors affecting classification performance, which has resulted in difficulty putting forward targeted solutions. In [...] Read more.
The imbalance of network data seriously affects the classification performance of algorithms. Most studies have only used a rough description of data imbalance with less exploration of the specific factors affecting classification performance, which has resulted in difficulty putting forward targeted solutions. In this paper, we find that the impact of medium categories on classification performance cannot be ignored, and therefore propose the concept of partial balance, consisting of Class Number of Partial Balance (β) and Balance Degree of Partial Samples (μ). Combined with Global Slope (α), a parameterized model is established to describe the difference of imbalanced datasets. Experiments are performed on the Moore Dataset and CICIDS 2017 Dataset. The experiment’s results on Random Forest, Decision Tree and Deep Neural Network show increasing α is a conducive step in the performance improvement of minority classes and overall classes. When β of dominant categories increases, that of inferior classes decreases, which results in a decrease in the average performance of minority classes. The lower μ is, the closer the sample size of medium classes is to the minority classes, and the better the average performance is. Based on the conclusions, we propose and verify some basic strategies by various classical algorithms. Full article
Show Figures

Figure 1

18 pages, 1934 KiB  
Article
Utilization of Explainable Machine Learning Algorithms for Determination of Important Features in ‘Suncrest’ Peach Maturity Prediction
by Dejan Ljubobratović, Marko Vuković, Marija Brkić Bakarić, Tomislav Jemrić and Maja Matetić
Electronics 2021, 10(24), 3115; https://doi.org/10.3390/electronics10243115 - 14 Dec 2021
Cited by 3 | Viewed by 2533
Abstract
Peaches (Prunus persica (L.) Batsch) are a popular fruit in Europe and Croatia. Maturity at harvest has a crucial influence on peach fruit quality, storage life, and consequently consumer acceptance. The main goal of this study is to develop a machine learning [...] Read more.
Peaches (Prunus persica (L.) Batsch) are a popular fruit in Europe and Croatia. Maturity at harvest has a crucial influence on peach fruit quality, storage life, and consequently consumer acceptance. The main goal of this study is to develop a machine learning model that will detect the most important features for predicting peach maturity by first training models and then using the importance ratings of these models to detect nonlinear (and linear) relationships. Thus, the most important peach features at a given stage of its ripening could be revealed. To date, this method has not been used for this purpose, and at the same time, it has the potential to be applied to other similar peach varieties. A total of 33 fruit features are measured on the harvested peaches, and three imbalanced datasets are created using firmness thresholds of 1.84, 3.57, and 4.59 kg·cm−2. These datasets are balanced using the SMOTE and ROSE techniques, and the Random Forest machine learning model is trained on them. Permutation Feature Importance (PFI), Variable Importance (VI), and LIME interpretability methods are used to detect variables that most influence predictions in the given machine learning models. PFI shows that the and a* ground color parameters, COL ground color index, SSC/TA, and TA inner quality parameters are among the top ten most contributing variables in all three models. Meanwhile, VI shows that this is the case for the a* ground color parameter, COL and CCL ground color indexes, and the SSC/TA inner quality parameter. The fruit flesh ratio is highly positioned (among the top three according to PFI) in two models, but it is not even among the top ten in the third. Full article
Show Figures

Figure 1

33 pages, 4326 KiB  
Article
Estimation and Interpretation of Machine Learning Models with Customized Surrogate Model
by Mudabbir Ali, Asad Masood Khattak, Zain Ali, Bashir Hayat, Muhammad Idrees, Zeeshan Pervez, Kashif Rizwan, Tae-Eung Sung and Ki-Il Kim
Electronics 2021, 10(23), 3045; https://doi.org/10.3390/electronics10233045 - 06 Dec 2021
Cited by 2 | Viewed by 2849
Abstract
Machine learning has the potential to predict unseen data and thus improve the productivity and processes of daily life activities. Notwithstanding its adaptiveness, several sensitive applications based on such technology cannot compromise our trust in them; thus, highly accurate machine learning models require [...] Read more.
Machine learning has the potential to predict unseen data and thus improve the productivity and processes of daily life activities. Notwithstanding its adaptiveness, several sensitive applications based on such technology cannot compromise our trust in them; thus, highly accurate machine learning models require reason. Such models are black boxes for end-users. Therefore, the concept of interpretability plays the role if assisting users in a couple of ways. Interpretable models are models that possess the quality of explaining predictions. Different strategies have been proposed for the aforementioned concept but some of these require an excessive amount of effort, lack generalization, are not agnostic and are computationally expensive. Thus, in this work, we propose a strategy that can tackle the aforementioned issues. A surrogate model assisted us in building interpretable models. Moreover, it helped us achieve results with accuracy close to that of the black box model but with less processing time. Thus, the proposed technique is computationally cheaper than traditional methods. The significance of such a novel technique is that data science developers will not have to perform strenuous hands-on activities to undertake feature engineering tasks and end-users will have the graphical-based explanation of complex models in a comprehensive way—consequently building trust in a machine. Full article
Show Figures

Figure 1

Back to TopTop