# Bio-Inspired Machine Learning Approach to Type 2 Diabetes Detection

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

- Apply the CFA and GA bio-inspired metaheuristic algorithms to the PID and HFD datasets for feature selection. Features in both datasets were reduced by applying a cost function for the logistic regression for the CFA.
- Combine the CFA and GA bio-inspired metaheuristic algorithms with several classification algorithms to predict type 2 diabetes.
- Analyze the performance of the CFA and GA over two datasets: PID and HFD.

## 2. Related Works

## 3. Preliminaries

#### 3.1. Cuttlefish Algorithm

Algorithm 1 The cuttlefish algorithm | |

1 | Input: Max Iteration, v_{1}, v_{2}, r_{1}, r_{2,} Upper, Lower |

2 | Output: Find the best 4 features |

3 | Initialize the number of populations with dimensions |

4 | Evaluate the fitness of the population |

5 | Store the best solution |

6 | Divide cells into four groups G_{1}, G_{2}, G_{3}, and G_{4} |

7 | while I <= Max iteration do |

8 | Calculate average of best solution, and store in best |

9 | for each cell in G_{1} do //Cases 1&2 |

10 | Generate new solution using (1), (2), and (3) |

11 | Ref = rand(r_{1}, r_{2}) × G_{1}[i].Points[j] |

12 | Vis = rand(v_{1}, v_{2})×(Best.Point[j])–G_{1}[i].Points[j] |

13 | Calculate fitness for new solution |

14 | if (fitness > best subset) then current = new Sol |

15 |
end if |

16 |
end for |

17 | for each cell in G_{2} do //Cases 3&4 |

18 | Generate new solution using (1) and (3) |

19 | Ref = Best.Point[j] |

20 | Vis = rand(v_{1},v_{2})×(Best.Points[j]– G_{2}[i].Points[j]) |

21 | Calculate the fitness for the new solution |

22 | if (fitness > best subset) then current = new sol |

23 |
end if |

24 |
end for |

25 | for each cell in G_{3} do //Case 5 |

26 | Generate new solution using (1) and (7) |

27 | Ref = Best.Point[j] |

28 | Vis = rand(v_{1}, v_{2}) × (Best.Points[j] − AVbest) |

29 | Calculate the fitness for the new_sol |

30 | if (fitness > best subset) then current = new_sol |

31 |
end if |

32 |
end for |

33 | for each cell in G_{4} do //Case 6 |

34 | Generate random solution using (1) |

35 | P[i].points[j] = rand ×(Upper − Lower) + Lower |

36 | Calculate the fitness for the new_sol |

37 | if (fitness > best subset) then current = new_sol |

38 |
end if |

39 |
end for |

40 | I = I + 1; |

41 | end while |

#### 3.2. Genetic Algorithm

## 4. Methodology

#### 4.1. Approach

#### 4.2. Datasets

#### 4.3. Feature Selection

#### 4.4. Classification

#### 4.5. Evaluation

## 5. Results and Discussion

## 6. Conclusions and Future Directions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Yach, D.; Hawkes, C.; Gould, C.L.; Hofman, K.J. The Global Burden of Chronic DiseasesOvercoming Impediments to Prevention and Control. JAMA
**2004**, 291, 2616–2622. [Google Scholar] [CrossRef] [PubMed] - Vaishali, R.; Sasikala, R.; Ramasubbareddy, S.; Remya, S.; Nalluri, S. Genetic algorithm based feature selection and MOE Fuzzy classification algorithm on Pima Indians Diabetes dataset. In Proceedings of the 2017 International Conference on Computing Networking and Informatics (ICCNI), Lagos, Nigeria, 29–31 October 2017; pp. 1–5. [Google Scholar]
- Khanam, J.J.; Foo, S.Y. A comparison of machine learning algorithms for diabetes prediction. ICT Express
**2021**, 7, 432–439. [Google Scholar] [CrossRef] - Khalid, S.; Khalil, T.; Nasreen, S. A survey of feature selection and feature extraction techniques in machine learning. In Proceedings of the 2014 Science and Information Conference, London, UK, 27–29 August 2014; pp. 372–378. [Google Scholar]
- Swapna, G.; Vinayakumar, R.; Soman, K.P. Diabetes detection using deep learning algorithms. ICT Express
**2018**, 4, 243–246. [Google Scholar] [CrossRef] - Yu, L.; Liu, H. Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res.
**2004**, 5, 1205–1224. [Google Scholar] - Ismail, L.; Materwala, H.; Tayefi, M.; Ngo, P.; Karduck, A.P. Type 2 Diabetes with Artificial Intelligence Machine Learning: Methods and Evaluation. Arch. Comput. Methods Eng.
**2021**, 29, 313–333. [Google Scholar] [CrossRef] - Yusta, S.C. Different metaheuristic strategies to solve the feature selection problem. Pattern Recognit. Lett.
**2009**, 30, 525–534. [Google Scholar] [CrossRef] - Gandomi, A.H.; Yang, X.-S.; Alavi, A.H. Cuckoo search algorithm: A metaheuristic approach to solve structural optimization problems. Eng. Comput.
**2013**, 29, 17–35. [Google Scholar] [CrossRef] - Yang, X.-S. Nature-Inspired Metaheuristic Algorithms; Luniver Press: London, UK, 2010. [Google Scholar]
- Negi, A.; Jaiswal, V. A first attempt to develop a diabetes prediction method based on different global datasets. In Proceedings of the 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC), Waknaghat, India, 22–24 December 2016; pp. 237–241. [Google Scholar]
- Tigga, N.P.; Garg, S. Prediction of Type 2 Diabetes using Machine Learning Classification Methods. Procedia Comput. Sci.
**2020**, 167, 706–716. [Google Scholar] [CrossRef] - Lukmanto, R.B.; Suharjito; Nugroho, A.; Akbar, H. Early Detection of Diabetes Mellitus using Feature Selection and Fuzzy Support Vector Machine. Procedia Comput. Sci.
**2019**, 157, 46–54. [Google Scholar] [CrossRef] - Sneha, N.; Gangil, T. Analysis of diabetes mellitus for early prediction using optimal features selection. J. Big Data
**2019**, 6, 13. [Google Scholar] [CrossRef] - Nibareke, T.; Laassiri, J. Using Big Data-machine learning models for diabetes prediction and flight delays analytics. J. Big Data
**2020**, 7, 78. [Google Scholar] [CrossRef] - Ellouze, A.; Kahouli, O.; Ksantini, M.; Alsaif, H.; Aloui, A.; Kahouli, B. Artificial Intelligence-Based Diabetes Diagnosis with Belief Functions Theory. Symmetry
**2022**, 14, 2197. [Google Scholar] [CrossRef] - Gupta, D.; Julka, A.; Jain, S.; Aggarwal, T.; Khanna, A.; Arunkumar, N.; de Albuquerque, V.H.C. Optimized cuttlefish algorithm for diagnosis of Parkinson’s disease. Cogn. Syst. Res.
**2018**, 52, 36–48. [Google Scholar] [CrossRef] - Abu Khurmaa, R.; Aljarah, I.; Sharieh, A. An intelligent feature selection approach based on moth flame optimization for medical diagnosis. Neural Comput. Appl.
**2020**, 33, 7165–7204. [Google Scholar] [CrossRef] - Uzma; Al-Obeidat, F.; Tubaishat, A.; Shah, B.; Halim, Z. Gene encoder: A feature selection technique through unsupervised deep learning-based clustering for large gene expression data. Neural Comput. Appl.
**2020**, 34, 8309–8331. [Google Scholar] [CrossRef] - Shah, S.H.; Iqbal, M.J.; Ahmad, I.; Khan, S.; Rodrigues, J.J.P.C. Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Comput. Appl.
**2020**. [Google Scholar] [CrossRef] - Malakar, S.; Ghosh, M.; Bhowmik, S.; Sarkar, R.; Nasipuri, M. A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput. Appl.
**2020**, 32, 2533–2552. [Google Scholar] [CrossRef] - Gandomi, A.H.; Yang, X.-S.; Talatahari, S.; Alavi, A.H. Metaheuristic algorithms in modeling and optimization. In Metaheuristic Applications in Structures and Infrastructures; Elsevier: Amsterdam, The Netherlands, 2013; pp. 1–24. [Google Scholar]
- Almomani, A.; Alweshah, M.; Al, S. Metaheuristic algorithms-based feature selection approach for intrusion detection. In Machine Learning for Computer and Cyber Security; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
- Eesa, A.S.; Brifcani, A.M.A.; Orman, Z. Cuttlefish algorithm-a novel bio-inspired optimization algorithm. Int. J. Sci. Eng. Res.
**2013**, 4, 1978–1986. [Google Scholar] - Eesa, A.S.; Brifcani, A.M.A.; Orman, Z. A new tool for global optimization problems-cuttlefish algorithm. Int. J. Math. Comput. Nat. Phys. Eng.
**2014**, 8, 1208–1211. [Google Scholar] - Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
- Azbeg, K.; Boudhane, M.; Ouchetto, O.; Jai Andaloussi, S. Diabetes emergency cases identification based on a statistical predictive model. J. Big Data
**2022**, 9, 31. [Google Scholar] [CrossRef] - Jayanthi, N.; Babu, B.V.; Rao, N.S. Survey on clinical prediction models for diabetes prediction. J. Big Data
**2017**, 4, 26. [Google Scholar] [CrossRef] - Ben-David, A. Comparison of classification accuracy using Cohen’s Weighted Kappa. Expert Syst. Appl.
**2008**, 34, 825–832. [Google Scholar] [CrossRef] - Vieira, S.M.; Kaymak, U.; Sousa, J.M.C. Cohen’s kappa coefficient as a performance measure for feature selection. In Proceedings of the International Conference on Fuzzy Systems, Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
- Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics
**1977**, 33, 159–174. [Google Scholar] [CrossRef] [PubMed][Green Version] - Rodríguez-Rodríguez, I.; Rodríguez, J.-V.; González-Vidal, A.; Zamora, M.-Á. Feature Selection for Blood Glucose Level Prediction in Type 1 Diabetes Mellitus by Using the Sequential Input Selection Algorithm (SISAL). Symmetry
**2019**, 11, 1164. [Google Scholar] [CrossRef][Green Version] - Aslan, M.F.; Sabanci, K. A Novel Proposal for Deep Learning-Based Diabetes Prediction: Converting Clinical Data to Image Data. Diagnostics
**2023**, 13, 796. [Google Scholar] [CrossRef] - Mahafzah, B.A. Performance evaluation of parallel multithreaded A* heuristic search algorithm. J. Inf. Sci.
**2014**, 40, 363–375. [Google Scholar] [CrossRef] - Mahafzah, B.A. Parallel multithreaded IDA* heuristic search: Algorithm design and performance evaluation. Int. J. Parallel Emergent Distrib. Syst.
**2011**, 26, 61–82. [Google Scholar] [CrossRef] - Al-Adwan, A.; Sharieh, A.; Mahafzah, B.A. Parallel heuristic local search algorithm on OTIS hyper hexa-cell and OTIS mesh of trees optoelectronic architectures. Appl. Intell.
**2019**, 49, 661–688. [Google Scholar] [CrossRef] - Al-Adwan, A.; Mahafzah, B.A.; Sharieh, A. Solving traveling salesman problem using parallel repetitive nearest neighbor algorithm on OTIS-Hypercube and OTIS-Mesh optoelectronic architectures. J. Supercomput.
**2018**, 74, 1–36. [Google Scholar] [CrossRef] - Al-Shaikh, A.A.; Mahafzah, B.A.; Alshraideh, M. Hybrid harmony search algorithm for social network contact tracing of COVID-19. Soft Comput.
**2021**, 27, 3343–3365. [Google Scholar] [CrossRef] - Mahafzah, B.A.; Jabri, R.; Murad, O. Multithreaded scheduling for program segments based on chemical reaction optimizer. Soft Comput.
**2021**, 25, 2741–2766. [Google Scholar] [CrossRef] - Al-Shaikh, A.; Mahafzah, B.A.; Alshraideh, M. Metaheuristic approach using grey wolf optimizer for finding strongly connected components in digraphs. J. Theor. Appl. Inf. Technol.
**2019**, 97, 4439–4452. [Google Scholar] - Khattab, H.; Sharieh, A.; Mahafzah, B.A. Most valuable player algorithm for solving minimum vertex cover problem. Int. J. Adv. Comput. Sci. Appl.
**2019**, 10, 159–167. [Google Scholar] [CrossRef][Green Version]

**Figure 1.**Diagram of cuttlefish skin detailing the three main skin structures [24].

No. | Feature | Description |
---|---|---|

1 | Pregnancies | Number of times pregnant |

2 | Glucose | Plasma glucose concentration 2 h in an oral glucose tolerance test |

3 | Blood Pressure | Diastolic blood pressure (mm Hg) |

4 | Skin Thickness | Triceps skinfold thickness (mm) |

5 | Insulin | 2-Hour serum insulin (mu U/mL) |

6 | BMI | Body Mass Index (weight in kg/(height in m)^{2}) |

7 | Age | Age in year |

8 | Diabetes Pedigree Function | Diabetes diagnostic history of the person’s relatives |

Dataset | Algorithm | Selected Features |
---|---|---|

PID | CFA | Glucose, skin thickness, BMI, and insulin |

GA | Glucose, BMI, diabetes pedigree function, and age | |

HFD | CFA | Diabetes pedigree function, age, glucose, and BMI |

GA | Pregnancies, glucose, insulin, and age |

Recommended | Not Recommended | |
---|---|---|

Preferred | True Positive (TP) | False Negative (FN) |

Not preferred | False Positive (FP) | True Negative (TN) |

Kappa | Strength of Agreement |
---|---|

<0.00 | Poor |

0.00–0.2 | Slight |

0.21–0.40 | Fair |

0.41–0.60 | Moderate |

0.61–0.80 | Substational |

0.81–1.00 | Almost perfect |

Parameter | Description | Value |
---|---|---|

Dimension | Number of features | 4 |

Upper | Maximum limit to initialize population | 8 |

Lower | Minimum limit to initialize population | 1 |

r_{1} | Maximum limit to find reflection | 1.5 |

r_{2} | Minimum limit to find reflection | −1.5 |

v_{1} | Maximum limit to find visibility | 2.5 |

v_{2} | Minimum limit to find visibility | −2.5 |

Dataset | Population Iteration | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
---|---|---|---|---|---|---|---|---|---|---|---|

PID | 10 | 0.77 | 0.78 | 0.77 | 0.77 | 0.79 | 0.78 | 0.77 | 0.80 | 0.79 | 0.79 |

20 | 0.75 | 0.75 | 0.75 | 0.76 | 0.76 | 0.77 | 0.76 | 0.79 | 0.79 | 0.80 | |

30 | 0.75 | 0.76 | 0.77 | 0.78 | 0.79 | 0.79 | 0.80 | 0.79 | 0.80 | 0.80 | |

40 | 0.74 | 0.75 | 0.75 | 0.76 | 0.77 | 0.77 | 0.79 | 0.79 | 0.80 | 0.79 | |

50 | 0.74 | 0.75 | 0.77 | 0.77 | 0.76 | 0.78 | 0.79 | 0.80 | 0.80 | 0.80 | |

60 | 0.75 | 0.76 | 0.77 | 0.77 | 0.77 | 0.78 | 0.79 | 0.79 | 0.80 | 0.80 | |

70 | 0.76 | 0.77 | 0.77 | 0.78 | 0.77 | 0.78 | 0.78 | 0.80 | 0.80 | 0.80 | |

HFD | 10 | 0.76 | 0.77 | 0.76 | 0.75 | 0.76 | 0.76 | 0.76 | 0.74 | 0.75 | 0.73 |

20 | 0.76 | 0.74 | 0.75 | 0.74 | 0.73 | 0.75 | 0.77 | 0.74 | 0.76 | 0.74 | |

30 | 0.75 | 0.75 | 0.76 | 0.76 | 0.75 | 0.76 | 0.77 | 0.75 | 0.75 | 0.75 | |

40 | 0.74 | 0.76 | 0.76 | 0.76 | 0.77 | 0.77 | 0.78 | 0.73 | 0.74 | 0.75 | |

50 | 0.75 | 0.76 | 0.76 | 0.77 | 0.77 | 0.77 | 0.78 | 0.75 | 0.75 | 0.76 | |

60 | 0.74 | 0.77 | 0.77 | 0.77 | 0.76 | 0.77 | 0.77 | 0.74 | 0.75 | 0.76 | |

70 | 0.74 | 0.75 | 0.76 | 0.76 | 0.76 | 0.77 | 0.77 | 0.74 | 0.76 | 0.77 |

**Table 7.**Difference between the performance of the CFA and the GA using different classification algorithms on the PID dataset.

Classifier | Algorithm | Accuracy ± STD | Accuracy Maximin | Accuracy Minimum | Kappa | MAE |
---|---|---|---|---|---|---|

LR | CFA | 0.80 ± 0.03 | 0.82 | 0.70 | 0.49 | 0.2 |

GA | 0.78 ± 0.04 | 0.80 | 0.70 | 0.4 | 0.24 | |

RF | CFA | 0.77 ± 0.04 | 0.77 | 0.73 | 0.3 | 0.23 |

GA | 0.78 ± 0.03 | 0.79 | 0.72 | 0.39 | 0.25 | |

K-NN | CFA | 0.72 ± 0.02 | 0.73 | 0.69 | 0.30 | 0.29 |

GA | 0.74 ± 0.02 | 0.75 | 0.71 | 0.38 | 0.25 | |

SVM | CFA | 0.80 ± 0.03 | 0.81 | 0.70 | 0.48 | 0.21 |

GA | 0.76 ± 0.03 | 0.77 | 0.73 | 0.4 | 0.25 | |

NB | CFA | 0.76 ± 0.02 | 0.77 | 0.69 | 0.4 | 0.24 |

GA | 0.75 ± 0.03 | 0.76 | 0.73 | 0.34 | 0.26 | |

DT | CFA | 0.69 ± 0.02 | 0.70 | 0.64 | 0.35 | 0.29 |

GA | 0.72 ± 0.03 | 0.75 | 0.67 | 0.28 | 0.31 |

**Table 8.**Difference between the performance of the CFA and the GA using different classification algorithms on the HFD dataset.

Classifier | Algorithm | Accuracy ± STD | Accuracy Maximin | Accuracy Minimum | Kappa | MAE |
---|---|---|---|---|---|---|

LR | CFA | 0.79 ± 0.02 | 0.78 | 0.69 | 0.46 | 0.22 |

GA | 0.73 ± 0.02 | 0.73 | 0.69 | 0.37 | 0.26 | |

RF | CFA | 0.97 ± 0.01 | 0.97 | 0.90 | 0.91 | 0.03 |

GA | 0.96 ± 0.03 | 0.97 | 0.89 | 0.92 | 0.03 | |

KNN | CFA | 0.77± 0.04 | 0.82 | 0.72 | 0.53 | 0.19 |

GA | 0.76 ± 0.03 | 0.78 | 0.74 | 0.52 | 0.21 | |

SVM | CFA | 0.75 ± 0.02 | 0.78 | 0.69 | 0.45 | 0.22 |

GA | 0.73 ± 0.03 | 0.74 | 0.70 | 0.4 | 0.26 | |

NB | CFA | 0.75 ± 0.02 | 0.77 | 0.69 | 0.46 | 0.22 |

GA | 0.72 ± 0.04 | 0.73 | 0.69 | 0.36 | 0.28 | |

DT | CFA | 0.95 ± 0.04 | 0.97 | 0.76 | 0.89 | 0.04 |

GA | 0.93 ± 0.01 | 0.96 | 0.94 | 0.86 | 0.06 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Al-Tawil, M.; Mahafzah, B.A.; Al Tawil, A.; Aljarah, I.
Bio-Inspired Machine Learning Approach to Type 2 Diabetes Detection. *Symmetry* **2023**, *15*, 764.
https://doi.org/10.3390/sym15030764

**AMA Style**

Al-Tawil M, Mahafzah BA, Al Tawil A, Aljarah I.
Bio-Inspired Machine Learning Approach to Type 2 Diabetes Detection. *Symmetry*. 2023; 15(3):764.
https://doi.org/10.3390/sym15030764

**Chicago/Turabian Style**

Al-Tawil, Marwan, Basel A. Mahafzah, Arar Al Tawil, and Ibrahim Aljarah.
2023. "Bio-Inspired Machine Learning Approach to Type 2 Diabetes Detection" *Symmetry* 15, no. 3: 764.
https://doi.org/10.3390/sym15030764