# An Improved Three-Way K-Means Algorithm by Optimizing Cluster Centers

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- The clustering results of k-means are dependent on the random selection of clustering centers and the problem of local optimization readily occurs.
- Traditional k-means algorithms are based on the assumption that a cluster is represented by one single set with a sharp boundary. Only two types of relationship between an object and cluster are considered, i.e., belong to and not belong to. The requirement for a sharp boundary is easily met for analytical clustering results, but may not adequately show the uncertainty information in the dataset.

## 2. Related Work

#### 2.1. Three-Way Clustering

#### 2.2. Three-Way k-Means

- (1)
- If $T\ne \varphi $, then $v\in \mathrm{support}\left({C}_{i}\right)$ and $v\in \mathrm{support}\left({C}_{j}\right)$.
- (2)
- If $T=\varphi $, then $v\in \mathrm{support}\left({C}_{i}\right)$.

Algorithm 1: Three-way k-means [21] |

## 3. The Improved Three-Way k-Means

#### 3.1. Random Probability Selection Strategy

#### 3.2. The Improved Three-Way k-Means Algorithm

Algorithm 2: The improved three-way k-means based on ant colony algorithm. |

## 4. Experimental Analysis

#### 4.1. Evaluation Indices

- In the above formula, n is the total number of samples in the dataset, ${C}_{i}$ is the correct number of samples divided into class clusters i, and k is the number of class clusters. $Acc$ represents the ratio between the number of correctly partitioned elements and the total number. A greater $ACC$ value implies a better clustering result. When $ACC$ = 1, the result of the clustering algorithm is consistent with the real result.
- Davies–Bouldin index ($DBI$) [47].$$DB=\frac{1}{c}\sum _{i=1}^{c}\underset{j\ne i}{max}\left\{\frac{S\left({C}_{i}\right)+S\left({C}_{j}\right)}{d({x}_{i},{x}_{j})}\right\}$$$$S\left({C}_{i}\right)=\frac{{\sum}_{v\in {C}_{i}}\Vert v-{x}_{i}\Vert}{\mid {C}_{i}\mid}.$$As a function of the ratio of the within cluster scatter to the between cluster separation, a lower value will mean that the clustering is better.
- Average silhouette index ($AS$) [47].$$AS=\frac{1}{n}\sum _{i=1}^{n}{S}_{i},$$$${S}_{i}=\frac{{b}_{i}-{a}_{i}}{max\{{a}_{i},{b}_{i}\}},$$${a}_{i}$ is the average distance between ${x}_{i}$ and all other objects in its own cluster, and ${b}_{i}$ is the minimum of the average distance between ${x}_{i}$ and objects in other clusters. The range of the average silhouette index is $[-1,1]$; a larger value means a better clustering result.

#### 4.2. Performances of Proposed Algorithm

#### 4.3. Experimental Results Analysis

- Similar to the k-means algorithm, the proposed method can achieve good results for convex datasets. If the dataset is non-convex, the proposed algorithm fails to give good results.
- The time complexity and computation complexity of the proposed algorithm are higher than for k-means and three-way k-means, which means it is not suitable for big data.

## 5. Conclusions and Future Work

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Fujita, H.; Li, T.R.; Yao, Y.Y. Advances in three-way decisions and granular computing. Knowl.-Based Syst.
**2016**, 91, 1–3. [Google Scholar] [CrossRef] - Yao, Y.Y. Integrative Levels of Granularity, Human-Centric Information Processing through Granular Modelling; Springer: Berlin/Heidelberg, Germany, 2009; pp. 31–47. [Google Scholar]
- Fujita, H.; Gaeta, A.; Loia, V.; Orciuoli, F. Resilience analysis of critical infrastructures: A cognitive approach based on granular computing. IEEE Trans. Cybern.
**2018**, 49, 1835–1848. [Google Scholar] [CrossRef] [PubMed] - Pedrycz, W. Granular Computing Analysis and Design of Intelligent Systems; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
- Xu, R.; Wunsch, D.C. Survey of clustering algorithms. IEEE Trans. Neural Netw. Learn. Syst.
**2005**, 16, 645–678. [Google Scholar] [CrossRef] [PubMed] - Yang, X.B.; Qi, Y.S.; Song, X.N.; Yang, J.Y. Test cost sensitive multigranulation rough set: Model and minimal cost selection. Inf. Sci.
**2013**, 250, 184–199. [Google Scholar] [CrossRef] - Xu, W.H.; Guo, Y.T. Generalized multigranulation double-quantitative decision-theoretic rough set. Knowl.-Based Syst.
**2016**, 105, 190–205. [Google Scholar] [CrossRef] - Li, W.T.; Xu, W.H.; Zhang, X.Y.; Zhang, J. Updating approximations with dynamic objects based on local multigranulation rough sets in ordered information systems. Artif. Intell. Rev.
**2022**, 55, 1821–1855. [Google Scholar] [CrossRef] - Xu, W.H.; Yu, J.H. A novel approach to information fusion in multi-source datasets: A granular computing viewpoint. Inf. Sci.
**2017**, 378, 410–423. [Google Scholar] [CrossRef] - Chen, X.W.; Xu, W.H. Double-quantitative multigranulation rough fuzzy set based on logical operations in multi-source decision systems. Int. J. Mach. Learn. Cybern.
**2022**, 13, 1021–1048. [Google Scholar] [CrossRef] - Xu, W.H.; Yuan, K.H.; Li, W.T. Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl. Intell.
**2022**, 52, 9148–9173. [Google Scholar] [CrossRef] - Yang, X.B.; Yao, Y.Y. Ensemble selector for attribute reduction. Appl. Soft Comput.
**2018**, 70, 1–11. [Google Scholar] [CrossRef] - Jiang, Z.H.; Yang, X.B.; Yu, H.L.; Liu, D.; Wang, P.X.; Qian, Y.H. Accelerator for multi-granularity attribute reduction. Knowl.-Based Syst.
**2019**, 177, 145–158. [Google Scholar] [CrossRef] - Li, J.Z.; Yang, X.B.; Song, X.N.; Li, J.H.; Wang, P.X.; Yu, D.-J. Neighborhood attribute reduction: A multi-criterion approach. Int. J. Mach. Learn. Cybern.
**2019**, 10, 731–742. [Google Scholar] [CrossRef] - Liu, K.Y.; Yang, X.B.; Yu, H.L.; Fujita, H.; Chen, X.J.; Liu, D. Supervised information granulation strategy for attribute reduction. Int. J. Mach. Learn. Cybern.
**2020**, 11, 2149–2163. [Google Scholar] [CrossRef] - Xu, S.P.; Yang, X.B.; Yu, H.L.; Yu, D.-J.; Yang, J.; Tsang, E.C.C. Multi-label learning with label-specific feature reduction. Knowl.-Based Syst.
**2016**, 104, 52–61. [Google Scholar] [CrossRef] - Liu, K.Y.; Yang, X.B.; Fujita, H. An efficient selector for multi-granularity attribute reduction. Inf. Sci.
**2019**, 505, 457–472. [Google Scholar] [CrossRef] - Liu, K.Y.; Yang, X.B.; Yu, H.L.; Mi, J.S.; Wang, P.X.; Chen, X.J. Rough set based semi-supervised feature selection via ensemble selector. Knowl.-Based Syst.
**2020**, 165, 282–296. [Google Scholar] [CrossRef] - MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
- Zhang, G.; Zhang, C.C.; Zhang, H.Y. Improved k-means algorithm based on density canopy. Knowl.-Based Syst.
**2018**, 145, 289–297. [Google Scholar] [CrossRef] - Wang, P.X.; Shi, H.; Yang, X.B.; Mi, J.S. Three-way k-means: Integrating k-means and three-way decision. Int. J. Mach. Learn. Cybern.
**2019**, 10, 2767–2777. [Google Scholar] [CrossRef] - Yao, Y.Y. Three-way decisions with probabilistic rough sets. Inf. Sci.
**2010**, 180, 341–353. [Google Scholar] [CrossRef] - Yao, Y.Y. The superiority of three-way decisions in probabilistic rough set models. Inf. Sci.
**2011**, 181, 1080–1096. [Google Scholar] [CrossRef] - Colorni, A.; Dorigo, M.; Maniezzo, V. Distributed 0ptimization by ant colonies. In Proceedings of the European Conference on Artificial Life, Paris, France, 11–13 December 1991. [Google Scholar]
- Yao, Y.Y. Three-way decision and granular computing. Int. J. Approx. Reason.
**2018**, 103, 107–123. [Google Scholar] [CrossRef] - Luo, J.F.; Hu, M.J.; Qin, K.Y. Three-way decision with incomplete information based on similarity and satisfiability. Int. J. Approx. Reason.
**2020**, 120, 151–183. [Google Scholar] [CrossRef] - Xu, J.F.; Zhang, Y.J.; Miao, D.Q. Three-way confusion matrix for classification: A measure driven view. Inf. Sci.
**2020**, 507, 772–794. [Google Scholar] [CrossRef] - Qi, J.J.; Qian, T.; Wei, L. The connections between three-way and classical concept lattices. Knowl.-Based Syst.
**2016**, 91, 143–151. [Google Scholar] [CrossRef] - Li, J.H.; Huang, C.C.; Qi, J.J.; Qian, Y.H.; Liu, W.Q. Three-way cognitive concept learning via multi-granularity. Inf. Sci.
**2017**, 378, 244–263. [Google Scholar] [CrossRef] - Yuan, K.H.; Xu, W.H.; Li, W.T.; Ding, W.P. An incremental learning mechanism for object classification based on progressive fuzzy three-way concept. Inf. Sci.
**2022**, 584, 127–147. [Google Scholar] [CrossRef] - Li, X.N. Three-way fuzzy matroids and granular computing. Int. J. Approx. Reason.
**2019**, 114, 44–50. [Google Scholar] [CrossRef] - Fujita, H.; Gaeta, A.; Loia, V.; Orciuoli, F. Improving awareness in early stages of security analysis: A zone partition method based on GrC. Appl. Intell.
**2019**, 49, 1063–1077. [Google Scholar] [CrossRef] - Yang, X.; Li, T.R.; Liu, D.; Fujita, H. A temporal-spatial composite sequential approach of three-way granular computing. Inf. Sci.
**2019**, 486, 171–189. [Google Scholar] [CrossRef] - Hu, C.X.; Zhang, L. Incremental updating probabilistic neighborhood three-way regions with time-evolving attributes. Int. J. Approx. Reason.
**2020**, 120, 1–23. [Google Scholar] [CrossRef] - Yu, H. A framework of three-way cluster analysis. Proceeding of the International Joint Conference on Rough Sets, Olsztyn, Poland, 3–7 July 2017; Volume 425, pp. 300–312. [Google Scholar]
- Shah, A.; Azam, N.; Ali, B.; Khan, M.T.; Yao, J.T. A three-way clustering approach for novelty detection. Inf. Sci.
**2021**, 569, 650–668. [Google Scholar] [CrossRef] - Wang, P.X.; Yao, Y.Y. CE3: A three-way clustering method based on mathematical morphology. Knowl.-Based Syst.
**2018**, 155, 54–65. [Google Scholar] [CrossRef] - Jia, X.Y.; Rao, Y.; Li, W.W.; Yang, S.C.; Yu, H. An automatic three-way clustering method based on sample similarity. Int. J. Mach. Learn. Cybern.
**2021**, 12, 1545–1556. [Google Scholar] [CrossRef] - Fan, J.C.; Wang, P.X.; Jiang, C.M.; Yang, X.B.; Song, J.J. Ensemble learning using three-way density-sensitive spectral clustering. Int. J. Approx. Reason.
**2022**, 149, 70–84. [Google Scholar] [CrossRef] - Shah, A.; Azam, N.; Alanazi, E.; Yao, J.T. Image blurring and sharpening inspired three-way clustering approach. Appl. Intell.
**2022**. [Google Scholar] [CrossRef] - Afridi, M.K.; Azam, N.; Yao, J.T. A three-way clustering approach for handling missing data using gtrs. Int. J. Approx. Reason.
**2018**, 98, 11–24. [Google Scholar] [CrossRef] - Wang, P.X.; Chen, X.J. Three-way ensemble clustering forincomplete data. IEEE Access
**2020**, 8, 91855–91864. [Google Scholar] [CrossRef] - Wang, P.X.; Yang, X.B. Three-way clustering method based on stability theory. IEEE Access
**2021**, 9, 33944–33953. [Google Scholar] [CrossRef] - Yu, H.; Chen, L.Y.; Yao, J.T. A three-way density peak clustering method based on evidence theory. Knowl.-Based Syst.
**2021**, 211, 106532. [Google Scholar] [CrossRef] - Fan, J.C.; Wang, X.X.; Wu, T.F.; Zhu, J.; Wang, P.X. Three-way ensemble clustering based on sample’s perturbation theory. Mathematics
**2022**, 10, 2598. [Google Scholar] [CrossRef] - Wu, T.F.; Fan, J.C.; Wang, P.X. An improved three-way clustering based on ensemble strategy. Mathematics
**2022**, 10, 1457. [Google Scholar] [CrossRef] - Vinh, N.X.; Epps, J.; Bailey, J. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J. Mach. Learn. Res.
**2010**, 11, 2837–2854. [Google Scholar] - Bezdek, J.C. Pattern recognition with fuzzy objective function algorithms. Adv. Appl. Pattern Recognit.
**1981**, 22, 203–239. [Google Scholar]

**Figure 1.**The TAO of three-way decision (adapted from [25]).

**Figure 4.**Time comparison of different algorithms on UCI datasets. (

**1**) Wine. (

**2**) Class. (

**3**) Ecoli. (

**4**) Forest. (

**5**) Bank. (

**6**) Iris. (

**7**) Contraceptive. (

**8**) Molecular Biology. (

**9**) Libras. (

**10**) Caffeine Consumption.

ID | Datasets | Samples | Attributes | Classes |
---|---|---|---|---|

1 | Wine | 178 | 13 | 3 |

2 | Class | 214 | 9 | 6 |

3 | Ecoli | 366 | 7 | 8 |

4 | Forest | 523 | 27 | 4 |

5 | Bank | 1372 | 4 | 2 |

6 | Iris | 150 | 4 | 3 |

7 | Contraceptive | 1473 | 9 | 3 |

8 | Molecular Biology | 106 | 52 | 2 |

9 | Libras | 360 | 90 | 15 |

10 | Caffeine Consumption | 1885 | 12 | 7 |

ID | Data Sets | k-Means | FCM | Three-Way k-Means | Proposed Algorithm |
---|---|---|---|---|---|

1 | Wine | 0.6573 | 0.6692 | 0.6831 | 0.6911 |

2 | Class | 0.5981 | 0.6007 | 0.6112 | 0.6366 |

3 | Ecoli | 0.6339 | 0.6335 | 0.6652 | 0.6773 |

4 | Forest | 0.7795 | 0.7540 | 0.7807 | 0.8294 |

5 | Bank | 0.5758 | 0.5969 | 0.6123 | 0.6131 |

6 | Iris | 0.8866 | 0.8933 | 0.9040 | 0.9040 |

7 | Contraceptive | 0.2145 | 0.2179 | 0.2822 | 0.2826 |

8 | Molecular Biology | 0.6037 | 0.6226 | 0.6547 | 0.6659 |

9 | Libras | 0.8611 | 0.9162 | 0.9256 | 0.9240 |

10 | Caffeine Consumption | 0.2005 | 0.1960 | 0.2411 | 0.2422 |

ID | Data Sets | k-Means | FCM | Three-Way k-Means | Proposed Algorithm |
---|---|---|---|---|---|

1 | Wine | 1.7835 | 1.6922 | 1.5542 | 1.5431 |

2 | Class | 1.0475 | 1.2233 | 0.7855 | 0.7596 |

3 | Ecoli | 1.1504 | 1.0273 | 0.9667 | 0.9425 |

4 | Forest | 1.2774 | 1.2253 | 1.200 | 1.1879 |

5 | Bank | 1.1913 | 1.1952 | 1.1332 | 1.1267 |

6 | Iris | 0.7609 | 0.7507 | 0.7236 | 0.7355 |

7 | Contraceptive | 1.2716 | 1.2539 | 1.2323 | 1.2220 |

8 | Molecular Biology | 4.9588 | 4.8236 | 4.6783 | 4.6689 |

9 | Libras | 1.9240 | 1.9126 | 1.9033 | 1.9023 |

10 | Caffeine Consumption | 1.9116 | 1.8072 | 1.6655 | 1.6048 |

ID | Data Sets | k-Means | FCM | Three-Way k-Means | Proposed Algorithm |
---|---|---|---|---|---|

1 | Wine | 0.3383 | 0.2337 | 0.3347 | 0.3574 |

2 | Class | 0.5309 | 0.5543 | 0.5887 | 0.6038 |

3 | Ecoli | 0.4419 | 0.4326 | 0.4433 | 0.4524 |

4 | Forest | 0.4029 | 0.4302 | 0.4559 | 0.4669 |

5 | Bank | 0.5000 | 0.4954 | 0.5111 | 0.5280 |

6 | Iris | 0.6959 | 0.7091 | 0.7114 | 0.7188 |

7 | Contraceptive | 0.4236 | 0.4309 | 0.4597 | 0.4672 |

8 | Molecular Biology | 0.0553 | 0.0538 | 0.0558 | 0.0585 |

9 | Libras | 0.3519 | 0.3000 | 0.3533 | 0.3556 |

10 | Caffeine Consumption | 0.3150 | 0.3491 | 0.3517 | 0.3563 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Guo, Q.; Yin, Z.; Wang, P.
An Improved Three-Way K-Means Algorithm by Optimizing Cluster Centers. *Symmetry* **2022**, *14*, 1821.
https://doi.org/10.3390/sym14091821

**AMA Style**

Guo Q, Yin Z, Wang P.
An Improved Three-Way K-Means Algorithm by Optimizing Cluster Centers. *Symmetry*. 2022; 14(9):1821.
https://doi.org/10.3390/sym14091821

**Chicago/Turabian Style**

Guo, Qihang, Zhenyu Yin, and Pingxin Wang.
2022. "An Improved Three-Way K-Means Algorithm by Optimizing Cluster Centers" *Symmetry* 14, no. 9: 1821.
https://doi.org/10.3390/sym14091821