# Validating Syntactic Correctness Using Unsupervised Clustering Algorithms

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Automated Checker for the Statements of Requirement Specifications

#### 3.1. Design of Automated Checker Using Unsupervised Clustering Techniques

#### 3.2. Implementation of Automated Checker

## 4. Experimental Results

#### 4.1. The Number of Natural Clusters Using Unsupervised Clustering Techniques

#### 4.2. Recommending Similar Templates Using k-Means Clustering Algorithm

## 5. Conclusions and Future Research

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

CNL | Controlled Natural Language |

STK | Standard Technical Korean |

EM | Expectation-Maximization algorithm |

DBSCAN | Density-Based Spatial Clustering of Applications with Noise algorithm |

RESTful API | REpresentational State Transfer Application Programming Interface |

JSON | JavaScript Object Notation |

MSE | Mean Squared Error |

## References

- Kuhn, T. A survey and classification of controlled natural languages. Comput. Linguist.
**2014**, 40, 121–170. [Google Scholar] [CrossRef] - ASD (AeroSpace and Defence Industries, Association of Europe). Simplified Technical English. In Specification ASD-STE100; European community trade mark No. 017966390; European Community: Brussels, Belgium, 2021; 382p. [Google Scholar]
- Congree Language ©. Machine-Aided Author Assistance for Simplified Technical English. Karlsbad, Germany, 2019. Available online: www.congree.com (accessed on 1 July 2022).
- MacKay, D. Chapter 20—An Example Inference Task: Clustering. In Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003; pp. 284–292. [Google Scholar]
- Pourrajabi, M.; Moulavi, D.; Campello, R.J.G.B.; Zimek, A.; Sander, J.; Goebel, R. Model Selection for Semi-Supervised Clustering. In Proceedings of the 17th International Conference on Extending Database Technology (EDBT), Athens, Greece, 24–28 March 2014; pp. 331–342. [Google Scholar]
- Amorim, R.C.; Hennig, C. Recovering the number of clusters in data sets with noise features using feature rescaling factors. Inf. Sci.
**2015**, 324, 126–145. [Google Scholar] [CrossRef] [Green Version] - Goodfellow, I.; Bengio, Y.; Courville, A. Chapter 14. Autoencoders. In Deep Learning; MIT Press: Cambridge, MA, USA, 2016; pp. 499–523. Available online: http://www.deeplearningbook.org (accessed on 1 July 2022).
- European Commission, Directorate-General for Translation. How to Write Clearly. Publications Office, 2011. Available online: https://data.europa.eu/doi/10.2782/29211 (accessed on 15 June 2022).
- Waller, R. What Makes a Good Document? The Criteria We Use; Simplification Centre, University of Reading: Reading, UK, 2011; 35p, Available online: www.simplificationcentre.org.uk (accessed on 1 July 2022).
- Hloch, M.; Kubek, M.; Unger, H. A Survey on Innovative Graph-Based Clustering Algorithms. In The Autonomous Web; Springer: Cham, Switzerland, 2022; Volume 101, pp. 95–110. [Google Scholar]
- Al-jabery, K.K.; Obafemi-Ajayi, T.; Olbricht, G.R.; Wunsch, D.C., II. Clustering algorithms. In Computational Learning Approaches to Data Analytics in Biomedical Applications; Academic Press, Elsevier Inc.: Cambridge, MA, USA, 2020; pp. 29–100. [Google Scholar]
- Balakrishnan, N.; Balas, V.E.; Rajendran, A. Chapter 2—Computational intelligence in healthcare and biosignal processing. In Handbook of Computational Intelligence in Biomedical Engineering and Healthcare; Academic Press, Elsevier Inc.: Cambridge, MA, USA, 2021; pp. 31–64. [Google Scholar]
- Zhang, H.; Li, H.; Chen, N.; Chen, S.; Liu, J. Novel fuzzy clustering algorithm with variable multi-pixel fitting spatial information for image segmentation. Pattern Recognit.
**2022**, 121, 108201. [Google Scholar] [CrossRef] - Kumar, S.N.; Ahilan, A.; Fred, A.L.; Kumar, H.A. ROI extraction in CT lung images of COVID-19 using Fast Fuzzy C means clustering. In Biomedical Engineering Tools for Management for Patients with COVID-19; Academic Press, Elsevier Inc.: Cambridge, MA, USA, 2021; pp. 103–119. [Google Scholar] [CrossRef]
- Lei, Y. 4 – Clustering algorithm-based fault diagnosis. In Intelligent Fault Diagnosis and Remaining Useful Life Prediction of Rotating Machinery; Butterworth-Heinemann: Oxford, UK, 2017; pp. 175–229. [Google Scholar]
- Pons-Vives, P.J.; Morro-Ribot, M.; Mulet-Forteza, C.; Valero, O. An Application of Ordered Weighted Averaging Operators to Customer Classification in Hotels. Mathematics
**2022**, 10, 1987. [Google Scholar] [CrossRef] - Michis, A.A. Multiscale Partial Correlation Clustering of Stock Market Returns. J. Risk Financ. Manag.
**2022**, 15, 24. [Google Scholar] [CrossRef] - Catania, L.J. The science and technologies of artificial intelligence (AI). In Foundations of Artificial Intelligence in Healthcare and Bioscience; Academic Press, Elsevier Inc.: Cambridge, MA, USA, 2021; pp. 29–72. [Google Scholar]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc.
**1977**, 39, 1–38. [Google Scholar] - Do, C.; Batzoglou, S. What is the expectation maximization algorithm? Nat. Biotechnol.
**2008**, 26, 897–899. [Google Scholar] [CrossRef] [PubMed] - Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, USA, 2–4 August 1996; Simoudis, E., Han, J., Fayyad, U.M., Eds.; AAAI Press: Palo Alto, CA, USA, 1996; pp. 226–231. [Google Scholar]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California, Berkeley, CA, USA, 1 January 1967; Volume 1, pp. 281–297. [Google Scholar]
- Franti, P.; Sieranoja, S. K-means properties on six clustering benchmark datasets. Appl. Intell.
**2018**, 48, 4743–4759. [Google Scholar] [CrossRef] - Javed, A.; Lee, B.S.; Rizzo, D.M. A benchmark study on time series clustering. Mach. Learn. Appl.
**2020**, 1, 100001. [Google Scholar] [CrossRef] - Sheng, X.; Zhang, Q.; Gao, R.; Guo, D.; Jing, Z.; Xin, X. K-means Cluster Algorithm Applied for Geometric Shaping Based on Iterative Polar Modulation in Inter-Data Centers Optical Interconnection. Electronics
**2021**, 10, 2417. [Google Scholar] [CrossRef] - Semantic Web Research Center. Korean Morphological Analyzer: Hannanum. KAIST, Republic of Korea. 2016. Available online: http://swrc.kaist.ac.kr/hannanum/ (accessed on 1 July 2022).
- Witten, I.; Frank, E.; Hall, M.; Pal, C. Appendix B—The WEKA Workbench. In Data Mining: Practical Machine Learning Tools and Techniques, 4th ed.; Morgan Kaufmann Publishers: Burlington, MA, USA, 2017; pp. 553–571. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Gupta, V.; Sachdeva, S.; Dohare, N. Chapter 8—Deep similarity learning for disease prediction. In Hybrid Computational Intelligence for Pattern Analysis, Trends in Deep Learning Methodologies; Academic Press, Elsevier Inc.: Cambridge, MA, USA, 2021; pp. 183–206. [Google Scholar]

**Figure 1.**The overall validation process used to prove the correctness of a requirement specification.

**Figure 2.**The architecture of the automated checker for the correctness of requirement specifications.

**Figure 5.**The architecture of the unsupervised clustering techniques used to identify the number of natural clusters.

**Figure 7.**The average number of clusters after 10 runs using EM and DBSCAN algorithms with autoencoding.

**Figure 8.**Comparing the performance of similarity (

**a**) using a Euclidean distance metric, and (

**b**) using a cosine similarity metric.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Noh, S.; Chung, K.; Shim, J.
Validating Syntactic Correctness Using Unsupervised Clustering Algorithms. *Electronics* **2022**, *11*, 2113.
https://doi.org/10.3390/electronics11142113

**AMA Style**

Noh S, Chung K, Shim J.
Validating Syntactic Correctness Using Unsupervised Clustering Algorithms. *Electronics*. 2022; 11(14):2113.
https://doi.org/10.3390/electronics11142113

**Chicago/Turabian Style**

Noh, Sanguk, Kihyun Chung, and Jaebock Shim.
2022. "Validating Syntactic Correctness Using Unsupervised Clustering Algorithms" *Electronics* 11, no. 14: 2113.
https://doi.org/10.3390/electronics11142113