Fuzzy Sets (FSs) were initially introduced in 1965 by Zadeh [1
], with a number of new orders of FSs being introduced over the years and many successful applications in various fields. With the introduction of intuitionistic fuzzy sets (IFSs) by Antanassov [2
], a generalization of the traditional mathematical framework of FSs, they found application in image segmentation [3
] and preprocessing [5
], decision making [6
] and pattern recognition [9
The main characteristic of IFSs are the expression of the degree of membership (membership value–belongingness) and the degree of non-membership (non-membership value–non-belongingness) for elements of a universe through functions. A notable notion in the literature is that of vague sets, proposed by Gau and Buehrer [10
], which, as pointed out by Bustince and Burillo [11
], are identified as IFSs. Other extensions of IFS theory were proposed, such as intuitionistic trapezoidal fuzzy multi-numbers [12
] or fuzzy soft expert sets [13
] defining basic union operations.
With the introduction of IFS theory and their application to the aforementioned fields, appropriate measures that compare the information carried by two IFSs needed to be defined. Consequently, many studies in the literature proposed different types of measures, with the most notable types being that of a distance [14
] and similarity [17
], with the literature having a greater focus on the latter. This can also be highlighted by the reviews conducted through the years, for example [20
], showing a great interest in the field and the need for the definition of appropriate measures.
From the above, the recognition of the importance of this field and the studies that were conducted for the definition of such measures should be highlighted. With the continuing growth of the field and the proposition of even more measures, along with the research that depends on such measures, the development of a library that implements those measures, as well as the general application of the IFSs theory, becomes very important. There are many examples in the literature that show the growth a field can experience through the release of such a library (following the open-source paradigm), with some famous examples being those of Tensorflow [23
] and PyTorch [24
] for deep learning, or OpenCV [25
] for computer vision.
There are numerous libraries and tools available in the literature, with each one focusing on different parts of the FS theory. Fuzzy-rough-learn [26
] builds upon the scikit-learn [27
] library to allow the user to apply machine learning with fuzzy rough sets, providing numerous preprocessors, classifiers, data descriptors and other functionalities. Fuzzycreator [28
] implements some useful tools for the generation of fuzzy sets from the data, their visualization, the representation of fuzzy sets (Interval/General Type 2 and other) and the calculation of different types of membership values. Despite this, the number of implemented measures and types of measures are very small, and, to date, the toolkit itself has not received any important updates. Lastly, S. Topal et al. [29
] presented a Python tool on Bipolar Neutrosophic Matrices that helps with the operations of such matrices, which can also be applied on fuzzy matrices.
Despite the useful tools existing in the literature, there are not any available that focus extensively on the implementation of fuzzy measures, an important aspect of FS theory and its application in other disciplines. Therefore, this paper introduces fsmpy, a Python library that follows the open-source paradigm and both functional and object-oriented programming, exploiting the performance of the NumPy [31
] library. Fsmpy implements both distance and similarity measures that have been proposed in the literature. The library also provides utility functions and objects for the application of other required processes in classification problems, such as an estimator compatible with the well-known library scikit-learn. The library aims to facilitate in the practical application of FS measures and their extension and application in other fields.
2. Design Overview
In this section, we demonstrate the overall architecture of fsmpy, which is divided into three modules: the Main, Utils and Tests modules. For better presentation, Figure 1
shows an outline of the library’s modules and sub-packages.
2.1. Main Module
The main module consists of the distances, similarities and miscellaneous measures, the datasets and the sets files. The measures files contain all the measures that were analyzed in the comparative study conducted by G.A. Papakostas et al. [20
], with some additional similarities from studies [32
]. For now, the miscellaneous measures consist of the Fuzzy Divergence, proposed by J. Fan and W. Xie [39
], and the Index of Fuzziness proposed by T. Chaira and A. Ray [3
For the representation of an FS or IFS, the fuzzy set and intuitionistic fuzzy set objects were implemented, which are extensively used in the library for any process requiring them. As for the datasets module, it contains data from a medical diagnosis dataset, which is commonly used in the literature for the comparison of the performance of a proposed IFS measure. Of course, with the continuous work conducted on the library, more measures, types of measures and fuzzy sets and datasets will be added.
In Table 1
, Table 2
and Table 3
, we show the main features of the methods that are implemented in the library. The first two columns refer to the name of the authors of each function and the corresponding name of the method that implements it. The third column shows the parameters, other than the two fuzzy sets A and B, required by the functions in order to calculate the corresponding measure. The types of these parameters are analyzed in the documentation. In the fourth column, we show the name of the function that the authors use in the original paper to mathematically describe the functions.
2.2. Utils Module
In this module, some very useful tools were implemented, including a classifier object, that included classifiers for several classification tasks. To date, a text classifier has been developed, which follows the methodology proposed by P. Intarapaiboon [54
]. Furthermore, in this module, a function for the calculation of the degree of confidence is provided, a widely used performance index factor proposed by A. Hatzimichailidis et al. [55
As Python has already become a very popular and widely used programming language in the scientific field, the development of novel IFS measures is being undertaken. For this reason, the proposed library provides a function that performs the appropriate checks to validate if a similarity measure, , satisfies the following defined properties:
with and being two IFSs.
2.3. Tests Module
The majority of the proposed measures in the literature present the results obtained from their application in specific examples. By using those examples and results, appropriate test functions were implemented that verify the correct implementation of the corresponding measure.
In this section, the application of the proposed library is presented in different fields, specifically: medical diagnosis, text classification and image segmentation. The examples used in all the applications presented in the following subsections are widely used in the literature, to showcase the performance of a fuzzy measure. Through those examples, the usability of the library is highlighted.
shows a high-level process diagram of the library, highlighting the representation of the fuzzy sets inside the library, while also showing when and where the fuzzy measures can be applied.
3.1. Medical Diagnosis
The application of IFSs in medical diagnosis has been considered by quite a few studies in the literature. In this application example, the data used in many studies was utilized. The data consist of a set of 4 patients (), 6 diagnoses () and 5 symptoms (). The sets are defined as follows:
The aforementioned data are represented using the IFS theory, with a membership and non-membership value representing each characteristic of a diagnosis and each symptom of a patient. Table 4
shows the characteristics of each diagnosis and Table 5
shows each patient’s symptoms, where each element consists of the membership and non-membership values of the corresponding case as
To find the correct diagnosis for each patient, the fuzzy measure between the sets of a patient’s symptoms and that of a diagnosis’ characteristics has to be calculated. Depending on the type of measure that is applied, the correct diagnosis is determined according to the minimum distance or the maximum similarity between the two sets.
In general, to classify a sample set to a specific class, given the class patterns, the presented library provides a function. This function allows the user to classify a sample set by using a single line of the code, with its parameters being the patterns of each class, the sample’s patterns, the measure to be used, its parameters and if the measure provided is a distance or not.
Therefore, by using the aforementioned function, we classified each patient’s symptoms to their corresponding diagnosis, with Table 6
showing the results. As the aim of this study was not to compare the measures that were included in the library; however, to showcase its usability, we used the Euclidean distance between two IFSs, proposed by Atanassov [2
]. Along with the predicted diagnosis, the degree of confidence was also provided (rounded to two decimals). Algorithm 1 shows the pseudo-code steps that are followed to classify each patient’s symptoms and to obtain the degree of confidence of each prediction.
3.2. Text Classification
Text classification using IFS theory was also considered by a few studies. In this example, two popular news datasets [56
] (BBC and BBCSport) were used, constructed from a total of 2225 news articles, belonging to 5 topical areas.
As mentioned in the previous section, the text classification classifier follows the methodology proposed by P. Intarapaiboon [54
], and so the same preprocessing steps are applied to the datasets as well. Thus, 9635 words are obtained for each article, with the BBCSport dataset containing 737 articles on 5 areas and each article is represented by the frequency of 4613 words. For the sake of space and simplicity, the similarity measure used for the classification process is the one proposed by Z. Liang and P. Shi [47
], and the membership and non-membership weights (
) were set to 1.
Thus, similar results to the ones in [54
] were obtained (Table 7
). With the classifier object that is provided by fsmpy, the user can easily use it for text classification and choose any measure provided by the library (or a personally developed one). Additionally, as mentioned in Section 2.2
, by following the estimator implementation of the scikit-learn library, the user can easily connect the classifier object with the grid search algorithm provided by the aforementioned library, to find the best parameters for any chosen measure. To highlight this feature, we applied the grid search over a fixed set of parameter values, obtaining better results.
|Algorithm 1. Pseudo-code of the classification algorithm implemented in fsmpy.|
|Input: class_patterns, sample_pattern, measure_caller, is_distance = True, return_confidence = False, **kwargs|
|measures = |
|for class_pattern in class_patterns do|
| measure = measure_caller(sample_pattern, class_pattern, **kwargs)|
| measure = measure * (is_distance == True ? 1: −1)|
|prediction = index_of_min(measures)|
|min_measure = measures[prediction]|
|other_measures = measures.remove_at(prediction)|
|if return_confidence == True then|
| return prediction, confidence_degree(min_measure, other_measures)|
| return prediction|
3.3. Image Segmentation
Another common application of IFSs is in image segmentation. The implemented algorithm follows the methodology proposed by T. Chaira [3
] (and extended by I. Vlachos and G. Sergiadis [16
]), which aims to minimize the fuzzy divergence between the calculated fuzzy set of the segmented image (for a threshold value,
) and the ideally segmented image (membership values equal to 1).
The proposed library provides a number of grayscale images that are widely used in the literature, for example the rice image, along with the corresponding ground truth segmented image for result evaluation. Figure 3
shows the results after applying the segmentation algorithm with different distance and similarity measures.
Again, the aim of the result’s showcase is not to compare the performance of the implemented measures, but to present the library’s usability and features. Therefore, to segment a given image, , the user only needs to call the function provided by the utils module. This function takes as its parameters the image to be segmented, the measure to be calculated, the measure’s parameters (if needed) and if the measure provided is a distance measure or not.
In this study, we presented a novel Python library named fsmpy, for comparing the fuzzy sets of different types. The library provides the implementation of a number of different distances, the similarity and other types of measures, along with other useful tools required for their implementation and application. By following the implementation designs of other libraries, fmpy becomes fast, easy to use and easy to interconnect with other libraries, such as scikit-learn and NumPy.
Compared to other available toolkits and libraries, fsmpy implements a large number of different fuzzy measures. Moreover, it provides tools for their application in different classification problems and their interconnection with other libraries. It is the first library to present very simple functionalities for different applications (medical diagnosis, text classification and image segmentation). With the open-source paradigm, the library can be extended with additional measures, types of measures and types of FSs, as well as more datasets, classifiers and membership value calculation functions.
At this point, some of the library’s limitations should be discussed. Firstly, there are some tests for the validation of the correct implementation of a measure that fail. This is currently being addressed with a high priority, but can also be attributed to the typographic errors in the publication of the original measure. Similarly, the results obtained in the image segmentation application are not exactly the same as the ones presented in [16
]. Secondly, a very small number of types of fuzzy sets and miscellaneous functions are implemented, and only one evaluation dataset is provided (medical diagnosis). Lastly, when compared to the Fuzzycreator library (a very similar one), fsmpy currently lacks of any visualization functionalities and implements only one membership calculation function.
As for the future works, the aims focus on solving the aforementioned limitations, as well as the following aspects:
Implementation of additional measures (distance, similarity and other types), classifiers and membership value calculation functions.
Further study of the implemented measures and their tests.
Additional tools useful for the development of novel measures (such as the testing of theorems).
Extension of library application and utility functions.