Next Article in Journal
Between the Classes of Soft Open Sets and Soft Omega Open Sets
Previous Article in Journal
Oscillation and Nonoscillatory Criteria of Higher Order Dynamic Equations on Time Scales
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The α-Groups under Condorcet Clustering

by
Tarik Faouzi
1,*,†,
Luis Firinguetti-Limone
1,†,
José Miguel Avilez-Bozo
1,† and
Rubén Carvajal-Schiaffino
2,†
1
Departamento de Estadística, Universidad del Bío-Bío, Concepción 4051381, Chile
2
Departamento de Matemática y Ciencia de la Computación, Universidad de Santiago de Chile, Santiago 9170020, Chile
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(5), 718; https://doi.org/10.3390/math10050718
Submission received: 29 December 2021 / Revised: 16 February 2022 / Accepted: 22 February 2022 / Published: 24 February 2022
(This article belongs to the Section Probability and Statistics)

Abstract

:
We introduce a new approach to clustering categorical data: Condorcet clustering with a fixed number of groups, denoted α -Condorcet. As k-modes, this approach is essentially based on similarity and dissimilarity measures. The paper is divided into three parts: first, we propose a new Condorcet criterion, with a fixed number of groups (to select cases into clusters). In the second part, we propose a heuristic algorithm to carry out the task. In the third part, we compare α -Condorcet clustering with k-modes clustering. The comparison is made with a quality’s index, accuracy of a measurement, and a within-cluster sum-of-squares index. Our findings are illustrated using real datasets: the feline dataset and the US Census 1990 dataset.

1. Introduction

In 1909, Jan Czekanowski proposed the first clustering method [1]. This kind of method has become fundamental to many branches of statistics and social sciences. With clustering, we seek to classify a set of objects into relatively homogeneous groups, which are usually referred to as clusters. That is, for a given dataset, the goal of a cluster analysis is to define a set of clusters and to assign to each of them the observations that some distances or similarity measures are close to each other, while observations between clusters are away from each other. There are increasing discussions surrounding the best clustering method, as one can gather from the large number of review articles (see for example [2,3,4,5,6]). Many authors have proposed different clustering algorithms, and most techniques and algorithms deal with quantitative data. However, categorical data are common, particularly in the social sciences [7,8,9,10,11,12]. As such, applying clustering methods to categorical data is important, and methods have been proposed to deal with these types of data. An extension of the k-means approach to clustering, the k-modes clustering [13], is prominent among these. In this paper, we present a novel algorithm to group qualitative data: an extension of Condorcet clustering [14]. We demonstrate that, with a fixed number of clusters, a unique partition of the data could be achieved by maximizing a Condorcet’s criterion [14]. We developed a heuristic algorithm that proved to be very useful. Moreover, an adjustment rate index was used to evaluate the quality of the partition of k-modes and α -Condorcet on the basis of real datasets. The rest of the paper is organized as follows: in Section 2, we present some related work. In Section 3, we introduce some relevant concepts and definitions. In Section 4, we present some theoretical results. The clustering algorithm is presented in Section 5. Using real data, in Section 6, we compare α -Condorcet clustering to k-modes clustering. Finally, our concluding remarks are given in Section 7.

2. Related Work

Clusters may be regarded as crisp or fuzzy. In fuzzy clustering, an observation may belong to more than one cluster with given probabilities, whereas in crisp clustering, an observation belongs to one and only one cluster. Most clustering algorithms, but not all, may be classified into two categories: partitioning and hierarchical algorithms.
k-means is prominent among the partitional methods, and is one of the most popular techniques for clustering quantitative data [15,16,17,18]. Given a set of n multivariate observations, x 1 , x 2 , , x n where x i is a d dimensional vector, the k -means algorithm partitions the data into k n clusters, S = ( S 1 , S 2 , , S k ) , such that the sum of squares within each cluster is minimized. That is, k-means seeks to minimize:
argmin S i = 1 k x j S i x j μ i 2 ,
where μ i is the mean of point in S i . This algorithm is fast and easy to implement [16,18]. Once the number of clusters is defined, this method chooses, at random, k points in the attribute space as initial values. After that, observations are assigned to the closest cluster and the centroids are updated. Because the algorithm does not guarantee convergence to the global optimum and since it is usually a fast algorithm, it is common to run it multiple times with different starting conditions. This method may, however, be badly affected by outliers.
Several methods have been proposed to deal with qualitative data or mixed data. The k-modes and k-prototype methods are prominent among these, as proposed by Huang [13], which are extensions to the k-means (see Table 1).
k-modes, in particular, is the k-means method, but with the Euclidean distance metric substituted by a simple matching dissimilarity measure, where the centers of the clusters are represented by their modes instead of the means. To introduce k-modes, let X and Z be two objects described by n categorical attributes. Then, a simple dissimilarity measure between these objects is the total number of mismatches of the corresponding values of the attributes of the two objects. That is
d 1 ( X , Z ) = j = 1 n δ ( x j , z j ) ,
where
δ ( x j , z j ) = 0 i f x j = z j , 1 i f x j z j .
Let S = { X 1 , X 2 , , X m } be a set of m objects described by n categorical attributes denoted by υ j , j = 1 , 2 , , n . Then a mode of S is a vector Q = [ q 1 , q 2 , , q n ] that minimizes
D ( S , Q ) = i = 1 m d 1 ( X i , Q ) ,
Q not necessarily an object of S . Finally, the k-modes algorithm partitions the set of m objects described by n categorical attributes into k clusters, S i , i = 1 , , k , by minimizing the following expression:
D ( S , Q ) = i = 1 k X S i d 1 ( X , Q i ) ,
where Q i is the mode of cluster S i . For a survey of k-modes see [19]. For a different approach to clustering categorical data, see [20].
Although the k-modes method has the advantage of being scalable to very large datasets, the final solution may be influenced by the initialization criterion of using random initial modes as centers. A number of suggestions have been made to overcome the shortcomings of k-modes. For example, Lakshmi [21] propose a different algorithm to overcome the initialization problem of k-modes. Moreover, Dorman [22] adapt the Hartigan algorithm for k-means and develop several approaches to selects the initial centroids to improve the efficiency of k-modes. Two other approaches to initialize the k-modes algorithm are given in [23,24]. A fuzzy version of the k-modes algorithm is proposed by Huang [25] to improve the performance of k-modes. Other fuzzy versions of the k-modes method are given in [26,27,28]. Ng [29] modify a simple matching dissimilarity measure to obtain clusters with intra-similarity and describe extensions of k-modes to cluster efficiently large categorical datasets. A different dissimilarity measure is provided by Cao [30].
For different approaches to clustering categorical data, see [20,31,32,33].
Besides the k-modes algorithm, Huang [25] also proposes the k-prototype, an algorithm that integrates the k-means and k-modes algorithms to cluster mixed types of objects. The dissimilarity between two mixed-type objects, X and Z, which are described by υ 1 r , υ 2 r , , υ p r , υ p + 1 c , , υ n c , may be measured by
d 2 ( X , Z ) = j = 1 p ( x j z j ) 2 + γ j = p + 1 n δ ( x j , z j ) .
Of course, the first term corresponds to the squared Euclidean distance, which is applied to the quantitative attributes and the second term is the simple matching dissimilarity measure, which is applied to the qualitative attributes. γ is a weight used to avoid favoring either type of attribute. Thus k-prototype seeks to minimize the following cost function:
P ( W , Q ) = l = 1 k i = 1 n w i l j = 1 p ( x i j q l j ) 2 + γ i = 1 m w i l j = p + 1 n δ ( x i j , q l j ) ,
where W is an n × k partition matrix with elements w i j , i = 1 , 2 , , m and j = 1 , 2 , , k ; Q = { Q 1 , Q 2 , , Q k } is a set of objects in the same object domain.
Next, Marcotorchino [14], Michaud [34,35,36] were the first to propose a clustering method for categorical data using a dissimilarity measure. These authors developed the relational analysis theory, and introduced the relation aggregation problem in order to solve the Condorcet’ s paradox in the voting system, and relate it to the similarity problem.
This approach consists of using pairwise comparisons and applying the simple majority decision rule. Indeed, aggregating equivalence relations using the simple majority decision rule guarantees optimal solutions under some constraints and without fixing a priori the number of groups. In our work, we used the approach introduced by Michaud and Marcotorchino, setting a priori the number of groups.

3. Materials and Methods

Let N = { v 1 , , v n } be a set of n variables and S = { x 1 , , x m } a set of m objects. Let C be a Condorcet matrix, with elements c x i x j , corresponding to the number of variables for which x i is similar to x j , denoted by x i v k x j , and Y = ( y x i x j ) i , j = 1 m is a matrix, such that
y x i x j = 1 i f x i x j , 0 i f x i x j .
For two given objects, x i and x j , with x i v k x j we mean that x i and x j have the same value for the variable v k with k = 1 , , n , while x i x j means that x i and x j are similar.
In the relational analysis methodology, Marcotorchino [14] suggest the maximization of Condorcet’s criterion, under some restrictions, given by
f ( Y ) = x j x i ( c x i x j y x i x j + c ¯ x i x j y ¯ x i x j ) ,
with y x i x j + y ¯ x i x j = 1 , c ¯ x i x j + c x i x j = n and Y = ( y x i x j ) i = 1 , j = 1 m is a matrix that maximizes the function f ( · ) given in Equation (6). This matrix takes values 0 and 1.
Then, the model associated with the absolute global majority is defined by:
P max Y f ( Y ) y x i x j { 0 , 1 } y x i x j + y ¯ x i x j = 1 0 y x i x j + y x j x k y x i x k 1 ,
where Y is the matrix of similarities. The first constraint represents the binarity, the second restriction represents the symmetry and the third restriction is the transitivity.
The following example explains how to obtain the matrix Y, which maximizes the Condorcet’s criterion under the restrictions given below.
Let E be a dataset that is composed of three items ( x 1 , x 2 , x 3 ) , with three qualitative variables v 1 , v 2 , v 3 being measured. The dataset is presented in Table 2.
Using Table 2, we identify the matrix of Condorcet C, which is given by C = 3 2 1 2 3 0 1 0 3 . Then, the possible solutions Y that satisfy the constraints are
Y 1 = 1 1 0 1 1 0 0 0 1 , Y 2 = 1 0 0 0 1 1 0 1 1 , Y 3 = 1 0 1 0 1 0 1 0 1 , Y 4 = 1 0 0 0 1 0 0 0 1 , Y 5 = 1 1 1 1 1 1 1 1 1 .
Next, we compute the function f ( · ) for each matrix Y i , i = 1 , 2 , 3 , 4 , 5 . Indeed, we have f ( Y 1 ) = 23 , f ( Y 2 ) = 15 , f ( Y 3 ) = 19 , f ( Y 4 ) = 21 and f ( Y 5 ) = 15 . We deduce that Y 1 maximizes the Condorcet’s criterion. Finally, we obtain the number of clusters, equal 2, and the clusters are { x 1 , x 2 } and { x 3 } .
Although, the proposed method of clustering does not require fixing the number of classes beforehand, there are instances where this is not convenient, such is the case in a psychometric analysis. In this paper, we take this point of view and assume that the number of clusters is given.
Therefore, in this paper, we fix the number of groups denoted by α , and focus on finding the solution of Equation (7), giving its algorithm and comparing its results to k-modes for some fixed value of α .
Next, let us denote by S = k = 1 α S k a partition with respect to the set of objects S. In this partition, the number of clusters is α .
Recall that the matrix C represents the similarities between pairs of objects that we want to cluster. Similarly, we introduce a matrix of dissimilarities between pairs of the same objects and we denote it by C ¯ . Next, we define n categorical variables denoted by v k ; k = 1 , , n , and let v k i be a modality of v k assigned to object i S . Then, we write that each variable v k is associated with a matrix C k . As a consequence, we obtain
k = 1 n C k = C ,
where the elements of matrix C k are given by
c x i x j k = 1 if x i and x j have the same modality of v k 0 otherwise .
By abuse of notation, we write c x i x j = c i j and y x i x j = y i j .
Using Equation (8), the general terms of the collective relational matrix C are given by c i j = k = 1 n c i j k . Furthermore, we define the general terms of the collective relational matrix C ¯ as c ¯ i j = k = 1 n c ¯ i j k . Note that c ¯ i j represents the number of variables for which x i and x j are not similar.

4. Main Theoretical Results

4.1. α -Condorcet Criterion Function

Now, we present our first important result: a new Condorcet criterion function.
Definition 1.
Let ( S k ) k { 1 , , α } be a partition of a set of objects S. We define a new Condorcet criterion function g as
g ( S ; α ) = k = 1 α 1 i , j m c i j S i k S j k + c ¯ i j ( S ¯ i k S j k + S i k S ¯ j k ) .
with i S k if S i k = 1 and i S k if S i k = 0 .
Using Equations c i j + n = c ¯ i j and S i k + S ¯ i k = 1 , the formula above can be rewritten as follows
g ( S ; α ) = k = 1 α 1 i , j m ( 3 c i j 2 n ) S i k S j k + c ¯ i j ( S i k + S j k ) .
Knowing the exact number of groups, the following model allows to group the objects by similarity in the sense of having common characteristics. Then, we obtain the partition S = k = 1 α S k by maximizing the following function
P max S g ( S ; α ) S i k { 0 , 1 } i = 1 m S i k 1 k = 1 α S i k = 1
where
S i k = 1 if x j S k , 0 otherwise .
Using the third restriction of P , the function g can be simplified to the following expression
g ( S ; α ) = i , j = 1 m k = 1 α ( 3 c i j 2 n ) S i k S j k + 2 c ¯ i j .
The next theorem ensures the existence of at least one solution to the problem given in Equation (10).
Theorem 1.
Let S = k = 1 α S k = { x 1 , , x m } with S i S j = α m . Then, there exists at least a partition of S that maximizes ( P ) .
Proof of Theorem 1.
We know that if the number of objects m is inferior to the number of clusters α , then no solution of ( P ) exists.
We now suppose that the number of objects is superior to the number of clusters α . Then, we have d possible partitions, where the parameter d is the number of partitions of the set of m objects into α clusters, which can be expressed as follows
d = w 1 + w 2 + + w α = m , w 1 w 2 w α m ! w 1 ! w 2 ! w α ! ,
where w i , i = 1 , , α , is the number of objects in the cluster S i . Furthermore, d is a positive integer. Finally, there exists at least a partition of S that maximizes ( P ) . □
Theorem 2.
We assume that the dataset does not present the Condorcet’ s paradox. Then, for some value of α, there exists a unique partition of S that maximizes ( P ) .
Proof of Theorem 2.
To simplify the calculations, we consider m = 3 . We suppose that the object x 1 is similar to x 2 and not similar to x 3 , then, under the absence of Condorcet’ s paradox, we conclude that x 2 is not similar to x 3 . Finally, there exists a unique partition ( { x 1 , x 2 } , { x 3 } ) that maximizes ( P ) . □
Next, we give some axiomatic conditions studied by Michaud [37], presenting some axiomatic conditions verified by Condorcet’s rule that respond to K. Arrow’s impossibility theorem, presented below.
Theorem 3
(Theorem of K. Arrow). According to Arrow’s impossibility theorem, it is impossible to formulate a social ordering without violating one of the following conditions:
1. 
Non-dictatorship: the voter’s preference cannot represent a whole community. The wishes of multiple voters should be taken into consideration.
2. 
Pareto efficiency: unanimous individual preferences must be respected. If every voter prefers candidate A over candidate B, candidate A should win.
3. 
Independence of irrelevant alternatives: if a choice is removed, then the others order should not change. If candidate A ranks ahead of candidate B, candidate A should still be ahead of candidate B, even if a third candidate, candidate C, is removed from participation.
4. 
Unrestricted domain: voting must account for all individual preferences.
5. 
Social ordering: each individual should be able to order their choices in a connected and transitive relation.
Then, Michaud [37] proved that the rule of Condorcet verifies some conditions given in the following Theorem. The following result concerns the verification of these conditions by the α -Condorcet method given in Equation (10).
Theorem 4
(Axiomatic conditions). In the context of similarity aggregation problems, the rule of Condorcet verifies the following conditions for some values of α:
1. 
Non-dictatorship condition: this condition means that no variable can, by itself, determine an item (individual or object) classification maximizing the Condorcet criterion.
2. 
Pareto pair unanimity condition: if all variables are presented in two items, then these two items must be found in the same cluster.
3. 
Condition of total neutrality: the classification obtained must be independent of the order of individuals or items or variables.
4. 
Condition of coherent union: if two disjointed sets of variables give the same partition, then the union of the two sets will give the same partition.

4.2. Total Inertia

We now focus on the inertia or within-cluster sum-of-squares. First, we present some preliminaries. Let ( S k ) k = { 1 , , α } be a partition of S. We build a cloud of points in R n , denoted N ( S ) , in which each dimension corresponds to a category of the variable v j , j = 1 , , n . Let N ( S ) = { ( A i , μ i ) : i S } be a cloud of mass points, where A i is the coordinate point of object x i and μ i is its corresponding mass. In general, the expression of within-cluster sum-of-squares is given by
I w = k α i S k μ i A i G k 2 ,
where G k is cloud’s center of gravity of cluster S k .
The within-cluster sum-of-squares is a measure of how the objects are similar in each cluster. However, this measure does not allow making decisions regarding the quality of the partition: for values of inertia close to zero, the quality of the partition is better.
Next, let p 1 , , p n be n modalities, respectively, of variables v 1 , , v n and let p a fixed parameter, such that p = k = 1 n p k . Then, we define c ^ i j = k = 1 p ϑ i k ϑ j k ϑ · k , where ϑ i k is one if object i has a modality k and zero otherwise, and ϑ · k is the total of objects that have the same modality k.
In the next theorem, we present an expression for inertia given by [38,39].
Theorem 5.
A relational expression of within-cluster sum-of-squares is given by
I w = 1 n p i = 1 m j = 1 m c ^ i j y i j y i · ,
Note that this expression does not require the number of clusters to be specified a priori.
The following result is an application of Equation (12) and Theorem 5.
Theorem 6.
A new relational expression of the within-cluster sum-of-squares, with the number of clusters α fixed, is given by
I w = 1 n p i = 1 m j = 1 m c ^ i j k = 1 α S i k S j k | S k | ,
where | S k | is the cardinality of cluster S k .
Proof of Theorem 6.
To prove this result, we need to consider the chi-square metric to find and compute a closed form of the expression of the within-cluster sum-of-squares. First, we define some preliminary elements. Let k i j be a general term given by
k i j = 1 if x i has the modality j ; 0 otherwise ,
with k i . = j = 1 p k i j , k . j = i = 1 m k i j and k . . = i = 1 m k i . .
Then, we have
I w = k = 1 α i S k μ i A i G k 2 = k = 1 α i S k μ i j = 1 p k . . k . j ( A i j G k j ) 2 ,
where A i j = k i j n , G k j = 1 ν k i S k μ i A i j and ν k = i S k μ i = n k m with n k = | S k | is the cardinality of class S k .
Note that μ i is the mass of each individual given by k i . i j k i j = n n m = 1 m .
It follows that
I w = k = 1 α i S k 1 m j = 1 p n × m k . j k i j n m n k × m i k i j n 2 = k = 1 α i S k j = 1 p 1 n k i j k . j 1 n k i S k k i j k . j 2 = k = 1 α j = 1 p 1 n i S k k i j k . j ( i S k k i j ) 2 n k k . j .
Now, we compute both right hand terms of Equation (17). We have,
k = 1 α j = 1 p 1 n i S k k i j k . j = 1 n j = 1 p k . j k . j = p n ,
and
k = 1 α j = 1 p 1 n ( i S k k i j ) 2 n k k . j = j = 1 p 1 n × k . j k = 1 α 1 n k i S k i S k k i j k i j = j = 1 p 1 n × k . j k = 1 α i = 1 m i = 1 m k i j k i j S i k S i k | S k | = i = 1 m i = 1 m 1 n j k i j k i j k . j k = 1 α S i k S i k | S k | = 1 n i = 1 m j = 1 m c ^ i j k = 1 α S i k S j k | S k | .
Finally, we obtain
I w = 1 n p i = 1 m j = 1 m c ^ i j k = 1 α S i k S j k | S k | .
We next introduce a quality index given by
C a = max S 1 , , S α g ( S 1 , , S α ; α ) m 2 n .
This index was given in [14], and measures the quality of the partition.

5. Algorithm

Several algorithms have been given in the past three decades to solve problem (7) by linear programming techniques, when the population under study is relatively small. Unfortunately, the classical linear programming techniques require many restrictions. In this case, the heuristic method has been adopted in order to process large amounts of data. Although these heuristic algorithms are fast, they do not always ensure an optimal solution.
Next, for all points ( x i , x j ) S × S , we maximize the expression
k = 1 α i , j = 1 m ( 3 c i j 2 n ) S i k S j k .
Note that the term 2 c ¯ i j is omitted because it is constant.
The above formula represents the series of links between objects x i and x j , denoted by L i j , and we write L i j = 3 c i j 2 n . Moreover, we denote the general link by L = 3 C 2 n × 1 , where 1 is a matrix with elements equal to 1.
The α -Condorcet clustering algorithm is illustrated by some steps given in Algorithm 1. Given a database D of m points in R n and partition S 1 , , S m of S, such that S k = { x k } , k = 1 , , m .
Similar to the algorithm given by [40], we compute the following steps.
Algorithm 1. Heuristic Algorithm  α -Condorcet.
Input α : Number of partitions
s c : Number of observations
x k , k = 1 , , s c : observations
n v : Number of variables
D s c × n v : Feature Matrix
Output S 1 , , S α : is a partition of α clusters
1 C s c × s c GenCondorcetGM ( D s c × n v ): the generation of the Condorcet matrix C
2 F 2 × C s c × s c n v × 1 s c × s c :    where 1 s c × s c is a matrix of ones
3 S k { x k }
4 I n i s c
5 P 1 k = 1 I n i S k
6for  j 1   to   I n i   do
7    for  i 1  to  I n i  do
8        if  ( i j )  then
9             L i , j C i , j
10        endif
11    endfor
12     K j M a x ( L · , j )       where L · , j is the largest value of the
                 j - t h column of matrix L
13endfor
14     P o s a 2 F i r s t P o s i t i o n ( Max ( K ) )       is the position of the first occurrence
                                of the largest value of vector K
15     S P o s 1 x P o s a 1 , x P o s a 2
16    EliminateGroup( S P o s a 2 ):    Elimination of the cluster S P o s a 2 = { x P o s a 2 }
17     I n i α 1
18     a Max ( K )
19     C P o s a 2 , P o s a 1 C P o s a 1 , P o s a 2 N u l l
20     b Max C · , P o s a 2
21     P o s b F i r s t P o s i t i o n ( Max C · , P o s a 2 )
22    if ( a = b I n i > α ) then
23     G G e n C o m b i ( S p o s a 1 , x p o s b 1 , F ) : GenCombi gives the combination
                                which maximizes the link F
24     r C a r d ( S P o s a 1 ) : C a r d is the cardinality function
25    if  ( r < C a r d ( G ) )  then
26         I n i I n i 1
27         C P o s a 2 , P o s a 1 C P o s b 1 , P o s b 2 N u l l
28         S P o s a 1 G , goto 5
29    else
30            goto 5
31    endif
32endif
33if  ( a > b I n i > α )  then
34    goto 5
35endif
36if  ( I n i = α )  then
37    return ( S 1 , , S α )
38endif
1.
First, we find the largest value in each column of Condorcet’s matrix C, which corresponds to the number of characteristics that a pair of observations share. We then take the position of the largest value, denoted by a. In this case, we put those observations in the same group S k , with k representing the kth column.
2.
Next, we remove the value a in the matrix C and define b as the largest value of the kth column.
3.
We distinguish some conditions:
3.1
If a > b , we repeat the first point.
3.2
If a = b , then the Condorcet’s criterion is applied. We group the elements that maximize the Condorcet criterion.
4.
We repeat the process.
5.
This process stops when the α groups are identified

5.1. Illustrative Example Using the Heuristic Algorithm

We now consider a dataset D, which is composed of six items ( x 1 , x 2 , , x 6 ) with three qualitative variables v 1 , v 2 , v 3 being measured. The dataset is presented in Table 3.
Then, the Condorcet’s matrix C is given in the Table 4.
In general, the diagonal of Condorcet’s matrix represents the number of variables measured in this group of observations, it also represents the maximum possible similarity that can occur between two observations x i and x j with i , j = 1 , 2 , , 6 . For our heuristic algorithm, we replace the diagonal numbers by zero.
The goal of this example is to create our partition P such that P = G i G j , i , j = 1 , , 6 , fixing the number of groups α = 2 . Before creating this partition, we suppose that each element represents a group and we write G i = { x i } with i = 1 , , 6 . Let K be a vector whose elements are the maximum in each vector of Condorcet’s matrix. Then, we have K = ( 3 , 2 , 3 , 3 , 2 , 3 ) . So, we identify the first maximum number of the vector K, called a, with a = 3 , and its position p = ( 4 , 1 ) , which represents the fourth position of the first column.
The following step is to put together the elements x 4 and x 1 in the same group G 1 . In the first column, we eliminate the fourth value, and we write:
x 1 x 2 x 3 x 4 x 5 x 6
x 1 -21311
x 2 2-0200
x 3 10-123
x 4 -21-11
x 5 1021-2
x 6 10312-
Computing the maximum of the first column, we obtain b = 2 . Comparing the value of both parameters a and b, we find that a > b . Thus, we have K = ( 3 , 2 , 3 , 3 , 2 , 3 ) . Then, the vector K is recalculated without considering columns 1 and 4, obtaining K = ( 0 , 2 , 3 , 0 , 2 , 3 ) . So, we identify the first maximum number of the vector K with a = 3 , and its position p = ( 6 , 3 ) that represents the last position of the third column. In the third column, we eliminate the sixth value, and we write:
x 1 x 2 x 3 x 4 x 5 x 6
x 1 -21311
x 2 2-0200
x 3 10-123
x 4 -21-11
x 5 1021-2
x 6 10-12-
Computing the maximum of the third column, we obtain b = 2 . In this case, a > b . Again, the vector K is recalculated without considering columns 1 , 3 , 4 and 6, and we have K = ( 0 , 2 , 0 , 0 , 2 , 0 ) . The first maximum of vector K is in the second position, and we have a = 2 , with position in the Condorcet matrix given by p = ( 1 , 2 ) . The last position lead to add the element x 2 to the group G 1 . We eliminate the first value of the second column, and write:
x 1 x 2 x 3 x 4 x 5 x 6
x 1 --1311
x 2 2-0200
x 3 10-123
x 4 -21-11
x 5 1021-2
x 6 10-12-
Next, the maximum of the second column is equal to 2, and we have b = 2 with a position ( 4 , 2 ) . Both parameters a and b are equal. In this case, it is not necessary to carry out combinatorics between { x 1 , x 4 } and { x 2 } because the element x 4 of position ( 4 , 2 ) belongs to G 1 . Now, we define a new vector K without the second column given by K = ( 0 , 0 , 0 , 0 , 2 , 0 ) . The first maximum of vector K can be found in position p = ( 3 , 5 ) and p = ( 6 , 5 ) meaning that the element x 5 can be in the first group or the second group. In this case, we must check which of the two partitions maximizes the Condorcet’s criterion function. After simple calculations, we deduce that x 5 belongs to G 2 . Finally, we obtain two groups G 1 = { x 1 , x 2 , x 4 } and G 2 = { x 3 , x 5 , x 6 } .

5.2. Advantage of Heuristic Algorithm

The main goals of this section are threefold. Firstly, we compare the partition quality, given in Equation (21), for the feline dataset using both exact and heuristic algorithms. Secondly, we use the inertia index, given in Equation (13), to compare the exact and heuristic algorithms. Finally, the execution time of the two methods is compared. For each step, we choose the first m = 5 , 7 , 9 felines of the feline dataset.
Table 5 shows that the use of the exact algorithm is ineffective due to some problems that occur when the sample size increases. Furthermore, the inertia and quality indexes of both heuristic and exact algorithms are almost equal.
Finally, observing the last column of Table 5, when the data size is equal to m = 9 and α = 4 , the execution of the exact algorithm takes 15.2 s, while the execution time of the heuristic algorithm is 0.36 s for m = 30 and α = 4 . Fixing again m = 9 and α = 6 , we observe that the execution time of the exact algorithm increases considerably, 586.95 s, compared to the execution time for α = 4 , 5 . Furthermore, for m = 30 , the quality and inertia indexes cannot be computed for exact algorithm; however, we know that the exact algorithm provide an optimal solution. Then, we can deduce that its quality index is at least as large as the quality of heuristic algorithm. Consequently, we confirm that the exact algorithm is computationally very expensive compared to the heuristic algorithm. Note that for the exact algorithm, the use of large datasets generates two important problems, the first is related to the execution time, while the second is concerned with the temporal storage space of data required by the programs being used at a particular moment (e.g., R-project).

6. Comparison between α -Condorcet and k -Modes

Firstly, in this section, we describe the experiments and their results. We ran our algorithm on feline datasets obtained from [14] and presented in Table A1 and Table A2 from the Appendix A. We tested the performance of α -Condorcet clustering against the k-modes algorithm. Our algorithms were implemented in R language. The α -Condorcet algorithm was implemented according to the description given above, and for k-modes, we used the algorithm as already implemented in R language. The quality of the partition was compared using the fit rate [14] given by Equation (21).
In previous studies [14,37], the similarity aggregation method gave an optimal solution of four groups. This solution was closer to the classification recognized by zoologists by species and genus (Figure 1).
On the other hand, in the partition into 4 = four groups, applying the k-modes algorithm, it is observed that certain species belong to more than one group and that it does not agree with the classification recognized by zoologists (Figure 2).
The accuracy of measurement, given in Equation (22), of both solutions given in Figure 1 and Figure 2, is 1 and 0.83, respectively.
Accuracy = Number of correct predictions Total number of predictions .
Next, a comparison was made between the α -Condorcet method and the k-modes method for different values of α in order to find the best method that fit the feline data through the within-class inertia index and adjustment rate given in Equations (13) and (21) respectively.
Figure 3, contrasts the quality of groupings by means of the adjustment rate. Therefore, it is observed that the α -Condorcet method presents a better quality of partition than the k-modes method for different values of α .
Figure 4, contrasts the quality of clustering fit through inertia, in the same dataset. In this figure, it is concluded that the intra–class inertia is almost the same for both methods with different values of α .
We now use the 1990 US Census dataset to compare the heuristic with k-modes algorithm. This dataset contains a 1% sample of the public use microdata sample person records drawn from the full 1990 census sample. For further references, see https://archive.ics.uci.edu/ml/datasets/US+Census+Data+%281990%29 (accessed on 20 December 2021). The comparisons between both methods were made with 50, 100, 150, and 200 observations.
Table 6 shows that the inertia index is almost the same for both algorithms. However, we observe that the heuristic algorithm is better than k-modes algorithm from the point of view of the quality index.

7. Conclusions

In clustering categorical data, many researchers have succeeded in developing unsupervised classification methods without fixing the number of classes a priori. Fixing the number of clusters beforehand, may be major drawback.
However, sometimes it is convenient to identify beforehand the number of groups, as, for instance, in psychometrics. Several methods have been proposed with a known number of clusters. We believe, however, that these methods do not always provide optimal solutions. For this reason, we proposed a new method with a fixed number of groups. This new approach is an extension of the Condorcet method. Although, the exact algorithm of this new approach gives an optimal solution, it consumes too much time. Hence, the heuristic algorithm was introduced. Table 5 shows that the proposed algorithm produces almost the same values of quality and inertia indexes as the exact algorithm.
Next, comparing our approach with k-modes for the feline data, we found that the accuracy index gave better result for the heuristic algorithm. In this case, the comparison was made with the precision index because we know a priori that the number clusters, α , is 4. This comparison was also made with the US Census 1990 data, using both partition quality and intra-class inertia indexes. The results in Table 6 show that both methods have almost the same inertia. However, the heuristic algorithm shows an improvement over k-modes in terms of partition quality. Consequently, the following may be concluded:
  • We proposed a heuristic algorithm as an alternative algorithm to the exact one. This gives the same or an approximate solution as the exact one.
  • From the simulations presented in Table 5, we can conclude that the heuristic algorithm is faster than the exact algorithm.
  • The heuristic algorithm produces similar (or even better) results to k-modes.
  • We conclude that α -Condorcet is a valid technical competitor with respect to the k-modes clustering technique.

Author Contributions

Conceptualization: T.F. and L.F.-L.; methodology: T.F. and L.F.-L.; software: R.C.-S., T.F. and J.M.A.-B.; formal analysis, writing—review, and editing: T.F. and L.F.-L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by FONDECYT (grant 11200749) and the University of Bío-Bío (grant DIUBB 2020525 IF/R). Partial support was provided by the university of Bío-Bío to Luis Firinguetti-Limone (grant DIUBB 183808 3/R).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The dataset of Table A1 has 30 felines and 14 variables that describe the characteristics of each feline. The full name and the modalities of each variable are described in the following Table A2.
Table A1. The feline dataset is a multivariate dataset introduced by P. Michaud and F. Marcotorchino in their articles [14,34,37].
Table A1. The feline dataset is a multivariate dataset introduced by P. Michaud and F. Marcotorchino in their articles [14,34,37].
French NameEnglish NameTipopielLongpoillRetractComportOriellesLarynxTaillerPoidsLongueursQueueDentsTypproieArbreChasse
LionLion1.000.001.001.001.001.003.003.003.002.001.001.000.001.00
TigreTiger3.000.001.003.001.001.003.003.003.002.001.001.000.000.00
JaguarJaguar2.000.001.002.001.001.003.003.002.001.001.001.001.000.00
LeopardoLeopard2.000.001.003.001.001.003.003.002.002.001.002.001.000.00
OnceOncilla2.001.001.001.001.001.002.002.002.003.001.002.001.000.00
GuepardoCheetah2.000.000.001.001.000.003.002.002.003.000.002.000.001.00
PumaPuma1.000.001.002.001.000.002.003.002.003.001.002.001.000.00
NebulClouded leopard4.000.001.003.001.001.002.002.002.003.001.003.001.000.00
ServalServal2.000.001.001.002.000.002.002.002.001.000.003.001.001.00
OcelotOcelot2.000.001.002.001.000.002.002.002.002.000.003.001.000.00
LynxLynx2.001.001.002.002.000.002.002.002.001.001.002.001.000.00
CaracalCaracal1.000.001.002.002.000.002.002.001.001.000.003.001.001.00
ViverrinFishing cat2.000.001.002.001.000.001.001.002.002.000.003.000.000.00
YaguarunJaguarundi1.000.001.002.001.000.001.002.002.003.000.003.001.000.00
ChausChaus1.001.001.003.002.000.001.002.001.002.000.003.001.000.00
DoreGolden cat1.000.001.003.001.000.001.001.001.002.000.003.001.000.00
MerguayMargay2.000.001.003.001.000.001.001.001.002.000.003.001.000.00
MargeritSand cat1.001.001.002.001.000.001.001.001.002.000.003.000.000.00
CaferCaffer cat3.000.001.003.001.000.001.001.001.002.000.003.001.001.00
ChineChinese mountain cat1.000.001.002.002.000.001.001.001.001.000.003.001.000.00
BengaleBengal cat2.000.001.003.001.000.001.001.001.002.000.003.001.000.00
rouilleuRusty spotted cat2.000.001.002.001.000.001.001.001.002.000.003.001.000.00
MalaisMalai1.001.001.003.001.000.001.001.001.001.000.003.001.000.00
BorneoBornean bay cat1.000.001.003.001.000.001.001.001.002.000.003.001.000.00
NigripesBlack footed cat2.000.001.002.001.000.001.001.001.001.000.003.001.001.00
ManulManul1.001.001.003.001.000.001.001.001.001.000.003.001.000.00
MarbreMarbled cat4.000.001.003.001.000.001.001.001.003.000.003.001.000.00
TigrinTiger cat2.000.001.003.001.000.001.001.001.002.000.003.001.000.00
TemminckTemminck1.000.001.003.001.000.001.001.001.002.000.003.001.000.00
AndesAndean mountain cat2.001.001.003.001.000.001.001.002.002.000.002.001.000.00
Table A2. Description of variables given in Table A1.
Table A2. Description of variables given in Table A1.
VariableDescriptionModalities
TyppelAppearance of the coatUnblemished, plain
Spotted
Striped
Marble
LongpoillFurShort hairs
Long hairs
RetractRetractable clawsYes
No
ComportPredatory behaviorDiurnal
Diurnal or nocturnal
Nocturnal
OriellesType of earsRound or rounded
Pointed
LarynxPresence of hyoid boneYes
No
TailleWaist at the withersSmall
Average
Big
PoidsWeightLow
Middle
Heavy
LongueurBody lengthSmall
Middle
Big
QueueThe relative length of the tailShort
Medium
Long
DentsDeveloped fangsYes
No
TypproieType of preyBig
Big or small
Small
ArbresClimb treeYes
No
ChasseOn the run or on the lookout (prowl)Yes
No

References

  1. Czekanowski, J. Zur Differentialdiagnose der Neandertalgruppe. Korespondentblatt der Deutschen Gesellschaft für Anthropologie Ethnologie und Urgeschichte 1909, XL, 44–47. [Google Scholar]
  2. Harkanth, S.; Phulpagar, B.D. A survey on clustering methods and algorithms. Int. J. Comput. Sci. Inf. Technol. 2013, 4, 687–691. [Google Scholar]
  3. Madhulatha, T.S. An overview on clustering methods. arXiv 2012, arXiv:1205.1117. [Google Scholar] [CrossRef]
  4. Madhulatha, T.S. An overview of clustering methods. Intell. Data Anal. 2007, 11, 583–605. [Google Scholar] [CrossRef]
  5. Xu, R.; Wunsch, D. Survey of clustering algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Xu, D.; Tian, Y. A comprehensive survey of clustering algorithms. Ann. Data Sci. 2015, 2, 165–193. [Google Scholar] [CrossRef] [Green Version]
  7. Ahlquist, J.S.; Breunig, C. Model-based clustering and typologies in the social sciences. Political Anal. 2010, 20, 325–346. [Google Scholar] [CrossRef]
  8. Aldenderfer, M.S.; Blashfield, R.K. A review of clustering methods. Clust. Anal. 1984, 33–61. [Google Scholar] [CrossRef]
  9. Díaz-Costa, E.; Fernández-Cano, A.; Faouzi, T.; Henríquez, C.F. Validación del constructo subyacente en una escala de evaluación del impacto de la investigación educativa sobre la práctica docente mediante análisis factorial confirmatorio. Rev. Investig. Educ. 2015, 33, 47–63. [Google Scholar] [CrossRef]
  10. Díaz-Costa, E.; Fernández-Cano, A.; Faouzi-Nadim, T.; Caamaño-Carrillo, C. Modelamiento y estimación del índice de impacto de la investigación sobre la docencia. Revista Electrónica Interuniversitaria de Formación del Profesorado 2019, 22, 211–228. [Google Scholar]
  11. Fonseca, J.R.S. Clustering in the field of social sciences: That is your choice. Int. J. Soc. Res. Methodol. 2013, 16, 403–428. [Google Scholar] [CrossRef]
  12. Rice, P.M.; Saffer, M.E. Cluster analysis of mixed-level data: Pottery provenience as an example. J. Archaeol. Sci. 1982, 9, 395–409. [Google Scholar] [CrossRef]
  13. Huang, Z. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 1998, 2, 283–304. [Google Scholar] [CrossRef]
  14. Marcotorchino, F.; Michaud, P. Agregation de similarites en classification automatique. Rev. Stat. Appl. 1982, 30, 21–44. [Google Scholar]
  15. Bock, H.-H. Origins and extensions of the k-means algorithm in cluster analysis. Electron. J. Hist. Probab. Stat. 2008, 4, 1–18. [Google Scholar]
  16. Forgy, E.W. Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics 1965, 21, 768–769. [Google Scholar]
  17. Jain, A.K. Data clustering: 50 years beyond K-means. IEEE Pattern Recognit. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
  18. MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Oakland, CA, USA, 1967; Volume 281–297. [Google Scholar]
  19. Goyal, M.; Aggarwal, S. A Review on K-Mode Clustering Algorithm. Int. J. Adv. Res. Comput. Sci. 2017, 8, 1615–1620. [Google Scholar] [CrossRef]
  20. Xiong, T.; Wang, S.; Mayers, A.; Monga, E. DHCC: Divisive hierarchical clustering of categorical data. Data Min. Knowl. Discov. 2012, 24, 103–135. [Google Scholar] [CrossRef]
  21. Lakshmi, K.; Visalakshi, N.K.; Shanthi, S.; Parvathavarthini, S. Clustering categorical data using K-Modes based on cuckoo searsh optimization algorithm. ICTACT J. Soft Comput. 2017, 8, 1561–1566. [Google Scholar] [CrossRef]
  22. Dorman, K.S.; Maitra, R. An Efficient k-modes Algorithm for Clustering Categorical Datasets. Stat. Anal. Data Min. ASA Data Sci. J. 2022, 15, 83–97. [Google Scholar] [CrossRef]
  23. Ali, D.S.; Ghoneim, A.; Saleh, M. K-modes and Entropy Cluster Centers Initialization Methods. In ICORES; SciTePress: Setúbal, Portugal, 2017; pp. 447–454. [Google Scholar]
  24. Khan, S.S.; Kant, S. Computation of Initial Modes for K-modes Clustering Algorithm Using Evidence Accumulation. In IJCAI; Morgan Kaufmann Publishers: San Francisco, CA, USA, 2007; pp. 2784–2789. [Google Scholar]
  25. Huang, Z.; Ng, M.K. A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 1999, 7, 446–452. [Google Scholar] [CrossRef] [Green Version]
  26. Gan, G.; Ma, C.; Wu, J. Data Clustering: Theory, Algorithms, and Applications; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2020. [Google Scholar]
  27. Jiang, Z.; Liu, X. A novel consensus fuzzy k-modes clustering using coupling DNA-chain-hypergraph P system for categorical data. Processes 2020, 8, 1326. [Google Scholar] [CrossRef]
  28. Kim, D.-W.; Lee, K.H.; Lee, D. Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recognit. Lett. 2004, 25, 1263–1271. [Google Scholar] [CrossRef]
  29. Ng, M.K.; Li, M.J.; Huang, J.Z.; He, Z. On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 503–507. [Google Scholar] [CrossRef] [PubMed]
  30. Cao, F.; Liang, J.; Li, D.; Bai, L.; Dang, C. A dissimilarity measure for the k-modes clustering algorithm. Knowl.-Based Syst. 2012, 26, 120–127. [Google Scholar] [CrossRef]
  31. Hazarika, I.; Mahanta, A.K.; Das, D. A New Categorical Data Clustering Technique Based on Genetic Algorithm. Int. J. Appl. Eng. Res. 2017, 12, 12075–12082. [Google Scholar]
  32. Khandelwal, G.; Sharma, R. A simple yet fast clustering approach for categorical data. Processes 2015, 120, 25–30. [Google Scholar] [CrossRef] [Green Version]
  33. Seman, A.; Bakar, Z.A.; Sapawi, A.M.; Othman, I.R. A medoid-based method for clustering categorical data. J. Artif. Intell. 2013, 6, 257. [Google Scholar] [CrossRef]
  34. Michaud, P.; Marcotorchino, F. Modèles d’optimisation en analyse des données relationnelles. Math. Sci. Hum. 1979, 67, 7–38. [Google Scholar]
  35. Michaud, P. Condorcet: A man of the avant–garde. Appl. Stoch. Model. Data Anal. 1987, 3, 173–189. [Google Scholar] [CrossRef]
  36. Michaud, P. The true rule of the Marquis de Condorcet. In Compromise, Negotiation and Group Decision; Springer: New York, NY, USA, 1988; pp. 83–100. [Google Scholar]
  37. Michaud, P. Agrégation à la Majorité II: Analyse du Résultat d’un Vote; Centre Scientifique IBM France: Paris, France, 1985. [Google Scholar]
  38. Hägele, G.; Pukelsheim, F. Llul’s writings on electoral systems. Stud. Lul. 2001, 41, 3–38. [Google Scholar]
  39. Marcotorchino, F. Liaison Analyse Factorielle-Analyse Relationnelle: Dualité Burt-Condorcet; IEEE Centre Scientifique IBM France: Paris, France, 1989. [Google Scholar]
  40. Lebbah, M.; Bennani, Y.; Grozavu, N.; Benhadda, H. Relational analysis for clustering consensus. Mach. Learn. 2010, 45–59. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Optimal solution by the Condorcet method.
Figure 1. Optimal solution by the Condorcet method.
Mathematics 10 00718 g001
Figure 2. Optimal solution by k-modes, fixing the number of groups at four.
Figure 2. Optimal solution by k-modes, fixing the number of groups at four.
Mathematics 10 00718 g002
Figure 3. The partitioning quality of feline data under both k-modes and α -Condorcet methods.
Figure 3. The partitioning quality of feline data under both k-modes and α -Condorcet methods.
Mathematics 10 00718 g003
Figure 4. The optimal solution minimizing within-cluster sum-of-squares under both k-modes and α -Condorcet methods.
Figure 4. The optimal solution minimizing within-cluster sum-of-squares under both k-modes and α -Condorcet methods.
Mathematics 10 00718 g004
Table 1. Some classical methods of clustering for categorical or mixed data.
Table 1. Some classical methods of clustering for categorical or mixed data.
MethodData TypeMetric
k-modesCategorical dataMeasure of similarity
k-prototypeMixed dataHuang cost function
CondorcetCategorical dataMeasure of similarity
Table 2. Example of dataset E.
Table 2. Example of dataset E.
v 1 v 2 v 3
x 1 113
x 2 123
x 3 212
Table 3. Example of dataset D.
Table 3. Example of dataset D.
v 1 v 2 v 3
x 1 113
x 2 123
x 3 212
x 4 113
x 5 211
x 6 212
Table 4. The Condorcet’s matrix C.
Table 4. The Condorcet’s matrix C.
x 1 x 2 x 3 x 4 x 5 x 6
x 1 -21311
x 2 2-0200
x 3 10-123
x 4 321-11
x 5 1021-2
x 6 10312-
Table 5. Time comparison between an exact algorithm and a heuristic algorithm using the feline dataset.
Table 5. Time comparison between an exact algorithm and a heuristic algorithm using the feline dataset.
Data Size α Quality IndexInertia IndexTime (Second)
Exact algorithm520.690.550.05
30.620.320.03
40.570.150.11
Heuristic algorithm20.690.550.03
30.610.280.03
40.570.150.017
Exact algorithm720.650.830.05
30.640.560.18
40.630.411.20
50.590.264.90
60.570.1114.70
Heuristic algorithm20.650.830.04
30.640.560.03
40.610.360.02
50.590.210.03
60.570.110.02
Exact algorithm920.641.610.11
30.641.180.99
40.640.9215.26
50.620.77111.80
60.600.30586.95
70.580.222186.42
80.560.116514.23
Heuristic algorithm20.641.610.09
30.641.180.06
40.640.920.07
50.600.650.05
60.590.300.04
70.580.220.05
80.560.110.05
Exact algorithm304≥0.67NANA
Heuristic algorithm40.670.900.36
Table 6. Comparison between k-modes and heuristic algorithm for different sample sizes of the US Census 1990 dataset with α = 4 , 5 , 6 , 7 , 8 .
Table 6. Comparison between k-modes and heuristic algorithm for different sample sizes of the US Census 1990 dataset with α = 4 , 5 , 6 , 7 , 8 .
Data Size α Quality IndexInertia Index
k-modes5040.573.95
50.563.86
60.543.66
70.523.82
80.542.91
Heuristic algorithm40.613.49
50.603.32
60.583.22
70.573.10
80.563.03
k-modes10040.585.39
50.575.08
60.554.87
70.534.77
80.514.80
Heuristic algorithm40.604.63
50.604.54
60.594.45
70.594.37
80.584.34
k-modes15040.536.04
50.525.97
60.526.09
70.536.06
80.535.84
Heuristic algorithm40.605.71
50.605.65
60.595.57
70.595.52
80.585.44
k-modes20040.546.97
50.526.88
60.536.82
70.516.76
80.516.70
Heuristic algorithm40.596.89
50.596.81
60.596.75
70.586.70
80.586.60
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Faouzi, T.; Firinguetti-Limone, L.; Avilez-Bozo, J.M.; Carvajal-Schiaffino, R. The α-Groups under Condorcet Clustering. Mathematics 2022, 10, 718. https://doi.org/10.3390/math10050718

AMA Style

Faouzi T, Firinguetti-Limone L, Avilez-Bozo JM, Carvajal-Schiaffino R. The α-Groups under Condorcet Clustering. Mathematics. 2022; 10(5):718. https://doi.org/10.3390/math10050718

Chicago/Turabian Style

Faouzi, Tarik, Luis Firinguetti-Limone, José Miguel Avilez-Bozo, and Rubén Carvajal-Schiaffino. 2022. "The α-Groups under Condorcet Clustering" Mathematics 10, no. 5: 718. https://doi.org/10.3390/math10050718

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop