A Knowledge-Guided Competitive Co-Evolutionary Algorithm for Feature Selection

Zhou, Junyi; Zheng, Haowen; Li, Shaole; Hao, Qiancheng; Zhang, Haoyang; Gao, Wenze; Wang, Xianpeng

doi:10.3390/app14114501

Open AccessArticle

A Knowledge-Guided Competitive Co-Evolutionary Algorithm for Feature Selection

by

Junyi Zhou

¹,

Haowen Zheng

¹,

Shaole Li

¹,

Qiancheng Hao

²,

Haoyang Zhang

¹,

Wenze Gao

³ and

Xianpeng Wang

^4,*

¹

School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China

²

School of Information Science and Engineering, Northeastern University, Shenyang 110004, China

³

School of Mechanical Engineering and Automation, Northeastern University, Shenyang 110819, China

⁴

Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Northeastern University, Shenyang 110819, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4501; https://doi.org/10.3390/app14114501

Submission received: 19 April 2024 / Revised: 19 May 2024 / Accepted: 22 May 2024 / Published: 24 May 2024

(This article belongs to the Special Issue Applied Machine Learning III)

Download

Browse Figures

Versions Notes

Abstract

:

In real-world applications, feature selection is crucial for enhancing the performance of data science and machine learning models. Typically, feature selection is a complex combinatorial optimization problem and a multi-objective optimization problem. Its primary goals are to reduce the dimensionality of the dataset and enhance the performance of machine learning algorithms. The selection of features in high-dimensional datasets is challenging due to the intricate relationships between features, which pose significant challenges to the performance and computational efficiency of algorithms. This paper introduces a Knowledge-Guided Competitive Co-Evolutionary Algorithm (KCCEA) for feature selection, especially for high-dimensional features. In the proposed algorithm, we make improvements to the foundational dominance-based multi-objective evolutionary algorithm in two aspects. First, the use of feature correlation as knowledge to guide evolution enhances the search speed and quality of traditional multi-objective evolutionary algorithm solutions. Second, a dynamically allocated competitive–cooperative evolutionary mechanism is proposed, integrating the improved knowledge-guided evolution with traditional evolutionary algorithms, further enhancing the search efficiency and diversity of solutions. Through rigorous empirical testing on various datasets, the KCCEA demonstrates superior performance compared to basic multi-objective evolutionary algorithms, providing effective solutions to multi-objective feature selection problems while enhancing the interpretability and effectiveness of prediction models.

Keywords:

evolutionary algorithm; feature selection; multi-objective optimization; co-evolution; competitive; classification

1. Introduction

In today’s machine learning landscape, the widespread presence of high-dimensional data in real-world scenarios presents substantial challenges [1,2]. As datasets burgeon with features, from image characteristics in computer vision to genetic markers in bioinformatics, the demand for robust feature selection has become more pressing than ever [3,4]. These complex datasets frequently include irrelevant, noisy, or redundant features that not only hinder the learning process but also impair the effectiveness of algorithms [5,6]. Effective feature selection techniques boost model accuracy by isolating and preserving only the most impactful features for predictive modeling [7]. Additionally, by reducing the number of features, these techniques cut down on storage needs and computational expenses, thus enhancing the interpretability and efficiency of models [8].

Feature selection is inherently a challenging combinatorial optimization problem aimed at reducing the feature space while simultaneously enhancing the classifier’s performance [9]. These objectives typically conflict; a minimal set of features might omit critical information, adversely impacting the model’s performance. This problem’s complexity escalates with the exponential growth in the number of potential feature subsets as the initial feature count increases, making a robust search strategy essential for effective feature selection [10].

In response to these challenges, the past few decades have seen the emergence of many effective algorithms, including filter-based methods, embedded methods, and wrapper methods. Filter-based methods, such as chi-square tests [11] and mutual information, evaluate the relevance of features independent of any learning algorithm. Embedded methods, like LASSO regression [12], perform feature selection during the model training process. Wrapper methods, such as Recursive Feature Elimination (RFE) [13], use a predictive model to evaluate feature subsets. Among all methods, metaheuristic algorithms stand out, particularly multi-objective evolutionary algorithms (MOEAs) [14,15], which excel in balancing conflicting objectives without requiring gradient information. These algorithms, such as the NSGA-II [16], utilize principles of natural evolution—such as selection, crossover, and mutation—to explore and exploit the search space effectively. By mimicking natural selection processes, MOEAs can efficiently navigate the large and complex search spaces found in high-dimensional feature selection tasks.

Despite their advantages, the computational complexity of MOEAs scales significantly with the increase in the number of features [17]. This complexity can be attributed to the need to evaluate a vast number of potential solutions, each representing a different subset of features. The performance of these algorithms often deteriorates as the dimensionality of the data increases, necessitating innovative approaches to reduce computational overhead and enhance the search process’s efficiency [18].

The evolving landscape of feature selection has prompted the development of more sophisticated approaches. Using a single algorithm to balance convergence and diversity in multi-objective optimization problems (MOPs) [19] is challenging. Thus, the current trend is to combine multiple algorithms [20,21]. This approach can be categorized into two main types: multi-algorithm methods (using various algorithms within the same population) [22,23] and multi-population methods (using multiple populations, each targeting a specific objective) [24,25]. In multi-algorithm methods, by integrating various algorithmic strategies, it is possible to effectively balance the exploration speed and the diversity of population development. It can synergize different evolutionary strategies to optimize performance across various feature selection scenarios. This method not only leverages the strengths of individual algorithms but also fosters a more dynamic and adaptive search process.

Evolutionary algorithms (EAs) are known for their high computational complexity, which often limits their practical application despite their potential for performance improvement [26]. In this study, we propose a Knowledge-Guided Competitive Co-Evolutionary Algorithm (KCCEA) to address this issue. By leveraging feature correlation knowledge and a dynamically allocated competitive–cooperative mechanism, the KCCEA significantly reduces computational complexity while enhancing performance. This approach balances the trade-off between computational efficiency and performance gain, making it a valuable contribution to real-world applications where high-dimensional data and limited computational resources are common. The main contributions of this work can be summarized as follows:

We propose a feature grouping method based on correlation and, based on this, introduce a Knowledge-Guided Evolutionary Algorithm (KGEA). Initially, the correlations between features are assessed using Spearman’s correlation coefficient. Then, using a predefined optimal threshold, features are grouped based on their correlations. Finally, the obtained grouping information serves as knowledge to guide the evolutionary process. In this way, the speed and quality of evolutionary feature selection can be significantly enhanced.
We design a Knowledge-Guided Competitive–Cooperative Evolutionary Algorithm (KCCEA) framework. Initially, a method based on allocation ratios allows the two algorithms to cooperate. Then, during the evolutionary process, the algorithms continually update the allocation ratio dynamically by assessing the success rates of producing superior offspring, allowing the two algorithms to compete. This method achieves an improvement in search speed and solution quality;
To verify the performance of the proposed methods, we conduct a series of experiments. The experimental results demonstrate that the proposed methods effectively enhance the performance of evolutionary algorithms in feature selection tasks.

The structure of this paper is as follows: Section 2 introduces the background. Section 3 describes the Knowledge-Guided Evolutionary Algorithm (KGEA). Section 4 presents the competitive–cooperative evolutionary mechanism (KCCEA). Section 5 and Section 6 describe the experimental setup and results, respectively. Section 7 concludes the paper and outlines several themes for future research.

2. Background

2.1. Related Work

2.1.1. Multi-Objective Optimization Problem

An MOP involves finding the minimum or maximum values of multiple objective functions under certain constraints [27,28]. It is generally formulated as follows:

\begin{matrix} min / max & f (X) = (f_{1} (X), f_{2} (X), \dots, f_{r} (X)) \\ s . t . & g_{i} (X) \geq 0 (i = 1, 2, \dots, k) \\ h_{i} (X) = 0 (i = 1, 2, \dots, l) \end{matrix}

(1)

In the above equation, the decision vector

X = (x_{1}, x_{2}, \dots, x_{n})

and the objective vector

f (X)

have multiple optimization combinations, which can be summarized into the following three scenarios:

Maximize all sub-objective functions;
Minimize all sub-objective functions;
Maximize some sub-objective functions while minimizing others.

It is possible to transform all optimization objectives into a unified form, either all maximizing or all minimizing, to facilitate the algorithm finding the optimal solution

X^{*} = (x_{1}^{*}, x_{2}^{*}, \dots, x_{n}^{*})

such that

f (X^{*})

is optimally achieved while satisfying constraints.

In this work, for the feature selection problem, we need to optimize two objectives: (1) the number of selected features and (2) the performance of the machine learning tasks using the selected features. Assuming a solution selects k features, then

f 1 (x)

is equal to

k / D

, representing the ratio of selected features, while

f 2 (x)

represents classification performance. To achieve the minimization of all sub-objective functions, we consider the classification performance as

1 - f 1_s c o r e

, where, for a multi-class problem,

f 1_s c o r e

is the weighted average.

2.1.2. Non-Dominated Sorting Genetic Algorithm II

The Non-Dominated Sorting Genetic Algorithm II (NSGA-II) is a refined version of the original NSGA [29] and is prominently used within the field of Pareto-Based Evolutionary Multi-Objective Optimization (EMO) algorithms [30]. Specifically crafted to address multi-objective optimization challenges, the NSGA-II excels in efficiently identifying a broad spectrum of Pareto optimal solutions. The core mechanism of the NSGA-II hinges on a swift non-dominated sorting procedure which organizes solutions into several tiers or ‘fronts’. These fronts are formed based on how solutions dominate or are dominated by one another.

Solutions on a lower-numbered front have no dominators from other fronts;
Each subsequent front is progressively dominated by those in prior fronts.

In terms of operational dynamics, the NSGA-II follows a structured process:

1.: Population Generation: The NSGA-II begins with the generation of a random initial population;
2.: Evaluation and Sorting: Each solution is evaluated and ranked based on objective function performance and then sorted into fronts;
3.: Selection: Selection favors solutions on better-ranked fronts. Within these fronts, a further selection criterion based on crowding distance determines choice, emphasizing less crowded areas to enhance diversity among the chosen solutions;
4.: Genetic Operations: Following selection, genetic mechanisms like crossover and mutation are employed to generate new solutions, aiding in the exploration and expansion of the solution space;
5.: Elitism and Generational Transition: To ensure robustness across generations, the NSGA-II integrates elitism, preserving top-performing solutions and carrying them forward. The blended population of parents and offspring undergoes another round of sorting and selection, emphasizing the superior solutions for the next generation.

Through this structured approach and emphasis on maintaining a diverse set of high-quality solutions, the NSGA-II not only adeptly explores the Pareto front but also ensures rapid convergence to optimal solutions, effectively addressing the complexities associated with multiple objectives.

2.2. Major Motivations

Evolutionary algorithms (EAs) have been widely applied to multi-objective feature selection [18,31,32], but most studies merely apply EAs without considering the relationships among features. We recognize that, with the increasing complexity of real-world data, especially the rise of big data, features in high-dimensional datasets are likely to encompass various complex relationships [33] which might not be merely linearly positive or negative correlations. These potential high-dimensional relationships could significantly impact the exploration process of feature selection. We can attempt to capture such relationships and examine their contributions to evolutionary optimization. The motivation to integrate knowledge-guided strategies stems from the observation that not all features contribute equally to the outcomes of predictive models. Some features are more interdependent than others and can be utilized to enhance the search process. By incorporating knowledge of feature correlations directly into the evolutionary process, this method can more effectively guide the search, reduce computational overhead, accelerate the search speed, and improve the quality of the selected feature subsets.

Furthermore, in multi-objective optimization, particularly when applied to feature selection, balancing exploration of the search space and exploitation of promising areas are crucial. This balance is difficult to achieve with a single algorithm because of the high risk of convergence to local optima [34,35]. Traditional static cooperative algorithms cannot adapt to changes in the problem landscape that might occur during the search process. A dynamic, adaptive approach can better respond to such changes, optimizing resource allocation based on the performance of competing methods, thereby ensuring that the most effective strategies are employed as conditions evolve. These motivations guide the development of the KCCEA, focusing on creating a versatile, efficient, and effective tool for feature selection that addresses the unique challenges posed by modern, high-dimensional datasets in various domains. The ultimate goal is to provide a method that not only performs well in academic benchmarks but also offers practical, real-world applicability and scalability.

3. Knowledge-Guided Evolutionary Algorithm

In the evolutionary optimization problem for feature selection, high-dimensional features often lead to a large number of decision variables. With the increase in decision variables, the search space for the problem expands dramatically. To effectively enhance the efficiency of solving the problem, this paper designs and improves a knowledge-guided evolutionary method, the KGEA, based on the NSGA-II [16]. Firstly, a method of grouping features based on correlation is proposed. In this method, we fully utilize the correlation among different features to achieve balanced grouping results (considered as knowledge in subsequent evolution). Then, during the evolutionary process, the algorithm uses the acquired knowledge to guide crossover and mutation, reduce the search space, and enhance the efficiency of the search while maximizing the diversity of the population.

3.1. Feature Grouping Based on Correlation

Datasets in real applications include various features. The correlation among different features is diverse. Some are related, and some are not. Some have positive correlation, while others have negative. Additionally, there might be numerous high-dimensional relationships between features that are difficult to identify, which could significantly impact the outcome. For instance, in finance, stock prices and market indices might show positive correlations, but interest rates and stock prices might have a negative correlation. Therefore, we aim to use correlation to group features. This strategy aims to improve the efficiency of subsequent evolutionary searches while maximizing the diversity of the population. Ideally, our expected feature grouping outcome should achieve balance among feature groups, i.e., avoid overly large feature groups and a multitude of isolated features. Through experimentation, we propose a significantly effective and simple threshold-based grouping method.

The pseudo-code of the described algorithm is shown in Algorithm 1. Initially, the algorithm conducts a detailed analysis of the correlation among features. By calculating the Pearson correlation coefficient between features, we obtain a correlation matrix where each element of the matrix represents the correlation strength between a pair of features. After obtaining the correlation matrix, we traverse each feature and check its correlation with the features in the existing groups based on a preset threshold determined through experimentation. If a feature’s absolute correlation with all features in a certain group exceeds the threshold, then the feature is added to the corresponding group. If a feature cannot be placed into any existing group, a new feature group is created. This grouping mechanism ensures high correlation within groups while maintaining relative independence between groups.

Algorithm 1 Feature _Grouping(correlation_matrix, threshold)
Input: correlation_matrix of size $n \times n$ , threshold Output: List of feature groups
1:	Initialize $f e a t u r e_g r o u p s$ to an empty list
2:	$n u m_f e a t u r e s \leftarrow$ number of columns in $c o r r e l a t i o n_m a t r i x$
3:	for $i = 0$ to $n u m_f e a t u r e s - 1$ do
4:	$a d d e d \leftarrow False$
5:	for each $g r o u p$ in $f e a t u r e_g r o u p s$ do
6:	if all $\| correlation_matrix [i, j] \| > threshold$ for j in $g r o u p$ then
7:	Append i to $g r o u p$
8:	$a d d e d \leftarrow True$
9:	Break
10:	end if
11:	end for
12:	if not $a d d e d$ then
13:	Append $[i]$ to $f e a t u r e_g r o u p s$
14:	end if
15:	end for
16:	return $[l i s t (g r o u p) for g r o u p in f e a t u r e_g r o u p s]$

This feature grouping mechanism based on correlation analysis acts as knowledge and provides guidance for the subsequent knowledge-guided evolutionary process. Its main objective is to reduce the dimensions of the search space by focusing on combinations of features with high correlation, capturing possible higher-dimensional feature relationships and thereby improving the search efficiency of the evolutionary algorithm. In this manner, the KGEA can effectively balance exploration and exploitation in evolutionary searches, promoting the diversity and quality of solutions.

3.2. Knowledge-Guided Crossover and Mutation

The unique aspect of the KGEA lies in its utilization of predetermined groupings of features to direct the genetic operators of crossover and mutation. This approach contrasts sharply with traditional methods where each gene (or feature) is considered independently. In the realm of evolutionary optimization, particularly for feature selection in high-dimensional datasets, the techniques of knowledge-guided crossover and mutation play a pivotal role. By intelligently grouping features based on their correlations—as delineated in prior sections—this method not only simplifies the search space but also ensures a focused exploration that significantly enhances efficiency.

In the KGEA, features are encoded using a binary system. As shown in Figure 1, when the encoding position

x_{1}

for Feature1 is 1, it indicates that we have selected this feature. This encoding is crucial as it directly influences the genetic operations that follow.

Figure 2 provides an example of knowledge-guided crossover and mutation assuming we have six features

{F 1, F 2, F 3, F 4, F 5, F 6}

with corresponding chromosome encoding

{x 1, x 2, x 3, x 4, x 5, x 6}

. Also, assume that we have obtained feature grouping information

{[x 1], [x 2, x 3], [x 4], [x 5, x 6]}

through the method mentioned in the previous subsection. In this setup, when performing mutations or crossovers, the algorithm does not randomly pick single genes but rather entire groups, thereby preserving the intrinsic correlation structure within the data.

During a mutation event, instead of a single gene mutation, entire groups such as

{x 1}

and

{x 5, x 6}

are mutated. This enhances the mutation’s effectiveness by ensuring that changes are made to related features together, thus maintaining the logical structure of the data. Similarly, during crossover events, entire groups like

{x 2, x 3}

and

{x 5, x 6}

might be swapped between two chromosomes. This group-based approach not only maintains the integrity of feature relationships but also introduces a new dimension of diversity to the gene pool.

This enhanced method of guiding the genetic operators using prior knowledge effectively reduces the dimensionality of the problem space. By focusing mutations and crossovers on groups of features that are likely to interact, the KGEA significantly boosts the algorithm’s efficiency. It balances the exploration of the search space with the exploitation of known, valuable regions, thereby improving both the speed and quality of the search for optimal feature sets.

4. Competition–Cooperation Evolution

In the KGEA, by introducing data parsing techniques and utilizing knowledge to guide the evolutionary process, the quality and search efficiency of the solutions are significantly enhanced. However, during the evolutionary process, due to the combinatorial features leading to a reduction in the solution space, the solutions obtained by the algorithm may fall into local optima. To further improve the quality of solutions and increase their convergence and diversity, we propose a competition–cooperation evolutionary mechanism called the KCCEA that combines the KGEA with the traditional NSGA-II. This method initializes an allocation ratio, allowing both models to evolve simultaneously, and, by mixing the offspring generated by the two models and randomly assigning them as parents for the next generation, it facilitates the exchange and cooperation of information between the two models. Furthermore, the allocation ratio of resources is adjusted generation by generation based on the success rates of the two evolutionary approaches, achieving dynamic allocation of computational resources and thus enabling the two algorithms to compete effectively. The overall framework of the algorithm is shown in Figure 3.

4.1. Competition–Cooperation Evolutionary Mechanism

The dynamic interaction between the KGEA and NSGA-II within our proposed KCCEA model is fundamentally rooted in the principles of competition and cooperation. This interaction not only enhances the diversity of the solutions but also improves their quality by preventing premature convergence to suboptimal solutions.

Initial Setup and Population Division: At the beginning of the evolutionary process, we assume the total number of individuals in the parent population

P_{t}

is denoted by

N_{i n d}

, and the initial allocation ratio n is predetermined based on prior experimental tuning. The parent population is then randomly divided into two sub-populations,

S_{1}

and

S_{2}

. The size of each sub-population,

n_{1}

for

S_{1}

and

n_{2}

for

S_{2}

, is determined by the allocation ratio as follows:

\begin{matrix} n_{1} & = a l l o c a t i o n_r a t i o \times N_{i n d} \end{matrix}

(2)

\begin{matrix} n_{2} & = (1 - a l l o c a t i o n_r a t i o) \times N_{i n d} \end{matrix}

(3)

Evolutionary Operations: Each sub-population undergoes evolutionary operations independently. Sub-population

S_{1}

is processed using the KGEA, which emphasizes knowledge-guided mutations and selections, whereas

S_{2}

undergoes a more traditional evolutionary path via the NSGA-II, focusing on Pareto optimality. This dual approach ensures that, while one algorithm might explore more deeply into a certain region of the solution space, the other may explore more broadly, thereby enhancing the overall explorative capabilities of the combined algorithm.

Crossover and Mutation: After the initial evolutionary operations, both sub-populations undergo crossover and mutation processes. Mutation and crossover are applied differently in the two sub-populations to maintain their distinctive characteristics. For

S_{1}

, mutation is knowledge guided, whereas for

S_{2}

it is more random and exploratory.

Integration and Selection: The offspring from both

S_{1}

and

S_{2}

, denoted as

Q_{1}

and

Q_{2}

, respectively, are then integrated to form a unified offspring population. This integration is crucial as it brings together the diverse traits developed independently in each sub-population. The unified population undergoes a non-dominated sorting followed by crowding distance selection to ensure that the next generation,

Q_{t}

, maintains a balanced representation of both exploration and exploitation traits. The size of

Q_{t}

is set to be equal to

N_{i n d}

, maintaining a consistent population size through the generations.

Feedback and Adjustment: The performance of each sub-population in producing high-quality offspring is monitored. Based on their performance, the allocation ratio is dynamically adjusted for the next generation, ensuring that more resources are allocated to the more successful approach in the current generation. This adaptive mechanism allows the KCCEA model to respond to the evolving landscape of the problem space effectively.

By fostering a balance between competition and cooperation, the KCCEA not only mitigates the risk of converging prematurely to local optima but also leverages the strengths of both evolutionary algorithms, leading to a robust mechanism for finding optimal solutions.

4.2. Dynamic Resource Allocation Based on Success Rate

Assuming the population size

N_{i n d}

is 100, initially, the allocation ratio is set to 0.5, meaning that, during the allocation, individuals are equally distributed between the two algorithms, forming their respective parent populations with 50 individuals each. Individuals generated by the KGEA are marked as 0, and those generated by the traditional NSGA-II are marked as 1.

By merging the parent and offspring populations of the two algorithms, and selecting 100 individuals to form a new generation, the elite retention mechanism ensures that excellent parent individuals will also appear in the new generation. We count the number of individuals marked as 0 and 1 to calculate the success rates of the two search methods.

Suppose that, in the new generation, there are 40 individuals marked as 0 and 30 individuals marked as 1. Then, the success rate of the KGEA is 0.8, and the success rate of the NSGA-II is 0.6. Normalize 0.8 and 0.6 so their sum equals 1, yielding new search ratios of 0.57 and 0.43. Update the allocation ratio to 0.57, and set the mark of all individuals to 2. According to the new search ratio, randomly select the corresponding number of individuals from the solutions generated by the two search methods to form the offspring. The dynamic resource allocation mechanism based on success rate is shown in Figure 4.

In the process of population evolution, it may occur that, due to the strong optimization capability, one method’s success rate is extremely high. Without restriction, this could lead to one model monopolizing all evolutionary resources, causing the loss of the other search method. In the later stages of iterative evolution, when the quality of individuals in the population is high, it may occur that the success rate of a certain search method is 0, which also requires manual intervention in the search ratio. The treatment for special cases is as shown in Table 1.

An additional simple method is to adopt a nonlinear approach for resource allocation instead of directly distributing resources based on performance proportions. For instance, a sigmoid function based on success rates can be used for resource allocation, ensuring that, even with significant performance differences, resource distribution does not become too extreme. The general form of the function used is as follows:

\begin{matrix} sigmoid (x) & = \frac{1}{1 + e^{- β (x - α)}} \end{matrix}

(4)

Here,

α

and

β

are adjustable parameters controlling the center point and the slope of the function, respectively. We set

α

to 0.5 (assuming an ideal state where both populations equally share resources), and

β

can be adjusted based on actual needs to control the smoothness of resource distribution. A higher

β

value makes the resource allocation more sensitive to performance differences, while a lower value results in more even distribution.

5. Experimental Setup

5.1. Classification Datasets

This study utilizes five diverse datasets for classification tasks, all sourced from online website like the UCI Machine Learning Repository [36]. Each dataset is meticulously selected to encapsulate a wide spectrum of challenges commonly encountered in the machine learning domain. These challenges range from managing datasets with low to high dimensionality to addressing problems with varying numbers of classes, from simple binary classifications to more complex multi-class scenarios. This variety is crucial for testing the robustness and adaptability of proposed algorithms under varied conditions. Moreover, the datasets are not only diverse in their characteristics but also relevant to real-world applications. They include data from health, chemistry, voice recognition, and consumer preferences, which are critical areas where machine learning models are extensively applied.

Table 2 provides a concise overview of these datasets, detailing the number of instances, features, and classes for each. This diverse collection not only tests the adaptability and efficiency of the algorithms but also enhances the robustness of our experimental findings by covering a broad spectrum of data characteristics.

5.2. Performance Metrics

This article utilizes two principal performance indicators to assess the quality of solutions obtained by multi-objective evolutionary algorithms (MOEAs): Hypervolume (HV) [37] and Inverted Generational Distance (IGD) [38]. These indicators are critical for evaluating an algorithm’s efficacy in terms of both convergence to the Pareto front and distribution of the solutions across it.

For HV, normalization of all objective values obtained by each algorithm is performed initially against the ideal and nadir points, scaling them to the [0, 1] range in each objective direction. The ideal and nadir points are set at (0, 0) and (1.1, 1.1), respectively, with the HV’s reference point subsequently established at (1, 1) [32]. This process ensures a standardized basis for comparing the extent of the objective space covered by the solutions from different algorithms.

In contrast, the calculation of IGD faces unique challenges in feature selection problems, which are discrete and complex, rendering the exact Pareto front nearly impossible to ascertain. Unlike in traditional benchmark test problems such as DTLZ [39] or WFG [40], this study adopts a novel approach for fairness. We aggregate the Pareto fronts collected from multiple runs of different algorithms into a unified population. This collective set is then subjected to non-dominated sorting, retaining only the foremost non-dominated front as the reference point set for IGD. The IGD metric, therefore, provides a comprehensive assessment by reflecting both the proximity of solutions to the Pareto front and their distribution. Lower IGD values signify superior algorithm performance, an inverse relationship compared to HV values.

5.2.1. Hypervolume

The Hypervolume indicator measures the volume covered by the members of a solution set in the objective space [37]. It quantifies both the convergence and diversity of the solutions by calculating the volume enclosed between the solutions and a reference point. Formally, the HV of a solution set S with respect to a reference point

r = (r_{1}, r_{2}, \dots, r_{m})

in an m-dimensional objective space is defined as the volume of the union of the hypercubes formed by each solution in S and r. The reference point is typically chosen to be worse than the nadir point of the Pareto front. For normalization purposes, all objective values are scaled to the interval [0, 1], with ideal and nadir points set to (0, 0) and (1.1, 1.1), respectively. The HV calculation can be represented as follows:

\begin{matrix} HV (S) & = λ (⋃_{x \in S} [f_{1} (x), r_{1}] \times \dots \times [f_{m} (x), r_{m}]) \\ = λ (⋃_{x \in S} \prod_{k = 1}^{m} [f_{k} (x), r_{k}]) \end{matrix}

(5)

where

λ

denotes the Lebesgue measure, and

f_{i} (x)

is the i-th objective function value of solution x. The larger the HV, the better, indicating that the solution set covers a larger portion of the objective space up to the reference point.

5.2.2. Inverted Generational Distance

The Inverted Generational Distance (IGD) measures both the convergence and the diversity of the solutions [38]. For each point on the true Pareto front, the IGD finds the minimum distance to a point on the Pareto front obtained by the algorithm and then averages these minimum distances. A lower IGD value indicates that the algorithm’s solutions are both closer to the true Pareto front and more evenly distributed. The specific formula for calculating IGD is as follows:

\begin{matrix} I G D & = \frac{\sum_{j = 1}^{| P^{*} |} d_{min} (x_{j}^{*}, P)}{| P^{*} |}, \\ d_{\min} (x_{j}^{*}, P) & = min \{\sqrt{\sum_{k = 1}^{m} {(f_{k} (x_{i}) - f_{k} (x_{j}^{*}))}^{2}}\}, \\ i = 1, 2, \dots, | P | . \end{matrix}

(6)

where P is the set of Pareto optimal solutions obtained by the multi-objective evolutionary algorithm,

P^{*}

is the set of true Pareto optimal solutions,

| P |

is the number of obtained optimal solutions,

| P^{*} |

is the number of true optimal solutions,

x_{i}

represents the i-th solution in P,

x_{j}^{*}

is the j-th solution in

P^{*}

, m is the number of objectives, and

f_{k} (x)

is the function value of individual x on the k-th objective. For instance, in a two-dimensional objective space, as shown in Figure 5, the algorithm obtains three Pareto optimal solutions, H, I, and J, while the true Pareto front is formed by solutions A, B, C, D, E, F, and G. The minimum distances from the points on the true Pareto front to the points H, I, and J are

d (A, H)

,

d (B, H)

,

d (C, I)

,

d (D, I)

,

d (E, J)

,

d (F, J)

, and

d (G, J)

, respectively. Hence, the IGD value is calculated as follows:

\begin{matrix} I G D & = & \frac{1}{7} (d (A, H) + d (B, H) + d (C, I) \\ + d (D, I) + \dots + d (G, J)) \end{matrix}

(7)

When solving problems using multi-objective evolutionary algorithms, the diversity of the solutions is also an important consideration. Compared to GD, the IGD can identify uneven distribution of solutions, making the evaluation more objective. Therefore, the IGD is chosen as the final evaluation metric.

5.3. Parameter Settings

In this study, each algorithm undergoes 30 distinct trials on every dataset to ensure robust statistical analysis. Consistency across these trials is maintained by using a fixed initial random seed for all algorithms. As outlined in [32], we adapt the population size based on the number of features present in each dataset. Specifically, the population size N is dynamically adjusted to match the number of features up to a maximum of 200. This cap is imposed to strike an optimal balance between algorithmic diversity and computational efficiency. Consequently, even for datasets with more than 200 features, the population size remains limited to 200. For the sigmoid function that equilibrates the resource allocation,

α

is set to 0.5 (assuming that, ideally, the two populations distribute the resources equally), and

β

is set to 2 to control the degree of smoothing of the resource allocation. The algorithms are set to run for 100 generations, providing a deep exploration of the solution space. For the purpose of evaluating classification performance, we partition each dataset into training and testing subsets. The training set comprises roughly 70% of the data, with the remaining 30% allocated to the testing set. This split is consistently applied across all 30 executions of each algorithm on a given dataset, ensuring comparability of results. However, the specific partition varies from one dataset to another, maintaining rigorous evaluation standards across different data contexts.

6. Experimental Results and Analysis

6.1. Grouping Threshold Sensitivity Analysis

This work conducted an analysis on the sensitivity to grouping thresholds. We varied the threshold from 0.5 to 0.95 and observed how the number of groups and the balance within the groups (standard deviation of group sizes) changed. As illustrated in Figure 6, we found that, as the threshold increased, the number of groups decreased while the balance improved. By normalizing the number of groups and balance to the same scale, and identifying the threshold where the difference between them was minimal, we determined the optimal grouping threshold, as shown in Table 3. This threshold provides the best compromise between the number of groups and balance during the grouping process. Subsequently, we applied the optimal threshold for each dataset within the KCCEA and compared it with a default threshold of 0.5. The results of these comparative experiments are presented in Table 4.

As can be seen from the results, the algorithm performs best when using the optimal thresholds we obtained, outperforming other takes on several test problems, especially on problems with high feature dimensions.

6.2. Effectiveness of Knowledge-Guided Evolution

The improved algorithm KGEA is compared with the traditional multi-objective evolutionary algorithm to verify the effectiveness of the knowledge-guided evolution method. The experimental method is the same as above; the NSGA-II is set according to the parameters in the paper [16], the KGEA mutation and crossover probability are both set to 0.5, and the grouping threshold is the optimal threshold calculated in the previous section.

The empirical results shown in Table 5 and Table 6 indicate that the Knowledge-Guided Evolutionary Algorithm (KGEA) outperforms the traditional NSGA-II in four out of five experiments within a set number of evolutionary generations. The effectiveness of the KGEA is demonstrated through comparison with the baseline NSGA-II, which does not utilize knowledge-based feature grouping. The key differentiator is the use of correlation-based feature grouping in the KGEA, which guides the crossover and mutation processes and significantly narrows the search space. This strategy leads to more informed genetic operations and enhances the efficiency of the algorithm.

These comparisons underline the benefits of integrating domain knowledge into the evolutionary process, particularly in scenarios where the relationships among decision variables significantly impact the outcome. It supports the premise that a knowledge-guided approach in evolutionary algorithms can significantly enhance performance, especially in complex problem domains such as feature selection in high-dimensional datasets.

6.3. Effectiveness of Dynamic Resource Allocation

The dynamic resource allocation mechanism in the KCCEA significantly improves performance over the KGEA by adjusting the allocation ratio based on the success rates of the two algorithms. This adaptive approach enhances the exploration and exploitation balance, leading to better overall results. As shown in Table 5 and Table 6, the KCCEA achieves lower IGD values and higher HV values across multiple datasets, particularly in high-dimensional feature spaces. For instance, in the ISOLET5 dataset, the KCCEA achieves an IGD of 0.021 and an HV of 0.544, outperforming the KGEA.

This mechanism ensures that computational resources are allocated optimally, leveraging the strengths of both the KGEA and NSGA-II. By maintaining a balance between competition and cooperation, the KCCEA prevents the dominance of a single algorithm, resulting in a more robust and efficient evolutionary process. The dynamic resource allocation effectively adapts the search strategy based on real-time performance, improving the quality of solutions and providing a competitive edge in tackling complex feature selection tasks in high-dimensional datasets.

6.4. Knowledge-Guided Competitive–Cooperative Evolutionary Algorithm against Baseline Algorithm

The Knowledge-Guided Competitive–Cooperative Evolutionary Algorithm (KCCEA) effectively integrates the Knowledge-Guided Evolutionary Algorithm (KGEA) with the traditional NSGA-II, showcasing significant performance enhancements in complex feature selection tasks across high-dimensional datasets. This integration facilitates a robust comparative analysis that highlights the benefits of incorporating domain-specific knowledge into evolutionary strategies. As evidenced by the results presented in Table 5 and Table 6, the KCCEA surpasses the NSGA-II in four out of five experiments. The advantage is particularly notable in datasets with complex feature interactions, such as ISOLET5 and Toxic, where the knowledge-guided approach of the KCCEA significantly improves solution quality by effectively navigating the expansive search space.

This performance enhancement stems from the structured crossover and mutation processes of the KCCEA, which utilize feature correlations to maintain relevant interactions and reduce computational overhead, enabling faster convergence towards optimal solutions. Additionally, the KCCEA framework adjusts allocation ratios dynamically based on real-time performance metrics, which helps the algorithm adapt to the most promising solution paths and prevents premature convergence to suboptimal solutions. This strategic integration of knowledge within evolutionary cycles substantiates the effectiveness of the KCCEA approach, marking it as a substantial improvement over traditional methods in tackling the complexities of advanced feature selection.

6.5. Analysis of Pareto Front Distributions

To gain a deeper and more intuitive understanding of the algorithmic performance across various setups, Figure 7 and Figure 8 illustrate the distribution of Pareto fronts for algorithms exhibiting median IGD performance. We focus on two datasets, MUSK1 and Toxic, due to their distinct differences in the spread and quality of non-dominated solutions as produced by different algorithms. It is observed that the KCCEA consistently achieves a broader and more diverse set of Pareto optimal solutions compared to other methods. This indicates not only a wider exploration of the solution space but also superior convergence properties, managing to maintain lower classification errors with a reduced set of selected features.

6.6. Discussion

The experimental results demonstrate the efficacy of the proposed KCCEA for high-dimensional feature selection. The KCCEA’s superior performance, indicated by lower IGD values and higher HV values across various datasets, underscores the advantages of integrating domain knowledge and competitive–cooperative mechanisms into evolutionary algorithms. Furthermore, the design of the KCCEA considers the scalability to handle large-scale datasets. Our proposed feature correlation knowledge grouping only requires a one-time calculation using the given algorithm before the evolution starts to acquire the knowledge. This grouping method demonstrates excellent efficiency and scalability, making it easily applicable to large-scale datasets.

While the KCCEA shows significant improvements in search efficiency and solution quality, there are still some limitations in this work. For instance, noisy and imbalanced data can significantly impact the effectiveness of feature selection [41]. It would be valuable to analyze the robustness of the algorithm under different conditions, such as in the presence of noise or data imbalance. Additionally, exploring the applicability of the KCCEA to other domains, such as regression and unsupervised learning tasks, could be interesting future work. Considering the scope of this work, data complexity and imbalance will be addressed in future studies. Furthermore, traditional machine learning methods often require prior feature selection, whereas neural networks [42] might perform better with selected features. Therefore, it is valuable to evaluate the proposed method’s performance before and after feature selection using neural networks. Exploring the performance of various classifiers with the KCCEA could also be considered in future work. Despite these challenges, the KCCEA represents a significant advancement in the field of high-dimensional feature selection.

7. Conclusions

This article proposes and examines the Knowledge-Guided Competitive–Cooperative Evolutionary Algorithm (KCCEA), demonstrating its significant advancements in the field of complex feature selection within high-dimensional datasets. Our paper not only provides an analysis but also proposes the integration of the Knowledge-Guided Evolutionary Algorithm (KGEA) with the traditional NSGA-II under a competitive–cooperative framework, which enhances the evolutionary strategy and efficiently navigates the expanded search space. The theoretical contribution of this research is the novel method of embedding domain-specific knowledge within the evolutionary processes, optimizing genetic operations and significantly boosting the algorithm’s performance.

On a practical level, the KCCEA’s dynamic resource allocation strategy, which adjusts based on real-time performance metrics, ensures the algorithm continuously adapts to the most promising solution paths, thus preventing premature convergence to suboptimal solutions. This adaptability underscores the practical utility of the KCCEA in real-world applications where decision variables are complexly interrelated. Nevertheless, the approach is heavily reliant on the initial accuracy of the domain knowledge used to guide the feature grouping, which is a limitation. Future work could address this by exploring autonomous methods for knowledge acquisition and integration. Further research might also consider applying the competitive–cooperative framework to other optimization domains, potentially broadening the scope and impact of this advanced evolutionary approach.

Author Contributions

Conceptualization, J.Z. and H.Z. (Haowen Zheng); Methodology, J.Z.; Software, J.Z.; Validation, J.Z., H.Z. (Haoyang Zhang), and X.W.; Formal Analysis, J.Z.; Investigation, S.L. and Q.H.; Resources, W.G.; Data Curation, H.Z. (Haoyang Zhang); Writing—Original Draft Preparation, J.Z.; Writing—Review and Editing, X.W.; Visualization, J.Z.; Supervision, X.W.; Project Administration, X.W.; Funding Acquisition, W.G. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Training Program of Innovation and Entrepreneurship for Undergraduates, Project 241221.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Qiu, J.; Wu, Q.; Ding, G.; Xu, Y.; Feng, S. A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 2016, 2016, 67. [Google Scholar] [CrossRef]
Feng, S.; Zhao, L.; Shi, H.; Wang, M.; Shen, S.; Wang, W. One-dimensional VGGNet for high-dimensional data. Appl. Soft Comput. 2023, 135, 110035. [Google Scholar] [CrossRef]
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
Khan, M.A.; Alqahtani, A.; Khan, A.; Alsubai, S.; Binbusayyis, A.; Ch, M.M.I.; Yong, H.S.; Cha, J. Cucumber leaf diseases recognition using multi level deep entropy-ELM feature selection. Appl. Sci. 2022, 12, 593. [Google Scholar] [CrossRef]
Karlupia, N.; Abrol, P. Wrapper-based optimized feature selection using nature-inspired algorithms. Neural Comput. Appl. 2023, 35, 12675–12689. [Google Scholar] [CrossRef]
Khan, W.A.; Chung, S.H.; Awan, M.U.; Wen, X. Machine learning facilitated business intelligence (Part I) Neural networks learning algorithms and applications. Ind. Manag. Data Syst. 2020, 120, 164–195. [Google Scholar] [CrossRef]
Ahadzadeh, B.; Abdar, M.; Safara, F.; Khosravi, A.; Menhaj, M.B.; Suganthan, P.N. SFE: A simple, fast and efficient feature selection algorithm for high-dimensional data. IEEE Trans. Evol. Comput. 2023, 27, 1896–1911. [Google Scholar] [CrossRef]
Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature selection: A data perspective. ACM Comput. Surv. (CSUR) 2017, 50, 1–45. [Google Scholar] [CrossRef]
Jiao, R.; Nguyen, B.H.; Xue, B.; Zhang, M. A Survey on Evolutionary Multiobjective Feature Selection in Classification: Approaches, Applications, and Challenges. IEEE Trans. Evol. Comput. 2023, 1. [Google Scholar] [CrossRef]
Pan, H.; Chen, S.; Xiong, H. A high-dimensional feature selection method based on modified Gray Wolf Optimization. Appl. Soft Comput. 2023, 135, 110031. [Google Scholar] [CrossRef]
Alabrah, A. A novel study: GAN-based minority class balancing and machine-learning-based network intruder detection using chi-square feature selection. Appl. Sci. 2022, 12, 11662. [Google Scholar] [CrossRef]
Patil, A.R.; Kim, S. Combination of ensembles of regularized regression models with resampling-based lasso feature selection in high dimensional data. Mathematics 2020, 8, 110. [Google Scholar] [CrossRef]
Jeon, H.; Oh, S. Hybrid-recursive feature elimination for efficient feature selection. Appl. Sci. 2020, 10, 3211. [Google Scholar] [CrossRef]
Zhou, A.; Qu, B.Y.; Li, H.; Zhao, S.Z.; Suganthan, P.N.; Zhang, Q. Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm Evol. Comput. 2011, 1, 32–49. [Google Scholar] [CrossRef]
Almutairi, M.S. Evolutionary Multi-Objective Feature Selection Algorithms on Multiple Smart Sustainable Community Indicator Datasets. Sustainability 2024, 16, 1511. [Google Scholar] [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Xu, Q.; Xu, Z.; Ma, T. A survey of multiobjective evolutionary algorithms based on decomposition: Variants, challenges and future directions. IEEE Access 2020, 8, 41588–41614. [Google Scholar] [CrossRef]
Li, L.; Xuan, M.; Lin, Q.; Jiang, M.; Ming, Z.; Tan, K.C. An evolutionary multitasking algorithm with multiple filtering for high-dimensional feature selection. IEEE Trans. Evol. Comput. 2023, 27, 802–816. [Google Scholar] [CrossRef]
Deb, K. Multi-Objective Optimization Using Evolutionary Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2001; Volume 16. [Google Scholar]
Goh, C.K.; Tan, K.C.; Liu, D.; Chiam, S.C. A competitive and cooperative co-evolutionary approach to multi-objective particle swarm optimization algorithm design. Eur. J. Oper. Res. 2010, 202, 42–54. [Google Scholar] [CrossRef]
Zhou, X.; Cai, X.; Zhang, H.; Zhang, Z.; Jin, T.; Chen, H.; Deng, W. Multi-strategy competitive-cooperative co-evolutionary algorithm and its application. Inf. Sci. 2023, 635, 328–344. [Google Scholar] [CrossRef]
Xie, D.; Ding, L.; Hu, Y.; Wang, S.; Xie, C.; Jiang, L. A Multi-Algorithm Balancing Convergence and Diversity for Multi-Objective Optimization. J. Inf. Sci. Eng. 2013, 29, 811–834. [Google Scholar]
Xiang, Y.; Lu, X.; Cai, D.; Chen, J.; Bao, C. Multi-algorithm fusion–based intelligent decision-making method for robotic belt grinding process parameters. Int. J. Adv. Manuf. Technol. 2024, 1–16. [Google Scholar] [CrossRef]
Zhan, Z.H.; Li, J.; Cao, J.; Zhang, J.; Chung, H.S.H.; Shi, Y.H. Multiple populations for multiple objectives: A coevolutionary technique for solving multiobjective optimization problems. IEEE Trans. Cybern. 2013, 43, 445–463. [Google Scholar] [CrossRef] [PubMed]
Zou, J.; Sun, R.; Liu, Y.; Hu, Y.; Yang, S.; Zheng, J.; Li, K. A multi-population evolutionary algorithm using new cooperative mechanism for solving multi-objective problems with multi-constraint. IEEE Trans. Evol. Comput. 2023, 28, 267–280. [Google Scholar] [CrossRef]
Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef]
Gunantara, N. A review of multi-objective optimization: Methods and its applications. Cogent Eng. 2018, 5, 1502242. [Google Scholar] [CrossRef]
Wang, Z.; Pei, Y.; Li, J. A survey on search strategy of evolutionary multi-objective optimization algorithms. Appl. Sci. 2023, 13, 4643. [Google Scholar] [CrossRef]
Srinivas, N.; Deb, K. Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol. Comput. 1994, 2, 221–248. [Google Scholar] [CrossRef]
Dutta, S.; Das, K.N. A survey on pareto-based eas to solve multi-objective optimization problems. In Soft Computing for Problem Solving: SocProS 2017, Volume 2; Springer: Singapore, 2019; pp. 807–820. [Google Scholar]
Espinosa, R.; Jiménez, F.; Palma, J. Surrogate-assisted and filter-based multiobjective evolutionary feature selection for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 2023. [CrossRef]
Xu, H.; Xue, B.; Zhang, M. A duplication analysis-based evolutionary algorithm for biobjective feature selection. IEEE Trans. Evol. Comput. 2020, 25, 205–218. [Google Scholar] [CrossRef]
Strehl, A. Relationship-Based Clustering and Cluster Ensembles for High-Dimensional Data Mining; The University of Texas at Austin: Austin, TX, USA, 2002. [Google Scholar]
Yang, J.M.; Kao, C.Y. A combined evolutionary algorithm for real parameters optimization. In Proceedings of the IEEE International Conference on Evolutionary Computation, Nagoya, Japan, 20–22 May 1996; IEEE: New York, NY, USA, 1996; pp. 732–737. [Google Scholar]
Feng, Z.k.; Niu, W.j.; Liu, S. Cooperation search algorithm: A novel metaheuristic evolutionary intelligence algorithm for numerical optimization and engineering optimization problems. Appl. Soft Comput. 2021, 98, 106734. [Google Scholar] [CrossRef]
Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 21 May 2024).
While, L.; Hingston, P.; Barone, L.; Huband, S. A faster algorithm for calculating hypervolume. IEEE Trans. Evol. Comput. 2006, 10, 29–38. [Google Scholar] [CrossRef]
Zitzler, E.; Thiele, L.; Laumanns, M.; Fonseca, C.M.; Da Fonseca, V.G. Performance assessment of multiobjective optimizers: An analysis and review. IEEE Trans. Evol. Comput. 2003, 7, 117–132. [Google Scholar] [CrossRef]
Deb, K.; Thiele, L.; Laumanns, M.; Zitzler, E. Scalable test problems for evolutionary multiobjective optimization. In Evolutionary Multiobjective Optimization: Theoretical Advances and Applications; Springer: London, UK, 2005; pp. 105–145. [Google Scholar]
Huband, S.; Barone, L.; While, L.; Hingston, P. A scalable multi-objective test problem toolkit. In Proceedings of the Evolutionary Multi-Criterion Optimization: Third International Conference, EMO 2005, Guanajuato, Mexico, 9–11 March 2005. Proceedings 3; Springer: Berlin/Heidelberg, Germany, 2005; pp. 280–295. [Google Scholar]
Khan, W.A. Balanced weighted extreme learning machine for imbalance learning of credit default risk and manufacturing productivity. Ann. Oper. Res. 2023, 1–29. [Google Scholar] [CrossRef]
Khan, W.A.; Masoud, M.; Eltoukhy, A.E.; Ullah, M. Stacked encoded cascade error feedback deep extreme learning machine network for manufacturing order completion time. J. Intell. Manuf. 2024, 1–27. [Google Scholar] [CrossRef]

Figure 1. Binary encoding for feature selection.

Figure 2. An example of knowledge-guided crossover and mutation.

Figure 3. Framework of competition–cooperation evolutionary mechanism.

Figure 4. Resource allocation mechanism based on success rate.

Figure 5. The calculation method of IGD.

Figure 6. Groupingperformance by threshold.

Figure 7. Pareto fronts obtained by each algorithm in the objective space: MUSK1.

Figure 8. Pareto fronts obtained by each algorithm in the objective space: Toxic.

Table 1. Special case handling.

Special Cases	Allocation _Rate
allocation_rate > 0.9	0.9
allocation_rate < 0.1	0.1
success_rate1 = 0 AND success_rate2 ≠ 0	0.1
success_rate1 ≠ 0 AND success_rate2 = 0	0.9
success_rate1 = 0 AND success_rate2 = 0	0.5

Table 2. Classification datasets.

No.	Dataset Name	Instance	Feature	Class
1	Wine	178	13	3
2	MUSK1	476	166	2
3	LSVT_Voice	126	310	2
4	ISOLET5	1559	617	26
5	Toxic	171	1203	2

Table 3. The optimal threshold for datasets.

Dataset	Optimal Threshold	Feature	Groups
1	0.65	13	11
2	0.76	166	71
3	0.77	310	96
4	0.72	617	260
5	0.75	1203	350

Table 4. Mean IGD performance on test data. The best performance is highlighted in gray.

Dataset	KCCEA (Optimal)		KCCEA (0.5)
Dataset	Mean	Variance	Mean	Variance
1	$2.55 \times 10^{- 2}$	$2.63 \times 10^{- 4}$	$2.74 \times 10^{- 2}$	$2.91 \times 10^{- 4}$
2	$2.55 \times 10^{- 2}$	$1.11 \times 10^{- 5}$	$2.01 \times 10^{- 2}$	$1.93 \times 10^{- 5}$
3	$2.54 \times 10^{- 2}$	$3.55 \times 10^{- 5}$	$2.23 \times 10^{- 2}$	$1.39 \times 10^{- 5}$
4	$2.10 \times 10^{- 2}$	$4.12 \times 10^{- 4}$	$1.87 \times 10^{- 2}$	$5.13 \times 10^{- 4}$
5	$3.63 \times 10^{- 2}$	$1.05 \times 10^{- 4}$	$3.29 \times 10^{- 2}$	$2.70 \times 10^{- 4}$

Table 5. Mean IGD performance on test data. The best performance is highlighted in gray.

Dataset	NSGA-II		KGEA		KCCEA
Dataset	Mean	Variance	Mean	Variance	Mean	Variance
1	$2.05 \times 10^{- 2}$	$1.43 \times 10^{- 4}$	$2.85 \times 10^{- 2}$	$2.71 \times 10^{- 4}$	$2.55 \times 10^{- 2}$	$2.63 \times 10^{- 4}$
2	$1.66 \times 10^{- 1}$	$1.09 \times 10^{- 4}$	$5.60 \times 10^{- 2}$	$1.93 \times 10^{- 4}$	$2.55 \times 10^{- 2}$	$1.11 \times 10^{- 5}$
3	$2.55 \times 10^{- 1}$	$1.19 \times 10^{- 5}$	$6.73 \times 10^{- 2}$	$8.62 \times 10^{- 5}$	$2.54 \times 10^{- 2}$	$3.55 \times 10^{- 5}$
4	$1.03 \times 10^{- 1}$	$2.70 \times 10^{- 5}$	$3.08 \times 10^{- 2}$	$3.20 \times 10^{- 4}$	$2.10 \times 10^{- 2}$	$4.12 \times 10^{- 4}$
5	$8.63 \times 10^{- 2}$	$1.05 \times 10^{- 5}$	$4.60 \times 10^{- 2}$	$4.94 \times 10^{- 4}$	$3.63 \times 10^{- 2}$	$1.05 \times 10^{- 4}$

Table 6. Mean HV performance on test data. The best performance is highlighted in gray.

Dataset	NSGA-II		KGEA		KCCEA
Dataset	Mean	Variance	Mean	Variance	Mean	Variance
1	$8.98 \times 10^{- 1}$	$3.36 \times 10^{- 5}$	$8.95 \times 10^{- 1}$	$1.80 \times 10^{- 4}$	$8.94 \times 10^{- 1}$	$2.77 \times 10^{- 4}$
2	$1.66 \times 10^{- 1}$	$1.09 \times 10^{- 4}$	$4.67 \times 10^{- 3}$	$1.93 \times 10^{- 4}$	$8.42 \times 10^{- 1}$	$1.32 \times 10^{- 4}$
3	$6.24 \times 10^{- 1}$	$5.71 \times 10^{- 5}$	$8.34 \times 10^{- 1}$	$1.29 \times 10^{- 4}$	$8.93 \times 10^{- 1}$	$3.55 \times 10^{- 5}$
4	$2.41 \times 10^{- 1}$	$6.70 \times 10^{- 4}$	$5.16 \times 10^{- 1}$	$3.20 \times 10^{- 4}$	$5.44 \times 10^{- 1}$	$3.77 \times 10^{- 4}$
5	$3.36 \times 10^{- 1}$	$1.57 \times 10^{- 4}$	$3.84 \times 10^{- 1}$	$1.13 \times 10^{- 4}$	$3.96 \times 10^{- 1}$	$2.23 \times 10^{- 4}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, J.; Zheng, H.; Li, S.; Hao, Q.; Zhang, H.; Gao, W.; Wang, X. A Knowledge-Guided Competitive Co-Evolutionary Algorithm for Feature Selection. Appl. Sci. 2024, 14, 4501. https://doi.org/10.3390/app14114501

AMA Style

Zhou J, Zheng H, Li S, Hao Q, Zhang H, Gao W, Wang X. A Knowledge-Guided Competitive Co-Evolutionary Algorithm for Feature Selection. Applied Sciences. 2024; 14(11):4501. https://doi.org/10.3390/app14114501

Chicago/Turabian Style

Zhou, Junyi, Haowen Zheng, Shaole Li, Qiancheng Hao, Haoyang Zhang, Wenze Gao, and Xianpeng Wang. 2024. "A Knowledge-Guided Competitive Co-Evolutionary Algorithm for Feature Selection" Applied Sciences 14, no. 11: 4501. https://doi.org/10.3390/app14114501

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Knowledge-Guided Competitive Co-Evolutionary Algorithm for Feature Selection

Abstract

1. Introduction

2. Background

2.1. Related Work

2.1.1. Multi-Objective Optimization Problem

2.1.2. Non-Dominated Sorting Genetic Algorithm II

2.2. Major Motivations

3. Knowledge-Guided Evolutionary Algorithm

3.1. Feature Grouping Based on Correlation

3.2. Knowledge-Guided Crossover and Mutation

4. Competition–Cooperation Evolution

4.1. Competition–Cooperation Evolutionary Mechanism

4.2. Dynamic Resource Allocation Based on Success Rate

5. Experimental Setup

5.1. Classification Datasets

5.2. Performance Metrics

5.2.1. Hypervolume

5.2.2. Inverted Generational Distance

5.3. Parameter Settings

6. Experimental Results and Analysis

6.1. Grouping Threshold Sensitivity Analysis

6.2. Effectiveness of Knowledge-Guided Evolution

6.3. Effectiveness of Dynamic Resource Allocation

6.4. Knowledge-Guided Competitive–Cooperative Evolutionary Algorithm against Baseline Algorithm

6.5. Analysis of Pareto Front Distributions

6.6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI