Next Article in Journal
A Decision Feedback Equalization Algorithm Based on Simplified Volterra Structure for PAM4 IM-DD Optical Communication Systems
Next Article in Special Issue
The Efficacy and Utility of Lower-Dimensional Riemannian Geometry for EEG-Based Emotion Classification
Previous Article in Journal
K-Means Clustering of 51 Geospatial Layers Identified for Use in Continental-Scale Modeling of Outdoor Acoustic Environments
Previous Article in Special Issue
Detection of Human Visceral Leishmaniasis Parasites in Microscopy Images from Bone Marrow Parasitological Examination
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Feature Construction Method That Combines Particle Swarm Optimization and Grammatical Evolution

by
Ioannis G. Tsoulos
* and
Alexandros Tzallas
Department of Informatics and Telecommunications, University of Ioannina, 451 10 Ioannina, Greece
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(14), 8124; https://doi.org/10.3390/app13148124
Submission received: 9 May 2023 / Revised: 24 June 2023 / Accepted: 11 July 2023 / Published: 12 July 2023

Abstract

:
The problem of data classification or data fitting is widely applicable in a multitude of scientific areas, and for this reason, a number of machine learning models have been developed. However, in many cases, these models present problems of overfitting and cannot generalize satisfactorily to unknown data. Furthermore, in many cases, many of the features of the input data do not contribute to learning, or there may even be hidden correlations between the features of the dataset. The purpose of the proposed method is to significantly reduce data classification or regression errors through the usage of a technique that utilizes the particle swarm optimization method and grammatical evolution. This method is divided into two phases. In the first phase, artificial features are constructed using grammatical evolution, and the progress of the creation of these features is controlled by the particle swarm optimization method. In addition, this new technique utilizes penalty factors to limit the generated features to a range of values to make training machine learning models more efficient. In the second phase of the proposed technique, these features are exploited to transform the original dataset, and then any machine learning method can be applied to this dataset. The performance of the proposed method was measured on some benchmark datasets from the relevant literature. Also, the method was tested against a series of widely used machine learning models. The experiments performed showed a significant improvement of 30% on average in the classification datasets and an even greater improvement of 60% in the data fitting datasets.

1. Introduction

A multitude of everyday problems from various sciences can be treated as problems of classification or data fitting problems, such as problems that appear in the fields of physics [1,2,3,4], chemistry [5,6,7], economics [8,9], environmental problems [10,11,12], and medical problems [13,14]. In the relevant literature, there is a wide range of techniques that one can use to handle such problems, such as the k nearest neighbors model (k-NN) [15,16], artificial neural networks (ANNs) [17,18], radial basis function (RBF) networks [19,20], support vector machines (SVM) [21,22], and decision trees [23,24]. Also, many practical problems have been tackled using machine learning approaches, such as prediction of non-breaking waves [25], energy conservation problems [26], and the prediction of scour depth at seawalls using genetic programming and neural networks [27]. Furthermore, machine learning models have been used in various complex tasks such as neural machine translation [28], oil distribution [29], image processing [30], robotics [31], and hydracarbon production [32]. A brief description of the methods that can be used for classification datasets is given in the publication of Kotsiantis et al. [33].
In the majority of cases, machine learning models have a number of parameters that should be determined through some algorithms. These parameters include the weights of artificial neural networks, which can be estimated with techniques such as the backpropagation method [34,35] or genetic algorithms [36,37,38], as well as the hyperparameters of learning models, which require different approaches [39,40,41]. However, most of the time, there are some problems in the parameterization of machine learning models:
  • A long training time is required which is proportional to the dimension of the input data. For example, in a neural network with 1 hidden layer equipped with 10 processing nodes and a provided dataset with 10 inputs, more than N = 100 parameters are required to build the neural network. Therefore, the size of the network will grow proportionally to the problem, and longer training times will be required for the model.
  • Another important problem presented in machine learning techniques is the fact that many models require significant storage space in the computer’s memory for their parameters, and in fact, this space increases significantly with the increase in the dimension of the objective problem. For example, in the Bfgs [42] optimization method, O N 2 storage space will be required for the training model and for the partial derivatives required by the optimization method. This issue was thoroughly discussed in a paper by Verleysen et al. [43]. Some common approaches proposed in order to reduce the dimension of the input datasets are the principal component analysis (PCA) method [44,45,46] as well as the minimum redundancy feature selection (MRMR) technique [47,48]. Moreover, Pourzangbar proposed a feature selection method [49] based on genetic programming for the determination of the most effective parameters for scour depth at seawalls. The proposed technique, in addition to creating artificial features, also essentially selects features at the same time, since it can remove from the final features those that will not bring significant benefits to the learning of the objective problem. Furthermore, Wang et al. proposed an auto-encoder reduction method, which was applied on a series of large datasets [50].
  • Another interesting problem, which has been tackled by dozens of researchers in the last few decades, is that of overfitting of neural networks or machine learning models in general. In this problem, although the machine learning model has achieved a satisfactory level of training, this is not reflected in the unknown patterns (test set) that were not present during training. The paper by Geman et al. [51] as well the article by Hawkins [52] thoroughly discussed the topic of overfitting. Examples of techniques proposed to tackle this problem are the weight sharing methods [53,54], methods that reduce the number of parameters of the model (pruning methods) [55,56], weight elimination [57,58,59], weight decaying methods [60,61], dropout methods [62,63], the Sarprop method [64], and positive correlation methods [65]. Recently, a variety of papers have proposed methods to handle the overfitting problem in various cases, such as the usage of genetic algorithms for training data selection in RBF networks [66], the evolution of RBF models using genetic algorithms for rainfall prediction [67], and pruning decision trees using genetic algorithms [68,69].
This paper recommends a two-phase method for data classification or regression problems. In the first phase, a global optimization method directs the production of artificial features from the existing ones with the help of grammatical evolution [70]. Grammatical evolution is a variation of genetic programming where the chromosomes are production rules of the target BNF grammar, and it has been used successfully in a variety of applications, such as music composition [71], economics [72], symbolic regression [73], robotics [74], and caching algorithms [75]. The global optimization method used in this work is the particle swarm optimization (PSO) method [76,77,78]. The PSO method was selected as the optimization method due to its simplicity and the small number of parameters that should be set. Also, the PSO method has been used in many difficult problems in all areas of the sciences, such as problems that arise in physics [79,80], chemistry [81,82], medicine [83,84], and economics [85]. Furthermore, the PSO method was successfully applied recently in many practical problems such as flow shop scheduling [86], the successful development of electric vehicle charging strategies [87], emotion recognition [88], robotics [89], the optimal design of a brace-viscous damper and pendulum tuned mass damper [90], application to high-dimensional expensive industrial problems [91], and RFID readers [92]. The generated artificial features are nonlinear combinations of the original ones, and any machine learning model can be used to effectively estimate their dynamics. In the present implementation, the RBF network was used since it is a widely tested machine learning model, but also because its training is much faster compared with other models. In the second phase, the best features obtained from the first phase are also used to modify the test set of the objective problem, and a machine learning method can be used to estimate the error in the control set.
The idea of creating artificial features using grammatical evolution was first introduced in the paper by Gavrilis et al. [93], and it has been successfully applied on a series of problems, such as spam identification [94], fetal heart classification [95], epileptic oscillations [96], the construction of COVID-19 predictive models [97], and performance and early drop prediction for higher education students [98].
Feature selection using neural networks has been also proposed in a series of papers, such as the work of Verikas and Bacauskiene [99] or the work of Kabir et al. [100]. Moreover, Devi utilized a simulated annealing approach [101] to select the most important features for classification datasets. Also, Neshatian et al. [102] developed a genetic algorithm that produces features using an entropy-based fitness function.
The rest of this article is organized as follows. In Section 2, the steps of the proposed method are fully described. In Section 3, the used experimental datasets as well as the results obtained by the incorporation of the proposed method are outlined. Finally, in Section 4, some conclusions are listed.

2. The Proposed Method

This section will introduce the main parts of the proposed two-step method. The first subsection will introduce the basics of grammatical evolution and give a complete example of building a valid function from a chromosome. Next, the process by which the grammatical evolution chromosomes can be used to create artificial features from existing ones will be presented in Section 2.2 The procedure by which the fitness of each chromosome can be assessed is presented in Section 2.3. Finally, in Section 2.4, the overall algorithm is presented along with a flowchart for its graphical representation.

2.1. The Technique of Grammatical Evolution

The process of grammatical evolution uses chromosomes that represent the production rules of the underlying Backus–Naur form (BNF) grammar [103] of the objective problem. BNF grammars have been widely used to describe the syntax of programming languages. Any BNF grammar is a set G = N , T , S , P where the following are true:
  • The set N represents the non-terminal symbols of the grammar. Any non-terminal symbol is analyzed to a series of terminal symbols using the production rules of the grammar.
  • T is the set of terminal symbols.
  • The non-terminal symbol S represents the start symbol of the grammar.
  • The set P contains the production rules of the grammar. Typically, any production rule is expressed in the form A a or A a B , A , B N , a T .
The process that creates a valid program starts from the symbol S and gradually replaces non-terminal symbols with the right-hand side of the selected production rule from the provided chromosome. The rule is selected with the following steps:
  • In the first step, the next element is taken from the current chromosome. Let us denote this as V.
  • The next production rule is selected through
    Rule = V mod N R
    where N R is the total number of production rules for the current non-terminal symbol.
The BNF grammar for the proposed method is shown in Figure 1. Symbols in < > brackets denote non-terminal symbols that belong to set N. In every line of the grammar, a production rule is shown for every non-terminal symbol. The numbers in parentheses represent the sequence number of the production rule for the corresponding non-terminal symbol. For example, the non-terminal symbol <op> has four production rules, with each leading to a terminating arithmetic operation symbol. The constant N is the dimension of the input dataset.
An example that produces a valid expression for the chromosome
x = 10 , 12 , 20 , 8 , 11
with N = 3 is shown in Table 1. This chromosome represents a series of sequential numbers of production rules from the above grammar. The grammar evolution method takes the elements of the chromosome one by one and finds the corresponding production rule by using the remainder of the division and the number of symbols of each non-terminal symbol. The final expression created is f ( x ) = sin x 3 .

2.2. Feature Construction

In the proposed technique, the chromosomes of grammatical evolution are used as a set of functions that create artificial features as nonlinear combinations of the existing ones. This process can also be considered a feature selection method, since it is possible that only a part of the original features can be used in the generated features. The proposed method creates N f artificial features from the original ones, and the process for any chromosome p is as follows:
  • Divide p into N f parts. Every part is denoted as the p i sub-particle.
  • For each sub-particle p i , a new artificial feature g i x , p i is constructed with the grammar of Figure 1 as a nonlinear combination of the original set of features x .
The final set of features will be considered mapping functions of the original ones. For example, the set
g ( x , p ) = g 1 ( x , p 1 ) = x 1 2 + 2 x 3 g 2 ( x , p 2 ) = 3 cos x 2
is a set of mapping functions for the original features x = x 1 , x 2 , x 3 . However, sometimes the generated features can lead to extreme values, and this will result in generalization problems from the used machine learning models. For this reason, and in the present work, penalty factors are used so that the mapping functions do not lead to extreme values. These penalty factors also modify the fitness function that the particle swarm optimization technique will minimize each time and are considered next.

2.3. Fitness Calculation

Each chromosome in grammatical evolution produces a series of artificial features, which are nonlinear functions of existing features. However, an evaluation and a distinction should be made between those sets of features which will yield more in the learning process and those which will yield less. This is accomplished by assessing the appropriateness of these features. In order to be able to compute the fitness of each group of features, the original training set should be reduced using the artificial features that have been produced, and the following steps should be executed for any given chromosome p:
  • Denote as  TO = x 1 , y 1 , x 2 , y 2 , , x M , y M the original training set.
  • Set  V = 0 for the penalty factor.
  • Compute the mapping function g ( x , p ) as suggested in Section 2.2.
  • Set  TF = for the modified training set.
  • For  i = 1 , , M , carry out the following steps.
    (a)
      Set  x i ˜ = g x i , p .
    (b)
      Set  TF = TF x i ˜ , y i .
    (c)
      If  x i ˜ > L max , then  V = V + 1 , where L max a predefined positive. value.
  • End For.
  • Train an RF C ( x ) with H processing NODES on TF and obtain the following error:
    f p = j = 1 M C x i ˜ y j 2
  • Compute the final fitness value:
    f p = f p × 1 + λ V 2
    where λ > 0 .

2.4. The Used PSO Method

The mains steps for this algorithm are outlined in detail in Algorithm 1.
Algorithm 1 The base PSO algorithm executed in one processing unit.
  • Initialization Step.
    (a)
    Set  iter = 0 .
    (b)
    Set m as the total number of particles.
    (c)
    Set  iter max as the maximum number of iterations allowed.
    (d)
    Initialize randomly the positions p 1 , p 2 , . . . , p m for the particles. For the grammatical evolution, every particle is a vector of randomly selected integers.
    (e)
    Initialize randomly the velocities u 1 , u 2 , . . . , u m . For the current work, every vector of velocities is a series of randomly selected integers in the range u min , u max . In the current work, u min = 5 and u max = 5 .
    (f)
    For  i = 1 . . m , set b i = p i . The vector b i denotes the best located position of particle p i .
    (g)
    Set  p best = arg min i 1 . . m f p i .
  • Termination Check Step. If iter iter max then go to step 8.
  • For  i = 1 . . m , do the following:
    (a)
    Compute the velocity u i as a combination of the vectors u i , p i , b i , and p best .
    (b)
    Set the new position for the particle to p i = p i + u i .
    (c)
    Calculate the fitness f p i for particle p i using the procedure described in Section 2.3.
    (d)
    If  f p i f b i , then b i = p i .
  • End For.
  • Set  p best = arg min i 1 . . m f p i .
  • Set  iter = iter + 1 .
  • Goto step 2.
  • Test step. Apply the mapping function of the best particle p best to the test set of the problem, and apply a machine learning model obtaining the corresponding test error.
The above calculates at every iteration the new position of the particle i using
p i = p i + u i
In most cases, the new velocity could be a linear combination of the previously computed velocity and the corresponding vectors for the best values b i and p best , and it can be defined as follows:
u i = ω u i + r 1 c 1 b i p i + r 2 c 2 p best p i
where the following are true:
  • The variables r 1 , r 2 are random numbers defined in [ 0 , 1 ] .
  • The constants c 1 , c 2 are defined in the range [ 1 , 2 ] .
  • The variable ω , commonly called the inertia, was suggested by Shim and Earhart [76]. In the original paper, they proposed the idea that large values for the inertia coefficient can lead to a better exploration of the search space, while smaller values of the coefficient lead to the method being concentrated around regions likely to contain the global minimum. Hence, in their work, the value of the inertia factor generally started with large values and decreased with repetition. In the current work, the inertia value was computed through the following equation:
ω = 0.5 + r 2
The variable r is a a random number with r [ 0 , 1 ] . This inertia calculation was proposed in [104]. With this calculation of the inertia variable, an even better exploration of the research space is achieved with the randomness it introduces, something that was also found in the publication of Charilogis and Tsoulos [105].
A flowchart for the overall process is shown in Figure 2.

3. Experiments

The ability of the proposed technique to produce effective artificial features for class prediction and feature learning will be measured in this section on some datasets from the relevant literature. These problems have been studied by various researchers in the relevant literature and cover a wide range of research areas from physics to economics. These datasets come from the following relevant websites:
The proposed technique will be compared with a series of known machine learning techniques, and the experimental results are then presented in the relevant tables.

3.1. Experimental Datasets

The classification problems used in the experiments have the following:
  • Appendicitis, a medical dataset [107,108];
  • The Australian dataset [109], an dataset concerning economical transactions in banks;
  • The Balance dataset, a dataset generated to model psychological experimental results [110];
  • The Bands dataset, a dataset used in rotogravure printing [111];
  • The Dermatology dataset [112], a medical dataset used to detect a type of eryhemato-squamous disease;
  • The Hayes Roth dataset [113];
  • The Heart dataset [114], a medical dataset used to detect heart diseases;
  • The House Votes dataset [115], a dataset related to the congressional voting records of the USA;
  • The Ionosphere dataset, used to classify measurements from the ionosphere, which has been examined in a variety of research papers [116,117];
  • The Liver disorder dataset [118,119], a dataset used for medical purposes;
  • The Mammography dataset [120], a medical dataset used for breast cancer diagnosis;
  • The Parkinson’s dataset [121,122], a dataset used to detect Parkinson’s disease using voice measurements;
  • The Puma dataset [123], a dataset used for medical purposes;
  • The Pop failures dataset [124], a dataset related to meteorological data;
  • The Regions2 dataset, a medical dataset for liver biopsy images [125];
  • The Suharto dataset [126], a medical dataset;
  • The Segment dataset [127], a dataset related to image segmentation;
  • The BC dataset [128], which is used for breast tumors;
  • The Wine dataset, a dataset related to chemical analysis of wines [129,130];
  • EEG datasets [131,132], which is an EEG dataset where the following cases were used in the experiments:
    (a)
    Z_F_S;
    (b)
    ZOE_NF_S;
    (c)
    ZONE_S.
  • The Zoo dataset [133].
The regression datasets used in the relevant experiments are the following:
  • The Abalone dataset [134], a dataset used to predict the age of abalones;
  • The Airfoil dataset, a dataset provided by NASA [135] which was obtained from a series of aerodynamic and acoustic tests;
  • The Baseball dataset, a dataset related to the salaries of baseball players;
  • The BK dataset [136], a dataset used to calculate the points in a basketball game;
  • The BL dataset, which is used in machine problems;
  • The Concrete dataset [137], a civil engineering dataset for calculating concrete’s compressive strength;
  • The Dee dataset, which is used to estimate the daily average price of 1 KWh of electricity energy in Spain;
  • The Diabetes dataset, which is a medical dataset;
  • The Housing dataset [138];
  • The FA dataset, which is used to fit body fat to other measurements;
  • The MB dataset [136];
  • The MORTGAGE dataset, holding economic data from the USA with the goal is to predict the 30-year conventional mortgage rate;
  • The PY dataset (pyrimidines problem) [139];
  • The Quake dataset, which is used to approximate the strength of an earthquake given the depth of its focal point and its latitude and longitude;
  • The Treasure dataset, which contains economic data for the USA, where the the goal is to predict the one-month CD rate.

3.2. Experimental Results

In order to give greater credibility to the experiments carried out, the method of tenfold cross validation was incorporated for every experimental dataset. Every experiment was repeated 30 times, using different seeds for the random generator each time. All the used code was implemented in ANSI C++ using the OPTIMUMS programming library for optimization purposes, which is freely available at https://github.com/itsoulos/OPTIMUMS/(accessed on 10 July 2023). For the classification datasets, the average classification error as measured in the test set is reported, while for the regression datasets, the average regression error is reported. Here, for the term classification error, we mean the percentage of patterns in the test set that were classified into a different class than expected. Also, in every table, an additional column, “AVERAGE”, was added to show the average classification or regression error for the corresponding datasets. The values for the experimental parameters are shown in Table 2.
In all techniques, the same parameter sets and the same random numbers were used in order to have a fair comparison of the experimental results.
In the first phase, the proposed technique will generate new artificial features from the existing ones with the help of a technique guided by the partnership of particle swarm optimization and grammatical evolution. In the second phase of the technique, these features will be used to modify the original test set, to which any machine learning method can now be applied. In the second phase of the current work, two different techniques will be used: an RBF neural network and an artificial neural network, which will be trained using a genetic algorithm. This is performed in order to establish the potential of the proposed procedure and improve the performance of both simple machine learning models and complex models. The proposed technique that created artificial features was compared on the same datasets against a series of well-known methods from the relevant literature:
  • A genetic algorithm with m chromosomes, denotes as GENETIC in the experimental tables, is used to train an artificial neural network with H hidden nodes. After termination of the genetic algorithm, the local optimization method BEFOGS is applied to the best chromosome of the population.
  • The radial basis function (RBF) network [140] with H processing nodes.
  • The optimization method Adam [141], which is used to train an artificial neural network with H hidden nodes.
  • The Prop optimization method [142,143], which is used to train an artificial neural network with H hidden nodes.
  • The NEAT method (Revolution of Augmenting Typologies) [144].
The experimental results using the above methods on the classification datasets are shown in Table 3, and the results for the regression datasets are illustrated in Table 4.
The results using the proposed method and for the construction of two, three, and four artificial features are presented in Table 5 and Table 6. The RBF column represents the experimental results in which, after the construction of the artificial features, an RBF network with H processing nodes was applied on the modified dataset. Also, the column marked “GENETIC” in Table 5 and Table 6 stands for the results obtained by the application of a genetic algorithm with m chromosomes to the modified dataset when the feature creation procedure was finished.
The experimental results are of great interest, as one can see from their careful study that the proposed technique was able to significantly reduce the error in the corresponding test sets. Especially in the case of regression problems, the reduction in error was, on average, greater than 50%. Moreover, the usage of a neural network trained by a genetic algorithm on the modified datasets gave clearly better results than the use of an RBF neural network, especially in the classification datasets. An additional test was performed for the regression datasets, where the number of particles in the PSO algorithm increased from 100 to 400, and the results are graphically illustrated in Figure 3.
Judging from the results, we can observe that the selection of 200 particles in the experimental results was an optimal choice and a compromise between the speed and efficiency of the method, as adding another 200 particles to the particle swarm optimization did not significantly improve the efficiency of the proposed method.
Moreover, a graphical comparison between the genetic algorithm when applied to artificial datasets and the genetic algorithm when applied to the original classification datasets is given in Figure 4. The same graphical comparison is also shown for the RBF model in Figure 5, and in these figures, the ability of the proposed method to drastically reduce the learning error through the construction of artificial features is evident.
Also, in Figure 6, a comparison for the regression datasets is outlined between the proposed method with the application of the genetic algorithm in second phase and the genetic algorithm on the original regression datasets.
Finally, using the Wilcox on a sandbank test, a comparison was made between the proposed method and all mentioned machine learning methods for the classification datasets. This comparison is graphically outlined in Figure 7.
These results suggest that the proposed method has a distinct advantage over the Welkin classification methods when it comes to constructing artificial features for classification tasks. It offered improved performance and provided a more effective solution for these datasets.

4. Conclusions

A hybrid technique that utilizes a particle swarm optimizer and a feature creation method using grammatical evolution was introduced here. The proposed method can identify possible dependencies between the original features and can also reduce the number of required features to a limited number. Also, the method can remove from the set of features those features that may not contribute to the learning of the dataset by some machine learning model. In addition, to make learning more efficient, the values of the generated features are bounded within a value interval using penalty factors. The constructed features are evaluated in terms of their effectiveness with the help of a fast machine learning model such as the RBF network, even though other more effective models could also be used. Among the advantages of the proposed procedure is the fact that it does not require any prior knowledge of the dataset to which it will be applied, and furthermore, the procedure is exactly the same whether it is a data classification problem or a data fitting problem. The particle swarm optimization method was used for the production of the characteristics, as it has been proven by the relevant literature to be an extremely efficient technique with a limited number of parameters that must be defined by the user.
The current work was applied on an extended series of widely used datasets from various fields and was compared against some machine learning models on the same datasets. From the experimental results, it was seen that the proposed technique dramatically improved the performance of traditional learning techniques when applied to artificial features. The proposed two-stage technique generated artificial features in the first stage guided by the particle swarm optimization technique, and in the second stage, either a neural network trained by a genetic algorithm or an RBF network was used in the modified test set. In both cases, the improvement from artificial feature generation in the control error was significant for each learning model. This improvement reached an average of 30% for data classification and 50% for data fitting problems. In fact, in many cases, the improvement in the test error exceeded 75%. Moreover, the method appears to be quite robust, since increasing the number of particles in the particle swarm optimization method did not appear to significantly reduce the average error in the test sets. Furthermore, increasing the number of features constructed did not seem to have a dramatic effect on the performance of the method, which means that the method was able to achieve good generalization results even with a limited number of features, which in turn led to greatly reducing the number of dimensions of the original problem. Future work on the method may include the use of parallel techniques for feature construction to drastically reduce the required execution time.

Author Contributions

I.G.T. and A.T. conceived the idea and the methodology, and I.G.T. implemented the corresponding software. I.G.T. conducted the experiments, employing objective functions as test cases, and provided the comparative experiments. A.T. performed the necessary statistical tests. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code used here is part of the freely available software OPTIMUS, available at https://github.com/itsoulos/OPTIMUMS/(accessed on 10 July 2023).

Acknowledgments

The experiments of this research work were performed with the high-performance computing system established at the Knowledge and Intelligent Computing Laboratory in the Department of Informatics and Telecommunications at the University of Ioannina, acquired with the project “Educational Laboratory equipment of TEI of Epirus” and with MIS 5007094 funded by the Operational Programme “Epirus” (2014–2020), the ERDF, and national funds.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Metodiev, E.M.; Nachman, B.; Thaler, J. Classification without labels: Learning from mixed samples in high energy physics. J. High Energy Phys. 2017, 2017, 174. [Google Scholar] [CrossRef] [Green Version]
  2. Baldi, P.; Cranmer, K.; Faucett, T.; Sadowski, P.; Whiteson, D. Parameterized neural networks for high-energy physics. Eur. Phys. J. C 2016, 76, 1–7. [Google Scholar] [CrossRef] [Green Version]
  3. Aniyan, A.K.; Thorat, K. Classifying Radio Galaxies with the Convolutional Neural Network. Astrophys. J. Suppl. 2017, 230, 20. [Google Scholar] [CrossRef] [Green Version]
  4. Carleo, G.; Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 2017, 355, 602–606. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Wei, J.N.; Duvenaud, D.; Guzik, A.A. Neural Networks for the Prediction of Organic Chemistry Reactions. ACS Cent. Sci. 2016, 2, 725–732. [Google Scholar] [CrossRef]
  6. Qi, C.; Fourie, A.; Chen, Q. Neural network and particle swarm optimization for predicting the unconfined compressive strength of cemented paste backfill. Constr. Build. Mater. 2018, 159, 473–478. [Google Scholar] [CrossRef]
  7. Gao, H.; Struble, T.J.; Coley, C.W.; Wang, Y.; Green, W.H.; Jensen, K.F. Using Machine Learning To Predict Suitable Conditions for Organic Reactions. ACS Cent. Sci. 2018, 4, 1465–1476. [Google Scholar] [CrossRef] [Green Version]
  8. Hafezi, R.; Shahrabi, J.; Hadavandi, E. A bat-neural network multi-agent system (BNNMAS) for stock price prediction: Case study of DAX stock price. Appl. Soft Comput. 2015, 29, 196–210. [Google Scholar] [CrossRef]
  9. Pang, X.; Zhou, Y.; Wang, P.; Lin, W.; Chang, V. An innovative neural network approach for stock market prediction. J. Supercomput. 2020, 76, 2098–2118. [Google Scholar] [CrossRef]
  10. Russo, A.; Lind, P.G.; Raischel, F.; Trigo, R.; Mendes, M. Neural network forecast of daily pollution concentration using optimal meteorological data at synoptic and local scales. Atmos. Res. 2015, 6, 540–549. [Google Scholar] [CrossRef] [Green Version]
  11. Azid, A.; Juahir, H.; Toriman, M.E.; Kamarudin, M.K.A.; Saudi, A.S.M.; Hasnam, C.N.C.; Aziz, N.A.A.; Azaman, F.; Latif, M.T.; Zainuddin, S.F.M.; et al. Prediction of the Level of Air Pollution Using Principal Component Analysis and Artificial Neural Network Techniques: A Case Study in Malaysia. Water Air Soil Pollut. 2014, 225, 2063. [Google Scholar] [CrossRef]
  12. Maleki, H.; Sorooshian, A.; Goudarzi, G.; Baboli, Z.; Tahmasebi Birgani, Y.; Rahmati, M. Air pollution prediction by using an artificial neural network model. Clean Technol. Environ. Policy 2019, 21, 1341–1352. [Google Scholar] [CrossRef] [PubMed]
  13. Baskin, I.I.; Winkler, D.; Tetko, I.V. A renaissance of neural networks in drug discovery. Expert Opin. Drug Discov. 2016, 11, 78–795. [Google Scholar] [CrossRef]
  14. Bartzatt, R. Prediction of Novel Anti-Ebola Virus Compounds Utilizing Artificial Neural Network (ANN). Chem. Fac. 2018, 49, 16–34. [Google Scholar]
  15. Liu, Z.G.; Pan, Q.; Dezert, J. A new belief-based K-nearest neighbor classification method. Pattern Recognit. 2012, 46, 834–844. [Google Scholar] [CrossRef]
  16. Deng, Z.; Zhu, X.; Cheng, D.; Zong, M.; Zhang, S. Efficient kNN classification algorithm for big data. Neurocomputing 2016, 195, 143–148. [Google Scholar] [CrossRef]
  17. Graupe, D. Principles of Artificial Neural Networks; World Scientific: Singapore, 2013; Volume 7. [Google Scholar]
  18. Samarasinghe, S. Neural Networks for Applied Sciences and Engineering: From Fundamentals to Complex Pattern Recognition; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
  19. Chen, D. Research on Traffic Flow Prediction in the Big Data Environment Based on the Improved RBF Neural Network. IEEE Trans. Ind. Inform. 2017, 13, 2000–2008. [Google Scholar] [CrossRef]
  20. Yang, Z.; Mourshed, M.; Liu, K.; Xu, X.; Feng, S. A novel competitive swarm optimized RBF neural network model for short-term solar power generation forecasting. Neurocomputing 2020, 397, 415–421. [Google Scholar] [CrossRef]
  21. Iranmehr, A.; Masnadi-Shirazi, H.; Vasconcelos, N. Cost-sensitive support vector machines. Neurocomputing 2019, 343, 50–64. [Google Scholar] [CrossRef] [Green Version]
  22. Cervantes, J.; Lamont, F.G.; Mazahua, L.R.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
  23. Kotsiantis, S.B. Decision trees: A recent overview. Artif. Intell. Rev. 2013, 39, 261–283. [Google Scholar] [CrossRef]
  24. Bertsimas, D.; Dunn, J. Optimal classification trees. Mach. Learn. 2017, 106, 1039–1082. [Google Scholar] [CrossRef]
  25. Pourzangbar, A.; Losada, M.A.; Saber, A.; Rasoul Ahari, L.; Larroudé, P.; Vaezi, M.; Brocchini, M. Prediction of non-breaking wave induced scour depth at the trunk section of breakwaters using Genetic Programming and Artificial Neural Networks. Coast. Eng. 2017, 121, 107–118. [Google Scholar] [CrossRef]
  26. Afsarian, F.; Saber, A.; Pourzangbar, A.; Olabi, A.G.; Khanmohammadi, M.A. Analysis of recycled aggregates effect on energy conservation using M5 model tree algorithm. Energy 2018, 156, 264–277. [Google Scholar] [CrossRef]
  27. Pourzangbar, A.; Saber, A.; Bakhtiary, A.Y.; Ahari, L.R. Predicting scour depth at seawalls using GP and ANNs. J. Hydroinform. 2017, 19, 349–363. [Google Scholar] [CrossRef]
  28. Liu, X.; He, J.; Yin, M.L.Z.; Yin, L.; Zheng, W.A. Scenario-Generic Neural Machine Translation Data Augmentation Method. Electronics 2023, 12, 2320. [Google Scholar] [CrossRef]
  29. Xu, X.; Lin, Z.; Li, X.; Shang, C.; Shen, Q. Multi-objective robust optimisation model for MDVRPLS in refined oil distribution. Int. J. Prod. Res. 2022, 60, 6772–6792. [Google Scholar] [CrossRef]
  30. Lu, S.; Ding, Y.; Liu, M.; Yin, Z.; Yin, L.; Zheng, W. Multiscale Feature Extraction and Fusion of Image and Text in VQA. Int. J. Comput. Intell. Syst. 2023, 16, 54. [Google Scholar] [CrossRef]
  31. Zheng, C.; An, Y.; Wang, Z.; Wu, H.; Qin, X.; Eynard, B.; Zhang, Y. Hybrid offline programming method for robotic welding systems. Robot. Comput.-Integr. Manuf. 2022, 73, 102238. [Google Scholar] [CrossRef]
  32. Zhang, K.; Wang, Z.; Chen, G.; Zhang, L.; Yang, Y.; Yao, C.; Wang, J.; Yao, J. Training effective deep reinforcement learning agents for real-time life-cycle production optimization. J. Petroleum Sci. Eng. 2022, 208, 109766. [Google Scholar] [CrossRef]
  33. Kotsiantis, S.B.; Zaharakis, I.D.; Pintelas, P.E. Machine learning: A review of classification and combining techniques. Artif. Intell. Rev. 2006, 26, 159–190. [Google Scholar] [CrossRef]
  34. Li, J.; Cheng, J.; Shim, J.; Huang, F. Brief Introduction of Back Propagation (BP) Neural Network Algorithm and Its Improvement. In Advances in Computer Science and Information Engineering; Jin, D., Lin, S., Eds.; Advances in Intelligent and Soft Computing; Springer: Berlin/Heidelberg, Germany, 2012; Volume 169. [Google Scholar]
  35. Chen, T.; Zhong, S. Privacy-Preserving Backpropagation Neural Network Learning. IEEE Trans. Neural Netw. 2009, 20, 1554–1564. [Google Scholar] [CrossRef]
  36. Sedki, A.; Ouazar, D.; Mazoudi, E.E. Evolving neural network using real coded genetic algorithm for daily rainfall–runoff forecasting. Expert Syst. Appl. 2009, 36, 4523–4527. [Google Scholar] [CrossRef]
  37. Ruehle, F. Evolving neural networks with genetic algorithms to study the string landscape. J. High Energy Phys. 2017, 2017, 38. [Google Scholar] [CrossRef] [Green Version]
  38. Majdi, A.; Beiki, M. Evolving neural network using a genetic algorithm for predicting the deformation modulus of rock masses. Int. J. Rock Mech. Min. Sci. 2010, 47, 246–253. [Google Scholar] [CrossRef]
  39. Lentzas, A.; Nalmpantis, C.; Vrakas, D. Hyperparameter Tuning using Quantum Genetic Algorithms. In Proceedings of the 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, OR, USA, 4–6 November 2019; pp. 1412–1416. [Google Scholar]
  40. Raji, I.D.; Bello-Salau, H.; Umoh, I.J.; Onumany, A.J.; Adegboye, M.A.; Salawudeen, A.T. Simple Deterministic Selection-Based Genetic Algorithm for Hyperparameter Tuning of Machine Learning Models. Appl. Sci. 2022, 12, 1186. [Google Scholar] [CrossRef]
  41. Shanthi, D.L.; Chethan, N. Genetic Algorithm Based Hyper-Parameter Tuning to Improve the Performance of Machine Learning Models. SN Comput. Sci. 2023, 4, 119. [Google Scholar] [CrossRef]
  42. Powell, M.J.D. A Tolerant Algorithm for Linearly Constrained Optimization Calculations. Math. Program. 1989, 45, 547–566. [Google Scholar] [CrossRef]
  43. Verleysen, M.; Francois, D.; Simon, G.; Wertz, V. On the effects of dimensionality on data analysis with neural networks. In Artificial Neural Nets Problem Solving Methods. IWANN 2003; Mira, J., Álvarez, J.R., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2687. [Google Scholar]
  44. Erkmen, B.; Yıldırım, T. Improving classification performance of sonar targets by applying general regression neural network with PCA. Expert Syst. Appl. 2008, 35, 472–475. [Google Scholar] [CrossRef]
  45. Zhou, J.; Guo, A.; Celler, B.; Su, S. Fault detection and identification spanning multiple processes by integrating PCA with neural network. Appl. Soft Comput. 2014, 14, 4–11. [Google Scholar] [CrossRef]
  46. Ravi Kumar, G.; Nagamani, K.; Anjan Babu, G. A Framework of Dimensionality Reduction Utilizing PCA for Neural Network Prediction. In Advances in Data Science and Management; Borah, S., Emilia Balas, V., Polkowski, Z., Eds.; Lecture Notes on Data Engineering and Communications Technologies; Springer: Singapore, 2020; Volume 37. [Google Scholar]
  47. Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
  48. Radovic, M.; Ghalwash, M.; Filipovic, N. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinform. 2017, 18, 9. [Google Scholar] [CrossRef] [Green Version]
  49. Pourzangbar, A. Determination of the most effective parameters on scour depth at seawalls using genetic programming (GP). In Proceedings of the 10th International Conference on Coasts, Ports and Marine Structures (ICOPMASS 2012), Tehran, Iran, 19–21 November 2012. [Google Scholar]
  50. Wang, Y.; Yao, H.; Zhao, S. Auto-encoder based dimensionality reduction. Neurocomputing 2016, 184, 232–242. [Google Scholar] [CrossRef]
  51. Geman, S.; Bienenstock, E.; Doursat, R. Neural networks and the bias/variance dilemma. Neural Comput. 1992, 4, 1–58. [Google Scholar] [CrossRef]
  52. Hawkins, D.M. The Problem of Overfitting. J. Chem. Inf. Comput. Sci. 2004, 44, 1–12. [Google Scholar] [CrossRef]
  53. Kim, J.K.; Lee, M.Y.; Kim, J.Y.; Kim, B.J.; Lee, J.H. An efficient pruning and weight sharing method for neural network. In Proceedings of the 2016 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Seoul, Republic of Korea, 26–28 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–2. [Google Scholar]
  54. Roth, W.; Pernkopf, F. Bayesian Neural Networks with Weight Sharing Using Dirichlet Processes. IEEE Trans. On Pattern Anal. Mach. Intell. 2020, 42, 246–252. [Google Scholar] [CrossRef]
  55. Augasta, M.; Kathirvalavakumar, T. Pruning algorithms of neural networks—A comparative study. Cent. Eur. Comput. Sci. 2003, 3, 105–115. [Google Scholar] [CrossRef] [Green Version]
  56. Hewahi, N.M. Neural network pruning based on input importance. J. Intell. Fuzzy Syst. 2019, 37, 2243–2252. [Google Scholar] [CrossRef]
  57. Hergert, F.; Finnoff, W.; Zimmermann, H.G. A comparison of weight elimination methods for reducing complexity in neural networks. In Proceedings of the 1992 IJCNN International Joint Conference on Neural Networks, Baltimore, MD, USA, 7–11 June 1992; Volume 3, pp. 980–987. [Google Scholar]
  58. Cottrell, M.; Girard, B.; Girard, Y.; Mangeas, M.; Muller, C. Neural modeling for time series: A statistical stepwise method for weight elimination. IEEE Trans. Neural Netw. 1995, 6, 1355–1364. [Google Scholar] [CrossRef]
  59. Ennett, C.M.; Frize, M. Weight-elimination neural networks applied to coronary surgery mortality prediction. IEEE Trans. Inf. Technol. Biomed. 2003, 7, 86–92. [Google Scholar] [CrossRef]
  60. Carvalho, M.; Ludermir, T.B. Particle Swarm Optimization of Feed-Forward Neural Networks with Weight Decay. In Proceedings of the 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS’06), Rio de Janeiro, Brazil, 13–15 December 2006; p. 5. [Google Scholar]
  61. Tsoulos, I.G.T.; Tzallas, A.; Tsalikakis, D. Evolutionary Based Weight Decaying Method for Neural Network Training. Neural Process Lett. 2018, 47, 463–473. [Google Scholar] [CrossRef]
  62. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  63. Iosifidis, A.; Tefas, A.; Pitas, I. DropELM: Fast neural network regularization with Dropout and DropConnect. Neurocomputing 2015, 162, 57–66. [Google Scholar] [CrossRef] [Green Version]
  64. Treadgold, N.K.; Gedeon, T.D. Simulated annealing and weight decay in adaptive learning: The SARPROP algorithm. IEEE Trans. Neural Netw. 1998, 9, 662–668. [Google Scholar] [CrossRef]
  65. Shahjahan, M.D.; Kazuyuki, M. Neural network training algorithm with possitive correlation. IEEE Trans. Inf. Syst. 2005, 88, 2399–2409. [Google Scholar] [CrossRef]
  66. Reeves, C.R.; Bush, D.R. Using Genetic Algorithms for Training Data Selection in RBF Networks. In Instance Selection and Construction for Data Mining; Liu, H., Motoda, H., Eds.; The Springer International Series in Engineering and Computer Science; Springer: Boston, MA, USA, 2001; Volume 608. [Google Scholar]
  67. Wu, J.; Long, J.; Liu, M. Evolving RBF neural networks for rainfall prediction using hybrid particle swarm optimization and genetic algorithm. Neurocomputing 2015, 148, 136–142. [Google Scholar] [CrossRef]
  68. Chen, J.; Wang, X.; Zhai, J. Pruning Decision Tree Using Genetic Algorithms. In Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence, Shanghai, China, 7–8 November 2009; p. 244248. [Google Scholar]
  69. Mijwil, M.M.; Abttan, R.A. Utilizing the Genetic Algorithm to Pruning the C4.5 Decision Tree Algorithm. Asian J. Appl. Sci. 2021, 9, 45–52. [Google Scholar] [CrossRef]
  70. O’Neill, M.; Ryan, C. Grammatical evolution. IEEE Trans. Evol. Comput. 2001, 5, 349–358. [Google Scholar] [CrossRef] [Green Version]
  71. Loughran, R.; McDermott, J.; O’Neill, M. Tonality driven piano compositions with grammatical evolution. In Proceedings of the 2015 IEEE Congress on Evolutionary Computation (CEC), Sendai, Japan, 25–28 May 2015; pp. 2168–2175. [Google Scholar]
  72. Gabrielsson, P.; Johansson, U.; König, R. Co-evolving online high-frequency trading strategies using grammatical evolution. In Proceedings of the 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), London, UK, 27–28 March 2014; pp. 473–480. [Google Scholar]
  73. Ali, M.S.; Kshirsagar, M.; Naredo, E.; Ryan, C. Towards Automatic Grammatical Evolution for Real-world Symbolic Regression. In Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021), Online, 25–27 October 2021; pp. 68–78. [Google Scholar]
  74. Ferrante, E.; Duéñez-Guzmán, E.; Turgut, A.E.; Wenseleers, T. GESwarm: Grammatical evolution for the automatic synthesis of collective behaviors in swarm robotics. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, Amsterdam, The Netherlands, 6–10 July 2013; pp. 17–24. [Google Scholar]
  75. Díaz Álvarez, J.; Colmenar, J.M.; Risco-Martín, J.L.; Lanchares, J.; Garnica, O. Optimizing L1 cache for embedded systems through grammatical evolution. Soft Comput. 2016, 20, 2451–2465. [Google Scholar] [CrossRef]
  76. Kennedy, J.; Earhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar] [CrossRef]
  77. Poli, R.; Kennedy, J.; Blackwell, T. Particle swarm optimization An Overview. Swarm Intell. 2007, 1, 33–57. [Google Scholar] [CrossRef]
  78. Trelea, I.C. The particle swarm optimization algorithm: Convergence analysis and parameter selection. Inf. Process. Lett. 2003, 85, 317–325. [Google Scholar] [CrossRef]
  79. de Moura Meneses, A.A.; Machado, M.D.; Schirru, R. Particle Swarm Optimization applied to the nuclear reload problem of a Pressurized Water Reactor. Progress Nucl. Energy 2009, 51, 319–326. [Google Scholar] [CrossRef]
  80. Wang, Y.; Miao, M.; Lv, J.; Zhu, L.; Yin, K.; Liu, H.; Ma, Y. An effective structure prediction method for layered materials based on 2D particle swarm optimization algorithm. J. Chem. Phys. 2012, 137, 224108. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  81. Chen, X.; Du, W.; Qi, R.; Qian, F.; Tianfield, H. Hybrid gradient particle swarm optimization for dynamic optimization problems of chemical processes. Asia-Pac. J. Chem. Eng. 2013, 8, 708–720. [Google Scholar] [CrossRef]
  82. Fang, H.; Zhou, J.; Wang, Z.; Qiu, Z.; Sun, Y.; Lin, Y.; Chen, K.; Zhou, X.; Pan, M. Hybrid method integrating machine learning and particle swarm optimization for smart chemical process operations. Front. Chem. Sci. Eng. 2022, 16, 274–287. [Google Scholar] [CrossRef]
  83. Chang, P.C.; Lin, J.J.; Liu, C.H. An attribute weight assignment and particle swarm optimization algorithm for medical database classifications. Comput. Methods Programs Biomed. 2012, 107, 382–392. [Google Scholar] [CrossRef]
  84. Radha, R.; Gopalakrishnan, R. A medical analytical system using intelligent fuzzy level set brain image segmentation based on improved quantum particle swarm optimization. Microprocess. Microsyst. 2020, 79, 103283. [Google Scholar] [CrossRef]
  85. Park, J.B.; Jeong, Y.W.; Shin, J.R.; Lee, K.Y. An Improved Particle Swarm Optimization for Nonconvex Economic Dispatch Problems. IEEE Trans. Power Syst. 2010, 25, 156–166. [Google Scholar] [CrossRef]
  86. Liu, B.; Wang, L.; Jin, Y.H. An Effective PSO-Based Memetic Algorithm for Flow Shop Scheduling. IEEE Trans. Syst. Cybern. Part B 2007, 37, 18–27. [Google Scholar] [CrossRef]
  87. Yang, J.; He, L.; Fu, S. An improved PSO-based charging strategy of electric vehicles in electrical distribution grid. Appl. Energy 2014, 128, 82–92. [Google Scholar] [CrossRef]
  88. Mistry, K.; Zhang, L.; Neoh, S.C.; Lim, C.P.; Fielding, B. A Micro-GA Embedded PSO Feature Selection Approach to Intelligent Facial Emotion Recognition. IEEE Trans. Cybern. 2017, 47, 1496–1509. [Google Scholar] [CrossRef] [Green Version]
  89. Han, S.; Shan, X.; Fu, J.; Xu, W.; Mi, H. Industrial robot trajectory planning based on improved pso algorithm. J. Phys. Conf. Ser. 2021, 1820, 012185. [Google Scholar] [CrossRef]
  90. Pourzangbar, A.; Vaezi, M. Optimal design of brace-viscous damper and pendulum tuned mass damper using Particle Swarm Optimization. Appl. Ocean. Res. 2021, 112, 102706. [Google Scholar] [CrossRef]
  91. Tian, J.; Hou, M.; Bian, H.; Li, J. Variable surrogate model-based particle swarm optimization for high-dimensional expensive problems. Complex Intell. Syst. 2022, 1–49. [Google Scholar] [CrossRef]
  92. Cao, B.; Gu, Y.; Lv, Z.; Yang, S.; Zhao, J.; Li, Y. RFID Reader Anticollision Based on Distributed Parallel Particle Swarm Optimization. IEEE Internet Things J. 2021, 8, 3099–3107. [Google Scholar] [CrossRef]
  93. Gavrilis, D.; Tsoulos, I.G.; Dermatas, E. Evangelos Dermatas, Selecting and constructing features using grammatical evolution. Pattern Recognit. Lett. 2008, 29, 1358–1365. [Google Scholar] [CrossRef]
  94. Gavrilis, D.; Tsoulos, I.G.; Dermatas, E. Neural Recognition and Genetic Features Selection for Robust Detection of E-Mail Spam. In Advances in Artificial Intelligence Volume 3955 of the Series Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; pp. 498–501. [Google Scholar]
  95. Georgoulas, G.; Gavrilis, D.; Tsoulos, I.G.; Stylios, C.; Bernardes, J.; Groumpos, P.P. Novel approach for fetal heart rate classification introducing grammatical evolution. Biomed. Signal Process. Control 2007, 2, 69–79. [Google Scholar] [CrossRef]
  96. Smart, O.; Tsoulos, I.G.; Gavrilis, D.; Georgoulas, G. Grammatical evolution for features of epileptic oscillations in clinical intracranial electroencephalograms. Expert Syst. Appl. 2011, 38, 9991–9999. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  97. Tsoulos, I.G.; Stylios, C.; Charalampous, V. COVID-19 Predictive Models Based on Grammatical Evolution. SN Comput. Sci. 2023, 4, 191. [Google Scholar] [CrossRef]
  98. Christou, V.; Tsoulos, I.G.; Loupas, V.; Tzallas, A.T.; Gogos, C.; Karvelis, P.S.; Antoniadis, N.; Glavas, E.; Giannakeas, N. Performance and early drop prediction for higher education students using machine learning. Expert Syst. Appl. 2023, 225, 120079. [Google Scholar] [CrossRef]
  99. Verikas, A.; Bacauskiene, M. Feature selection with neural networks. Pattern Recognit. Lett. 2002, 23, 1323–1335. [Google Scholar] [CrossRef]
  100. Kabir, M.M.; Islam, M.M.; Murase, K. A new wrapper feature selection approach using neural network. Neurocomputing 2010, 73, 3273–3283. [Google Scholar] [CrossRef]
  101. Devi, V.S. Class Specific Feature Selection Using Simulated Annealing. In Mining Intelligence and Knowledge Exploration; Prasath, R., Vuppala, A., Kathirvalavakumar, T., Eds.; MIKE 2015, Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 9468. [Google Scholar]
  102. Neshatian, K.; Zhang, M.; Andreae, P. A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming. IEEE Trans. Evol. Comput. 2012, 16, 645–661. [Google Scholar] [CrossRef]
  103. Backus, J.W. The Syntax and Semantics of the Proposed International Algebraic Language of the Zurich ACM-GAMM Conference. In Proceedings of the International Conference on Information Processing, UNESCO, Paris, France, 15–20 June 1959; pp. 125–132. [Google Scholar]
  104. Earhart, R.C.; Shim, Y.H. Tracking and optimizing dynamic systems with particle swarms. In Proceedings of the 2001 Congress on Evolutionary Computation, Seoul, Republic of Korea, 27–30 May 2001. [Google Scholar]
  105. Charilogis, V.; Tsoulos, I.G. Toward an Ideal Particle Swarm Optimizer for Multidimensional Functions. Information 2022, 13, 217. [Google Scholar] [CrossRef]
  106. Alcalá-Fdez, J.; Fernandez, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. J. Mult.-Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
  107. Weiss, S.M.; Kulikowski, C.A. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1991. [Google Scholar]
  108. Wang, M.; Zhang, Y.Y.; Min, F. Active learning through multi-standard optimization. IEEE Access 2019, 7, 56772–56784. [Google Scholar] [CrossRef]
  109. Quinlan, J.R. Simplifying Decision Trees. Int. J. Man-Mach. Stud. 1987, 27, 221–234. [Google Scholar] [CrossRef] [Green Version]
  110. Shultz, T.; Mareschal, D.; Schmidt, W. Modeling Cognitive Development on Balance Scale Phenomena. Mach. Learn. 1994, 16, 59–88. [Google Scholar] [CrossRef] [Green Version]
  111. Evans, B.; Fisher, D. Overcoming process delays with decision tree induction. IEEE Expert 1994, 9, 60–66. [Google Scholar] [CrossRef]
  112. Demiroz, G.; Govenir, H.A.; Ilter, N. Learning Differential Diagnosis of Eryhemato-Squamous Diseases using Voting Feature Intervals. Artif. Intell. Med. 1998, 13, 147–165. [Google Scholar]
  113. Hayes-Roth, B.; Hayes-Roth, B.F. Concept learning and the recognition and classification of exemplars. J. Verbal Learn. Verbal Behav. 1977, 16, 321–338. [Google Scholar] [CrossRef]
  114. Kononenko, I.; Šimec, E.; Robnik-Šikonja, M. Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF. Appl. Intell. 1997, 7, 39–55. [Google Scholar] [CrossRef]
  115. French, R.M.; Chater, N. Using noise to compute error surfaces in connectionist networks: A novel means of reducing catastrophic forgetting. Neural Comput. 2002, 14, 1755–1769. [Google Scholar] [CrossRef] [PubMed]
  116. Dy, J.G.; Brodley, C.E. Feature Selection for Unsupervised Learning. J. Mach. Learn. Res. 2004, 5, 845–889. [Google Scholar]
  117. Perantonis, S.J.; Virvilis, V. Input Feature Extraction for Multilayered Perceptrons Using Supervised Principal Component Analysis. Neural Process. Lett. 1999, 10, 243–252. [Google Scholar] [CrossRef]
  118. Mcdermott, J.; Forsyth, R.S. Diagnosing a disorder in a classification benchmark. Pattern Recognit. Lett. 2016, 73, 41–43. [Google Scholar] [CrossRef]
  119. Garcke, J.; Griebel, M. Classification with sparse grids using simplicial basis functions. Intell. Data Anal. 2002, 6, 483–502. [Google Scholar] [CrossRef]
  120. Elter, M.; Schulz-Wendtland, R.; Wittenberg, T. The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 2007, 34, 4164–4172. [Google Scholar] [CrossRef]
  121. Little, M.; Mcsharry, P.; Roberts, S.; Costello, D.; Moroz, I. Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection. BioMed Eng. OnLine 2007, 6, 23. [Google Scholar] [CrossRef] [Green Version]
  122. Little, M.A.; McSharry, P.E.; Hunter, E.J.; Spielman, J.; Ramig, L.O. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2009, 56, 1015–1022. [Google Scholar] [CrossRef] [Green Version]
  123. Smith, J.W.; Everhart, J.E.; Dickson, W.C.; Knowler, W.C.; Johannes, R.S. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care; IEEE Computer Society Press: New York, NY, USA, 1988; pp. 261–265. [Google Scholar]
  124. Lucas, D.D.; Klein, R.; Tannahill, J.; Ivanova, D.; Brandon, S.; Domyancic, D.; Zhang, Y. Failure analysis of parameter-induced simulation crashes in climate models. Geosci. Model Dev. 2013, 6, 1157–1171. [Google Scholar] [CrossRef] [Green Version]
  125. Giannakeas, N.; Tsipouras, M.G.; Tzallas, A.T.; Kyriakidi, K.; Tsianou, Z.E.; Manousou, P.; Hall, A.; Karvounis, E.C.; Tsianos, V.; Tsianos, E. A clustering based method for collagen proportional area extraction in liver biopsy images. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, Milan, Italy, 25–29 August 2015; pp. 3097–3100. [Google Scholar]
  126. Hastie, T.; Tibshirani, R. Non-parametric logistic and proportional odds regression. JRSS-C (Appl. Stat.) 1987, 36, 260–276. [Google Scholar] [CrossRef]
  127. Dash, M.; Liu, H.; Scheuermann, P.; Tan, K.L. Fast hierarchical clustering and its validation. Data Knowl. Eng. 2003, 44, 109–138. [Google Scholar] [CrossRef]
  128. Wolberg, W.H.; Mangasarian, O.L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA 1990, 87, 9193–9196. [Google Scholar] [CrossRef] [PubMed]
  129. Raymer, M.; Doom, T.E.; Kuhn, L.A.; Punch, W.F. Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. Publ. IEEE Syst. Cybern. Soc. 2003, 33, 802–813. [Google Scholar] [CrossRef] [PubMed]
  130. Zhong, P.; Fukushima, M. Regularized nonsmooth Newton method for multi-class support vector machines. Optim. Methods Softw. 2007, 22, 225–236. [Google Scholar] [CrossRef]
  131. Andrzejak, R.G.; Lehnertz, K.; Mormann, F.; Rieke, C.; David, P.; Elger, C.E. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. Rev. E 2001, 64, 061907. [Google Scholar] [CrossRef] [Green Version]
  132. Tzallas, A.T.; Tsipouras, M.G.; Fotiadis, D.I. Automatic Seizure Detection Based on Time-Frequency Analysis and Artificial Neural Networks. Comput. Intell. Neurosci. 2007, 2007, 80510. [Google Scholar] [CrossRef]
  133. Koivisto, M.; Sood, K. Exact Bayesian Structure Discovery in Bayesian Networks. J. Mach. Learn. Res. 2004, 5, 549–573. [Google Scholar]
  134. Nash, W.J.; Sellers, T.L.; Talbot, S.R.; Cawthor, A.J.; Ford, W.B. The Population Biology of Abalone (Haliotis Species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait, Sea Fisheries Division; Technical Report No. 48; Department of Primary Industry and Fisheries, Tasmania: Hobart, Australia, 1994; ISSN 1034-3288. [Google Scholar]
  135. Brooks, T.F.; Pope, D.S.; Marcolini, A.M. Airfoil Self-Noise and Prediction; Technical Report; NASA RP-1218; NASA: Washington, DC, USA, 1989. [Google Scholar]
  136. Simonoff, J.S. Smoothing Methods in Statistics; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
  137. Yeh, I.C. Modeling of strength of high performance concrete using artificial neural networks. Cem. Concr. Res. 1998, 28, 1797–1808. [Google Scholar] [CrossRef]
  138. Harrison, D.; Rubinfeld, D.L. Hedonic prices and the demand for clean ai. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef] [Green Version]
  139. King, R.D.; Muggleton, S.; Lewis, R.; Sternberg, M.J.E. Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proc. Nat. Acad. Sci. USA 1992, 89, 11322–11326. [Google Scholar] [CrossRef] [PubMed]
  140. Yu, H.; Xie, T.; Paszczynski, S.; Wilamowski, B.M. Advantages of Radial Basis Function Networks for Dynamic System Design. IEEE Trans. Ind. Electron. 2011, 58, 5438–5450. [Google Scholar] [CrossRef]
  141. Kingma, D.P.; Ba, J.L. ADAM: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
  142. Pajchrowski, T.; Zawirski, K.; Nowopolski, K. Neural Speed Controller Trained Online by Means of Modified PROP Algorithm. IEEE Trans. Ind. Inform. 2015, 11, 560–568. [Google Scholar] [CrossRef]
  143. Hermanto, R.P.S.; Nugroho, A. Waiting-Time Estimation in Bank Customer Queues using PROP Neural Networks. Procedia Comput. Sci. 2018, 135, 35–42. [Google Scholar] [CrossRef]
  144. Stanley, K.O.; Miikkulainen, R. Evolving Neural Networks through Augmenting Typologies. Evol. Comput. 2002, 10, 99–127. [Google Scholar] [CrossRef] [PubMed]
Figure 1. BNF grammar of the proposed method.
Figure 1. BNF grammar of the proposed method.
Applsci 13 08124 g001
Figure 2. The flowchart for the proposed method.
Figure 2. The flowchart for the proposed method.
Applsci 13 08124 g002
Figure 3. Average regression error for all regression datasets using the proposed method. The number of PSO particles increased from 100 to 400, and the number of constructed features was set to f = 2 .
Figure 3. Average regression error for all regression datasets using the proposed method. The number of PSO particles increased from 100 to 400, and the number of constructed features was set to f = 2 .
Applsci 13 08124 g003
Figure 4. Comparison of the proposed genetic classification algorithm for the construction of 2, 3, and 4 artificial features (blue color) versus a Welkin genetic classification algorithm (gray color) on the 24 different datasets.
Figure 4. Comparison of the proposed genetic classification algorithm for the construction of 2, 3, and 4 artificial features (blue color) versus a Welkin genetic classification algorithm (gray color) on the 24 different datasets.
Applsci 13 08124 g004
Figure 5. Comparison of the proposed RBF classification network for the construction of two, three, and four artificial features (blue color) versus a Welkin RBF classification network (gray color) on the 24 different datasets.
Figure 5. Comparison of the proposed RBF classification network for the construction of two, three, and four artificial features (blue color) versus a Welkin RBF classification network (gray color) on the 24 different datasets.
Applsci 13 08124 g005
Figure 6. Comparison of the proposed algorithm for the construction of two, three, and four artificial features (blue color) versus the genetic algorithm (gray color) for the regression datasets.
Figure 6. Comparison of the proposed algorithm for the construction of two, three, and four artificial features (blue color) versus the genetic algorithm (gray color) for the regression datasets.
Applsci 13 08124 g006
Figure 7. The comparison between the proposed method for constructing 2, 3, and 4 artificial features (blue color) and several Welkin classification methods (black color) on 24 different datasets revealed that the proposed method demonstrated superior performance with statistical significance. The results were obtained using the Wilcox on a sandbank test. Across all 24 datasets, the proposed method consistently outperformed the Welkin classification methods regarding the classification error.
Figure 7. The comparison between the proposed method for constructing 2, 3, and 4 artificial features (blue color) and several Welkin classification methods (black color) on 24 different datasets revealed that the proposed method demonstrated superior performance with statistical significance. The results were obtained using the Wilcox on a sandbank test. Across all 24 datasets, the proposed method consistently outperformed the Welkin classification methods regarding the classification error.
Applsci 13 08124 g007
Table 1. Steps to produce a valid expression from the BNF grammar.
Table 1. Steps to produce a valid expression from the BNF grammar.
ExpressionChromosomeOperation
<expr>10, 12, 20, 8, 11 10 mod 3 = 1
<func>(<expr>)12, 20, 8, 11 12 mod 4 = 0
sin(<expr>)20, 8, 11 20 mod 3 = 2
sin(<terminal>)8, 11 8 mod 2 = 0
sin(<xlist>)11 11 mod 3 = 2
sin(x3)
Table 2. The values for every parameter used in the experiments.
Table 2. The values for every parameter used in the experiments.
Parameter MeaningValue
mParticles or chromosomes200
HNumber of hidden nodes10
iter max Maximum number of iterations200
L max Limit used in penalty calculation100
λ Penalty factor100
Table 3. Average classification error for the classification datasets using the well-known methods.
Table 3. Average classification error for the classification datasets using the well-known methods.
DATASET GENETICRBFADAMPROPNEAT
Appendicitis18.10%12.23%16.50%16.30%17.20%
Australian32.21%34.89%35.65%36.12%31.98%
Balance8.97%33.42%7.87%8.81%23.14%
Bands35.75%37.22%36.25%36.32%34.30%
Dermatology30.58%62.34%26.14%15.12%32.43%
Hayes Roth56.18%64.36%59.70%37.46%50.15%
Heart28.34%31.20%38.53%30.51%39.27%
House Votes6.62%6.13%7.48%6.04%10.89%
Ionosphere15.14%16.22%16.64%13.65%19.67%
Liver disorder31.11%30.84%41.53%40.26%30.67%
Lymography23.26%25.31%29.26%24.67%33.70%
Mammography19.88%21.38%46.25%18.46%22.85%
Parkinson’s18.05%17.42%24.06%22.28%18.56%
Puma32.19%25.78%34.85%34.27%34.51%
Pop failures5.94%7.04%5.18%4.81%7.05%
Regions229.39%38.29%29.85%27.53%33.23%
Suharto34.86%32.19%34.04%34.90%34.51%
Segment57.72%59.68%49.75%52.14%66.72%
BC8.56%7.27%35.35%21.57%12.88%
Wine19.20%31.41%29.40%30.73%25.43%
Z_F_S10.73%13.16%47.81%29.28%38.41%
ZOE_NF_S8.41%9.02%47.43%6.43%43.75%
ZONE_S2.60%4.03%11.99%27.27%5.44%
ZOO16.67%21.93%14.13%15.47%20.27%
AVERAGE 22.94%26.78%30.24%24.60%28.63%
Table 4. Average regression error using the well-known methods for the regression datasets.
Table 4. Average regression error using the well-known methods for the regression datasets.
DATASETGENETICRBFADAMPROPNEAT
ABALONE7.177.374.304.559.88
AIRFOIL0.0030.270.0050.0020.067
BASEBALL103.6093.0277.9092.05100.39
BK0.0270.020.031.5990.15
BL5.740.010.284.380.05
CONCRETE0.00990.0110.0780.00860.081
DEE1.0130.170.630.6081.512
DIABETES19.860.493.031.114.25
HOUSING43.2657.6880.2074.3856.49
FA1.950.020.110.140.19
MB3.392.160.060.0550.061
MORTGAGE2.411.459.249.1914.11
PU1.210.020.090.0390.075
QUAKE0.040.0710.060.0410.298
TREASURY2.9292.0211.1610.8815.52
AVERAGE12.8410.3011.7012.4412.70
Table 5. Experimental results for the classification datasets using the proposed method. The number in the cells denote the average classification error as measured on the test set. The variable f corresponds to the number of artificial features created by the proposed method.
Table 5. Experimental results for the classification datasets using the proposed method. The number in the cells denote the average classification error as measured on the test set. The variable f corresponds to the number of artificial features created by the proposed method.
f = 2 f = 3 f = 4
DATASETRBFGENETICRBFGENETICRBFGENETIC
APPENDICITIS15.40%14.33%16.90%15.77%15.97%17.30%
AUSTRALIAN15.49%14.48%14.53%15.33%14.75%15.97%
BALANCE16.67%2.89%22.54%4.94%17.26%4.62%
BANDS38.09%38.13%37.09%39.22%37.22%35.51%
DERMATOLOGY41.58%30.37%35.46%25.44%40.45%21.97%
HAYES ROTH37.41%27.92%38.10%25.74%39.59%25.82%
HEART21.53%17.13%17.64%16.87%19.63%15.69%
HOUSE VOTES6.36%3.78%7.17%3.25%4.25%3.52%
IONOSPHERE10.32%10.17%10.12%10.01%11.42%9.02%
LIVERDISORDER34.23%32.33%35.84%32.97%35.93%30.74%
LYMOGRAPHY34.93%28.67%32.00%23.00%29.00%23.83%
MAMMOGRAPHIC16.92%16.51%16.47%16.35%17.54%16.50%
PARKINSONS11.14%13.00%11.30%11.11%12.95%9.42%
PIMA22.85%22.76%24.93%24.67%24.25%24.20%
POPFAILURES7.32%7.41%6.96%7.62%5.96%5.67%
REGIONS228.52%26.84%24.91%25.28%25.35%24.93%
SAHEART29.29%28.63%28.92%30.31%28.25%30.25%
SEGMENT52.69%45.59%46.83%41.06%50.15%39.52%
WDBC5.00%4.66%5.76%4.97%5.13%3.84%
WINE8.92%7.22%6.76%5.75%6.00%5.86%
Z_F_S7.91%8.37%7.89%7.67%5.21%6.86%
ZOE_NF_S6.90%6.85%6.95%5.65%6.24%5.28%
ZONE_S3.08%3.40%2.44%2.52%3.47%3.33%
ZOO26.47%7.83%31.73%10.03%28.70%11.57%
AVERAGE20.79%17.47%20.39%16.90%20.19%16.30%
Table 6. Experimental results on the regression datasets using the proposed method. The numbers in the cells denote the average regression error as measured on the test set.The variable f corresponds to the number of artificial features created by the proposed method.
Table 6. Experimental results on the regression datasets using the proposed method. The numbers in the cells denote the average regression error as measured on the test set.The variable f corresponds to the number of artificial features created by the proposed method.
f = 2 f = 3 f = 4
DATASETRBFGENETICRBFGENETICRBFGENETIC
ABALONE4.3613.5184.1593.8394.8593.786
AIRFOIL0.0030.0010.0030.0010.0030.001
BASEBALL66.0053.7460.7957.0466.1961.69
BK0.0220.0310.0210.0290.0190.023
BL0.4130.00010.0190.0070.0430.011
CONCRETE0.0080.0060.0070.0050.0080.004
DEE0.2590.2520.3390.2860.6090.5
DIABETES0.6110.8320.6341.4110.8571.157
HOUSING22.38715.58318.61413.60214.8313.208
FA0.0560.0110.0150.0110.0150.012
MB0.2580.0870.1150.0780.3420.072
MORTGAGE0.6210.0460.650.0370.0780.04
PU2.8940.140.9360.0290.7240.031
QUAKE0.0690.0360.0570.0370.040.037
TREASURY0.9120.0880.8740.0840.1730.076
AVERAGE6.594.965.825.105.925.38
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tsoulos, I.G.; Tzallas, A. A Feature Construction Method That Combines Particle Swarm Optimization and Grammatical Evolution. Appl. Sci. 2023, 13, 8124. https://doi.org/10.3390/app13148124

AMA Style

Tsoulos IG, Tzallas A. A Feature Construction Method That Combines Particle Swarm Optimization and Grammatical Evolution. Applied Sciences. 2023; 13(14):8124. https://doi.org/10.3390/app13148124

Chicago/Turabian Style

Tsoulos, Ioannis G., and Alexandros Tzallas. 2023. "A Feature Construction Method That Combines Particle Swarm Optimization and Grammatical Evolution" Applied Sciences 13, no. 14: 8124. https://doi.org/10.3390/app13148124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop