A Hyperparameter Self-Evolving SHADE-Based Dendritic Neuron Model for Classification

Yang, Haichuan; Zhang, Yuxin; Zhang, Chaofeng; Xia, Wei; Yang, Yifei; Zhang, Zhenwei

doi:10.3390/axioms12111051

Open AccessArticle

A Hyperparameter Self-Evolving SHADE-Based Dendritic Neuron Model for Classification

by

Haichuan Yang

¹

,

Yuxin Zhang

²

,

Chaofeng Zhang

³

,

Wei Xia

⁴

,

Yifei Yang

^5,*

and

Zhenwei Zhang

^6,*

¹

Graduate School of Technology, Industrial and Social Sciences, Tokushima University, Tokushima 770-8506, Japan

²

Department of Engineering, Wesoft Company Ltd., Kawasaki 210-0024, Japan

³

Advanced Institute of Industrial Technology, Tokyo 140-0011, Japan

⁴

College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China

⁵

Faculty of Engineering, University of Toyama, Toyama 930-8555, Japan

⁶

Industrial Collaborative Innovation Center, Linyi Vocational University of Science and Technology, Linyi 276000, China

^*

Authors to whom correspondence should be addressed.

Axioms 2023, 12(11), 1051; https://doi.org/10.3390/axioms12111051

Submission received: 19 September 2023 / Revised: 9 November 2023 / Accepted: 14 November 2023 / Published: 15 November 2023

(This article belongs to the Special Issue Mathematical Modelling of Complex Systems)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, artificial neural networks (ANNs), which are based on the foundational model established by McCulloch and Pitts in 1943, have been at the forefront of computational research. Despite their prominence, ANNs have encountered a number of challenges, including hyperparameter tuning and the need for vast datasets. It is because many strategies have predominantly focused on enhancing the depth and intricacy of these networks that the essence of the processing capabilities of individual neurons is occasionally overlooked. Consequently, a model emphasizing a biologically accurate dendritic neuron model (DNM) that mirrors the spatio-temporal features of real neurons was introduced. However, while the DNM shows outstanding performance in classification tasks, it struggles with complexities in parameter adjustments. In this study, we introduced the hyperparameters of the DNM into an evolutionary algorithm, thereby transforming the method of setting DNM’s hyperparameters from the previous manual adjustments to adaptive adjustments as the algorithm iterates. The newly proposed framework, represents a neuron that evolves alongside the iterations, thus simplifying the parameter-tuning process. Comparative evaluation on benchmark classification datasets from the UCI Machine Learning Repository indicates that our minor enhancements lead to significant improvements in the performance of DNM, surpassing other leading-edge algorithms in terms of both accuracy and efficiency. In addition, we also analyzed the iterative process using complex networks, and the results indicated that the information interaction during the iteration and evolution of the DNM follows a power-law distribution. With this finding, some insights could be provided for the study of neuron model training.

Keywords:

dendritic neuron model; evolutionary algorithms; classification; complex networks

MSC:

68T27; 68T05; 90B10

1. Introduction

In 1943, McCulloch and Pitts introduced a mathematical representation of neural activity, laying the foundation for ANNs [1]. Although ANNs have gained prominence, they come with challenges, such as hyperparameter tuning complexities, the necessity for abundant labeled training datasets, excessive model refinement, and inherent opaqueness stemming from their black-box nature [2,3]. Interestingly, these approaches often overlook the essence of deep neural networks: the efficient data processing capability of individual neurons. Instead, they frequently boost model performance through statistical theories or intricate learning strategies. Some researchers have also chosen to reconceptualize neuron models rather than merely deepening neural networks [4]. Furthermore, the McCulloch–Pitts neuron model, which solely represents connection strength between two neurons using a weight, has faced criticism for its simplification [5].

A real biological neuron possesses intricate spatial and temporal features. Drawing inspiration from the neuron’s ability to process temporal information, a unique and biologically accurate dendritic neuron model (DNM) was proposed. This model stands out due to its distinct architecture and excitation functions, incorporating sigmoid functions for synaptic interactions and a multiplication operation to emulate dendritic interactions [6]. As a new model, there is ample scope for refining and improving the DNM. For addressing classification problems through combinatorial optimization with algorithmically trained artificial neurons, it is evident that utilizing more advanced algorithms in conjunction with sophisticated neuron models can substantially enhance performance. Recently, an upgraded iteration of the DNM has emerged as a promising choice for training neurons, demonstrating superior outcomes in classification tasks. Notably, among these advancements, the refined DNM-R model has achieved the highest classification accuracy when paired with the same algorithm [7]. Nonetheless, this optimization approach comes with a notable drawback: the DNM employed as the training model encompasses numerous parameters requiring adjustment. Typically, this involves a minimum of three parameters, namely, k, q, and M. These parameters essentially correspond to the amplification factor, discrimination factor, and the number of dendritic branches. In prior investigations, k and q were often segmented into 5 parameters each for the purpose of tuning experiments, while M sometimes entailed as many as 20 parameters. From the perspective of orthogonal experiments, this implies that parametric experiments alone would require repetition a minimum of 500 times. This imposes a substantial burden, both in terms of research endeavors and practical applications. Therefore, it is imperative to introduce a methodology capable of adaptively optimizing the model parameters to facilitate improvements.

Given that the training issue associated with the neuron model may constitute an NP-hard problem [8], evolutionary algorithms (EAs) emerge as a potent solution. The genesis of EAs can be traced back to the genetic algorithm (GA) [9], which subsequently spurred the creation of various algorithms, including differential evolution (DE) [10] and the success-history-based parameter adaptation for differential evolution (SHADE) [11]. A key enhancement of DE over GA is its mutation strategy, which leverages differences among individuals instead of mere random variations. In contrast, SHADE refines the DE approach by differentially linking offspring to the optimal parent individuals. Moreover, parameters from top-performing individuals are preserved through iterations to guide the learning of subsequent generations. The efficacy of DE-derived algorithms has been well documented [12], with their enhanced versions frequently securing leading spots in the IEEE CEC contests [13].

Various adaptation strategies can be employed in optimization processes, including random variations in crossover and variance rates using probability distributions like normal and Cauchy distributions, a common practice in many differential evolution algorithms [14]. Additionally, some research endeavors have explored the utilization of fitness–distance balance strategies to adaptively fine-tune algorithmic parameters [15,16]. These investigations have, to varying degrees, shown that adaptive strategies not only obviate the need for meticulous parameter tuning but also contribute to enhanced algorithmic performance. The implementation of suitable adaptive strategies is particularly pertinent in the context of optimizing real-world problems, such as classification tasks. It is important to recognize that the training process of algorithms on artificial neurons essentially involves iterating their weights, constituting an adaptive process in itself. Consequently, the algorithm itself can be construed as an efficient adaptive instrument endowed with nonlinear properties, setting it apart from conventional mathematical techniques. Advanced algorithms are inherently designed to tackle intricate black-box problems, which typically necessitate robust exploitation and exploration capabilities. These attributes render them well suited for the adaptive adjustment of hyperparameters. In essence, it is advisable to treat the hyperparameters within artificial neurons as variables subject to iteration within the algorithmic framework, leveraging the algorithm’s evolutionary prowess to fine-tune these hyperparameters.

In this study, we have integrated the key hyperparameters of the DNM with SHADE. We have implemented an adaptive hyperparameter-adjustment approach, leveraging the inherent evolutionary capabilities of these algorithms. The resultant novel optimization framework is denoted as hyperparameter-tuning success-history-based parameter adaptation for differential evolution (HSHADE), with a type of neuron that can self-evolve as the algorithm iterates. We conducted a comprehensive evaluation of HSHADE using a benchmark comprising 10 real-world problems commonly employed for assessing algorithmic performance in classification tasks. Comparative analyses were performed against the original algorithm, well-established algorithms with a track record of effectiveness, and contemporary state-of-the-art algorithms within the same problem set. The findings conclusively demonstrate that HSHADE exhibits a notable advantage in terms of classification accuracy and significantly streamlines the parameter tuning process.

The main contributions of this study are summarized below:

(1): HSHADE successfully iterates the fixed parameters of the DNM adaptively using evolutionary algorithms, thus reducing its tuning workload.
(2): HSHADE achieves the same or better accuracy than the state-of-the-art algorithms on the same set of classification problems.
(3): HSHADE maintains the fast problem-solving characteristics of a single neuron and also achieves very high accuracy.
(4): The power-law distribution of information interaction networks observed during the iterative process of HSHADE provides new insights for future neural model training.

The remainder of the paper is structured as follows: the DNMs, EAs, and self-evolving DNM are formulated in Section 2. The experimental results are analyzed in Section 3. Section 4 presents the discussion and conclusions.

2. Materials and Methods

This section will cover the use of DNMs for classification problem processing as well as different types of EAs that can be used to optimize DNMs. We will first introduce the structure of DNMs and their learning process, then describe some of the EA steps, including SHADE, which is used in this paper to work in conjunction with DNMs, and finally present the concept of self-evolving DNMs.

2.1. Dendritic Neuron Model

Figure 1 depicts the entire integrated framework of the dendritic neuron model, which is very similar to the characteristics of biological neurons. A dendritic neuron can be divided into four different layers, with signals entering from the synaptic layer to the dendrite layer, and then conducting to the membrane layer, and finally the soma layer for output.

Meanwhile, Figure 2 depicts the DNM learning process, which includes five major steps: training algorithms, morphological conversion based on four connection states, dendritic pruning and its output, along with synaptic pruning and its output. Pruning can remove unnecessary dendrites and synapses, and it can also infer the number of dendrites in the presynaptic axon terminal and dendritic connection position, as well as how they are connected.

2.1.1. Synaptic Layer

Signals from axon terminals to dendrites are processed by the synaptic layer, and receptors on the postsynaptic cell take up the particular ions received; the potentials of these ions vary depending on the state of the synaptic connection, which can be inhibitory or excitatory. In this,

{X_{1}, \dots, X_{i}, \dots, X_{I}}

(

i = 1, 2, \dots, I

) is the mathematical representation of the I external inputs to the presynaptic axon terminal, and alternatively, the function of the synaptic layer can be denoted by the following sigmoid function [17]:

Q_{i j} = \frac{1}{1 + e^{- k (θ_{i j} x_{i} - p_{i j})}} .

(1)

In the above equation, there are two synaptic parameters, one is the synaptic weight

θ_{i j}

, which represents the state of synaptic connectivity, either excitatory (

θ_{i j} > 0

) or inhibitory (

θ_{i j} < 0

); and the other is the threshold parameter

p_{i j}

. Together,

θ_{i j}

and

p_{i j}

control the dendritic and axonal morphology of the DNM, and both can be trained. In addition, k represents the distance parameter. The entire formula

Q_{i j}

represents the ability of the jth (

j = 1, 2, \dots, J

) post-dendritic cell to distance itself from the ith presynaptic axon end.

As shown in Figure 2, by optimizing

θ_{i j}

and

p_{i j}

, the learning algorithm may imitate the synaptic plasticity mechanism. Furthermore, as demonstrated in Figure 3, learned synaptic plasticity can be realized by splitting synaptic connections into four different states, i.e., the four connection states show the neuron’s morphology biologically by locally identifying the position of each dendritic and synaptic species in the process of morphology shift [18], which involves the following:

(1): Constant 1 connection: the potential of the postsynaptic cell is still close to 1 despite the input varying between 0 and 1 when (1) $p_{i j} < 0 < θ_{i j}$ or (2) $p_{i j} < θ_{i j} < 0$ .
(2): Constant 0 connection: the potential is still close to 0 despite the input varying between 0 and 1 when (1) $θ_{i j} < 0 < p_{i j}$ or (2) $0 < θ_{i j} < p_{i j}$ .
(3): Excitatory connection: the potential is always proportional to the input signal despite the input varying between 0 and 1, as long as $0 < p_{i j} < θ_{i j}$ .
(4): Inhibitory connection: the potential is always inversely proportional to the input signal despite the input varying between 0 and 1, as long as $θ_{i j} < p_{i j} < 0$ .

2.1.2. Dendritic Layer

The dendritic layer employs multiplication, which has been considered to be the simplest nonlinear operation found in neurons [19] and is performed by each dendrite. In addition, the multiplication operator is equivalent to a logical AND operation when the dendritic layer receives constant-1 or constant-0 connections. As all signals transmitted by the synaptic layer will be received by the dendrites in the dendritic layer

Y_{j}

, the function of the jth dendrite is as follows:

Y_{j} = \prod_{i = 1}^{I} Q_{i j},

(2)

2.1.3. Membrane Layer

Then, the membrane layer captures all signals from the dendritic layer. Simultaneously, an accumulation operator similar to a logical OR operation is applied to the membrane layer V, and the summed signals are sent to the soma to stimulate neurons. The following is the equation:

V = \sum_{j = 1}^{J} Y_{j},

(3)

2.1.4. Soma Layer

Finally, the soma body utilizes a sigmoid function for the membrane layer V, with the goal of determining whether or not a neuron fires based on the overall model output. Furthermore, q is the firing parameter, which ranges between 0 and 1, indicating the soma body’s threshold. The formula is as follows:

W = \frac{1}{1 + e^{- k (V - q)}} .

(4)

2.2. Evolutionary Algorithms

Evolutionary algorithms (EAs) usually include population initialization, search operation, evaluation, and selection operation. In this section, these steps are described in detail, using SHADE as an example.

The assignment of initial parameters and the process of generating initial individuals in the solution space constitute the initialization operation.

X_{i}

represents the initialized individuals,

i = 1, 2, \dots, G

, and G represents the population size of SHADE. SHADE is an improved DE algorithm because it updates the values of F and

C_{R}

by following the Cauchy and normal distributions, respectively, while keeping track of the successful parameter history.

\begin{matrix} F_{i} = C (M_{F, r}, 0.1) \\ C_{R i} = N (M_{C, r}, 0.1) \end{matrix}

(5)

where

C

and

N

mean it follows the Cauchy and normal distribution.

M_{F, r}

and

M_{C, r}

mean the memory archive of F and

C_{R}

. These adaptive parameters significantly balance the exploitation and exploration capabilities of the DE, allowing it to achieve excellent performance on the standard test set. The memory archive is updated in Equation (6)

\begin{matrix} M_{F, r} = (1 - c) \cdot M_{F, r} + c \cdot m e a n_{L} (F_{1}, F_{2}, \dots, F_{G}) \\ M_{C, r} = (1 - c) \cdot M_{C, r} + c \cdot m e a n_{A} (C_{R 1}, C_{R 2}, \dots, C_{R G}) \end{matrix}

(6)

where c is a [0.05, 0.2] random number and

m e a n_{A}

is the root of the arithmetic mean. The update of

m e a n_{L}

is as follows:

\begin{matrix} m e a n_{L} (F_{1}, F_{2}, \dots, F_{G}) = \frac{\sum_{F} \cdot F_{i}^{2}}{\sum_{F} \cdot F_{i}} \end{matrix}

(7)

2.2.1. Search Operation

The following equation shows the search operation of SHADE, where t denotes the number of iterations,

r 1

and

r 2

are two numbers chosen at random from the sets

1, 2, \dots, G

, which are distinct from each other and from i.

X_{α}^{t}

is the top-ranking member of the parent population. The random individual in the parent population is represented by

X_{r 1}^{t}

.

Y_{r 2}^{t}

is an individual chosen at random from the present and older populations. The adaptation parameters are

F_{i}

and

C_{R i}

.

V_{i}^{t} = X_{i}^{t} + \{\begin{matrix} F_{i} \cdot (X_{α}^{t} - X_{i}^{t} + X_{r 1}^{t} - Y_{r 2}^{t}), & if r a n d (0, 1) < C_{R i} o r j = j_{r a n d} \\ 0, & if o t h e r w i s e \end{matrix}

(8)

where

V_{i}^{t}

is a temporary offspring individual whose retention in the new population requires selection through an evaluation operation.

X_{i}^{t}

is the parent individual.

j_{r a n d}

is an integer, serving as a safeguard in instances where all random values exceed

C_{R i}

. Its primary function is to guarantee at least a single information exchange, thus optimizing the utilization of computational resources and preventing wastage.

2.2.2. Evaluation Operation

The next step is the evaluation operation, in which the fitness values of all

V_{i}^{t}

are evaluated, and the individuals that outperform the

X_{i}^{t}

are retained in the next generation by comparing

f (V_{i}^{t})

to the

f (X_{i}^{t})

using a one-to-one greedy selection strategy. The evaluation operation function

f (\cdot)

will evaluate each individual’s fitness in the population. The selection operation is formulated as follows:

{X_{i}^{t}}^{'} = \{\begin{matrix} V_{i}^{t}, & if f (V_{i}^{t}) < f (X_{i}^{t}) \\ X_{i}^{t}, & if o t h e r w i s e \end{matrix}

(9)

where

{X_{i}^{t}}^{'}

is the offspring individual.

After the algorithm completes the selection operation, it proceeds to the subsequent iterative search operation.

2.3. Self-Evolving DNM

Unlike the weight parameters, the hyperparameters within the DNM framework bear practical similarities to the concept of a learning rate. Typically, these hyperparameters remain constant throughout the training process, with their values determined through repeated experimentation. It is worth noting that these hyperparameters often possess specific tuning ranges, and values falling outside these predefined ranges can significantly diminish the neuron’s problem-solving capacity. Hence, it is entirely feasible to treat hyperparameters as variables and incorporate them as dimensions within the iterative process of the algorithm. In general, the value of q in the DNM is in the range

[0, 1]

, and k is used twice in the two sigmoid functions in the range

[0, 20]

. Given that SHADE employs upper and lower bounds of

[- 1, 1]

for training the DNM optimization in classification problems, the newly introduced variables will also be normalized. This normalization is implemented to enhance their alignment with the algorithm. The value of k is scaled up by a factor of ten after one iteration to satisfy its value domain. Therefore, the ith individual of HSHADE is

{X_{i, 1}, X_{i, 2} \dots, X_{i, d i m}, X_{i, d i m + 1}, X_{i, d i m + 2}}

, where

d i m

is the number of dimensions. Before solving the problem using DNM, its parameters, and hyperparameters are determined in the following:

\{\begin{matrix} θ_{i, j} = X_{i, j}, & j \leq \frac{d i m}{2} \\ p_{i, j} = X_{i, \frac{d i m}{2} + j}, & \frac{d i m}{2} < j \leq d i m \\ q_{i} = X_{i, d i m + 1}, \\ k_{i} = 10 \cdot X_{i, d i m + 2} . \end{matrix}

(10)

Altering the number of branches M, will result in a shift in the dimensionality of the algorithmic solution process, employing variable-dimensional matrix operations typically necessitates specialized encoding techniques. Currently, extant variable-dimension algorithms exhibit subpar performance in the context of optimizing continuous problems. Consequently, there exists a lack of effective solutions for the adaptive enhancement of M. The complete execution flow of HSHADE is demonstrated using Algorithm 1.

Algorithm 1: Pseudocode of HSHADE.

3. Results

In the experiment, the number of iterations is 30,000. All experiments were performed using MATLAB on a PC with a 3.00 GHz Intel(R) Core(TM) i7-9700 CPU and 36 GB of RAM. All experiments were run independently 30 times. The associated code will be publicly available as detailed in the Data Availability Statement. Additionally, a Python-based implementation will be developed in subsequent work.

3.1. Dataset Description and Parameter Settings

To evaluate the efficacy of the introduced HSHADE in training the DNM, we utilized ten commonly employed classification datasets from the UCI Machine Learning Repository. These datasets encompass Tic-tac-toe, Heart, Australia, Congress, Vote, Spect, German, Breast, Ionosphere, and KrVsKpEW. Table 1 provides an overview of each dataset, detailing the number of attributes (inclusive of the class label), the total number of samples, and the DNM learning space dimension for each respective problem. It is important to emphasize that in the table k and q are hyperparameters that need to be adjusted when training the DNM using traditional algorithms. However, when training the DNM with HSHADE, only the hyperparameter m requires adjustment. In this study, m is an integer ranging from 1 to 20. In addition, to obtain the optimal hyperparameter combinations of the DNM in Table 1 for various datasets, extensive parameter discussions are required [5], which significantly consumes valuable computational resources.

3.2. Evaluation Criteria

The following assessment tools were used to gauge how well the algorithms performed:

(1): Symbols “+”, “=”, and “−”: The notation “+” is used when HSHADE outperforms other EAs, while “=” denotes comparable performance. If HSHADE is inferior, this is represented by “−”. The evaluation is grounded in the outcomes of the Wilcoxon rank-sum test.
(2): Wilcoxon rank-sum test (significance threshold $p < 0.05$ ): This non-parametric test assesses the null hypothesis that two sample sets are derived from an identical population. After 30 optimization iterations, this test was applied to discern differences between HSHADE’s function results and those of other meta-heuristics. The distinctions are symbolized by $W / T / L$ . Here, W represents the count of functions where HSHADE notably surpasses other methods. T denotes those where HSHADE’s performance aligns with other methods and L marks functions where HSHADE lags behind.
(3): Box-and-whisker plots: The top line illustrates the peak value, whereas the line at the base signifies the least value. The box’s upper and lower boundaries correspond to the third and the first quartiles, respectively. The central red line marks the median, with the red “+” symbol highlighting outliers. A wider gap between maximal and minimal values suggests more pronounced algorithmic variability.

3.3. Experimental Setup

Numerous optimization algorithms have been introduced for DNM training. In this study, we evaluate the performance of the proposed HSHADE in relation to its contemporaries. This includes IFDE [5], the winning algorithm in the DNM training, L-SHADE [20], the top-performing algorithm in IEEE CEC2014, CJADE [21], a cutting-edge DE variant, SCJADE [22], an enhanced version of CJADE, BBO [23], known for its robust performance with DNM, SASS [24], a newly introduced meta-heuristic, and BP [25]. Each algorithm was executed independently 30 times on an identical computer setup, adhering to the parameter specifications suggested in their respective papers. It is noteworthy that the dataset was split with 70% used for training and the remaining 30% designated for testing.

3.4. Performance Comparison of EA-Trained DNMs

In this part, we compare the performance of DNMs trained by EAs on the classification problem. Table 2 shows the performance of DNMs trained by EAs on the classification problem. The initial comparison involves BP, given that the backpropagation method coupled with gradient descent stands as the most frequently employed algorithm for neural network training. BP demonstrates a pronounced advantage in dealing with SpectEW’s classification tasks, potentially attributable to its capacity for rapid convergence towards local optima. However, within a limited number of solutions, meta-heuristic algorithms strive for enhanced global-optimum-solving capabilities, leading to HSHADE significantly outperforming BP across multiple problems. HSHADE exhibits substantial superiority over both conventional algorithms and their enhanced counterparts. CJADE and SCJADE have previously been acknowledged as effective DNM training algorithms, boasting higher accuracy compared to most MHAs on a majority of problems. HSHADE possesses a substantial and decisive advantage over them. In a similar vein, HSHADE demonstrates enhanced performance when pitted against state-of-the-art algorithms such as L-SHADE and SASS. In past studies, SHADE’s performance lags significantly behind these two algorithms, underscoring the effectiveness of our improvements that enable HSHADE’s impressive performance. Ultimately, HSHADE even surpasses the newest and most efficient IFDE, signifying that it not only conserves substantial computational resources in terms of parameter tuning but also secures the top position in terms of accuracy among contemporary training algorithms.

Table 3 shows the performance of DNMs trained by HSHADE on the classification problem. Those results marked in bold represent the outcomes with the highest average accuracy in the given problem. It is evident that the variation in the number of branches has a noticeable impact on the results. Therefore, conducting experiments to explore the effects of different branch numbers remains essential. The values of M employed by HSHADE represent the optimal outcomes achieved through these twenty experiments.

Table 4 shows the ablation experiment of HSHADE, where SHADE-R denotes that the original SHADE was used to train DNM-R, and SHADE-A denotes that HSHADE was used to train the original DNM. SHADE represents the result of training the original DNM using the original SHADE. Figure 4 and Figure 5 show the box and convergence graphs of the ablation experiment of HSHADE. First and foremost, when examining the convergence graph, it becomes evident that HSHADE outperforms the other three approaches significantly in terms of its convergence capability. It excels in its ability to discover globally optimal solutions, consistently approaching closer results when the other methods begin to stagnate. However, it is important to note that in classification problems, the issue of overfitting arises. Thus, a better fit during training does not necessarily translate to higher accuracy during testing. To account for this, we generated box plots of all the test results. For instance, in the Tic-tac-toe problem, HSHADE not only exhibits the highest median accuracy but also achieves the highest accuracy in a single instance. On the other hand, in the Vote problem, while HSHADE’s median accuracy falls slightly below that of HSHADE-only-trained DNM, its optimal solution demonstrates significantly higher accuracy than the other methods. This underscores the practical value of HSHADE, as it consistently possesses the potential to achieve the best solution. The difference between SHADE’s results in training DNM and DNM-R is not particularly pronounced, and its level of accuracy varies across different problems. However, SHADE, which relies solely on hyperparameter adaptation, exhibits a significant drop in performance on certain problems. Notably, it produces entirely incorrect training results for the Ionosphere problem. This issue arises from the introduction of two new dimensions, which significantly expands the algorithm’s search space and adversely affects its original convergence capability. Therefore, HSHADE should be used in conjunction with DNM-R. DNM-R, with its signal amplification capability, outperforms the DNM in terms of problem sensitivity, thereby strengthening the algorithm’s upper bound of convergence ability. On the other hand, the adaptive tuning of hyperparameters in HSHADE enhances the model’s feature resolution compared to setting parameters with a single decimal place. In summary, the experimental data strongly support the effectiveness and indispensability of HSHADE’s improvements, and HSHADE consistently demonstrates excellent performance.

3.5. Analysis of Population Interaction Network Fitting Results for HSHADE

Table 5 shows the population interaction network fitting results for HSHADE. For the specific fitting process, refer to our previous research [26]. SSE stands for standardized squared error and

R^{2}

stands for coefficient of determination. In general, the smaller the SSE the larger the

R^{2}

, implying a greater similarity to that distribution. The results indicate that during the iteration process of HSHADE, the network formed by information interaction aligns more closely with the characteristics of a power-law distribution. This is consistent with the findings from our study on predicting time series with DNM [27]. These results suggest that the power-law-distributed information interaction network has a positive impact on the training process of the DNM.

4. Discussions and Conclusions

In the realm of artificial neural networks, the deep understanding and optimization of individual neuron dynamics have always held profound significance. The DNM is one such example that has caught the attention of the research community, particularly due to its unique architectural nuances and data processing capabilities. Despite its promising features, the DNM poses certain complexities, especially when it comes to parameter tuning. Recognizing this challenge, this study has pioneered the development of HSHADE—an innovative approach that synergistically marries the evolutionary prowess of the SHADE algorithm with the DNM’s hyperparameters. This integration heralds a new generation of neurons, ones that are characterized by their ability to self-evolve during algorithmic iterations.

The comparative evaluations and benchmarks indicate that the HSHADE framework significantly outperforms its contemporaries, particularly in terms of classification accuracy. Moreover, it presents a more streamlined and efficient method for parameter tuning, thereby addressing one of the primary challenges associated with the DNM. However, as with most pioneering endeavors, the HSHADE is not without its limitations. A salient challenge that still remains is the auto-tuning of the parameter M, which corresponds to the count of dendritic branches in the neuron model. The crux of the challenge lies in the fact that any change in M results in variations in the computational dimensions, making matrix operations unfeasible. This, in essence, implies that while the HSHADE framework has made significant strides in auto-tuning certain parameters, automation of the M parameter remains elusive.

As we gaze into the future of this research, the overarching goal will pivot towards devising methodologies that can seamlessly and effectively auto-tune the M parameter. As previously highlighted, variations in M result in non-uniform dimensions among individuals within the population. In this context, the problem being optimized is termed a metameric variable-length problem [28]. Contemporary mainstream algorithms rely on matrix operations to generate new individuals, making them incompatible with the metameric variable-length problem setting. Essentially, this study serves as an intermediary step. Our primary objective was to incorporate all parameters (including M) into the individuals of the evolutionary algorithm, facilitating the self-evolution of these parameters. However, only the self-evolution of k and q was successfully achieved. Tackling this challenge would not only enhance the efficacy of the HSHADE framework but also further solidify its position as a game-changer in the domain of artificial neuron modeling and optimization.

In terms of validity threats, the sole variable examined in this study pertains to the integration of hyperparameters into the algorithm’s iterative process, with all other experimental conditions being held constant. Consequently, there are no discernible threats to internal validity at present. Nonetheless, given that our selection was confined to the UCI Machine Learning Repository dataset, it is imperative that future studies broaden the spectrum of datasets used to address potential threats to external validity.

Author Contributions

Conceptualization, H.Y. and Y.Z.; methodology, Y.Z. and H.Y.; software, H.Y. and Y.Y.; validation, Y.Z., Y.Y. and W.X.; formal analysis, Y.Y. and Y.Z.; investigation, Y.Z.; resources, H.Y.; data curation, C.Z. and W.X.; writing—original draft preparation, Y.Z.; writing—review and editing, Z.Z.; visualization, Y.Z.; supervision, C.Z.; project administration, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially supported by the Tokushima University Tenure-Track Faculty Development Support System, Tokushima University, Japan.

Data Availability Statement

Data will be made available upon request by contacting the corresponding author’s email address. The source code of the model will be made public through the URL: https://velvety-frangollo-5d54c2.netlify.app/ accessed on 20 November 2023.

Conflicts of Interest

The authors declare no conflict of interest. Notably, author Yuxin Zhang is affiliated with Wesoft Company Ltd., but this affiliation does not pose a conflict regarding the research and content of this manuscript.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial neural network
BBO	Biogeography-based optimization
BP	Backpropagation
CJADE	Chaotic-local-search-based differential evolution algorithms
DE	Differential evolution
DNM	Dendritic neuron model
DNM-R	DNM with a receptor function
EA	Evolutionary algorithm
GA	Genetic algorithm
HSHADE	Hyperparameter-tuning SHADE
IFDE	Information feedback-enhanced DE
L-SHADE	SHADE with linear population size reduction
MHA	Meta-heuristic algorithm
PIN	Population interaction network
SASS	Spherical search algorithm
SCJADE	Improved variant of CJADE
SHADE	Success-history based parameter adaptation for DE
SSE	Standardized squared error

References

McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Stanley, K.O.; Clune, J.; Lehman, J.; Miikkulainen, R. Designing neural networks through neuroevolution. Nat. Mach. Intell. 2019, 1, 24–35. [Google Scholar] [CrossRef]
Townsend, J.; Chaton, T.; Monteiro, J.M. Extracting Relational Explanations from Deep Neural Networks: A Survey from a Neural-Symbolic Perspective. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 3456–3470. [Google Scholar] [CrossRef] [PubMed]
Ostojic, S.; Brunel, N. From spiking neuron models to linear-nonlinear models. PLoS Comput. Biol. 2011, 7, e1001056. [Google Scholar] [CrossRef]
Xu, Z.; Wang, Z.; Li, J.; Jin, T.; Meng, X.; Gao, S. Dendritic neuron model trained by information feedback-enhanced differential evolution algorithm for classification. Knowl. Based Syst. 2021, 233, 107536. [Google Scholar] [CrossRef]
Todo, Y.; Tamura, H.; Yamashita, K.; Tang, Z. Unsupervised learnable neuron model with nonlinear interaction on dendrites. Neural Netw. 2014, 60, 96–103. [Google Scholar] [CrossRef]
Yang, Y.; Li, X.; Li, H.; Zhang, C.; Todo, Y.; Yang, H. Yet Another Effective Dendritic Neuron Model Based on the Activity of Excitation and Inhibition. Mathematics 2023, 11, 1701. [Google Scholar] [CrossRef]
Šíma, J. Training a Single Sigmoidal Neuron Is Hard. Neural Comput. 2002, 14, 2709–2728. [Google Scholar] [CrossRef]
Holland, J.H. Genetic algorithms. Sci. Am. 1992, 267, 66–73. [Google Scholar] [CrossRef]
Storn, R.; Price, K. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Tanabe, R.; Fukunaga, A. Success-history based parameter adaptation for Differential Evolution. In Proceedings of the 2013 IEEE Congress on Evolutionary Computation, Cancun, Mexico, 20–23 June 2013; pp. 71–78. [Google Scholar]
Das, S.; Suganthan, P.N. Differential Evolution: A Survey of the State-of-the-Art. IEEE Trans. Evol. Comput. 2011, 15, 4–31. [Google Scholar] [CrossRef]
Awad, N.H.; Ali, M.Z.; Suganthan, P.N.; Reynolds, R.G. An ensemble sinusoidal parameter adaptation incorporated with L-SHADE for solving CEC2014 benchmark problems. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016; pp. 2958–2965. [Google Scholar]
Zhang, J.; Sanderson, A.C. JADE: Adaptive Differential Evolution with Optional External Archive. IEEE Trans. Evol. Comput. 2009, 13, 945–958. [Google Scholar] [CrossRef]
Peng, L.; Liu, S.; Liu, R.; Wang, L. Effective long short-term memory with differential evolution algorithm for electricity price prediction. Energy 2018, 162, 1301–1314. [Google Scholar] [CrossRef]
Yang, H.; Tao, S.; Zhang, Z.; Cai, Z.; Gao, S. Spatial information sampling: Another feedback mechanism of realising adaptive parameter control in meta-heuristic algorithms. Int. J. Bio-Inspired Comput. 2022, 19, 48–58. [Google Scholar] [CrossRef]
Poirazi, P.; Brannon, T.; Mel, B.W. Pyramidal neuron as two-layer neural network. Neuron 2003, 37, 989–999. [Google Scholar] [CrossRef]
Gao, S.; Zhou, M.; Wang, Y.; Cheng, J.; Yachi, H.; Wang, J. Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 601–614. [Google Scholar] [CrossRef]
Koch, C.; Poggio, T. Multiplying with synapses and neurons. In Single Neuron Computation; Elsevier: Amsterdam, The Netherlands, 1992; pp. 315–345. [Google Scholar]
Tanabe, R.; Fukunaga, A.S. Improving the search performance of SHADE using linear population size reduction. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China, 6–11 July 2014; pp. 1658–1665. [Google Scholar]
Gao, S.; Yu, Y.; Wang, Y.; Wang, J.; Cheng, J.; Zhou, M. Chaotic Local Search-Based Differential Evolution Algorithms for Optimization. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 3954–3967. [Google Scholar] [CrossRef]
Xu, Z.; Gao, S.; Yang, H.; Lei, Z. SCJADE: Yet Another State-of-the-Art Differential Evolution Algorithm. IEEJ Trans. Electr. Electron. Eng. 2021, 16, 644–646. [Google Scholar] [CrossRef]
Simon, D. Biogeography-Based Optimization. IEEE Trans. Evol. Comput. 2008, 12, 702–713. [Google Scholar] [CrossRef]
Kumar, A.; Misra, R.K.; Singh, D.; Mishra, S.; Das, S. The spherical search algorithm for bound-constrained global optimization problems. Appl. Soft Comput. 2019, 85, 105734. [Google Scholar] [CrossRef]
Todo, Y.; Tang, Z.; Todo, H.; Ji, J.; Yamashita, K. Neurons with multiplicative interactions of nonlinear synapses. Int. J. Neural Syst. 2019, 29, 1950012. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Tao, S.; Yang, H.; Yuan, Z.; Tang, Z. Dynamic Complex Network, Exploring Differential Evolution Algorithms from Another Perspective. Mathematics 2023, 11, 2979. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, Y.; Li, X.; Yuan, Z.; Todo, Y.; Yang, H. A dendritic neuron model optimized by meta-heuristics with a power-law-distributed population interaction network for financial time-series forecasting. Mathematics 2023, 11, 1251. [Google Scholar] [CrossRef]
Ryerkerk, M.L.; Averill, R.C.; Deb, K.; Goodman, E.D. Solving metameric variable-length optimization problems using genetic algorithms. Genet. Program. Evolvable Mach. 2017, 18, 247–277. [Google Scholar] [CrossRef]

Figure 1. The structure of dendritic neuron model.

Figure 2. The learning process of a DNM.

Figure 3. Connection cases of the synaptic layer.

Figure 4. Convergence graphs.

Figure 5. Box graphs.

Table 1. Experimental datasets.

	Attributes	Instances	M	k	q	Dimensions of Learning Space
Tic-tac-toe	9	958	10	5	0.9	180
Heart	13	270	5	5	0.7	130
Australia	14	690	5	20	0.3	140
Congress	16	435	3	20	0.1	96
Vote	16	300	10	15	0.3	320
Spect	22	267	20	5	0.1	880
German	24	1000	10	10	0.1	480
Breast	30	568	3	5	0.5	180
Ionosphere	34	351	3	5	0.5	204
KrVsKpEW	36	3196	20	5	0.1	1440

Table 2. Performance of DNMs trained by EAs on the classification problem.

	HSHADE	IFDE	L-SHADE	SCJADE	CJADE	SASS	BBO	BP
Australia	86.05%	85.64% +	84.73% +	65.19% +	68.37% +	85.86% +	83.90% +	83.64% +
BreastEW	88.59%	93.82% −	84.72% +	64.00% +	63.53% +	88.00% +	92.27% −	63.53% +
CongressEW	95.51%	97.03% −	84.73% +	82.15% +	79.28% +	96.92% −	95.41% +	89.74% +
German	74.80%	76.38% −	74.72% +	35.82% +	35.12% +	75.82% −	70.60% +	40.44% +
Heart	93.57%	86.29% +	67.60% +	77.32% +	77.65% +	92.88% +	89.98% +	94.41% −
Ionosphere	96.49%	94.22% +	96.31% +	17.88% +	17.88% +	94.15% +	94.19% +	17.88% +
KrVsKpEW	90.17%	79.75% +	91.58% −	48.34% +	47.89% +	80.38% +	75.33% +	47.26% +
SpectEW	77.40%	81.82% −	40.84% +	89.11% −	84.21% −	79.06% −	80.27% −	88.56% −
Tic-tac-toe	82.13%	78.54% +	81.24% +	61.91% +	59.20% +	77.26% +	74.22% +	64.13% +
Vote	93.25%	92.56% +	47.26% +	82.11% +	77.36% +	93.06% +	92.19% +	84.36% +
W/T/L	-/-/-	6/0/4	9/0/1	9/0/1	10/0/0	5/2/3	7/1/2	8/0/2

Table 3. Performance of DNMs trained by HSHADE on the classification problem.

Branches (M)	Australia	BreastEW	CongressEW	German	Heart	Ionosphere	KrVsKpEW	SpectEW	Tic-tac-toe	Vote
1	86.05%	66.90%	95.15%	74.80%	86.95%	96.49%	90.17%	77.40%	60.68%	93.08%
2	85.85%	71.39%	94.97%	74.20%	90.16%	96.11%	90.14%	77.17%	67.83%	93.25%
3	85.28%	74.45%	95.51%	73.80%	92.21%	95.74%	89.68%	77.17%	74.50%	92.47%
4	85.27%	73.06%	95.05%	73.14%	92.90%	95.50%	89.35%	77.02%	78.62%	92.69%
5	85.07%	76.45%	95.26%	73.12%	93.22%	95.19%	89.73%	77.08%	78.73%	92.44%
6	85.19%	80.45%	95.23%	72.97%	93.50%	94.86%	89.11%	75.92%	80.34%	93.06%
7	85.01%	78.51%	95.23%	72.50%	93.54%	95.43%	89.07%	76.47%	80.68%	92.94%
8	85.25%	77.35%	95.08%	71.79%	93.57%	93.89%	89.26%	74.99%	80.16%	92.64%
9	84.93%	84.43%	95.31%	72.60%	93.19%	94.81%	88.77%	75.40%	82.13%	93.22%
10	84.99%	83.69%	95.31%	71.97%	93.00%	95.19%	88.82%	75.29%	80.03%	92.17%
11	85.25%	80.16%	95.33%	71.83%	92.84%	93.89%	88.42%	74.83%	81.08%	92.33%
12	84.99%	82.80%	95.36%	71.36%	93.11%	93.25%	88.69%	75.04%	81.06%	92.72%
13	85.09%	88.06%	95.18%	71.63%	92.27%	93.97%	88.11%	75.61%	78.42%	92.97%
14	85.15%	81.43%	95.26%	72.02%	91.72%	92.38%	89.03%	74.51%	79.94%	92.69%
15	85.41%	87.67%	95.33%	71.71%	91.44%	91.52%	87.37%	74.88%	79.07%	91.94%
16	85.46%	88.59%	95.46%	71.10%	91.13%	93.16%	87.84%	74.40%	77.89%	92.33%
17	85.12%	86.75%	95.36%	71.42%	91.21%	93.55%	87.82%	73.89%	75.67%	92.39%
18	85.22%	87.96%	95.23%	71.66%	90.67%	92.01%	86.50%	73.99%	76.79%	92.31%
19	85.15%	88.49%	95.21%	71.10%	90.61%	93.11%	86.87%	73.49%	76.41%	91.81%
20	85.36%	86.67%	94.64%	71.21%	90.58%	93.36%	86.00%	74.60%	75.67%	92.17%

Table 4. Ablation experiment of HSHADE.

	HSHADE	SHADE-R	SHADE	SHADE-A
Australia	86.05%	73.76% +	84.99% +	80.19% +
BreastEW	88.59%	83.74% +	91.69% −	63.53% +
CongressEW	95.51%	92.61% +	96.13% −	96.28% −
German	74.80%	88.45% −	73.83% +	65.62% +
Heart	93.57%	95.87% −	87.22% +	89.44% +
Ionosphere	96.49%	91.55% +	94.17% +	17.88% +
KrVsKpEW	90.17%	96.21% −	78.41% +	47.26% +
SpectEW	77.40%	84.57% −	80.78% −	80.87% −
Tic-tac-toe	82.13%	74.74% +	76.76% +	73.02% +
Vote	93.25%	75.03% +	92.72% +	93.67% −
W/T/L	-/-/-	6/0/4	7/0/3	7/0/3

Table 5. Analysis of PIN fitting results for HSHADE.

	Normal		Gamma		Possion		Exp		Logistic		Power Law
	SSE	$R^{2}$	SSE	$R^{2}$	SSE	$R^{2}$	SSE	$R^{2}$	SSE	$R^{2}$	SSE	$R^{2}$
Ionosphere	32.3246	0.042527	1.782	0.79448	2.819	0.7952	1.8445	0.77065	1.0849	0.86272	0.011742	0.98565
Australia	33.1752	0.041274	1.7839	0.79512	2.8254	0.79561	1.8472	0.77123	0.90466	0.89639	0.011791	0.98739
KrVsKpEW	33.303	0.040344	1.8087	0.79237	2.8473	0.79395	1.873	0.76797	1.0819	0.8796	0.012128	0.98508
BreastEW	54.2068	0.023278	1.9801	0.77925	3.1568	0.77939	2.0406	0.75538	2.1971	0.80714	0.006997	0.99136
German	36.5741	0.049809	1.8025	0.66516	2.7998	0.79693	1.834	0.77374	1.4862	0.85046	0.032933	0.96517
Heart	56.6934	0.020287	1.8924	0.7917	2.8557	0.80069	1.9794	0.763	1.6371	0.86115	0.011856	0.99012
CongressEW	31.464	0.044235	1.7581	0.79679	2.7907	0.79688	1.8202	0.77328	1.0456	0.85937	0.014252	0.98255
SpectEW	30.4986	0.045194	1.7622	0.79571	2.7774	0.79709	1.8248	0.77189	1.0079	0.8498	0.012513	0.98547
Tic-tac-toe	39.7001	0.033316	1.8322	0.79286	2.8606	0.79587	1.9028	0.7674	1.0778	0.86414	0.007815	0.99162
Vote	31.6266	0.04429	1.7689	0.79544	2.8234	0.79465	1.8284	0.77241	0.9807	0.86524	0.014295	0.98401

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, H.; Zhang, Y.; Zhang, C.; Xia, W.; Yang, Y.; Zhang, Z. A Hyperparameter Self-Evolving SHADE-Based Dendritic Neuron Model for Classification. Axioms 2023, 12, 1051. https://doi.org/10.3390/axioms12111051

AMA Style

Yang H, Zhang Y, Zhang C, Xia W, Yang Y, Zhang Z. A Hyperparameter Self-Evolving SHADE-Based Dendritic Neuron Model for Classification. Axioms. 2023; 12(11):1051. https://doi.org/10.3390/axioms12111051

Chicago/Turabian Style

Yang, Haichuan, Yuxin Zhang, Chaofeng Zhang, Wei Xia, Yifei Yang, and Zhenwei Zhang. 2023. "A Hyperparameter Self-Evolving SHADE-Based Dendritic Neuron Model for Classification" Axioms 12, no. 11: 1051. https://doi.org/10.3390/axioms12111051

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hyperparameter Self-Evolving SHADE-Based Dendritic Neuron Model for Classification

Abstract

1. Introduction

2. Materials and Methods

2.1. Dendritic Neuron Model

2.1.1. Synaptic Layer

2.1.2. Dendritic Layer

2.1.3. Membrane Layer

2.1.4. Soma Layer

2.2. Evolutionary Algorithms

2.2.1. Search Operation

2.2.2. Evaluation Operation

2.3. Self-Evolving DNM

3. Results

3.1. Dataset Description and Parameter Settings

3.2. Evaluation Criteria

3.3. Experimental Setup

3.4. Performance Comparison of EA-Trained DNMs

3.5. Analysis of Population Interaction Network Fitting Results for HSHADE

4. Discussions and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI