Next Article in Journal
State-Plane Trajectory-Based Duty Control of a Resonant Bidirectional DC/DC Converter with Balanced Capacitors Stress
Next Article in Special Issue
An Evolutionary Algorithm for Task Clustering and Scheduling in IoT Edge Computing
Previous Article in Journal
Exploring Low-Risk Anomalies: A Dynamic CAPM Utilizing a Machine Learning Approach
Previous Article in Special Issue
Set-Based Particle Swarm Optimisation: A Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Discrete JAYA Algorithm Based on Reinforcement Learning and Simulated Annealing for the Traveling Salesman Problem

1
School of Systems Science, Beijing Jiaotong University, Beijing 100044, China
2
School of Modern Post, Beijing University of Posts and Telecommunications, Beijing 100876, China
3
School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(14), 3221; https://doi.org/10.3390/math11143221
Submission received: 13 June 2023 / Revised: 19 July 2023 / Accepted: 19 July 2023 / Published: 22 July 2023
(This article belongs to the Special Issue Combinatorial Optimization: Trends and Applications)

Abstract

:
The JAYA algorithm is a population-based meta-heuristic algorithm proposed in recent years which has been proved to be suitable for solving global optimization and engineering optimization problems because of its simplicity, easy implementation, and guiding characteristic of striving for the best and avoiding the worst. In this study, an improved discrete JAYA algorithm based on reinforcement learning and simulated annealing (QSA-DJAYA) is proposed to solve the well-known traveling salesman problem in combinatorial optimization. More specially, firstly, the basic Q-learning algorithm in reinforcement learning is embedded into the proposed algorithm such that it can choose the most promising transformation operator for the current state to update the solution. Secondly, in order to balance the exploration and exploitation capabilities of the QSA-DJAYA algorithm, the Metropolis acceptance criterion of the simulated annealing algorithm is introduced to determine whether to accept candidate solutions. Thirdly, 3-opt is applied to the best solution of the current iteration at a certain frequency to improve the efficiency of the algorithm. Finally, to evaluate the performance of the QSA-DJAYA algorithm, it has been tested on 21 benchmark datasets taken from TSPLIB and compared with other competitive algorithms in two groups of comparative experiments. The experimental and the statistical significance test results show that the QSA-DJAYA algorithm achieves significantly better results in most instances.

1. Introduction

In the real world, the essence of a large number of complex problems are combinatorial optimization problems. The traveling salesman problem (TSP) is considered to be one of the most common problems in the field of combinatorial optimization, especially in logistics transportation and distribution. In this problem, the traveler starts from one city, passes through all the cities, and finally returns to the starting city. Since Dantzig and Ramser [1] proposed this problem in 1959, it has attracted increasing attention. However, the TSP is an NP-hard problem [2,3], that is, there is no polynomial time algorithm to solve this problem. Solving such problems is still full of challenge. It should be noted that other combinatorial optimization problems, such as knapsack problems, assignment problems, job-shop scheduling problems, and so on, are also NP-hard problems similar to the TSP. If the TSP can be solved efficiently, it will also provide promising solutions for other similar problems.
Generally speaking, most exact approaches to the TSP are based on the linear programming algorithm, the branch-and-bound algorithm, and the dynamic programming algorithm [4]. The basic idea of the exact algorithms is to find the optimal solution by traversing the entire solution space, which determines the high time complexity of the above exact algorithms. Therefore, this kind of exact algorithms can only effectively solve small-scale TSPs, but it is difficult to solve the medium and large-scale TSPs. The explosion of the solution space caused by the increase in the scale is the most severe problem in the process of solving the TSP. Traditional exact algorithms have prominent disadvantages in the face of this difficulty, making it difficult to effectively solve this optimization problem [4]. To overcome this difficulty, researchers have designed many meta-heuristic algorithms inspired by the laws of physical change and biological systems in nature, such as Genetic Algorithm (GA) [5], Cuckoo Search (CS) [6], Particle Swarm Optimization (PSO) [7,8,9], Bat Algorithm (BA) [2], Simulated Annealing (SA) [10], Ant Colony Optimization algorithm (ACO) [8,11,12,13,14], Frog-Leaping Algorithm (FLA) [15], and Artificial Bee Colony (ABC) [16,17,18,19,20]. Those kinds of meta-heuristic algorithms are simple in principle, are flexible in mechanism, and easily find the approximate optimal solution in a short time.
The JAYA algorithm is a meta-heuristic algorithm, proposed by Rao in 2016, which is used to solve constrained and unconstrained continuous optimization problems [21]. It has been applied to solve different kinds of problems, such as flexible job-shop scheduling [22], text clustering [23], solid oxide fuel cell parameter optimization [24], feature selection [25], hydropower reservoir operation optimization [26], etc. Mesut Gunduz and Murat Aslan [27] discretized the JAYA algorithm by modifying the encoding mode and updating mechanism, and applied it to the TSP for the first time. Although the results on the selected cases are good in terms of convergence speed and solution quality, it is still inferior to the Discrete Tree-Seed Algorithm (DTSA) [28] used for comparison. Fortunately, the traditional JAYA algorithm can be improved by combining it with other meta-heuristic algorithms and introducing some advanced ideas such as reinforcement learning to mitigate the problems of instability and easily falling into local optima.
In this paper, an improved discrete JAYA algorithm based on reinforcement learning and SA, viz., the QSA-DJAYA algorithm, is proposed to solve the TSP. In the QSA-DJAYA algorithm, six transformation operators are used for producing the candidate solutions. Unlike the DJAYA algorithm of [27], which used roulette wheel selection to select transformation operators, the proposed QSA-DJAYA algorithm utilizes the Q-learning algorithm in reinforcement learning to choose the most promising transformation operator. Then, inspired by SA, we introduce the Metropolis acceptance criterion to determine whether to accept the candidate solution. In addition, 3-opt is applied to the best solution of the current iteration at a certain frequency to further improve the efficiency of this algorithm. To evaluate the performance of the proposed approach, it was compared with four representative algorithms developed by ourselves and eight efficient methods the from literature on 21 instances from TSPLIB [29]. The experimental results and the statistical significance test show the effectiveness of the proposed QSA-DJAYA algorithm. The main contributions are as follows:
  • A novel improved discrete JAYA algorithm for the TSP is designed.
  • The Q-learning algorithm in reinforcement learning is embedded to adaptively select the promising transformation operator.
  • The Metropolis acceptance criterion of SA is introduced to help jump out of the local optima.
  • Compared with four typical algorithms developed by ourselves and eight efficient methods from the literature, the proposed algorithm displays great superiority in solving the TSP.
The remainder of this paper is organized as follows: Section 2 describes the related work. In Section 3, the proposed QSA-DJAYA algorithm is introduced. Section 4 discusses the experimental results of the proposed algorithm and other advanced competing algorithms on the TSP benchmark datasets. Section 5 draws conclusions and proposes future work.

2. Related Work on TSP and JAYA Algorithm

2.1. Research on the Meta-Heuristic Algorithms of TSP

2.1.1. The Traveling Salesman Problem

The aim of the TSP is to look for a trip that ensures the salesman visits each city exactly once and returns to the starting with the minimum total travel distance. The TSP can be defined as a complete weighted graph G ( V , E ) , where V = 1 , 2 , , n is the set of vertices and E is the set of edges. Mathematically, the model of TSP can be formulated as follows:
Min C = i = 1 n j = 1 n d i j x i j ,
subject to:
j = 1 n x i j = 1 i V ,
i = 1 n x i j = 1 j V ,
i S j S ¯ x i j 1 S V , S ,
x i j 0 , 1 i , j V ,
where n is the number of cities to be visited, and d i j is the distance between city i and city j. Equation (1) is the objective function that minimizes the total distance traveled. Equations (2) and (3) ensure that each city is visited exactly once. Equation (4) is the subtour elimination constraint. And, in Equation (5), x i j is a binary variable, and it denotes whether the arc from i to j is selected by the salesman.

2.1.2. Related Work on TSP

For the TSP, as the size of the problem increases, the number of solutions increases exponentially. And, the exact algorithm cannot give the optimal or approximate optimal solution within a reasonable time. Inspired by natural phenomena, especially the collective behavior of social animals, researchers begin to study various meta-heuristic algorithms to solve the TSP efficiently. Some meta-heuristic algorithms are proposed earlier and studied intensively, such as GA, ACO, PSO, ABC, and so on. Those kinds of meta-heuristic algorithms are further adapted to improve their performance. Ebadinezhad [11] proposed an adaptive ACO that can dynamically adjust the parameters in order to overcome the disadvantages of low convergence speed and easily falling into local optima. The mechanism selected the starting point based on clustering to realize the shortest path. Zhong et al. [16] proposed a hybrid discrete ABC with threshold acceptance criteria, in which employed and onlooker bees decided whether to accept the newly generated solutions according to the threshold acceptance criteria. This non-greedy acceptance strategy maintained the population diversity. Choong et al. [17] used the modified selection function (MCF) to automatically adjust the selection of neighborhood search heuristics adopted by the employed bees and the onlooker bees and improved the model performance by combining the Lin–Kernighan local search strategy. Khan and Maiti [18] improved the ABC and adopted multiple updating rules and the K-opt operation to solve the TSP. Karaboga and Gorkemli [19] also proposed two new versions of the ABC. One was a combined version of the standard ABC, and the other is a further refinement of the combined version.
In addition to the abovementioned meta-heuristic algorithms, a large number of other meta-heuristic algorithms have been proposed in recent years. For example, Hatamlou [30] studied the application of the Black Hole algorithm (BH) in solving the TSP, and the experiments showed that the algorithm can find a solution with better quality in a shorter time than the classical GA, ACO, and PSO. Similarly, Zhang and Han [31] proposed a discrete sparrow search algorithm (DSSA) with a global perturbation strategy to solve the TSP. Zheng et al. [32] solved a variant of the TSP, namely the multiple traveling salesmen problem (mTSP). For this problem, they proposed an iterated two-stage heuristic algorithm called ITSHA, whose first stage was an initialization phase aimed at generating high quality and diverse initial solutions and the second stage was an improvement phase mainly using novel Variable Neighborhood Search (VNS) methods they developed to optimize the initial solutions. Liu et al. [33] studied an evolutionary algorithm to solve the multimodal multiobjective traveling salesman problem (MMTSP), which had the potential to solve many real-world problems. The proposed algorithm, where two new edge assembly crossover operators were embedded, used a new environmental selection operator to maintain a balance between the objective space diversity and decision space diversity. Tsai et al. [34] developed a novel method to improve biogeography-based optimization (BBO) for solving the TSP. The proposed method combined a greedy randomized adaptive search procedure and the 2-opt algorithm. However, it was only tested on three datasets and the largest size of the instances was 100.
However, when solving the TSP, a single meta-heuristic algorithm is more likely to fall into the local optimal, which degrades the performance of the algorithm. Therefore, many scholars have proposed hybrid meta-heuristic algorithms, which fuse two or more algorithms together to make full use of the advantages of different algorithms. This balances the exploration and exploitation capabilities of the algorithms and helps algorithms to better solve complex problems. Baraglia et al. [35] solved the classical TSP by combining GA with Lin–Kernighan local search. Similarly, Yang and Pei [7] proposed a hybrid method based on ABC and PSO. Mahi et al. [8] proposed a hybrid algorithm by combining PSO, ACO, and the 3-opt algorithm. Gulcu et al. [12] proposed a parallel cooperative hybrid algorithm combining ACO and the 3-opt algorithm. Saji and Barkatou [2] combined the random walk of Lévy flight with bat movement to improve the traditional BA to solve the classic TSP. In order to improve the diversity and convergence of the population, a uniform crossing operator in GA was embedded in the proposed algorithm. Yang et al. [9] proposed a new method to solve the TSP with an arbitrary neighborhood. In the hybrid algorithm, the outer loop used linear decreasing inertial weight PSO to search the continuous access location, while the inner loop used GA to optimize the discrete visiting sequence. The experiments showed that this hybrid algorithm can significantly reduce the search space without lowering the quality of solutions and can find high-quality solutions in a reasonable time. It can be seen that scientifically combining two or more meta-heuristic algorithms and redesigning the algorithm by utilizing their advantages and discarding their disadvantages to solve the TSP plays an important role in promoting the quality and efficiency of the solution algorithm.

2.2. Research on JAYA Algorithm

2.2.1. The Basic JAYA Algorithm

JAYA is interpreted as victory in Sanskrit. The algorithm strives to achieve victory by obtaining the optimal solution; hence, the algorithm was named the JAYA algorithm. Based on the principle of continuous improvement, the JAYA algorithm approaches excellent individuals while constantly moving away from poor ones, thus improving the quality of solutions [21]. Unlike other evolutionary algorithms that require many parameters, the JAYA algorithm only needs to adjust parameters of the iterative process for specific problems such as random numbers, which can avoid the problem that too many parameters need to be adjusted during the implementation of the algorithm. Therefore, compared with other meta-heuristic algorithms, the JAYA algorithm has a unique orientation characteristic of striving for the best and avoiding the worst. It has the advantages of few control parameters, simple structure, flexible mechanism, and easy understanding and implementing, which make it be suitable for solving diverse optimization problems.
In the basic JAYA algorithm, each individual in the population iteratively evolves to obtain a new solution based on Equation (6), as follows:
x k , i = x k , i + r 1 x b e s t , i x k , i r 2 x w o r s t , i x k , i ,
where x k , i is the i-th dimensional variable of the k-th individual. x b e s t , i and x w o r s t , i are the i-th dimension variable of the individual with the best and the worst fitness values in the current iteration, respectively. Both r 1 and r 2 are random numbers in the range [0,1]. x k , i is the updated value of the i-th dimension variable of the k-th individual. It can be seen from Equation (6) that r 1 ( x b e s t , i | x k , i | ) shows the evolution trend of the current solution to the current best one and  r 2 ( x w o r s t , i | x k , i | ) shows the evolution trend of the current solution away from the worst one. Therefore, the core of JAYA algorithm is to approach the optimal solution while staying away from the worst solution. The flowchart of the basic JAYA algorithm is shown in Figure 1.

2.2.2. Related Work on JAYA Algorithm

The traditional JAYA algorithm is a powerful meta-heuristic algorithm proposed by Rao [21] in 2016. Rao proved its excellent performance by solving 30 unconstrained benchmark problems. According to the related literature, the JAYA algorithm has a unique orientation characteristic of striving for the best and avoiding the worst. It has the advantages of few control parameters, a simple structure, and a flexible mechanism, which make it be suitable for solving diverse optimization problems. And, the JAYA algorithm has been successfully applied in many fields such as in society and industry and has a wide range of application scenarios. Aslan et al. [36] proposed a binary optimization algorithm based on the JAYA algorithm, which replaced the updating rules of the traditional JAYA algorithm with newly designed transformation operators for binary optimization. Rao and More [37] proposed an adaptive JAYA algorithm to solve the design optimization and analysis of selected thermal devices. Pradhan and Bhende [38] introduced linear inertia weights and nonlinear inertia weights based on fuzzy logic, respectively, which effectively improved the ability of the JAYA algorithm to solve complex problems. Wang et al. [39] proposed a parallel JAYA algorithm based on graphics processors (GPUs) to estimate the model parameters of lithium-ion batteries, which can not only accurately estimate the model parameters of batteries but also greatly shorten the runtime. Xiong et al. [24] combined the JAYA algorithm and differential evolution to propose a hybrid meta-heuristic optimization algorithm, which has been successfully applied to the parameter optimization of solid oxide fuel cells. Thirumoorthy and Muneeswaran [23] applied the hybrid JAYA optimization algorithm to text document clustering, which achieved the highest quality clustering in all selected benchmark instances. Gunduz and Aslan [27] improved upon the algorithm, leading to the discrete JAYA algorithm, and applied it for the first time to solve the TSP. They proved that the proposed DJAYA algorithm was a highly competitive and robust optimizer for the TSP. Li et al. [22] once again verified the effectiveness of the JAYA algorithm by solving the flexible job-shop scheduling problem with an improved JAYA algorithm.

3. The Proposed QSA-DJAYA Algorithm for TSP

As mentioned above, the JAYA algorithm was originally proposed by Rao to solve continuous optimization problems [21]. By means of discretization, the JAYA algorithm has been applied to solve flexible job-shop scheduling [22], text clustering [23], solid oxide fuel cell parameter optimization [24], feature selection [25], the hydropower reservoir operation optimization problem [26], the TSP [27], etc.
In the proposed algorithm, the initialization method is the same as that in [27], that is, the first individual is constructed using the nearest neighbor algorithm and the rest individuals in the population are generated via random permutation. Following the core of the basic JAYA algorithm, two search trend parameters S T 1 and S T 2 are used to control the selection of an individual among the best, worst, and current for solution updating. And, this strategy is also taken from [27]. The selection of updating operators takes into account the basic Q-learning algorithm in reinforcement learning, which will be introduced in detail in Section 3.1. In addition, the characteristic of SA will be introduced in the acceptance criteria of solutions, which is described in Section 3.2. Below, the proposed algorithm is called the QSA-DJAYA algorithm.

3.1. Strategy Selection Based on Q-Learning Algorithm

Q-learning is an approximate value-based algorithm in reinforcement learning. When an agent implements an action, the environment will give a corresponding reward R according to the action. Q is Q ( s , a ) , that is, the expected income obtained by taking action a ( a A ) in a certain state s ( s S ) . Therefore, the main idea of the algorithm is to build a Q-table based on state and action to store Q-values and, then, to select the actions that can obtain the maximum benefits according to the Q-values. Learning is a dynamic process, and the action-value function Q ( s , a ) is iteratively updated by learning from the collected experiences of the current policy [40]. The equation for updates uses the following method of time difference:
Q ( s t + 1 , a t ) = Q ( s t , a t ) + α [ R t + γ max a A Q ( s t + 1 , a ) Q ( s t , a t ) ] ,
where α is the learning rate and γ is the rewarding decay coefficient. Usually, in the Q-learning algorithm, the way to select an action is via the ε -greedy strategy, that is, the action with the best Q-value for s t + 1 is selected in most cases, and an action is randomly selected in a few cases. According to this strategy, the action that should be implemented in each step can be selected through the iteratively updated Q-values. The actions obtained according to this method can maximize the reward, so as to achieve the optimal policy. But, in order to improve the efficiency, the strategy of selecting the optimal action using a combination of the roulette and greedy strategy is adopted in the proposed QSA-DJAYA algorithm. In order to promote the action with a large Q-value to have a high probability of being selected, the calculation of the specified probability follows Equation (8). The pseudo-code of Q-learning is shown in Algorithm 1.
p r o b ( a k ) = exp ( Q ( s , a k ) ) k exp ( Q ( s , a k ) ) ,
where a k represents the k-th action. The exponential function for Q-values is to avoid the denominator being 0.
Algorithm 1 Pseudo-code of Q-learning.
  1: Initialize the Q-table
  2: Initialize the initial state s 0
  3: repeat
  4:       if  r a n d < ε  then
  5:             Select an action a t among the set of all actions A at random
  6:       else
  7:             Select an action a t that satisfies max a t A Q ( s t + 1 , a )
  8:       end if
  9:       Take action a t , observe the reward R t and state S t
10:       Update Q-values according to Equation (7)
11: until Termination condition satisfied
The QSA-DJAYA algorithm integrates the abovementioned basic idea of the Q-learning algorithm into the discrete JAYA algorithm, obtaining a discrete JAYA algorithm based on Q-learning. Specifically, the six operators—swap, shift, symmetry, insertion, reversion, and 2-opt—are used as each action and each state, and the most appropriate updating operator is selected through the Q-learning algorithm in the updating process of each solution. A Q-table of the QSA-DJAYA algorithm is shown in Table 1.
In terms of operator selection, although the 3-opt operator has high efficiency, the time cost of applying this operator is too high for the large-scale TSP. So, the 3-opt updating operator is only applied to the current global optimal solution at a certain frequency. The six operators used in the current link are swap, insertion, reversion, shift, symmetry, and 2-opt. Brief descriptions of these operators are described below.
Swap: Two positions are randomly selected from the current route A (marked in green), and then, the elements at these two positions are exchanged. The transformed route is B. An example of a swap operation is shown diagrammatically in Figure 2.
Insertion: Two positions are randomly selected from the current route A (marked in green), and the element at the first position is inserted after the second element. An example of insertion operation is shown diagrammatically in Figure 3.
Reversion: Two positions are randomly selected from the current route A (marked in green), and then, the elements between these two positions are arranged in reverse order. An example of reversion operation is shown diagrammatically in Figure 4.
Shift: Two positions are randomly selected from the current route A, namely p o s 1 and p o s 2 . Then, the element at p o s 1 (marked in green) is stored, and the first element on the right side of the element at p o s 1 to the element at p o s 2 (marked in orange) is moved one position to the left in the original order. Finally, the element at p o s 1 is placed at p o s 2 . An example of shift operation is shown diagrammatically in Figure 5.
Symmetry: According to the length of the current route A, a reasonable length of single transformation segment L is randomly selected, and then, the starting point of transformation is randomly selected on the premise that the route is reasonable. Two consecutive segments of length L are randomly selected on A (marked in green and orange, respectively), and the position of the two segments is switched first, and then, the sequence reversal is carried out on the two segments after the exchange. An example of symmetry operation is shown diagrammatically in Figure 6.
2-opt: 2-opt is a local search algorithm that basically operates by randomly selecting two arcs from a route and swapping them if this results in a new shorter route. All possible 2-opt transformations for a route are tried, and then, the best 2-opt transformation is chosen. An example of a 2-opt operation is shown diagrammatically in Figure 7, the segment that need to be reversed are marked in green.
After updating the operation with the selected operator, the corresponding reward is set according to the degree of improvement in the solution’s fitness. The reward is calculated as Equation (9).
r e w a r d = max ( 1 f i t n e s s n e w f i t n e s s o l d , 0 ) ,
where f i t n e s s n e w represents the corresponding fitness of the new solution obtained after updating the solution with the operator bound to the action. f i t n e s s o l d represents the original fitness of the solution before the implementation of the action. According to the above equation, the reward must be a number not less than 0. And, the better the improvement effect of the updating operator, the greater the reward.
Therefore, unlike [27], the QSA-DJAYA algorithm is inspired by the advanced idea of reinforcement learning in terms of solution updating and uses the Q-learning algorithm to dynamically select operators with superior performance to update the solution, making the updating mechanism more reasonable.

3.2. Acceptance Strategy Based on SA

The SA is a stochastic optimization algorithm based on the Monte Carlo iterative strategy. It is inspired by the annealing process of the solid matter in physics. SA is actually a greedy algorithm, but its search process uses the Metropolis acceptance criterion. That is, it accepts a solution worse than the current solution with a certain probability. Therefore, it is possible to jump out of the local optimal solution to find the global optimal solution.
After evaluating the fitness of the new solution, the QSA-DJAYA algorithm also introduces the Metropolis acceptance criterion to judge whether to accept the new solution, and the acceptance probability is calculated as Equation (10).
p = 1 , if f ( x n e w ) < f ( x o l d ) , exp ( f ( x n e w ) f ( x o l d ) T ) , if f ( x n e w ) f ( x o l d ) ,
where f ( x n e w ) denotes the fitness of the new solution and f ( x o l d ) denotes the fitness of the current solution. T denotes the current temperature, T = r a t e T 0 , r a t e ( 0 , 1 ) , T 0 is the initial temperature. Therefore, according to the Metropolis criterion, if f ( x n e w ) is less than f ( x o l d ) , the current solution is updated as a new solution with probability 1, that is, if the new solution is found to be better, the new solution is accepted. If f ( x n e w ) is greater than or equal to f ( x o l d ) , then the probability p needs to be calculated according to Equation (10). When the random number r ( 0 , 1 ) is less than p, the current solution will be updated to the new solution.

3.3. The Proposed QSA-DJAYA Algorithm

As shown in Figure 8, the QSA-DJAYA algorithm starts by generating the initial population containing N-1 individuals constructed via random permutation and 1 individual constructed via the nearest neighbor algorithm. At each iteration, the best and worst solutions in the current population, B e s t and W o r s t , are updated. According to the guidance of S T 1 and S T 2 , a solution is selected as the current solution for the updating operation from B e s t , W o r s t and x k . Then, the ε -greedy strategy is followed, and either the transformation operator is randomly chosen or the most promising transformation operator is chosen. When the new solution is generated and its fitness value is evaluated, the acceptance of the new solution is judged according to the Metropolis acceptance criterion. When the reward R is calculated and the Q-table is updated, the current best solution is further improved by 3-opt at a certain frequency. Finally, it stops when the termination condition is satisfied.

4. Experimental Results

In order to evaluate the performance of the designed QSA-DJAYA algorithm, we conducted two groups of comparative experiments. In experiment 1, we compared the proposed algorithm with four other representative algorithms, viz., DJAYA, GA, ACO, and SA, developed by ourselves to verify the excellence of the QSA-DJAYA algorithm’s framework. In addition, we conducted a comparison with eight efficient methods from the literature, which include ACO, PSO, GA, BH, ABC, the Hierarchic Approach (HA), DTSA, and DJAYA in experiment 2. This section details the experimental settings, experimental results, and comparative studies.

4.1. Experimental Settings

All experiments with the algorithms developed by us were run on an Intel(R)Core(TM)i9-10900K 3.70GHz desktop with 64.0GB of memory. All codes were implemented in MATLAB. The 21 test instances we selected were all the benchmark TSP datasets obtained from TSPLIB [29]. The instances are listed in Table 2. In Table 2, the name of the instance, the scale of the problem, and the best known solution are listed.
In the comparative experiment 1, the algorithm was run 30 times for each instance. While in experiment 2, the number of runs was consistent with the experimental settings of the literature compared for a fair comparison. Over these test replicates, the shortest tour length obtained from each run and the computational time to obtain the shortest tour length were recorded. For each algorithm, we sorted out the best value, the worst value, the average value, the standard deviation, and the G a p value obtained in these replicates of each instance through the experimental results. The G a p value is a percentage and is calculated as shown in Equation (11).
G a p ( % ) = A v e r a g e B K S B K S × 100 ,
where A v e r a g e is the average of the shortest tour length, and B K S is the currently best known solution of the instance. So, the optimization performance of each algorithm can be evaluated using this index.

4.2. Parameter Tuning

The values of S T 1 and S T 2 were the same as those in [27], viz., S T 1 = S T 2 = 0.5 . And, although 3-opt is an efficient operator, for computation time reasons, the frequency of applying 3-opt μ was set to 100 in all experiments. Except for these two parameters, there are five parameters in the QSA-DJAYA algorithm that needed to be adjusted. In order to determine these parameters, the instance kroC100 was chosen as a test instance to carry out parameter tuning experiments. For each parameter tuning experiment, the algorithm stopped when the maximum number of function evaluations (MaxF) reached 300,000.
The first parameter tuning experiment was performed for the population size p o p s i z e . The p o p s i z e was set to 10∼100, the learning factor in reinforcement learning was α = 0.9 , the discount factor in reinforcement learning was γ = 0.8 , the probability of escaping from the local optimal in reinforcement learning was ε = 0.1 , and the initial temperature in SA was T 0 = 0.045 . The row corresponding to p o p s i z e in Table 3 shows the experimental results of the QSA-DJAYA algorithm on the instance kroC100 under different p o p s i z e . It can be seen that when the p o p s i z e was set to 10, the average value of the shortest tour length obtained over the 10 runs was the smallest, so p o p s i z e was determined to be 10 in subsequent experiments.
The second parameter tuning experiment was carried out for α , and α was set to 0.1∼0.9. According to the first parameter tuning experiment, the p o p s i z e in this experiment was set to 10. The row corresponding to α in Table 3 shows the experimental results of the QSA-DJAYA algorithm on the instance kroC100 under different α . The results show that the result was best when α was set to 0.8.
The third parameter tuning experiment was carried out for γ , and γ was set to 0.1∼0,9. According to the first and second experiments, the p o p s i z e in this experiment was set to 10 and the α was set to 0.8. The row corresponding to γ in Table 3 shows the experimental results of the QSA-DJAYA algorithm on the instance kroC100 under different γ . The results show that γ should be set to 0.8.
The fourth is carried out for ε , and ε was set to 0.1∼0.5. According to the previous three parameter tuning experiments, the p o p s i z e in this experiment was set to 10, the α was set to 0.8, and the γ was set to 0.8. The row corresponding to ε in Table 3 shows the experimental results of the QSA-DJAYA algorithm on the instance kroC100 under different ε . The results show that ε should be set to 0.1.
The last parameter tuning experiment was carried out for the T 0 , and T 0 was set to 0.01∼0.05. According to the previous parameter tuning experiments, the p o p s i z e in this experiment was set to 10, the α was set to 0.8, the γ was set to 0.8, and the ε was set to 0.1. The row corresponding to T 0 in Table 3 shows the experimental results of the QSA-DJAYA algorithm on the instance kroC100 under different T 0 . According to the experimental results, T 0 should be set to 0.05 to optimize the performance of the QSA-DJAYA algorithm.
In summary, the parameter in the following experiments were set as population size p o p s i z e = 10 , learning factor in reinforcement learning was α = 0.8 , discount factor in reinforcement learning was γ = 0.8 , escape local optimal probability in reinforcement learning was ε = 0.1 , initial temperature in SA was T 0 = 0.05 .

4.3. Experimental Results and Statistical Analysis

In this section, we first show the results of experiments 1 and 2. Then, the comparative analysis carried out according to these experimental results and the significance of the obtained results verified using the non-parametric Friedman statistical test and non-parametric Mann–Whitney test are presented. It should be noted that the DJAYA algorithm in experiment 1 is different from that in [27] because it is embedded with the same six operators as those in the QSA-DJAYA algorithm in order to avoid the loss of fairness caused by different operator efficiencies.

4.3.1. Results of Experiment 1

According to the experimental scheme determined in Section 4.1 and the parameter settings determined in Section 4.2, we conducted experiment 1 on a set of TSP benchmark instances containing a total of 20 instances. On the basis of the size of the instance, the instances were divided into two sets. The first set of small-scale instances included six instances with the number of cities ranging from 17 to 52. The second set of large-scale instances included fourteen examples with the number of cities ranging from 70 to 225. Due to the difference in scale, the stopping criterion was the MaxF, which was set to N × 500 . N is the number of cities involved in the instance. The basic experimental results of small-scale and large-scale instances are shown in Table 4 and Table 5, respectively.
In Table 4 and Table 5, the Best, Worst, Median, Average, and Std columns represent the best value, worst value, median value, average value, and standard deviation of the shortest tour length over 30 independent runs on each instance, respectively. For the Best, Worst, Median and Average columns, the smaller the value is, the stronger the search ability of the algorithm is. For the Std column, the smaller the value is, the higher the reliability and stability of the algorithm is. The calculation of each value in the Gap column follows Equation (11). Since the currently best known solution of most instances is already the optimal solution of the instance, a small value for the Gap value indicates that the efficiency of the algorithm is high. The best shortest tour length for the same instance in the Average and Median columns is shown in bold. In Table 4, except for the instance eil51, the QSA-DJAYA algorithm shows the best performance for the median and average. The DJAYA algorithm performs optimally on the instances eil51, which confirms the effectiveness of the compared algorithms. In Table 5, in terms of the median and average, the proposed algorithm also performs best on all large-scale instances except pr124. For instance pr124, the DJAYA algorithm obtains the best median and average value but does not find the best known solution, while the QSA-DJAYA algorithm finds it. Although the QSA-DJAYA algorithm does not achieve the minimum standard deviation on all instances, the standard deviation of it on seven instances is all minimal. Thus, the stability is better than the compared algorithms. In addition, it can be noted that the calculation time of both the QSA-DJAYA algorithm and the DJAYA algorithm is longer than that of the other algorithms due to the time-consuming calculation of the operators embedded in both algorithms. However, under the premise of the same operator, it can be observed that the calculation time of the QSA-DJAYA algorithm is much less than that of the DJAYA algorithm. Moreover, it can be seen that under the premise of fair comparison, the QSA-DJAYA algorithm obtains the currently best known solution on nine instances. For one small-scale numerical instance and three large-scale numerical instances, the graphical presentations of tours corresponding to the best solutions are presented in Figure 9.
In summary, Table 4 and Table 5 show that the QSA-DJAYA algorithm outperforms all the compared algorithms. However, it is worth mentioning that some compared algorithms such as GA and SA consumed less computational time under the same MaxF, although they achieved poor-quality solutions. The comprehensive consideration of solution quality and computational time is important to evaluate the performance of algorithms. Therefore, additional subexperiments of experiment 1 were performed with the same execution time. Detailed execution times and experimental results are provided in Table 6. From Table 6, it is clearly seen that QSA-DJAYA can achieve higher-quality solutions with the same execution time. This comparative experiment once again confirms the superiority of the proposed QSA-DJAYA in terms of solution quality and efficiency.
Further, to observe the difference in the results obtained by the above five algorithms on the same problem more intuitively, the boxplots were drawn, as shown in Figure 10. The selected comparison data are the Gap values obtained via each algorithm on each instance, and the red plus signs in Figure 10 represent several outliers. This is beneficial to compare the stability of the search abilities of each algorithm. It should be noted that the experimental results of GA are very poor compared with those of the other four algorithms, so only the other four algorithms are compared here. Through the distribution and range of Gap values presented in the box graphs, it is proved that QSA-DJAYA is better than GA, ACO, SA, and DJAYA.

4.3.2. Results of Experiment 2

In experiment 2, a comparison of our proposed method with eight efficient methods from literature was performed. Specifically, the performance of QSA-DJAYA was compared with those of ACO, PSO, GA, and BH in the first subexperiment. The experimental results of ACO, PSO, GA, and BH on seven instances were taken from [30]. In the second subexperiment, we compared QSA-DJAYA with the methods proposed in some studies [27,28,41]. The experimental results of ACO, ABC, and HA were taken from [41], those of DTSA were taken from [28], and those of DJAYA were taken from [27]. For providing a fair comparison, the experimental scheme of experiment 2 was the same as that of the compared methods. Compared results of the first and the second subexperiment are depicted in Table 7 and Table 8.
As seen from Table 7, QSA-DJAYA performed optimally on all seven instances as far as the average is concerned. Additionally, Table 8 reveals that QSA-DJAYA also achieved a shorter route length in all instances, shown in bold. The obtained results from QSA-DJAYA for all numerical instances are satisfactory in terms of the best, worst, and average values. However, we cannot compare the median values of each algorithm because the relevant results are not provided in the literature.

4.3.3. Results of Statistical Tests

In order to compare the algorithms’ performance of experiments 1 and 2 statistically, the non-parametric Friedman statistical test is applied. In experiment 1, for the average tour length obtained using each algorithm over 30 runs, the p values of the first and the second subexperiments are 1.9810 × 10 14 and 6.7936 × 10 6 , respectively. In experiment 2, the p value of the first subexperiment is 9.2960 × 10 4 , and the p value of the second subexperiment is 6.0690 × 10 7 . Therefore, the statistical test results returned by the Friedman statistical test confirm that there are significant differences between the experimental results obtained using the competing algorithms. Meanwhile, statistical significance testing was also performed via the non-parametric Mann–Whitney U test. The null hypothesis was that there was no significant difference between the A v e r a g e values of the two algorithms on the instances participating in the comparison with a 95 % confidence level. The results of the Mann–Whitney U test of all comparative experiments based on the A v e r a g e values with a 95 % confidence level are summarized in Table 9. Although the QSA-DJAYA is not significantly better than some algorithms at the 95 % confidence level, it outperforms all competing methods derived from the literature in terms of solution quality.

5. Conclusions and Future Work

In this paper, we proposed an improved discrete JAYA algorithm based on reinforcement learning and SA (QSA-DJAYA) to solve the TSP. The QSA-DJAYA algorithm has been mainly modified in two aspects. On the one hand, the basic Q-learning algorithm in reinforcement learning was introduced to choose the transformation operator when the solution needs to be updated. On the other hand, the SA was introduced in the solution acceptance criterion. The core was to accept poor solutions with a certain probability to balance the exploration and exploitation capabilities of the algorithm. The performance of the QSA-DJAYA algorithm was tested on 21 widely used benchmark instances in the TSPLIB. The comparison results show that the QSA-DJAYA algorithm has significant competitiveness.
Our possible future work is to analyze the applicability of the QSA-DJAYA algorithm to other routing problems, especially variants of the TSP. We may improve the QSA-DJAYA algorithm by combining advanced ideas in reinforcement learning and transfer learning to solve more complex vehicle routing problems in practical applications.

Author Contributions

Conceptualization, J.X., W.H., W.G. and Y.Y.; Methodology, J.X., W.H., W.G. and Y.Y.; Validation, J.X., W.H., W.G. and Y.Y.; Investigation, J.X., W.H., W.G. and Y.Y.; Writing—review & editing, J.X., W.H., W.G. and Y.Y.; Supervision, W.H., W.G. and Y.Y.; Funding acquisition, W.H., W.G. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Fundamental Research Funds for the Central Universities under Grant 2023JBMC042, the Double First-class Talent Introduction Project of China under Grant No. 505022102 (505021149) and the National Natural Science Foundation of China under Grants 62173027 and 72288101.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Dantzig, G.B.; Ramser, J.H. The Truck Dispatching Problem. Manag. Sci. 1959, 6, 80–91. [Google Scholar] [CrossRef]
  2. Saji, Y.; Barkatou, M. A discrete bat algorithm based on Lévy flights for Euclidean traveling salesman problem. Expert Syst. Appl. 2021, 172, 114639. [Google Scholar] [CrossRef]
  3. Arora, S. Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems. J. ACM 1998, 45, 753–782. [Google Scholar] [CrossRef]
  4. Laporte, G. The traveling salesman problem: An overview of exact and approximate algorithms. Eur. J. Oper. Res. 1992, 59, 231–247. [Google Scholar] [CrossRef]
  5. Potvin, J.Y. Genetic algorithms for the traveling salesman problem. Ann. Oper. Res. 1996, 63, 337–370. [Google Scholar] [CrossRef]
  6. Zhang, Z.; Yang, J. A discrete cuckoo search algorithm for traveling salesman problem and its application in cutting path optimization. Comput. Ind. Eng. 2022, 169, 108157. [Google Scholar] [CrossRef]
  7. Yang, W.; Pei, Z. Hybrid ABC/PSO to solve travelling salesman problem. Int. J. Comput. Sci. Math. 2013, 4, 214–221. [Google Scholar] [CrossRef]
  8. Mahi, M.; Baykan, Ö.K.; Kodaz, H. A new hybrid method based on Particle Swarm Optimization, Ant Colony Optimization and 3-Opt algorithms for Traveling Salesman Problem. Appl. Soft Comput. 2015, 30, 484–490. [Google Scholar] [CrossRef]
  9. Yang, Z.; Xiao, M.Q.; Ge, Y.W.; Feng, D.L.; Zhang, L.; Song, H.F.; Tang, X.L. A double-loop hybrid algorithm for the traveling salesman problem with arbitrary neighbourhoods. Eur. J. Oper. Res. 2018, 265, 65–80. [Google Scholar] [CrossRef]
  10. Geng, X.; Chen, Z.; Yang, W.; Shi, D.; Zhao, K. Solving the traveling salesman problem based on an adaptive simulated annealing algorithm with greedy search. Appl. Soft Comput. 2011, 11, 3680–3689. [Google Scholar] [CrossRef]
  11. Ebadinezhad, S. DEACO: Adopting dynamic evaporation strategy to enhance ACO algorithm for the traveling salesman problem. Eng. Appl. Artif. Intell. 2020, 92, 103649. [Google Scholar] [CrossRef]
  12. Gülcü, Ş.; Mahi, M.; Baykan, Ö.K.; Kodaz, H. A parallel cooperative hybrid method based on ant colony optimization and 3-Opt algorithm for solving traveling salesman problem. Soft Comput. 2018, 22, 1669–1685. [Google Scholar] [CrossRef]
  13. Zhang, Z.; Xu, Z.; Luan, S.; Li, X.; Sun, Y. Opposition-based ant colony optimization algorithm for the traveling salesman problem. Mathematics 2020, 8, 1650. [Google Scholar] [CrossRef]
  14. Shahadat, A.S.B.; Akhand, M.; Kamal, M.A.S. Visibility Adaptation in Ant Colony Optimization for Solving Traveling Salesman Problem. Mathematics 2022, 10, 2448. [Google Scholar] [CrossRef]
  15. Dong, Y.; Wu, Q.; Wen, J. An improved shuffled frog-leaping algorithm for the minmax multiple traveling salesman problem. Neural Comput. Appl. 2021, 33, 17057–17069. [Google Scholar] [CrossRef]
  16. Zhong, Y.; Lin, J.; Wang, L.; Zhang, H. Hybrid discrete artificial bee colony algorithm with threshold acceptance criterion for traveling salesman problem. Inf. Sci. 2017, 421, 70–84. [Google Scholar] [CrossRef]
  17. Choong, S.S.; Wong, L.P.; Lim, C.P. An artificial bee colony algorithm with a Modified Choice Function for the traveling salesman problem. Swarm Evol. Comput. 2019, 44, 622–635. [Google Scholar] [CrossRef]
  18. Khan, I.; Maiti, M.K. A swap sequence based Artificial Bee Colony algorithm for Traveling Salesman Problem. Swarm Evol. Comput. 2019, 44, 428–438. [Google Scholar] [CrossRef]
  19. Karaboga, D.; Gorkemli, B. Solving Traveling Salesman Problem by Using Combinatorial Artificial Bee Colony Algorithms. Int. J. Artif. Intell. Tools 2019, 28, 1950004. [Google Scholar] [CrossRef]
  20. Pandiri, V.; Singh, A. A hyper-heuristic based artificial bee colony algorithm for k-Interconnected multi-depot multi-traveling salesman problem. Inf. Sci. 2018, 463-464, 261–281. [Google Scholar] [CrossRef]
  21. Venkata Rao, R. Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput. 2016, 7, 19–34. [Google Scholar] [CrossRef]
  22. Li, J.Q.; Deng, J.W.; Li, C.Y.; Han, Y.Y.; Tian, J.; Zhang, B.; Wang, C.G. An improved Jaya algorithm for solving the flexible job shop scheduling problem with transportation and setup times. Knowl.-Based Syst. 2020, 200, 106032. [Google Scholar] [CrossRef]
  23. Thirumoorthy, K.; Muneeswaran, K. A hybrid approach for text document clustering using Jaya optimization algorithm. Expert Syst. Appl. 2021, 178, 115040. [Google Scholar] [CrossRef]
  24. Xiong, G.; Zhang, J.; Shi, D.; Zhu, L.; Yuan, X. Optimal identification of solid oxide fuel cell parameters using a competitive hybrid differential evolution and Jaya algorithm. Int. J. Hydrogen Energy 2021, 46, 6720–6733. [Google Scholar] [CrossRef]
  25. Chaudhuri, A.; Sahu, T.P. A hybrid feature selection method based on Binary Jaya algorithm for micro-array data classification. Comput. Electr. Eng. 2021, 90, 106963. [Google Scholar] [CrossRef]
  26. Chong, K.L.; Lai, S.H.; Ahmed, A.N.; Wan Jaafar, W.Z.; El-Shafie, A. Optimization of hydropower reservoir operation based on hedging policy using Jaya algorithm. Appl. Soft Comput. 2021, 106, 107325. [Google Scholar] [CrossRef]
  27. Gunduz, M.; Aslan, M. DJAYA: A discrete Jaya algorithm for solving traveling salesman problem. Appl. Soft Comput. 2021, 105, 107275. [Google Scholar] [CrossRef]
  28. Cinar, A.C.; Korkmaz, S.; Kiran, M.S. A discrete tree-seed algorithm for solving symmetric traveling salesman problem. Eng. Sci. Technol. Int. J. 2020, 23, 879–890. [Google Scholar] [CrossRef]
  29. Reinelt, G. TSPLIB—A Traveling Salesman Problem Library. ORSA J. Comput. 1991, 3, 376–384. [Google Scholar] [CrossRef]
  30. Hatamlou, A. Solving travelling salesman problem using black hole algorithm. Soft Comput. 2018, 22, 8167–8175. [Google Scholar] [CrossRef]
  31. Zhang, Z.; Han, Y. Discrete sparrow search algorithm for symmetric traveling salesman problem. Appl. Soft Comput. 2022, 118, 108469. [Google Scholar] [CrossRef]
  32. Zheng, J.; Hong, Y.; Xu, W.; Li, W.; Chen, Y. An effective iterated two-stage heuristic algorithm for the multiple Traveling Salesmen Problem. Comput. Oper. Res. 2022, 143, 105772. [Google Scholar] [CrossRef]
  33. Liu, Y.; Xu, L.; Han, Y.; Zeng, X.; Yen, G.G.; Ishibuchi, H. Evolutionary Multimodal Multiobjective Optimization for Traveling Salesman Problems. IEEE Trans. Evol. Comput. 2023. [Google Scholar] [CrossRef]
  34. Tsai, C.H.; Lin, Y.D.; Yang, C.H.; Wang, C.K.; Chiang, L.C.; Chiang, P.J. A Biogeography-Based Optimization with a Greedy Randomized Adaptive Search Procedure and the 2-Opt Algorithm for the Traveling Salesman Problem. Sustainability 2023, 15, 5111. [Google Scholar] [CrossRef]
  35. Baraglia, R.; Hidalgo, J.; Perego, R. A hybrid heuristic for the traveling salesman problem. IEEE Trans. Evol. Comput. 2001, 5, 613–622. [Google Scholar] [CrossRef]
  36. Aslan, M.; Gunduz, M.; Kiran, M.S. JayaX: Jaya algorithm with xor operator for binary optimization. Appl. Soft Comput. 2019, 82, 105576. [Google Scholar] [CrossRef]
  37. Rao, R.; More, K. Design optimization and analysis of selected thermal devices using self-adaptive Jaya algorithm. Energy Convers. Manag. 2017, 140, 24–35. [Google Scholar] [CrossRef]
  38. Pradhan, C.; Bhende, C.N. Online load frequency control in wind integrated power systems using modified Jaya optimization. Eng. Appl. Artif. Intell. 2019, 77, 212–228. [Google Scholar] [CrossRef]
  39. Wang, L.; Zhang, Z.; Huang, C.; Tsui, K.L. A GPU-accelerated parallel Jaya algorithm for efficiently estimating Li-ion battery model parameters. Appl. Soft Comput. 2018, 65, 12–20. [Google Scholar] [CrossRef]
  40. Mazyavkina, N.; Sviridov, S.; Ivanov, S.; Burnaev, E. Reinforcement learning for combinatorial optimization: A survey. Comput. Oper. Res. 2021, 134, 105400. [Google Scholar] [CrossRef]
  41. Gündüz, M.; Kiran, M.S.; Özceylan, E. A hierarchic approach based on swarm intelligence to solve the traveling salesman problem. Turk. J. Electr. Eng. Comput. Sci. 2015, 23, 103–117. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the basic JAYA algorithm.
Figure 1. Flowchart of the basic JAYA algorithm.
Mathematics 11 03221 g001
Figure 2. An example of swap transformation for TSP.
Figure 2. An example of swap transformation for TSP.
Mathematics 11 03221 g002
Figure 3. An example of insertion transformation for TSP.
Figure 3. An example of insertion transformation for TSP.
Mathematics 11 03221 g003
Figure 4. An example of reversion transformation for TSP.
Figure 4. An example of reversion transformation for TSP.
Mathematics 11 03221 g004
Figure 5. An example of shift transformation for TSP.
Figure 5. An example of shift transformation for TSP.
Mathematics 11 03221 g005
Figure 6. An example of symmetry transformation for TSP.
Figure 6. An example of symmetry transformation for TSP.
Mathematics 11 03221 g006
Figure 7. An example of 2-opt transformation for TSP.
Figure 7. An example of 2-opt transformation for TSP.
Mathematics 11 03221 g007
Figure 8. Flowchart of the QSA-DJAYA algorithm.
Figure 8. Flowchart of the QSA-DJAYA algorithm.
Mathematics 11 03221 g008
Figure 9. Tours correspondingto the best solutions found by QSA-DJAYA.
Figure 9. Tours correspondingto the best solutions found by QSA-DJAYA.
Mathematics 11 03221 g009
Figure 10. Boxplots of Gap values for the four algorithms. (a) Boxplot for the first subexperiment of experiment 1. (b) Boxplot for the second subexperiment of experiment 1.
Figure 10. Boxplots of Gap values for the four algorithms. (a) Boxplot for the first subexperiment of experiment 1. (b) Boxplot for the second subexperiment of experiment 1.
Mathematics 11 03221 g010
Table 1. Q-table design of QSA-DJAYA algorithm.
Table 1. Q-table design of QSA-DJAYA algorithm.
Action 1Action 2Action 3Action 4Action 5Action 6
State 1Q(1,1)Q(1,2)Q(1,3)Q(1,4)Q(1,5)Q(1,6)
State 2Q(2,1)Q(2,2)Q(2,3)Q(2,4)Q(2,5)Q(2,6)
State 3Q(3,1)Q(3,2)Q(3,3)Q(3,4)Q(3,5)Q(3,6)
State 4Q(4,1)Q(4,2)Q(4,3)Q(4,4)Q(4,5)Q(4,6)
State 5Q(5,1)Q(5,2)Q(5,3)Q(5,4)Q(5,5)Q(5,6)
State 6Q(6,1)Q(6,2)Q(6,3)Q(6,4)Q(6,5)Q(6,6)
Table 2. List of experimental instances.
Table 2. List of experimental instances.
NumberNameNBKS
1gr17172085
2bayg29291610
3bays29292020
4oliver3030420
5swiss42421273
6eil5151426
7berlin52527542
8st7070675
9pr7676108,159
10eil7676538
11rat99991211
12kroA10010021,282
13kroB10010022,141
14kroC10010020,749
15kroD10010021,294
16kroE10010022,068
17eil101101629
18lin10510514,379
19pr12412459,030
20ch1501506528
21tsp2252253919
Table 3. Results of parameter tuning.
Table 3. Results of parameter tuning.
ParameterValueAverageStdGap (%)
p o p s i z e 1021,012.30138.551.27
2021,066.90175.611.53
3021,198.1039.092.16
4021,161.2065.401.99
5021,187.6013.802.11
6021,183.000.002.09
7021,183.000.002.09
8021,183.000.002.09
9021,183.000.002.09
10021,183.000.002.09
α 0.120,933.00114.300.89
0.221,033.90128.931.37
0.321,008.5089.241.25
0.420,921.70163.110.83
0.520,971.50159.761.07
0.620,999.10173.701.21
0.720,926.90150.250.86
0.820,902.70118.090.74
0.921,012.30138.551.27
γ 0.121,036.20140.781.38
0.221,000.00164.051.21
0.320,960.80154.051.02
0.420,910.50141.260.78
0.521,006.80180.071.24
0.621,036.50131.141.39
0.720,984.80224.061.14
0.820,902.70118.090.74
0.921,060.60170.801.50
ε 0.120,902.70118.090.74
0.221,095.50157.971.67
0.320,998.60164.421.20
0.420,950.20195.060.97
0.520,940.00181.610.92
T 0 0.0121,179.3022.032.07
0.0220,995.40129.591.19
0.0321,073.80217.491.57
0.0421,028.60227.021.35
0.0520,925.1098.580.85
Table 4. Results of five representative compared algorithms on small-scale instances.
Table 4. Results of five representative compared algorithms on small-scale instances.
NameAlgorithmBestWorstMedianAverageStdGap (%)Time (s)
gr17GA223824892376.02377.5360.5214.030.24
ACO208521492115.02114.6723.301.420.77
SA208520902087.52087.502.500.120.09
DJAYA208520882085.02085.701.270.030.80
QSA-DJAYA208520852085.02085.000.000.000.50
bayg29GA237826982531.02539.0370.3057.700.50
ACO163816721661.51657.836.582.973.90
SA161016831622.01626.4719.641.020.15
DJAYA161516741634.01637.7315.691.722.68
QSA-DJAYA161016461610.01618.8312.320.552.08
bays29GA285833603213.53175.40134.2657.200.49
ACO202020622020.02024.609.440.233.79
SA202020822033.02038.4319.580.910.15
DJAYA202620352026.02028.703.570.432.69
QSA-DJAYA202020342026.02028.403.680.422.09
swiss42GA238027102593.02587.6384.98103.270.87
ACO128713031299.01297.933.531.966.67
SA127313911335.51329.8735.514.470.23
DJAYA127312741273.01273.230.420.026.93
QSA-DJAYA127312731273.01273.000.000.006.12
eil51GA881984921.0921.7022.17116.361.22
ACO437457447.0447.374.985.0211.26
SA428454441.0440.976.363.510.29
DJAYA428438432.0432.303.681.4812.26
QSA-DJAYA426434434.0432.732.381.5810.92
berlin52GA137051636715563.015550.57505.02106.191.27
ACO766277677679.07683.5324.271.8811.84
SA754283177988.57954.83190.785.470.29
DJAYA754277117657.07641.2741.951.3212.68
QSA-DJAYA754277987542.07557.3357.190.2011.74
Table 5. Results of five representative compared algorithms on large-scale instances.
Table 5. Results of five representative compared algorithms on large-scale instances.
NameAlgorithmBestWorstMedianAverageStdGap (%)Time (s)
st70GA188820932005.51994.0751.53195.422.01
ACO708734718.0718.836.376.4925.68
SA684744706.5709.0015.555.040.41
DJAYA687714703.0701.908.873.9939.36
QSA-DJAYA684703684.0686.804.851.7529.46
pr76GA306,770329,482322,696.5320,291.606367.43196.132.36
ACO115,846121,443118,745.5118,676.931172.139.7230.59
SA109,872120,095113,747.5114,336.932762.895.710.46
DJAYA109,190110,684110,684.0110,564.07383.172.2257.08
QSA-DJAYA108,159110,858109,653.0109,530.73907.211.2738.73
eil76GA133514981425.01420.4737.08164.032.32
ACO558568565.0564.572.084.9432.23
SA553585567.0567.438.535.470.47
DJAYA553563558.0558.902.943.8856.18
QSA-DJAYA540553551.0550.602.232.3438.34
rat99GA412046334474.04467.73113.72268.933.74
ACO128713371313.01312.1711.308.3561.79
SA125713491307.51307.3325.287.950.65
DJAYA125312571256.01255.830.693.70182.15
QSA-DJAYA123012561253.01253.204.523.4895.56
kroA100GA83,91993,13390,348.589,856.232208.86322.223.74
ACO22,42823,31822,748.522,760.13238.536.9567.86
SA21,82923,59522,495.022,610.57513.416.240.65
DJAYA21,31921,57821,514.021,498.9743.241.02191.85
QSA-DJAYA21,2922138921,292.021,295.3717.400.0697.56
kroB100GA83,25792,13889,154.588,831.472269.08301.213.81
ACO22,90123,44623,256.523,254.00114.495.0367.49
SA22,64724,31023,691.523,670.10402.666.910.65
DJAYA22,25823,16222,762.022,739.27154.602.70190.28
QSA-DJAYA22,22022,72422,708.022,635.70130.592.2396.98
kroC100GA83,94894,56488,964.089,056.272414.13329.213.84
ACO21,51121,77521,680.021,661.6768.924.4067.93
SA21,44924,22122,067.522,282.13593.607.390.66
DJAYA21,18521,30921,206.021,212.3729.332.23188.90
QSA-DJAYA20,96521,33121,183.021,180.8048.062.0897.18
kroD100GA816808945986842.086690.601873.05307.113.80
ACO22,57223,36023,020.023,007.33159.538.0568.44
SA21,82224,07622,701.022,741.30523.696.800.67
DJAYA21,62022,86322,001.022,060.23299.543.60187.00
QSA-DJAYA2149521,89621,575.021,583.5779.891.3697.08
kroE100GA85,06193,91290,908.590,557.901932.14310.363.83
ACO23,19623,87723,667.023,661.23142.127.2268.02
SA22,71224,13823,354.023,364.07360.655.870.65
DJAYA22,50922,67922,547.022,562.1049.782.24187.80
QSA-DJAYA22,13022,47522,466.022,429.5374.211.6494.22
eil101GA187220211963.01958.7042.85211.403.88
ACO677705693.0693.706.2410.2966.95
SA647689667.0666.7310.506.000.67
DJAYA642664650.0650.275.523.38190.78
QSA-DJAYA630646635.0635.273.021.0099.91
lin105GA58,39167,94363,848.063,722.031998.94343.164.18
ACO14,90215,15015,054.015,050.5071.474.6787.30
SA14,76716,06115,484.515,461.77347.257.530.71
DJAYA14,57615,07114,877.514,856.93118.663.32233.07
QSA-DJAYA1437914,66014,438.014,451.4384.120.50115.69
pr124GA339,211373,718362,675.0360,680.938077.14511.015.63
ACO60,59063,29761,714.561,795.57570.394.69114.72
SA60,22069,85262,575.062,844.472128.646.460.87
DJAYA59,24659,79259,246.059,350.17194.250.54510.69
QSA-DJAYA59,03059,79259,548.059,454.43270.900.72207.90
ch150GA30,08332,15031,169.531,084.30479.35376.178.46
ACO675868506824.06824.6019.994.54210.20
SA686275337241.57223.60180.0310.661.16
DJAYA659866336629.06625.079.381.491259.00
QSA-DJAYA656666246598.06596.7310.441.05372.08
tsp225GA22,76324,18423,645.023,618.03368.21502.6519.53
ACO422543804293.54291.0737.709.49658.87
SA431345424388.04412.0058.1012.582.10
DJAYA403840694056.04054.808.113.4711,326.09
QSA-DJAYA399440594012.04013.7711.852.421715.49
Table 6. Results of five representative compared algorithms on large-scale instances.
Table 6. Results of five representative compared algorithms on large-scale instances.
NameAlgorithmBestWorstMedianAverageStdGap (%)Time (s)
swiss42GA235026322515.02514.8074.4297.552.00
ACO128713031299.01298.073.531.972.00
SA127313981332.51332.0734.514.642.00
DJAYA127312931281.01281.076.790.632.00
QSA-DJAYA127312741273.01273.030.180.002.00
berlin52GA13,72715,74415,078.515,033.30385.6799.3310.00
ACO754777917679.07676.3044.081.7810.00
SA754285347970.07970.93226.495.6910.00
DJAYA754276577657.07633.3038.791.2110.00
QSA-DJAYA754277987542.07563.6061.420.2910.00
pr76GA276,734314,038305,579.5303,113.677994.43180.2560.00
ACO114,953120,653117,885.5117,736.001530.898.8560.00
SA109,265120,582113,844.5113,985.272870.605.3960.00
DJAYA109,190111,336110,684.0110,489.73530.232.1560.00
QSA-DJAYA108,159111,464109,190.0109,429.101015.601.1760.00
kroA100GA81,26388,95287,087.586,860.571506.34308.14120.00
ACO22,31723,11022,592.522,647.50206.696.42120.00
SA21,43823,94922,136.022,359.70648.975.06120.00
DJAYA21,29221,38921,389.021,385.7717.410.49120.00
QSA-DJAYA21,29221,71121,292.021,333.70106.190.24120.00
lin105GA57,14763,18160,680.560,620.531229.44321.59120.00
ACO14,84415,20315,042.515,037.4079.554.58120.00
SA14,98816,02915,296.015,323.40265.986.57120.00
DJAYA14,43814,84914,743.014,698.77112.302.22120.00
QSA-DJAYA14,37914,70614,463.514,473.6096.340.66120.00
pr124GA331,771356,698346,274.0346,512.676497.63487.01120.00
ACO60,72662,93061,742.561,829.87571.214.74120.00
SA59,35464,85761,227.061,409.571364.104.03120.00
DJAYA59,24659,79259,246.059,431.53252.660.68120.00
QSA-DJAYA59,07659,79259,246.059,396.73246.660.62120.00
ch150GA28,71930,88730,384.030,316.60421.04364.41120.00
ACO679268896825.06828.0321.074.60120.00
SA669771476974.56961.63124.896.64120.00
DJAYA662467056692.56689.0719.002.47120.00
QSA-DJAYA657466256605.06601.8310.051.13120.00
tsp225GA22,87223,77723,375.523,342.80257.03495.63120.00
ACO426144204330.54329.6330.2110.48120.00
SA406043124183.04178.9363.936.63120.00
DJAYA417841964190.04189.133.126.89120.00
QSA-DJAYA403741464093.04090.3718.314.37120.00
Table 7. The compared results of QSA-DJAYA with the ACO, PSO, GA, and BH algorithms.
Table 7. The compared results of QSA-DJAYA with the ACO, PSO, GA, and BH algorithms.
NameAlgorithmBestWorstAverageStd
bays29ACO9239.197311,014.44839823.20722.42
PSO9120.33889498.17119195.91168.97
GA9751.425510,513.914210,015.23319.88
BH9396.4759507.17019463.2560.96
QSA-DJAYA202020262024.802.40
bayg29ACO9447.492911,033.54849882.22675.83
PSO9329.25111,332.72249947.03799.41
GA9579.123410,411.19919771.95127.11
BH9375.44189375.44189375.440.00
QSA-DJAYA161016261616.407.84
eil51ACO454.3895469.0531461.026.30
PSO469.1551737.5258574.80107.24
GA448.8397462.1142453.489.42
BH437.893526.8977458.9338.64
QSA-DJAYA427434432.602.80
berlin52ACO7757.026310,541.12288522.901152.20
PSO9218.468214,279.433111,089.532067.93
GA8779.75599565.37449288.451301.21
BH8188.07149356.74838455.83508.99
QSA-DJAYA754275967552.8021.60
st70ACO711.6515855.2032757.7559.61
PSO1030.84841756.12271321.81269.28
GA1112.30781242.20111158.8552.17
BH723.26911081.1087797.57125.23
QSA-DJAYA683697686.605.24
eil76ACO574.2404665.9995594.1440.22
PSO804.26671195.9021975.64152.41
GA619.2262679.7864652.06122.10
BH566.243925.8417659.10152.18
QSA-DJAYA551551551.000.00
eil101ACO725.0996868.2047763.9259.97
PSO1158.7041973.81921499.99319.75
GA828.8806854.4381838.839.96
BH720.38381249.8684897.38210.14
QSA-DJAYA634638636.001.67
Table 8. The compared results of QSA-DJAYA with the ACO, ABC, HA, DTSA, and DJAYA algorithms.
Table 8. The compared results of QSA-DJAYA with the ACO, ABC, HA, DTSA, and DJAYA algorithms.
NameAlgorithmAverageStdGap (%)
oliver30ACO424.681.410.22
ABC462.5512.479.16
HA423.740.000.00
DTSA428.504.211.12
DJAYA426.882.740.74
QSA-DJAYA423.652.940.87
eil51ACO457.864.076.76
ABC590.4915.7937.69
HA443.395.253.39
DTSA443.934.043.51
DJAYA440.184.952.64
QSA-DJAYA432.302.691.48
berlin52ACO7659.3138.701.52
ABC10,390.26439.6937.72
HA7544.370.000.00
DTSA7545.8321.000.02
DJAYA7580.3080.600.48
QSA-DJAYA7552.2043.330.14
st70ACO709.168.274.73
ABC1230.4941.7981.73
HA700.587.513.47
DTSA708.656.774.66
DJAYA702.309.563.72
QSA-DJAYA686.703.951.73
eil76ACO561.983.503.04
ABC931.4424.8670.78
HA557.984.102.31
DTSA578.583.936.09
DJAYA573.176.335.10
QSA-DJAYA550.402.652.30
pr76ACO116,321.22885.797.55
ABC205,119.617379.1689.65
HA115,072.29742.906.39
DTSA14,930.031545.646.26
DJAYA113,258.291711.934.71
QSA-DJAYA109,417.35944.671.16
kroA100ACO22,880.12235.187.49
ABC53,840.032198.36152.94
HA22,435.31231.345.40
DTSA21,728.40358.132.08
DJAYA21,735.31331.332.13
QSA-DJAYA21,297.0521.110.07
eil101ACO693.426.807.96
ABC1315.9535.28104.88
HA683.396.566.40
DTSA689.914.477.41
DJAYA677.374.875.46
QSA-DJAYA635.503.221.03
ch150ACO6702.8720.732.61
ABC21,617.48453.71230.93
HA6677.1219.302.22
DTSA6748.9932.633.32
DJAYA6638.6352.791.63
QSA-DJAYA6596.309.971.05
tsp225ACO4176.0828.348.22
ABC17,955.12387.35365.28
HA4157.8526.277.74
DTSA4230.4558.769.93
DJAYA4095.0242.546.12
QSA-DJAYA4011.108.612.35
Table 9. Results returned by the Mann–Whitney U test in two groups of comparative experiments.
Table 9. Results returned by the Mann–Whitney U test in two groups of comparative experiments.
ExperimentsAlgorithm U min U α Decision
Experiment 1-1QSA-DJAYA vs. GA65127Negate the null hypothesis.
QSA-DJAYA vs. ACO101127Negate the null hypothesis.
QSA-DJAYA vs. SA100127Negate the null hypothesis.
QSA-DJAYA vs. DJAYA100127Negate the null hypothesis.
Experiment 1-2QSA-DJAYA vs. GA2113Retain the null hypothesis.
QSA-DJAYA vs. ACO1413Retain the null hypothesis.
QSA-DJAYA vs. SA1513Retain the null hypothesis.
QSA-DJAYA vs. DJAYA1413Retain the null hypothesis.
Experiment 1-2QSA-DJAYA vs. GA2113Retain the null hypothesis.
QSA-DJAYA vs. ACO1413Retain the null hypothesis.
QSA-DJAYA vs. SA1513Retain the null hypothesis.
QSA-DJAYA vs. DJAYA1413Retain the null hypothesis.
Experiment 2-1QSA-DJAYA vs. ACO-28Negate the null hypothesis.
QSA-DJAYA vs. PSO88Retain the null hypothesis.
QSA-DJAYA vs. GA-28Negate the null hypothesis.
QSA-DJAYA vs. BH-28Negate the null hypothesis.
Experiment 2-2QSA-DJAYA vs. ACO2723Retain the null hypothesis.
QSA-DJAYA vs. ABC1923Negate the null hypothesis.
QSA-DJAYA vs. HA2723Retain the null hypothesis.
QSA-DJAYA vs. DTSA2623Retain the null hypothesis.
QSA-DJAYA vs. DJAYA2723Retain the null hypothesis.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, J.; Hu, W.; Gu, W.; Yu, Y. A Discrete JAYA Algorithm Based on Reinforcement Learning and Simulated Annealing for the Traveling Salesman Problem. Mathematics 2023, 11, 3221. https://doi.org/10.3390/math11143221

AMA Style

Xu J, Hu W, Gu W, Yu Y. A Discrete JAYA Algorithm Based on Reinforcement Learning and Simulated Annealing for the Traveling Salesman Problem. Mathematics. 2023; 11(14):3221. https://doi.org/10.3390/math11143221

Chicago/Turabian Style

Xu, Jun, Wei Hu, Wenjuan Gu, and Yongguang Yu. 2023. "A Discrete JAYA Algorithm Based on Reinforcement Learning and Simulated Annealing for the Traveling Salesman Problem" Mathematics 11, no. 14: 3221. https://doi.org/10.3390/math11143221

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop