Modified Evolutionary Test Data Generation Algorithm Based on Dynamic Change in Fitness Function Weights

Avdeenko, Tatiana; Serdyukov, Konstantin

doi:10.3390/engproc2023033023

Open AccessProceeding Paper

Modified Evolutionary Test Data Generation Algorithm Based on Dynamic Change in Fitness Function Weights^†

by

Tatiana Avdeenko

^‡

and

Konstantin Serdyukov

^*,‡

Applied Mathematics and Computer Science Department, Novosibirsk State Technical University, 20 Karla Marksa Ave., 630073 Novosibirsk, Russia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 15th International Conference “Intelligent Systems” (INTELS’22), Moscow, Russia, 14–16 December 2022.

^‡

These authors contributed equally to this work.

Eng. Proc. 2023, 33(1), 23; https://doi.org/10.3390/engproc2023033023

Published: 13 June 2023

(This article belongs to the Proceedings of 15th International Conference “Intelligent Systems” (INTELS’22))

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we investigate a modification of the method of data generation for multiple code paths within a single launch of the genetic algorithm. This method allows the consideration of the remoteness of paths initiated by different test cases by introducing an additional additive component into the fitness function. Previous studies have shown that the parameter defining the relationship between the different components of the fitness function has a rather strong effect on code coverage. To eliminate this effect, we propose the modification of the first component of the fitness function, which is responsible for path complexity. This modification is based on a dynamic change in code statement weights between generations to achieve greater population diversity. We propose several methods for implementing this modification, divided into two groups. In the first group, the statement weights change depending only on the fact of statement coverage in a generation, and the rate of change depends on the number of previous generations in which it was covered. In the second group, the rate of change depends on the proportion of statement coverage by the test sets in the previous generation. Each of the proposed methods is investigated to achieve complete coverage with different values of the parameter defining the ratio of the components of the fitness function. As a result, the best method is determined, which eliminates the need to determine this parameter for each testing code, thus achieving a greater universality for the algorithm.

Keywords:

white box testing; test data generation; genetic algorithm; fitness function

1. Introduction

Testing is one of the most complex and time-consuming processes in software development, and is necessary to ensure high quality of the developed product [1]. Therefore, automating the testing process, or at least its subprocesses, is an important research task. One of the important subprocesses of the testing is the test data generation. An analysis of existing studies, methods and approaches in the field of the application of methods for automatic test data generation has shown [2,3,4] that in software development, a blind strategy of random data generation is mainly used. At the same time, an analysis of scientific studies has shown that there are approaches, the development and application of which can significantly improve the quality of generated tests cases, expressed in the degree of coverage of the code being tested [5,6].

Among such advanced approaches to test data generation, static methods of symbolic analysis of program code were historically the first [7,8]. The generation of test data as a result of such analysis was reduced to automatic generation and resolution in symbolic form of a system of equations and inequalities obtained by logical union and the intersection of all conditions of the software-under-test (SUT). The undoubted advantage of the static approach is that it obtains results in symbolic form, which makes it possible to analytically determine subareas of test-case values that guarantee the passage of calculations over the given parts of the code.

However, a significant limitation of the possibility to apply the static approach is the problem of computational complexity of symbolic computations even for tasks of relatively small dimensionality. Therefore, a dynamic approach based on the actual execution of the SUT with specially generated values of input variables and subsequent analysis of data flows is currently more realistic and efficient for practical use in software development companies.

The most promising methods for implementing a dynamic approach to test data generation are evolutionary optimization methods [9,10,11]. The evolutionary paradigm, which is the basis of the genetic algorithm (GA), uses a set of random test data generated at the initial stage, after which a sequential “evolution” of the data is performed to improve the coverage quality of the testing code. Therefore, the assumption arises that the GA can be adapted to implement the idea of the evolutionary improvement of test data in terms of maximizing the coverage of the testing code. However, the existing research focuses on solving local problems, such as finding a specific set of test data that covers some given statements. At the same time, to solve the practical task of comprehensive software testing, it is relevant to develop methods for generating test sets that provide the maximum coverage of the entire SUT, taking into account its multi-connected complex structure.

The paper is organized as follows. Section 1 gives an introduction into the problem. Section 2 describes the usage of the GA in terms of test data generation. In Section 3, we formulate the fitness function for achieving maximum coverage. In Section 4, we propose a modification of the algorithm with dynamic weight assignment. Section 5 provides the results. Section 6 is conclusion.

2. Theoretical Background

The genetic algorithm [12,13,14] works iteratively, performing consecutive steps at each iteration several until the completion conditions are reached. With each new iteration, GA creates a new generation of the population based on the previous (parent) one. The main GA cycle for test data generation includes the following stages, which, except for the first stage, are performed iteratively until a given coverage value or number of generations are reached:

Initialization. The initial population is formed randomly, taking into account the constraints on the values of input variables. The size of population m is chosen based on the size of the SUT (more specifically, the minimum number of different possible paths that the computation can take).
Fitness function calculation. Each chromosome in a population is evaluated by a fitness function.
Selection. The best 20% of chromosomes are selected unchanged for the next generation; the remaining 80% of chromosomes of the next generation will be obtained by crossover.
Crossover. Half of the chromosomes of the next generation are formed by randomly crossing 20% of the best chromosomes of the previous generation with each other. The remaining chromosomes will be obtained by randomly crossing all chromosomes of the previous generation with each other. Crossover occurs by choosing a random constant $β_{i} \in [0, 1]$ for each $i = \bar{1, N}$ and subsequent crossing, where $i - t h$ gene of the offspring is a linear combination of the corresponded parent genes:

${v a r}_{i}^{o f f s p r i n g} = β_{i} \times {v a r}_{i}^{m o t h e r} + (1 - β_{i}) \times {v a r}_{i}^{f a t h e r}, i = \bar{1, N} .$
Mutation. With a given mutation probability (0.05) each gene can change its value at random within given constraints. The main goal of mutation is to achieve greater diversity.
Formation of the elite chromosome pool. In each generation, individuals of the population are selected into the elite chromosome pool. Only the chromosomes that provide additional code coverage compared to the previous coverage are included in the pool.

After all GA stages have been executed, it is determined whether the completion conditions are met, or whether the process proceeds to the next iteration. The iterability of GA is the factor that allows the obtaining of new solutions. Each new generation is formed based on the previous one, i.e., the test sets of the previous generation participate in the formation of new sets, thus providing the “evolution” of previously obtained solutions.

3. Multi-Path Algorithm for Maximum Code Coverage

Input variables of the testing code are either the variables

{v a r}_{i}, j = \bar{1, N}

, which are part of the input statement, either input parameters of procedures and functions, initiating calculations along a certain code path. In this way, we can describe a vector of input variables as

({v a r}_{1}, {v a r}_{2}, \dots, {v a r}_{N})

, and entire definition area as

D = D_{1} \times D_{2} \times \dots \times D_{N}

, where

D_{i}

is definition area of the input variable

v a r_{i}

. When chromosome

x_{i} \in D

is represented by a dimensional vector N

x_{i} = [v a r_{1}^{i}, v a r_{2}^{i}, \dots, v a r_{N}^{i}]

.

The purpose of automatic test data generation is to find many test cases

{x_{1}, x_{2}, . . ., x_{m}}

, which initiate passing through a given set of reachable paths, i.e., the paths that can be covered by the test sets. The main coverage criterion is the criterion of statement coverage [15]. We introduce notation

g (x_{i})

as a vector that is an indicator of the statement coverage initiated by a certain test set

x_{i}

:

g (x_{i}) = (g_{1} (x_{i}), g_{2} (x_{i}), \dots, g_{n} (x_{i})),

where n is the number of statements of the SUT, and

g_{j} (x_{i}) = \{\begin{matrix} 1 & if path initiated by the set x_{i} passes though the statement j; \\ 0 & otherwise . \end{matrix}

If we denote the vector of statement weights of the SUT as

(w_{1}, w_{2}, \dots, w_{n})

, then we can define the fitness function for a single chromosome

x_{i}

as follows

F (x_{i}) = \sum_{j = 1}^{n} w_{j} g_{j} (x_{i}),

(1)

where

w_{j}

—weight of the statement j,

g_{j}

—value of the coverage indication, n—number of statements.

The greater the sum of the statement weights executed on the path initiated by the test case

x_{i}

, the greater the value of the fitness function

F (x_{i})

. To ensure greater population diversity, a component is added to Formula (1) that allows the consideration of the remoteness of paths from each other. The remoteness of the paths is defined through the similarity operation. To calculate the j-th similarity coefficient

s i m_{j} (x_{i_{1}}, x_{i_{2}})

of two chromosomes

x_{i_{1}}

and

x_{i_{2}}

, check whether the j-th statement of SUT, whose coverage is marked by the indicator

g_{j}

, is at the intersection of both paths initiated by test cases

x_{i_{1}}

and

x_{i_{2}}

:

s i m_{j} (x_{i_{1}}, x_{i_{2}}) = \bar{g_{j} (x_{i_{1}}) \oplus g_{j} (x_{i_{2}})}, j = \bar{1, n}

(2)

where the logical operations “negation” (NOT) and “exclusive OR” (⊕, XOR) are used.

The more matching the covered statements at the intersection of two paths, the greater the similarity value between chromosomes. The following formula defines the similarity between two chromosomes as the weighted average of the similarity over all code statements:

s i m (x_{i_{1}}, x_{i_{2}}) = \sum_{j = 1}^{n} w_{j} \times s i m_{j} (x_{i_{1}}, x_{i_{2}})

(3)

The similarity value between chromosome

x_{i}

and the other chromosomes of the population is calculated as

f_{s i m} (x_{i}) = \frac{1}{(m - 1)} \sum_{s = 1; s \neq i}^{m} s i m (x_{s}, x_{i}),

(4)

where m is the number of chromosomes in the population.

Now, we can determine the average similarity value of paths in the entire population

\bar{f_{s i m}} = \frac{1}{m} \sum_{i = 1}^{m} f_{s i m} (x_{i}) .

(5)

and further formulate the additive component of the fitness function responsible for the diversity of paths in the population as the modulus of the difference between the average similarity of the population and the similarity of a particular chromosome

F_{2} (x_{i}) = |\bar{f_{s i m}} - f_{s i m} (x_{i})| .

(6)

As a result, the resulting fitness function for chromosome

x_{i}

, taking into account the diversity of paths, is calculated by the formula

F (x_{i}) = F_{1} (x_{i}) + k \times F_{2} (x_{i}),

(7)

where

F_{1} (x_{i})

and

F_{2} (x_{i})

are defined by Formulas (1) and (6), respectively. Accordingly, the first component

F_{1} (x_{i})

determines the complexity of the path initiated by the chromosome

x_{i}

, and the second component

F_{2} (x_{i})

determines the remoteness of this path from all other paths in the population. The parameter k defines the relationship between the components.

Using Formula (7) as a fitness function leads to more diverse populations as a result of a single GA run. However, due to the use of a continuous version of the genetic algorithm in this research, the resulting diversity is not sufficient to fully cover the code within a single GA run.

The latter circumstance is related to the detected “swing effect” arising from the presence of indistinguishable chromosomes in the population. If indistinguishable chromosomes have a high value of fitness function in one generation, they will be selected for crossover. Then, their offspring will, with high probability, also be indistinguishable from their parents. The new generation, in this case, will consist of a greater number of indistinguishable chromosomes, and similarity in the population will depend more on them. For all these chromosomes, the value of the additive component

F_{2}

of the fitness function will be reduced, and for chromosomes passing through other paths, it will be increased. Now, other chromosomes could be selected for crossing and will form a multitude of indistinguishable chromosomes for the next generation. Thus, the population will be cyclically first filled with indistinguishable chromosomes, which in the next generation will lead to a decrease in similarity value for them, and, accordingly, to a decrease of

F_{2}

, thus reducing the priority of indistinguishable chromosomes in the next iteration. A similar cycle will be repeated for different sets, and as a result, both path complexity (

F_{1}

) and similarity value (

F_{2}

) cease to play an important role in the formation of a new population, and different sets are constantly shuffled without investigation of the solution space.

To exclude “swing effect” it is proposed to use the indicator

i n d (x_{1}, \dots, x_{i})

, which is determined by the number of chromosomes from the set

{x_{1}, \dots, x_{(i - 1)}}

indistinguishable from the chromosome

x_{i}

:

\tilde{F} (x_{i}) = F_{1} (x_{i}) + \frac{1}{1 + i n d (x_{1}, \dots, x_{i})} \times F_{2} (x_{i}) .

(8)

Indeed, the initial value

i n d (x_{1}) = 0

, because the set in which the indistinguishable chromosomes are identified is empty at the first step. At each subsequent step, the value

i n d (x_{1}, \dots, x_{i})

can either increase by 1 if the next chromosome is indistinguishable from one of the previous ones, or keep the same value if the next chromosome is unique. This will allow chromosomes passing through different paths to be more evenly distributed throughout the population as a whole.

4. Modification of the Fitness Function Based on Dynamic Changes in Statements Weights

The studies carried out in the articles [16,17,18] showed a relatively strong influence of the value of k on the coverage of the SUT. At

k = 0

, the coverage was minimal, reaching its maximum value at

k = 10

, after which it began to decline. Obviously, choosing the right k can significantly affect the final results. The value of

k = 10

obtained in the studies was optimal only within the tested programs; for others, this value may not be optimal. Therefore, to achieve greater universality of the algorithm, it would be preferable to reduce the influence of k.

For this purpose, we propose the modification of the

F_{1}

component of the fitness function (8), so that a greater population diversity is achieved by it alone. The idea of modifying the

F_{1}

component is inspired by other evolutionary methods. In studies [19,20,21] some of the evolutionary methods are used, in particular Particle Swarm Optimization (PSO), which is one of the Swarm Intelligence algorithms. However, the application of PSO in existing studies has been based more on comparing PSO and GA implementations than on the hybridization of approaches [22]. Other representatives of this family are Ant Colony Optimization (ACO), Artificial Bee Colony Algorithm (ABC), Cuckoo Search (CS), and many other algorithms based on the collective interaction of different particles or agents.

The ACO [23] is one of the methods that allows solving pathfinding problems on graphs. It is based on simulating the behavior of a colony of ants. The ants, passing along certain paths, leave a trail of pheromones behind them. The better the solution found, the more pheromones there will be on one or another path. In the next generation, ants already form their paths based on the number of pheromones—the more pheromones on a certain path, the more ants will be directed to that path and continue exploring it. In this way, the colony gradually explores the entire solution space, gradually reaching better and better paths.

It is not possible to directly apply the ACO to the test data generation problem, because the output to certain paths is initiated by different datasets, and the only way to change the path is to directly change the values of the test sets themselves. Nevertheless, the idea of using the “pheromones” model to prioritize pathfinding could have a positive effect in providing more diversity in the population. When applied to the problem of increasing the diversity of test sets, the idea of pheromones leads to the expediency of dynamically (from generation to generation) increasing or decreasing the weights of operators

w_{j}, j = \bar{1, n}

, depending on the number of chromosomes previously (in the previous generations) passed through these statements. The dynamic change in the weights of statements can be represented as

{\tilde{w}}_{j}^{(q)} = P h_{j}^{(q)} w_{j}, j = \bar{1, n}; q = \bar{1, Q}

(9)

where

{\tilde{w}}_{j}^{(q)}

is the weight assigned to the statement j in generation q,

P h_{j}^{(q)}

is weight multiplier of the statement j in generations q (

0 \leq P h_{j}^{(q)} \leq 1

), Q is number of generations (iterations of GA). Taking into account dependence (9) the dynamic variant of the

F_{1}

fitness function component will have the form:

F_{1}^{(q)} = \sum_{j = 1}^{n} {\tilde{w}}_{j}^{(q)} g_{j} (x_{i}) = \sum_{j = 1}^{n} P h_{j}^{(q)} w_{j} g_{j} (x) i); q = \bar{1, Q} .

(10)

In expression (10) it is very important to determine the dependence of the multiplier

P h_{j}^{(q)}

(

0 \leq P h_{j}^{(q)} \leq 1

) on the arguments, so that the statements weights in the fitness function respond to operator coverage in the previous generations in time. The resulting diversity of the population of test datasets, and hence the degree to which they cover the SUT, depends on the choice of the variation method of

P h_{j}^{(q)}

.

Two basic strategies were proposed for the initial behavior of the multiplier

P h_{j}^{(q)}

depending on the number of generations q—the direct and the reverse strategy. In the direct strategy, we assume

P h_{j}^{(1)} = 0

in the first generation and then this value increases (or remains the same) depending on the coverage (or non-coverage) of operator j in the previous generation. In the reverse strategy, on the contrary, in the first generation we assume

P h_{j}^{(1)} = 1

and then this value decreases (or remains the same), depending on the coverage (or non-coverage) of operator j.

In both strategies, the multiplier can reach the boundaries of the interval [0, 1]. Thus, in the direct strategy, the value of

P h_{j}^{(q)}

increases monotonically, but after reaching the limit value

P h_{j}^{(q)} = 1

(this value corresponds to the maximum priority of the operator j in the fitness function) it is necessary to begin its decrease to change the algorithm direction to other, still uncovered, statements. Then, after reaching the minimum possible value

P h_{j}^{(q)} = 0

, corresponding to the non-inclusion of operator j in the fitness function, we start monotonic increasing again, and so on. In the reverse strategy, changes occur in opposite directions, first in the direction of decreasing, then in the direction of increasing, etc.

This fluctuating change in the multiplier

P h_{j}^{(q)}

between values 0 and 1 can occur with different rates, given by the parameter

Δ P h

, which affects the total number of fluctuations within the interval [0, 1] during the process of test data generations. Let

{T r a n s}_{i}^{(q)}

be the number of complete passes from 0 to 1 or from 1 to 0 by the multiplier

P h_{j}^{(q)}

made to the current generation q. Then, the behavior of the multiplier for the direct strategy can be written as

{P h}_{j}^{(q)} = x = \{\begin{matrix} 0 & if q = 1, \\ {P h}_{j}^{(q - 1)} + Δ P h \times {(- 1)}^{T r a n s_{j}^{(q)}} & if {\tilde{m}}_{j}^{(q - 1)} \neq 0, \\ {P h}_{j}^{(q - 1)} & if {\tilde{m}}_{j}^{(q - 1)} = 0, \end{matrix}

(11)

and the behavior of the multiplier for the reverse strategy is in the form

{P h}_{j}^{(q)} = x = \{\begin{matrix} 1 & if q = 1, \\ {P h}_{j}^{(q - 1)} - Δ P h \times {(- 1)}^{T r a n s_{j}^{(q)}} & if {\tilde{m}}_{j}^{(q - 1)} \neq 0, \\ {P h}_{j}^{(q - 1)} & if {\tilde{m}}_{j}^{(q - 1)} = 0, \end{matrix}

(12)

where

{\tilde{m}}_{j}^{(q - 1)}

is the number of chromosomes in a population consisting of m individuals that covered the operator j in a generation (

q - 1

).

The article comprises several methods for determining the rate parameter

Δ P h

. The

H a l f

method assumes that one full pass of the multiplier (from 0 to 1 or from 1 to 0) with the rate

Δ P h

can be obtained by covering the operator j in half of the generations from the initially given number of generations Q (

Q ⁄ 2

), the

Q u a r t e r

method—in quarters of generations (

Q ⁄ 4

),

T e n t h

—one tenth of all generations (

Q ⁄ 10

). The fewer generations (iterations) needed for one complete pass, the greater the rate parameter

Δ P h

, and the more often the multiplier will fluctuate between the limit values [0, 1]. Table 1 presents the main indicators used to implement the proposed methods for varying the multiplier

{P h}_{j}^{(q)}

using a constant rate of change

Δ P h

. Direct strategy methods are marked with a plus sign (+), and methods with a reverse strategy are marked with a minus sign (−).

Another method of determining the multiplier

{P h}_{j}^{(q)}

, which we called

C o u n t -

(the method is based on the reverse strategy), involves changing it not by a constant value, but by a value depending on the coverage intensity of the operator j in the previous generation:

{P h}_{j}^{(q)} = x = \{\begin{matrix} 1 & if q = 1, \\ {P h}_{j}^{(q - 1)} (1 - {\tilde{m}}_{j}^{(q - 1)} / m & if {\tilde{m}}_{j}^{(q - 1)} \neq 0, \\ 1 & if {\tilde{m}}_{j}^{(q - 1)} = 0 . \end{matrix}

(13)

In contrast to the previously proposed method, in

C o u n t -

the value of

{P h}_{j}^{(q)}

will decrease the stronger the more chromosomes in the previous generation were covered by operator j. There is no gradual increase in the multiplier in this case; instead, if the operator was not covered (

{\tilde{m}}_{j}^{(q - 1)} = 0

), then the maximum value

{P h}_{j}^{(q)} = 1

is set. Thus, often covered operators cease to play a significant role in the process of searching for test sets, and the algorithm will mostly try to generate sets for as yet uncovered paths.

5. Results

Let us compare the application of various methods for determining the multiplier

{P h}_{j}^{(q)}

, using the methods proposed above for the test program SUT2 described in [24]. Figure 1 shows a comparison of the average coverage for different values of the parameter k of the components of the fitness function (8), in which

F_{1}

is calculated either by Formula (1), i.e., without modification, or by Formula (10) when using modification by the methods

H a l f +

,

Q u a r t e r +

,

T e n t h +

and

C o u n t -

. The average coverage is calculated based on 1500 runs.

Q = 50

and

m = 25

are chosen as the GA parameters, at which full coverage is relatively rarely achieved.

In Figure 1, red highlights the average coverage when using Formula (8) (static method), methods of monotonic change in

P h

based on direct strategy are in shades of blue and black—

C o u n t -

method. The methods for determining the parameter

Δ P h

based on the reverse strategy are not presented in the figure, but, in general, they have approximately similar values of average coverage.

Each of the proposed methods for determining the multiplier

{P h}_{j}^{(q)}

showed a higher average coverage value than the static method (without modification) for each of the k values. Figure 1 shows that for the static method, the average coverage gradually increases with increasing k, while using modifications, the maximum average coverage is reached already at

k = 2

, and thereafter does not decrease. The best result among all proposed methods showed

C o u n t -

, which is why exactly this method will be used in further research of this modification of the fitness function.

Analysis of the results presented in Figure 1 allows us to conclude that the modification allows the significant increase in the coverage even without using the previously determined optimal value of

k = 10

. At the same time, even at

k = 0

, i.e., without use of the additive component

F_{2}

of the fitness function, a higher coverage is achieved than without modification. Comparison of average coverage without and with the best count modification method can be seen in Figure 2. It shows average coverage with algorithm parameters

Q = 50

,

m = 25

.

Thus, the use of the modification makes it possible to increase the average coverage, which is especially noticeable at

k = 0

. More importantly, the maximum coverage is achieved when using any non-zero value of k, i.e., the ratio parameter of the fitness function components k ceases to play a significant role in achieving the maximum coverage. As a result, the proposed modification based on the dynamic change in statement weights makes it possible to increase code coverage when generating test sets, as well as eliminate the need to determine the k value for each individual SUT.

6. Conclusions

The paper proposes a modification of the method for generating test data for multiple paths in one launch of GA. The initial problem, which consists of the necessity to determine the value of the ratio parameter of the fitness function components, is solved by dynamically changing the weights of statements between generations. The methods proposed in the paper eliminate the need to define the parameter for each individual program, and one of the methods,

C o u n t -

, allows the achievement of greater coverage, even if the ratio parameter value is zero. Therefore, not only the original goal of implementing the modification is achieved, but also the diversity of generated test cases increases, so overall coverage has also improved.

Author Contributions

Conceptualization, T.A.; methodology, T.A.; software, K.S.; investigation, K.S.; validation, T.A.; writing—original draft preparation, T.A. and K.S.; writing—review and editing, T.A.; visualization, K.S.; supervision, T.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Ministry of Science and Higher Education of Russian Federation (project No. FSUN-2020-0009).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

ISO/IEC TR 19759; Software Engineering—Guide to the Software Engineering Body of Knowledge (SWEBOK). ISO Copyright Office: Switzerland, Geneva, 2015.
Kumar, M.; Chaudhary, J. Reviewing Automatic Test Data Generation. Int. J. Eng. Sci. Comput. 2017, 7, 11432–11435. [Google Scholar]
Sui, J.; Gong, Y.; Jin, D.; Wang, Y. Statistical testing data generation for UAS. In Proceedings of the 3rd International Conference on Material Engineering and Advanced Manufacturing Technology, Shanghai, China, 26–28 April 2019; p. 7. [Google Scholar]
Xuan, J.; Jiang, H.; Ren, Z.; Hu, Y.; Luo, Z. A random walk based algorithm for structural test case generation. In Proceedings of the 2nd International Conference on Software Engineering and Data Mining, Chengdu, China, 23–25 June 2010; pp. 583–588. [Google Scholar]
Meudec, C. ATGen: Automatic Test Data Generation using Constraint Logic Programming and Symbolic Execution. Softw. Test. Verif. Reliab. 2001, 11, 81–96. [Google Scholar] [CrossRef]
Gerlich, R. Automatic Test Data Generation and Model Checking with CHR. arXiv 2014, arXiv:1406.2122. [Google Scholar]
Clarke, L. A system to generate test data and symbolically execute programs. IEEE Trans. Softw. Eng. 1976, SE-2, 215–222. [Google Scholar] [CrossRef]
Howden, W. Symbolic testing and the DISSECT symbolic evaluation system. IEEE Trans. Softw. Eng. 1977, SE-4, 266–278. [Google Scholar] [CrossRef]
Girgis, M.R. Automatic Test Data Generation for Data Flow Testing Using a Genetic Algorithm. J. Univers. Comput. Sci. 2005, 11, 898–915. [Google Scholar]
Doungsa-ard, C.; Dahal, K.; Hossain, A.; Suwannasart, T. GA-based Automatic Test Data Generation for UML State Diagrams with Parallel Paths. In Advanced Design and Manufacture to Gain a Competitive Edge: New Manufacturing Techniques and Their Role in Improving Enterprise Performance; Springer: London, UK, 2008; pp. 147–156. [Google Scholar]
Sharma, A.; Patani, R.; Aggarwal, A. Software Testing Using Genetic Algorithms. Int. J. Comput. Sci. Eng. Surv. 2016, 7, 21–33. [Google Scholar] [CrossRef]
Holland, J.H. Adaptation in Natural and Artificial Systems; MIT Press: Cambridge, UK, 1975; p. 236. [Google Scholar]
Mitchel, M. An Introduction to Genetic Algorithms; A Brad-Ford Book; The MIT Press: Cambridge, UK, 1999; p. 162. [Google Scholar]
Simon, D. Evolutionary Optimization Algorithms: Biologically-Inspired and Population-Based Approaches to Computer Intelligence; John Wiley & Sons: Hoboken, NJ, USA, 2013; p. 784. [Google Scholar]
Spillner, A.; Linz, T.; Schaefer, H. Software Testing Foundations. In A Study Guide for the Certified Tester Exam; Rocky Nook: Kingston, MA, USA, 2014; p. 305. [Google Scholar]
Avdeenko, T.V.; Serdyukov, K.E. Genetic algorithm fitness function formulation for test data generation with maximum statement coverage. Lect. Notes Comput. Sci. 2021, 12689, 379–389. [Google Scholar]
Avdeenko, T.V.; Serdyukov, K.E. Development and Research of the Test Data Generation Approach Modifications. In Proceedings of the 2021 International Conference on Information Technology and Nanotechnology (ITNT), Samara, Russia, 20–24 September 2021; pp. 1–6. [Google Scholar]
Avdeenko, T.V.; Serdyukov, K.E.; Tsydenov, Z.B. Formulation and research of new fitness function in the genetic algorithm for maximum code coverage. Procedia Comput. Sci. 2021, 186, 713–720. [Google Scholar] [CrossRef]
Singla, S.; Kumar, D.; Rai, M.; Singla, P. A hybrid PSO approach to automate test data generation for data flow coverage with dominance concepts. J. Adv. Sci. Technol. 2011, 37, 15–26. [Google Scholar]
Bueno, P.M.; Wong, W.E.; Jino, M. Automatic test data generation using particle systems. In Proceedings of the 2008 ACM Symposium on Applied Computing, New York, NY, USA, 16–20 March 2008; pp. 809–814. [Google Scholar]
Khan, S.A.; Nadeem, A. Automated Test Data Generation for Coupling Based Integration Testing of Object Oriented Programs Using Particle Swarm Optimization (PSO). In Genetic and Evolutionary Computing, Proceedings of the Seventh International Conference on Genetic and Evolutionary Computing, ICGEC 2013, Prague, Czech Republic, 25–27 August 2013; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; Volume 238, pp. 115–124. [Google Scholar]
Dixit, S.; Tomar, P. Applying Computational Intelligence in Software Testing. J. Artif. Intell. Res. Adv. 2015, 2, 7–11. [Google Scholar]
Dorigo, M.; Birattari, M.; Stutzle, T. Ant colony optimization. IEEE Comput. Intell. Mag. 2015, 1, 28–39. [Google Scholar] [CrossRef]
Avdeenko, T.; Serdyukov, K. Automated Test Data Generation Based on a Genetic Algorithm with Maximum Code Coverage and Population Diversity. Appl. Sci. 2015, 11, 4673. [Google Scholar] [CrossRef]

Figure 1. Comparison of different modifications (

Q = 50

,

m = 25

).

Figure 1. Comparison of different modifications (

Q = 50

,

m = 25

).

Figure 2. Comparison of coverage with and without modification (

Q = 50

,

m = 25

).

Figure 2. Comparison of coverage with and without modification (

Q = 50

,

m = 25

).

Table 1. Methods for determining the rate parameter

Δ P h

of multiplier

{P h}_{j}^{(q)}

.

Table 1. Methods for determining the rate parameter

Δ P h

of multiplier

{P h}_{j}^{(q)}

.

Method	Formula for Calculating the Multiplier ${Ph}_{j}^{(q)}$	$Δ Ph$
Half+	(11)	$2 / Q$
Half−	(13)	$2 / Q$
Quarter+	(11)	$4 / Q$
Quarter−	(13)	$4 / Q$
Tenth+	(11)	$10 / Q$
Tenth−	(13)	$10 / Q$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Avdeenko, T.; Serdyukov, K. Modified Evolutionary Test Data Generation Algorithm Based on Dynamic Change in Fitness Function Weights. Eng. Proc. 2023, 33, 23. https://doi.org/10.3390/engproc2023033023

AMA Style

Avdeenko T, Serdyukov K. Modified Evolutionary Test Data Generation Algorithm Based on Dynamic Change in Fitness Function Weights. Engineering Proceedings. 2023; 33(1):23. https://doi.org/10.3390/engproc2023033023

Chicago/Turabian Style

Avdeenko, Tatiana, and Konstantin Serdyukov. 2023. "Modified Evolutionary Test Data Generation Algorithm Based on Dynamic Change in Fitness Function Weights" Engineering Proceedings 33, no. 1: 23. https://doi.org/10.3390/engproc2023033023

Article Menu

Modified Evolutionary Test Data Generation Algorithm Based on Dynamic Change in Fitness Function Weights^†

Abstract

1. Introduction

2. Theoretical Background

3. Multi-Path Algorithm for Maximum Code Coverage

4. Modification of the Fitness Function Based on Dynamic Changes in Statements Weights

5. Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Modified Evolutionary Test Data Generation Algorithm Based on Dynamic Change in Fitness Function Weights †

Abstract

1. Introduction

2. Theoretical Background

3. Multi-Path Algorithm for Maximum Code Coverage

4. Modification of the Fitness Function Based on Dynamic Changes in Statements Weights

5. Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Modified Evolutionary Test Data Generation Algorithm Based on Dynamic Change in Fitness Function Weights^†