FEDRak: Federated Learning-Based Symmetric Code Statement Ranking Model for Software Fault Forecasting

Alhumam, Abdulaziz

doi:10.3390/sym15081562

Open AccessArticle

FEDRak: Federated Learning-Based Symmetric Code Statement Ranking Model for Software Fault Forecasting

by

Abdulaziz Alhumam

Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, P.O. Box 400, Al-Ahsa 31982, Saudi Arabia

Symmetry 2023, 15(8), 1562; https://doi.org/10.3390/sym15081562

Submission received: 1 July 2023 / Revised: 7 August 2023 / Accepted: 8 August 2023 / Published: 10 August 2023

Download

Browse Figures

Versions Notes

Abstract

:

Software Fault Forecasting (SFF) pertains to timely identifying sections in software projects that are prone to faults and may result in significant development expenses. Deep learning models have become widespread in software fault monitoring and management, and these models rely on the design metrics and the code pattern features for classifying the code as erroneous or safe. The proposed model works based on the collective formulation of the fault localization model, which acquires the model-specific metadata for building a global model that would perform software fault forecasting globally. The proposed model works by ranking the suspicious code blocks based on the symmetry of the semantic features of the erroneous code and the implementation code. The feature selection and scaling process is initially performed to precisely identify the features contributing to fault forecasting. The data extraction that is portrayed as the intermediate phase would assist in focusing on the code statements and ranking them based on the impact of the fault. A fine-tuned spectrum-based fault localization technique is used in ranking the statements. The FEDRak model facilitates ongoing adaptation in instances where there are adjustments in the feature contribution of data over time. The federated learning model would update the feature weights of the global model based on the weights synchronized by locally built fault forecasting approaches. FEDRak is statistically analyzed in relation to other contemporary techniques in fault localization in terms of metrics like sensitivity, specificity, accuracy, F1-score, and ROC curves. The proposed model’s performances are analyzed in terms of local and global models.

Keywords:

software fault forecasting; classification; spectrum-based fault localization; feature selection

1. Introduction

Software Fault Forecasting (SFF) is a methodology to enhance software quality and minimize software testing expenses by developing divergent categorization or classification mechanisms utilizing diverse machine learning techniques. Numerous software development companies aim to anticipate issues to uphold software quality for customer satisfaction and to economize on testing expenses. Using Deep Learning (DL) methodology with historical data is a common practice for predicting suspicious code blocks in the software development life cycle. The structured method is a procedure that facilitates the development of software with superior quality while keeping costs low and meeting customer expectations within a short timeframe [1]. Federated Learning (FL) has emerged as a promising approach to overcome data isolation by enabling distinct sites to train a global model cooperatively rather than directly sharing data for training [2].

The mission of SFF is to ensure the provision of software of superior quality and reliability while fine-tuning the utilization of scarce resources. Consequently, it will be feasible for software developers to give precedence to the usage of computing resources at every stage of the software development lifecycle. Numerous organizations involved in software development seek to anticipate software defects to preserve software quality, enhance customer satisfaction, and reduce testing expenses. The use of the software development process is aimed at improving the quality of software [3]. Diverse classification models can be constructed using various machine intelligence techniques to conduct testing more efficiently. Different machine intelligence techniques have been explored to predict erroneous code statements in software modules to improve software quality and minimize software testing expenses [4]. The quantity and length of assessments administered significantly influence the efficacy of tests. Insufficient testing of software may result in the persistence of undetected faults, which could subsequently manifest as undesirable behaviors in the future. Excessive software testing can result in project delays and budget overruns due to unforeseen expenses. Early detection and rectification of software defects can reduce project costs and mitigate the risk of project duration overruns. Software developers can determine the software’s fault susceptibility by analyzing its code during the initial phases of development through the proficient utilization of software metrics [5].

Software enterprises are endeavoring to develop software modules that are devoid of errors. The software’s efficacy has been compromised due to its inherent defects, rendering it incapable of executing tasks precisely and effectively. The identification of software defects is considered the most crucial phase of SFF and requires extensive testing [6]. This stage is of utmost importance as it ensures the quality and reliability of the software product. Effective defect management enhances the quality of software solutions and fosters a culture of quality consciousness throughout the project life cycle, leading to a sustained enhancement of deliverables. Identifying and fixing any software problems as soon as possible is essential to enhancing the program’s dependability and practicality [7]. The Federated Learning-based Software Fault Forecasting model is shown in Figure 1.

In federated learning, the global model is built collectively based on the distributed local models, and the local models are built on the locally available instances. In the process, each device developed an individual local model by exclusively utilizing its local dataset. The local models were transmitted to a central coordinator and subsequently consolidated into a global model, which was then redistributed to all participants for either inference or additional training purposes. The primary objective of FL is to facilitate collaborative efforts among participants, leading to the development of a superior model compared to individual efforts while ensuring the preservation of data privacy. It is accomplished by mandating that participants exchange model parameters rather than data. The main contribution of the current study is listed below.

This study identifies features based on the Gaussian probability density function that would assist in accurately identifying the features across various programs independently based on the distribution to recognize potentially vulnerable statements within a software program.
The features are assigned feature weights that are significant in deciding the feature contribution in the classification procedure, and the weights are updated over the training rounds by evaluating the test cases.
The Spider Monkey Optimization algorithm is used in updating the statement ranks for every individual code from a global perspective, resulting in a precise ranking of the statements.
The global feature weights are updated from the weights in the local model in the federated learning setting to build a robust fault prediction model.
The statements are assigned vulnerability scores according to their overall ranking among the program’s code statements, updated over the training process, and the ranks would assist in localizing the suspicious code blocks.
Statistical analysis of the proposed federated learning-based software fault forecasting model with other conventional approaches concerning standard metrics like Sensitivity, Specificity, Accuracy, F1-score, and the RCC curves.

The rest of the manuscript is organized as follows: Section 2 presents the literature on existing software fault forecasting models and federated learning models. Section 3 presents this study’s background, discussing feature selection, scaling, dataset description, and the implementation environment. Section 4 presents the proposed model, discussing the fine-tuned spectrum-based fault localization mechanism and the weight updating mechanism in federated learning. Section 5 presents the experimental results and discussion. Finally, Section 6 offers this study’s conclusion and future research directions.

2. Literature

Many studies have previously been conducted for fault identification and prediction that have been studied and experimented with using conventional machine learning and DL approaches. Their Various targets are implemented for bug detection in the code. Some strategies focus on the vulnerable lines of code, using classification techniques to differentiate, using features for classification, run-time profiling, program log reports, breakpoints, etc. Nevertheless, there is a demand for a framework that could forecast the suspicious code block well in advance using run-time profiling and semantic features, which would assist in precisely classifying the suspicious portion of the software. A popular technique for defect prediction involves utilizing a classification algorithm to partition the source code into two distinct categories, namely, defective and error-free code [8]. Despite this, methodologies reliant on manually constructed characteristics frequently fail to encapsulate the semantics and syntax of the code block adequately. Conventional code metrics lack the ability to differentiate between code fragments that possess identical structure and complexity yet execute distinct functionalities. Rawat and Dubey [9] have conducted research that presents several models aimed at enhancing software quality. Their study involves an analysis of the factors that impact software quality as well as strategies for improving product and overall performance in the context of software. The study examined several metrics related to size and complexity and various computational models, including Bayesian belief networks, genetic algorithms, and neural networks, among other options.

Challagulla et al. [10] have analyzed the impact of ML and statistical models in assessing software quality. Experiments were conducted on four separate real-time software fault repositories using various prediction approaches. The results showed that the rule-based classification approach using the 1R technique and instance-based learning, when combined with the consistency-based subset assessment mechanism, outperformed other models in terms of precision. Based on their outcome, the authors demonstrated a comprehensive software defect analysis tool for analyzing flaws and monitoring software modules in real time. Divya Tomar and Sonali Agarwal [11] have presented a study on the prediction of faulty software modules based on class-imbalanced learning that is efficient in dealing with imbalanced datasets. The issue with learning from imbalanced data is that the minority class is not afforded the same level of attention by the learning approach as the majority class. In the context of imbalanced datasets, the learning algorithm produces classification rules tailored to the atypical class by either generating specific rules or addressing the issue of missing ones altogether. The lack of generalizability of these rules to novel data renders them unsuitable for predictive purposes.

Wu et al. [12] have introduced a semi-supervised technique for dictionary learning in their study, which involved utilizing both labeled and unlabeled defect datasets over 16 different projects. Moreover, their approach considered the expenses incurred due to classification errors while performing dictionary learning. But the dictionary-based process needs high-quality data for training. It is also imperative to note that this aspect is contingent on the context and cannot be universally applied, necessitating a case-by-case examination. The process entails including an additional computational burden to identify the most salient features, which may impede the existence of correlated or insignificant features. Yang, X, et al. [13] have presented a generative model based on Deep Belief Networks (DBNs), which rely on a neural network that operates across multiple levels. This architecture enables the DBN to learn and represent complex patterns in the data it is trained on. The architecture of this network comprises a singular input layer, a singular output layer, and a multitude of hidden layers. The output layer generates the feature vector, which represents the input data. Each stratum comprises stochastic nodes. A crucial characteristic of the DBN is its restricted connectivity, whereby nodes are exclusively linked to nodes in adjacent layers, not those within the same layer. The primary limitation of the DBN lies in its inadequate ability to effectively encapsulate the contextual information of code elements, including but not limited to the sequential execution of statements and calling upon functions. The primary limitation of the DBN lies in its inadequate ability to effectively encapsulate the contextual information of code elements, including but not limited to the sequential execution of statements and calling upon functions.

Numerous studies based on the Convolutional Neural Network (CNN) specialize in multiple convolution filters for data processing, with this network defined by two important characteristics. The local unit connection pattern is first duplicated throughout the network. The network captures the short-term structural context of the source code. Second, all units have the same parameters. Regardless of its location, the network may learn code element information. Zhou Xu et al. [14] investigated defect detection using CNN over triplet loss and weighted cross-entropy loss approaches. Another work, by Qiu et al. [15], uses the CNN model to provide a feature-learning approach. The model is intended to choose characteristics from token vectors in the Abstract Syntax Tree (AST) of the code. It then goes on to learn transferable joint characteristics. Integrating deep-learning-generated characteristics with hand-crafted ones allows the technique to effectively conduct cross-project fault prediction. The fundamental disadvantage of these models is that they grow increasingly complicated as the dataset size increases, and they must be trained for a substantial number of epochs to recognize the features with good accuracy.

Mcmurray and Sodhro [16] experimented with a defect detection mechanism for security-related traceability in Smart Healthcare Applications, and the model performed reasonably well across divergent machine learning techniques such as Principal Component Analysis (PCA), Partial Least Squares Regression (PLS), and Feature Selection. Srinivasa Kumar et al. [17] conducted a study on software fault detection and recovery for business operations using an independent program to examine and recover back to normal functionality using the test cases more conventionally. Batool, and Khan [18] have proposed a software fault detection model using deep learning models based on Long Short-Term Memory (LSTM) and Bi-directional Long Short-Term Memory (Bi-LSTM) [19] and Radial Basis Function Network (RBFN) for fault prediction. And the performances of the deep learning models are compared with other conventional approaches, and the experimental results have proven that LSTM and Bi-LSTM have yielded better accuracy of 93.66% and 93.45%, respectively. Still, the RBFN has yielded 82.18% accuracy, which is reasonably faster than the earlier two deep learning models. In the study on software fault prediction by Borandag [20], a deep learning model is based on Recurrent Neural Networks (RNNs) and Ensemble techniques and learns across divergent datasets; the proposed model yields an accuracy of 95.9% on one of the datasets considered in the experiment.

The above-discussed studies are just a few approaches used in software fault localization. And all the approaches are local models, where they are locally implemented over software, and the models could classify the suspicious code block they are trained for. It is desired to have a global model that could handle divergent software models with distinct erroneous classes; definitely, a global model is needed. In the current study, a federated learning-based model is being designed to address the requirement for a unified global model to deal with the divergent error classes.

3. Background

The current section of the manuscript deals with the preliminaries of the proposed fine-tuned spectrum-based fault localization technique, which includes information about the feature selection and scaling mechanisms, information extraction, dataset description, and implementation environment, followed by the implementation of federated learning.

3.1. Feature Selection and Scaling

This study uses the Gaussian Probability Density Function (GPDF) [21] in feature selection. The Gaussian distribution is a prevalent form of continuous probability distribution. Gaussian distributions are statistically significant and frequently employed in feature analysis to depict random variables with real values. GPDF is widely used because it is the probability density function that emerges as a limit for the sum of random variables. It has been observed that, regardless of the probability density function of the individual variables, the probability density function of a combination of random variables that are independent resembles a Gaussian distribution as the total number of variables being summed increases [22]. The mathematical formulation of GPDF is shown in Equation (1).

f (v) = \frac{1}{\sqrt{2 π σ}} e x p (- \frac{{(v - μ)}^{2}}{{2 σ}^{2}})

(1)

From the above equation, the notation

σ^{2}

designates the distribution variance that denotes the centralized degree of GPDF, and

μ

designates the distribution mean over the input variable

v

. The probability distribution function across the range of intervals

(α, β)

is shown in Equation (2).

ρ (α < v < β) = \frac{1}{\sqrt{2 π σ}} \int_{α}^{β} e x p - (\frac{{(v - μ)}^{2}}{{2 σ}^{2}}) d x

(2)

The probability range

α

designates the lower bound of the GPDF, and

β

denotes the upper boundary of the probability range. Hence, it is plausible to consider the probability of 𝑥 as an approximation of the integral from

v

to

v + ∆ v

, which can be evaluated using Equation (3).

ρ (v) = \frac{1}{\sqrt{2 π σ}} \int_{v}^{v + ∆ v} e x p - (\frac{{(v - μ)}^{2}}{{2 σ}^{2}}) d x \approx ∆ v \times f (v)

(3)

The assessed value would be the initial feature probabilistic values assigned to the features, and their corresponding values are consistently updated over the iterations. The feature scaling technique is used to standardize the range of values. The process of feature scaling is employed to normalize the range of features present in the input data set. The input program’s feature set encompasses diverse values during the learning phase, with a simultaneous reduction in the loss function. The scaling process is executed iteratively to expedite and accurately achieve the global or local optimum in the localization algorithm. The present investigation involves utilizing Min–Max normalization to scale the feature values within the range of 0–1. The Min–Max normalization technique offers several advantages over conventional scaling methods. Min–Max scaling can effectively manage feature distributions that deviate from the Gaussian distribution. The Min–Max normalization technique addresses the issue of precision loss in a gradient optimization method that aims to converge toward the global solution [23]. The process generates target values within the range of 0 to 1 through the utilization of the minimum and maximum values of the column. The corresponding mathematical formula is shown in Equation (4). The new feature weight value is considered the feature’s current value, as shown in Equation (5), for further processing.

V_{n e w} = \frac{v - v_{m i n}}{v_{m a x} - v_{m i n}}

(4)

ρ (v) = V_{n e w}

(5)

The variable

v_{n e w}

represents the newly normalized values from 0 to 1. The variable

v_{m i n}

corresponds to the lowest values related to the feature, while

v_{m a x}

represents the highest value related to the same feature. The symbol

v

represents the data sample that is being considered.

3.2. Information Extraction

The direct extraction of textual information from code content may pose a challenge in retrieving crucial information due to the complexity of the code file. The text contained significant erroneous data, including comments and function descriptions. The token sequence exhibited concise information and lucid content, facilitating mapping to the code content. Thus, the retrieval of information from code text can be accomplished by utilizing token sequences. The attention module is a technique used for mining key features in the text. It can automatically recognize the significant features within the text data. The use of the attention mechanism has gained significant traction in natural language processing. Thus, the utilization of the attention module may facilitate the task of extracting textual information from the token sequence. The procedure by which the attention module extracts textual information from a set of queries

S

using a set of keys

K

over the value

D

as shown in Equation (6) [24].

E C = s o f t m a x (\frac{{S K}^{T}}{\sqrt{{d i s}_{k}}}) D

(6)

Initially, the tokens were mapped onto a space with high dimensions. A key-value pair was created and utilized as an input parameter to the attention model to represent the vector’s value in a token. The attention score was calculated based on the degree of similarity between the query and the key. Using the Softmax function, the attention module generates a vector representation of the textual information in the code.

3.3. Dataset Description

Multiple datasets are being considered in the current study to evaluate the performance of the federated learning model. Each local model is designated to handle a particular dataset type, and the global model combines multiple local models. The local datasets include the NASA-Metrics Data Program (MDP) repository [25,26], which has CM1, MW1, PC1, PC3, and PC4 instances. The PROMISE repository that provides for KC1 and JM1 instances and the Unix Utility Programs (UUP) repository with gzip, sed, flex, and grep instances are considered in the current study for evaluation.

The fused dataset in the NASA repository comprises 3579 instances and 38 attributes in total. Each of the datasets that have been chosen represents a distinct software component, and the instances contained within the dataset are indicative of the various software modules. The features denote the software metrics documented throughout the development process. The fused dataset comprises 38 features, with one feature designated as the output class for prediction and the remaining 37 features utilized in the prediction process. The output classification determines the existence or absence of defects in the module under consideration. PROMISE repository consists of 9793 instances, of which 9593 are JM1, 200 are KC1 instances, 1759 defects in JM1 instances, and 36 instances of KC1 are defective [27].

The gzip utility accepts a set of 13 distinct parameters as input, in addition to a roster of files to be compressed. The software exhibits significant functionality, as evidenced by its 6573 lines of code and 211 test inputs. The Sed utility is employed to perform minor alterations to an input sequence. The primary application of this tool is to analyze textual input and implement alterations to the information as directed by the user. The program comprises 12,062 lines of code and encompasses 360 test inputs. The function of the flex program is to perform lexical analysis. The input files were generated from regular expressions and C code rules. The total number of lines of code amounts to 13,892, while the number of test inputs provided is 525. The grep command accepts two input parameters: patterns and files. Lines from any file that match any of the given patterns are printed by the program. Lines of code are used to quantify the amount of code produced in a software program. The dataset has 12,653 items and 470 variables. The summarized information on instances associated with various software fault repositories is shown in Table 1.

3.4. Implementation Environment

The test cases evaluate the reliability of the code excerpt across various parameters, such as disparate inputs and operational circumstances. The assessments are conducted locally using dedicated software deployed on a standalone computer. Table 2 presents the specifics of the experimental setting in which the experimentation is conducted.

3.5. Implementation of Federated Learning

The utilized technologies augment a self-contained, microservice-based, and fortified infrastructure for operational settings that necessitate the implementation of federated machine learning resolutions. The technologies mentioned above facilitate the aggregation of readily available services and third-party libraries, collectively constituting the stack of open-source tools that underpin the platform. Docker has been chosen as the primary tool for managing images and containers, serving as the initial level of abstraction that impacts all platform modules. Virtualization enables secure resource management through the implementation of hardware-agnostic and isolated execution. In addition to ensuring the secure implementation of federated tasks, it safeguards the host from user-generated code.

The implementing-level federated execution layer is built upon the Flower library, which offers robust functionalities that ensure efficient processing of computing modules without requiring specialized libraries for algorithm production. How communication is conducted varies based on the intended purpose and substance. If model parameters are being communicated, the gRPC protocol (implemented via Flower) is utilized, significantly improving the (de)serialization phase. Without other requirements, RESTful actions are deemed sufficient to effectuate modifications to the state of nodes, whether through user-to-server or server-to-server interactions. In addition, a web-based Graphical User Interface (GUI) is incorporated into the system, which is implemented separately from the core API using Jinja templates. The API Gateway, Kong, consolidates various internal paths into a singular port, enhancing the system’s usability and facilitating the exposure of external endpoints. The Kong proxy process is capable of handling gRPC and HTTP/1 protocols.

4. Proposed Methodology

The current section of the manuscript presents Spider Monkey Optimization-based fine-tuned Spectrum-Based Fault Localization, which assigns ranks to suspicious code blocks in fault forecasting. The current section discusses the weight update mechanism in federated learning.

4.1. Fine-Tuned Spectrum-Based Fault Localization

The conventional spectrum-based fault localization is finetuned using the Spider Monkey Optimization algorithm for a better-refined ranking model. The more precisely the rank assigned to the code blocks, the better the accuracy of the fault forecasting model would be. The spectrum-based fault localization model employs a mathematical formula that is specific to the method to assign a potential vulnerability ranking and possible fault to each trace of program components, such as expressions, code statements, initialization statements, assignment statements, branching statements, and evaluation statements, that are gathered for each test case. The suspiciousness rank is a metric used to assess the probability of a statement or a code snippet within a software program being faulty. By employing a spectrum-based approach for fault localization, the dependency information of each code snippet is scrutinized during the execution of test cases. The suspicious code block assessment estimates suspicious ranks for each program element by integrating correlation and dependency information [28]. The fine-tuned spectrum-based fault localization framework is shown in Figure 2.

The frequency of successful and unsuccessful test case executions, relative to the number of executions, determines the ranking system. Let us assume that 𝐶 identifies the code block, consisting of a set of code blocks with elements

{c_{1}, c_{2}, c_{3} . . . c_{n}}

., where it is assumed

\sum_{c = 1}^{n} c \in C B

. Equation (7) presents the formula utilized for rank assessment. The probabilistic feature values, i.e.,

ρ (v),

are assessed during training. The rank of the code block is assessed based on the number of test cases that a particular block of code can pass. The total sum of failed test cases is identified by

T_{f a i l_t c}

.

{R a n k}_{C B} = ρ (v) \times \frac{f a i l_t c (C B)}{\sqrt{T_{f a i l_t c} \times (f a i l_t c (C B) + p a s s_t c (C B))}}

(7)

From the above equations, the notation

f a i l_t c (C B)

denotes the failed test case associated with the code block, and the notation

p a s s_t c (C B)

denotes the test cases that are accepted. The obtained rank is normalized using the Spider Monkey Optimization algorithm. In conventional studies on ranking the code blocks, the code blocks are ranked based on the test cases. The rank of the statements is confined to the program where they are included, and the local ranking algorithm would not yield a better result in the federated learning environment. Similar test cases and code statements would be encountered in more than one program. The statement’s ranks are refined through the SMO algorithm to maintain the global ranking mechanism.

The identification of updated ranks for the subsequent phase of fault forecasting is being accomplished through the utilization of Spider Monkey Optimization, which considers local and global best practices. The fitness of the search space at the outset is established through a random selection process of those who initially make up the population. The formula presented in Equation (8) assigns the updated rank values identified by the notation

N_{r}

.

{C B}_{r} = {R a n k}_{C B} + α + β + ({p b}_{r} - {R a n k}_{C B}) \times r a n d (- 1,1)

(8)

α = | {R a n k}_{C B} - {C B}_{l b} | \times r a n d (0,0.5)

(9)

β = | {R a n k}_{C B} - {C B}_{g b} | \times r a n d (0,0.5)

(10)

From the above equation,

{p b}_{r}

designates the perturbation rate associated with the statement ranking. The notation

{C B}_{l b}

corresponds to the local best rank associated with the code block, and

{C B}_{g b}

corresponds to the global best rank. The notations

α

and

β

are the two values used in normalizing the local best and global best ranks, as shown in Equations (9) and (10). The function 𝑟𝑎𝑛𝑑( ) is utilized to generate an equal distribution of the computed rank value within the interval of 0 and 1. The updated ranks are concerned with the local best rank and the global best rank and assist in amending the ranks to the code blocks in a precise manner that is based on the overall grading, which would ease the forecasting of erroneous code blocks in a normalized manner.

4.2. Feature Weight Upgradation at Federated Server

This is the crucial phase of the proposed FEDRak-based approaches for software fault forecasting. The feature weights are updated using federated learning rather than sending the data over the internet. The updated feature weights would be fed as input to the server to upgrade the global model [29]. For ease of understanding, the weight of the feature is identified by

ω

and is associated with two different approaches for two different clients. The weights at each of them are designated as

(ω_{a_{1}}^{c_{a}}, ω_{a_{2}}^{c_{b}}),

which correspond to two clients

c_{a}

and

c_{b}

that concern two different algorithms

a_{1}

and

a_{2}

that are used in software fault forecasting. Equations (11) and (12) present the weights associated with both algorithms.

ω_{a_{1}}^{c_{a}} = {(\begin{matrix} ω_{11} & . . . & ω_{1 c_{b}} \\ . & . & . \\ ω_{r_{a} 1} & . . . & ω_{r_{a} c_{b}} \end{matrix})}_{a \times b}

(11)

ω_{a_{2}}^{c_{b}} = {(\begin{matrix} ω_{11} & . . . & ω_{1 c_{n}} \\ . & . & . \\ ω_{r_{m} 1} & . . . & ω_{r_{m} c_{n}} \end{matrix})}_{m \times n}

(12)

In the above equations, the notation

ω_{1 c_{b}}

designates the feature weight at the first row and last column, i.e., the

b th

column for the first algorithm. Similarly, the variable

ω_{r_{a} 1}

denotes the weight at the last row, i.e., the

a th

row and first column matrix generated by the first algorithm. The notation

ω_{r_{a} c_{b}}

represents the weight in the first algorithm’s last row and column. The notation

ω_{1 c_{n}}

denotes the feature weight in the first row and last column, i.e., the

n th

column for the second algorithm. Similarly, the variable

ω_{r_{m} 1}

denotes the weight at the last row, i.e., the

m th

row and first column of the first algorithm. The variable

ω_{r_{m} c_{n}}

denotes the weight in the second algorithm’s last row and column. The ideal weights for the input to the processing layer in a federated server context are denoted by

ω_{R}^{f s}

, where the notation

R

is a combination of both the algorithms, i.e.,

R = a_{1} + a_{2}

and the corresponding formula for idea weights are shown in Equation (13).

ω_{R}^{f s} = \frac{|ω_{a_{1}}^{c_{a}} + ω_{a_{2}}^{c_{b}}|}{2}

(13)

In the case of a single client denoted by

c

that is associated, the feature over an algorithm

a

is shown in Equation (14).

ω_{R}^{f s} = ω_{a}^{c}

(14)

The present aggregation encounters a challenge about the matrix’s addition property, as the matrix’s addition to the dimensions necessitates consistency. Equation (9) unambiguously indicates that adding locally trained matrices is unfeasible due to their disparate dimensions. To address this matter, the dimensions of all relevant matrices must be identical. To accomplish this task, it is necessary to concatenate a matrix of zeros with each relevant matrix [30]. The objective is to assess the highest possible number of rows across all clients that have undergone local training, as shown in Equation (15), and the columns are presented in Equation (16) over two distinct dimensions of matrixes

(d_{1}, d_{2})

and

(d_{3}, d_{4})

.

M_{r o w s} = m a x (d_{1}, d_{3})

(15)

M_{c o l m s} = m a x (d_{2}, d_{4})

(16)

The process of embedding the zero matrices alongside each optimal weight matrix will be executed using the subsequent methodology. The matrices mentioned above with zero values will be subjected to horizontal concatenation with the weight of each model trained locally. The matrices with zero values will be subjected to horizontal concatenation with the weight of each model trained locally, as shown in Equations (17) and (18) [31]. The weight updating mechanism can be depicted in Figure 3.

M_{z_a_{1}} = z e r o s (r_{m}, M_{c o l m s} - d_{2})

(17)

M_{z_a_{2}} = z e r o s (r_{m}, M_{c o l m s} - d_{4})

(18)

The horizontal concatenation of the matrices is illustrated as shown in Equations (19) and (20).

H_{z_a_{1}} = h o r c a t (M_{z_a_{1}}, M (a_{1}))

(19)

H_{z_a_{2}} = h o r c a t (M_{z_a_{2}}, M (a_{2}))

(20)

The result of Equation (21) will be utilized to acquire a global model, assuming both metrics are of identical size.

ω_{H}^{f s} = 2 \times H_{z_a_{1}} + 0.5 \times H_{z_a_{2}}

(21)

From the above equations, the notation

ω_{R}^{f s}

is used to update the weights when the clients’ matrices have similar dimensions, and

ω_{H}^{f s}

denotes the weights when the matrices have divergent dimensions.

5. Results and Discussion

To evaluate the effectiveness of a fault forecasting methodology, it is imperative to employ suitable metrics like sensitivity, specificity, F1-score, accuracy measures [32], and ROC curves. A statistical analysis of the performance of the proposed software fault forecasting model is performed with other contemporary techniques used in other studies like Naïve Bayes (NB), Artificial neural network (ANN), Decision Tree (DT), Random Forest (RF), Fuzzy Logic-Fused (FLF), Bayesian Network (BN), Support Vector Machine (SVM), Synthetic Minority Oversampling Technique (SMOTE), Heterogenous Ensemble Classifier (HEC) (combination of ANN, DT, BN, and SVM), Bayesian Regularization (BR), Scaled Conjugate Gradient (SCG), BFGS Quasi-Newton (QN), Levenberg–Marquardt (LM), Variant based Ensemble Learning (VEL), Artificial Bee Colony (ABC) optimization technique, Principal Component Analysis (PCA) and Principal Component-based Support Vector Machine (PC-SVM). In a few studies, the accuracies are analyzed for each program category. To make the analysis more fruitful, the mean of accuracies for each class is considered for a more reliable estimation of the performances. The model’s performances are assessed for the standalone implementation identified by FEDRak (L) and the federated learning model-based model identified by FEDRak (G).

The performances of the proposed model are performed concerning the standard evaluation parameters like True Positive (

T_{p v}

), True Negative (

T_{n v}

), False Positive (

F_{p v}

) and False Negative (

F_{n v}

) [33]. The number of times the proposed model can precisely identify the erroneous code block correctly is determined as a true positive, and the number of instances correctly identifying error-free blocks as non-suspicious is defined as a true negative. The number of cases in which the proposed model misinterprets the erroneous code blocks as non-erroneous ones are considered false positives. The number of instances in which the proposed model misinterprets the non-erroneous code blocks as erroneous code blocks are considered false negatives [34]. The sensitivity determined the ability of the model to correctly identify the erroneous code block, and the corresponding formula is shown in Equation (22).

S e n s i t i v i t y = \frac{T_{p v}}{T_{p v} + F_{n v}}

(22)

The specificity determined the ability to recognize the non-erroneous code block correctly. The corresponding formula is shown in Equation (23).

S p e c i f i c i t y = \frac{T_{n v}}{T_{n v} + F_{p v}}

(23)

Accuracy is the other significant performance assessment metric, which summarizes the total number of accurately detected instances. It is most often used when all classes of datasets are equally essential. The corresponding formula for accuracy is shown in Equation (24).

A c c u r a c y = \frac{T_{p v} + T_{n v}}{T_{p v} + T_{n v} + {F_{p v} + F}_{n v}}

(24)

The F1 score evaluates a model’s prediction capacity by focusing on its class-wise efficiency instead of its overall performance. The F1-score is a harmonic mean of precision and sensitivity. The corresponding formula for F1-score is shown in Equation (25), and the formula for precision is shown in Equation (26).

A c c u r a c y = \frac{2 \times (P r e c i s i o n \times S e n s i t i v i t y)}{(P r e c i s i o n + S e n s i t i v i t y)}

(25)

P r e c i s i o n = \frac{T_{p v}}{T_{p v} + F_{p v}}

(26)

The performance of the proposed model is analyzed across each category of the programs, in both local and global models. In some of the existing studies, the models are not evaluated concerning some of the metrics. All such instances are mentioned as not applicable (N/A) in the performance analysis tables. The obtained outcome for each category and the corresponding values are shown in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11. The mean performances of local and global models in each program category, along with Unix Utility programs, are shown in Table 12.

The above tables show the proposed model’s performance in relation to individual programs and the overall performance of the proposed model in executing over a standalone environment and a federated learning environment. It can be seen from the experimental results that standalone learning has exhibited better performance than federated learning. Their performances in federated learning are comparatively lower due to the standard features among the programs that have different feature ranks among divergent program categories that would result in compromised accuracies in the global model. The performances for each program class are evaluated against the existing models, and the performances of each program category are evaluated as shown in Table 13. PC4 of the NASA-MDP repository has attained the highest accuracy of 97.9%, and Sed of the Unix Utility program has the least accuracy of 89%. The mean accuracy for all categories of programs is 93.7%. The graphs representing the resultant outcome for all programs are shown in Figure 4.

The performance of the proposed software fault forecasting model is further evaluated concerning metrics like Receiver Operator Characteristic (ROC) [38]. The ROC curve is especially useful with binary classes because it shows the probability relation between the true positive rate and the false positive rate. The ROC curve is generated by computing and graphing the rate of true positives relative to the rate of false positives for a singular classifier across multiple thresholds. The formula for the ROC curve is shown in Equation (27).

R O C = \frac{R a n k (p_{s}) - p_{s} \times ((p_{s} + 1) / 2)}{p_{s} + n_{s}}

(27)

From the above equation, the notation

p_{s}

designates the number of positive samples, and the notation

n_{s}

designates the number of negative samples in the entire dataset. The corresponding ROC curves for NASA-MDP programs are shown in Figure 5, the ROC curves for PROMISE programs are shown in Figure 6, and the ROC curves for Unix Utility programs are shown in Figure 7.

5.1. Performance Analysis

The cumulative performance of the proposed model concerning all the classes is summarized and analyzed against the performance of existing studies for fault localization, as shown in Table 12.

Table 12. Performance analysis of various faults.

Approach	Sensitivity	Specificity	Accuracy
NB [25]	0.356	0.922	0.833
ANN [25]	0.186	0.957	0.858
DT [25]	0.203	0.938	0.850
FLF [25]	0.328	0.989	0.910
FEDRak	0.934	0.934	0.937

As can be observed from the experimental analysis shown in the above table, the proposed fault forecasting model has shown reasonable performance. The accuracies of various algorithms across the NASA-MDP and PROMISE repositories are summarized in Table 13.

Table 13. Performance analysis of various faults.

Approach	Dataset	Accuracy
NB [25]	NASA-MDP	0.833
ANN [25]	NASA-MDP	0.858
DT [25]	NASA-MDP	0.850
FLF [25]	NASA-MDP	0.910
BN [37]	NASA-MDP	0.713
DT [37]	NASA-MDP	0.771
BN + SMOT [37]	NASA-MDP	0.786
DT + SMOT [37]	NASA-MDP	0.810
ANN [39]	PROMISE	0.865
SVM [39]	PROMISE	0.854
NB [39]	PROMISE	0.850
TREE [39]	PROMISE	0.836
KNN [39]	PROMISE	0.612
SVM [40]	PROMISE	0.772
MLP [40]	PROMISE	0.788
RBF [40]	PROMISE	0.795
VEL [40]	PROMISE	0.844
NB-PCA [41]	PROMISE	0.810
SVM-PCA [41]	PROMISE	0.830
RF-PCA [41]	PROMISE	0.830
RF-Adaboost [42]	PROMISE	0.900
SVM-Adaboost [42]	PROMISE	0.790
Adaboost-RF [43]	PROMISE	0.897
Bag-RF [43]	PROMISE	0.897
FEDRak	MDP, PROMISE	0.952

As can be observed from Table 12 and Table 13, the proposed fine-tuned spectrum-based fault localization technique has outperformed that of the conventional fault-localization techniques. The average performances concerning various evaluation metrics of the local and global models are considered as the performance of the proposed model compared with other state-of-the-art techniques.

5.2. Threats to Validity

Empirical findings demonstrate that the approach posited in this investigation exhibits superior performance in inter-program defect forecasting. However, certain variables and plausible hazards impinge on the method’s validity. Acquiring extensive project datasets that contain defect labels can be a challenging task. Only a limited number of datasets from NASA-MDP, PROMISE, and Unix Utility Programs have been utilized to conduct comparison experiments. To enhance the validity and reliability of the divergent defect model, it is recommended that additional software defect datasets from multiple companies be utilized in future research endeavors.

The metrics employed as autonomous factors for forecasting software defects have detected a potential internal fault. Multiple datasets were used from various repositories, each with distinct metrics and granularity levels such as method, class, or file. However, multiple datasets are being used in the current study to address this potential risk. The potential for external threats exists when generalizing the conclusions derived from various client devices. The findings presented in this study are derived from datasets generated by multiple researchers. It is important to note that discrepancies in the measurement techniques employed by these teams may impact the accuracy and reliability of the results. To address this potential risk, a deliberate selection process was used to identify datasets encompassing diverse implementation aspects, varying in scale and level of detail.

6. Conclusions

The Software fault forecasting model presented in the current study has performed exceptionally well in identifying the code blocks with possible code errors. The model works over the fine-tuned spectrum-based fault localization technique, assessing the symmetry of normal and erroneous reference code features. The feature selection and scaling, followed by information extraction, are performed to forecast the code blocks precisely. The feature weights are synchronized with the global model through the federated learning model upon building the standalone local model. The performance of the standalone and global models for every program category is analyzed, and it is observed that the proposed approach has outperformed the other contemporary approaches used in fault localization. This study is confined to a limited number of program classes in the NASA-MDP and PROMISE repositories. The previous studies in the review process included all of the programs in the repositories; the current research may be assessed further using all types of programs in the repositories. The performance of the proposed model can be further evaluated using divergent datasets for fault localization over multiple clients to evaluate the federated learning model’s performance precisely. The time delay is the other crucial parameter in the federated learning mechanism that has to be assessed in future studies.

The future research directions also include security constraints like data confidentiality and integrity mechanisms for the data exchanged between the clients and the server in the federated learning environment. The optimization of weights is expected to result in improved performance. At the same time, including auxiliary memory elements for state information maintenance is anticipated to enhance the efficiency of the fault forecasting approaches.

Funding

This research received no external funding.

Data Availability Statement

Not Applicable.

Acknowledgments

The author acknowledges the College of Computer Sciences and Information Technology, King Faisal University, Saudi Arabia, for providing the necessary resources for carrying out the research.

Conflicts of Interest

The author declares no conflict of interest.

References

Khalid, A.; Badshah, G.; Ayub, N.; Shiraz, M.; Ghouse, M. Software Defect Prediction Analysis Using Machine Learning Techniques. Sustainability 2023, 15, 5517. [Google Scholar] [CrossRef]
Dang, T.K.; Lan, X.; Weng, J.; Feng, M. Federated Learning for Electronic Health Records. ACM Trans. Intell. Syst. Technol. 2022, 13, 72. [Google Scholar] [CrossRef]
Ali, M.M.; Huda, S.; Abawajy, J.; Alyahya, S.; Al-Dossari, H.; Yearwood, J. A parallel framework for software defect detection and metric selection on cloud computing. Clust. Comput. 2017, 20, 2267–2281. [Google Scholar] [CrossRef]
Alhumam, A. Software Fault Localization through Aggregation-Based Neural Ranking for Static and Dynamic Features Selection. Sensors 2021, 21, 7401. [Google Scholar] [CrossRef]
Anju, A.J.; Judith, J.E. Adaptive recurrent neural network for software defect prediction with the aid of quantum theory- particle swarm optimization. Multimed. Tools Appl. 2023, 82, 16257–16278. [Google Scholar] [CrossRef]
Herbold, S.; Trautsch, A.; Grabowski, J. A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans. Softw. Eng. 2018, 44, 811–833. [Google Scholar]
Alhumam, A. Explainable software fault localization model: From blackbox to whitebox. Comput. Mater. Contin. 2022, 73, 1463–1482. [Google Scholar]
Akimova, E.N.; Bersenev, A.Y.; Deikov, A.A.; Kobylkin, K.S.; Konygin, A.V.; Mezentsev, I.P.; Misilov, V.E. A Survey on Software Defect Prediction Using Deep Learning. Mathematics 2021, 9, 1180. [Google Scholar] [CrossRef]
Rawat, M.; Dubey, S.K. Software Defect Prediction Models for Quality Improvement: A Literature Study. Int. J. Comput. Sci. 2012, 9, 288–296. [Google Scholar]
Challagulla, V.U.; Bastani, F.B.; Yen, I.L. A Unified Framework for Defect Data Analysis Using the MBR Technique. In Proceedings of the 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI′06), Arlington, VA, USA, 13–15 November 2006; pp. 39–46. [Google Scholar] [CrossRef]
Tomar, D.; Agarwal, S. Prediction of Defective Software Modules Using Class Imbalance Learning. Appl. Comput. Intell. Soft Comput. 2016, 2016, 7658207. [Google Scholar] [CrossRef] [Green Version]
Wu, F.; Jing, X.Y.; Dong, X.; Cao, J.; Xu, M.; Zhang, H.; Ying, S.; Xu, B. Cross-project and within-project semi-supervised software defect prediction problems study using a unified solution. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), Buenos Aires, Argentina, 20–28 May 2017; pp. 195–197. [Google Scholar] [CrossRef]
Yang, X.; Lo, D.; Xia, X.; Sun, J. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Inf. Softw. Technol. 2017, 87, 206–220. [Google Scholar] [CrossRef]
Xu, Z.; Li, S.; Xu, J.; Liu, J.; Luo, X.; Zhang, Y.; Zhang, T.; Keung, J.; Tang, Y. LDFR: Learning deep feature representation for software defect prediction. J. Syst. Softw. 2019, 158, 110402. [Google Scholar] [CrossRef]
Qiu, S.; Lu, L.; Cai, Z.; Jiang, S. Cross-Project Defect Prediction via Transferable Deep Learning-Generated and Handcrafted Features. In Proceedings of the 31st International Conference on Software Engineering & Knowledge Engineering (SEKE 2019), Lisbon, Portugal, 10–12 July 2019; pp. 1–6. Available online: http://ksiresearch.org/seke/seke19paper/seke19paper_70.pdf (accessed on 30 May 2022).
Mcmurray, S.; Sodhro, A.H. A Study on ML-Based Software Defect Detection for Security Traceability in Smart Healthcare Applications. Sensors 2023, 23, 3470. [Google Scholar] [CrossRef]
Srinivasa Kumar, D.; Sankar Rao, A.; Manoj Kumar, N.; Jeebaratnam, N.; Kalyan Chakravarthi, M.; Bhargavi Latha, S. A stochastic process of software fault detection and correction for business operations. J. High Technol. Manag. Res. 2023, 34, 100463. [Google Scholar] [CrossRef]
Batool, I.; Khan, T.A. Software fault prediction using deep learning techniques. Softw. Qual. J. 2023. [Google Scholar] [CrossRef]
Srinivasu, P.N.; Shafi, J.; Krishna, T.B.; Sujatha, C.N.; Praveen, S.P.; Ijaz, M.F. Using Recurrent Neural Networks for Predicting Type-2 Diabetes from Genomic and Tabular Data. Diagnostics 2022, 12, 3067. [Google Scholar] [CrossRef]
Borandag, E. Software Fault Prediction Using an RNN-Based Deep Learning Approach and Ensemble Machine Learning Techniques. Appl. Sci. 2023, 13, 1639. [Google Scholar] [CrossRef]
Hui, Z.; Liu, X. Research on Software Reliability Growth Model Based on Gaussian New Distribution. Procedia Comput. Sci. 2020, 166, 73–77. [Google Scholar] [CrossRef]
Wang, W.; Lu, L.; Wei, W. A Novel Supervised Filter Feature Selection Method Based on Gaussian Probability Density for Fault Diagnosis of Permanent Magnet DC Motors. Sensors 2022, 22, 7121. [Google Scholar] [CrossRef]
Ahsan, M.M.; Mahmud, M.A.P.; Saha, P.K.; Gupta, K.D.; Siddique, Z. Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies 2021, 9, 52. [Google Scholar] [CrossRef]
Yao, W.; Shafiq, M.; Lin, X.; Yu, X. A Software Defect Prediction Method Based on Program Semantic Feature Mining. Electronics 2023, 12, 1546. [Google Scholar] [CrossRef]
Aftab, S.; Abbas, S.; Ghazal, T.M.; Ahmad, M.; Hamadi, H.A.; Yeun, C.Y.; Khan, M.A. A Cloud-Based Software Defect Prediction System Using Data and Decision-Level Machine Learning Fusion. Mathematics 2023, 11, 632. [Google Scholar] [CrossRef]
Alazba, A.; Aljamaan, H. Software Defect Prediction Using Stacking Generalization of Optimized Tree-Based Ensembles. Appl. Sci. 2022, 12, 4577. [Google Scholar] [CrossRef]
Shepperd, M.; Song, Q.; Sun, Z.; Mair, C. Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Trans. Softw. Eng. 2013, 39, 1208–1215. [Google Scholar] [CrossRef] [Green Version]
Ajibode, A.A.; Shu, T.; Ding, Z. Evolving Suspiciousness Metrics from Hybrid Data Set for Boosting a Spectrum Based Fault Localization. IEEE Access 2020, 8, 198451–198467. [Google Scholar] [CrossRef]
Abbas, S.; Issa, G.F.; Fatima, A.; Abbas, T.; Ghazal, T.M.; Ahmad, M.; Yeun, C.Y.; Khan, M.A. Fused Weighted Federated Deep Extreme Machine Learning Based on Intelligent Lung Cancer Disease Prediction Model for Healthcare 5.0. Int. J. Intell. Syst. 2023, 2023, 2599161. [Google Scholar] [CrossRef]
Zhong, K.; Liu, G. Communication-Efficient Federated Learning with Multi-layered Compressed Model Update and Dynamic Weighting Aggregation. In CICAI 2021: Artificial Intelligence; Lecture Notes in Computer Science; Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W., Eds.; Springer: Cham, Switzerland, 2021; Volume 13070. [Google Scholar] [CrossRef]
Billaud-Friess, M.; Falcó, A.; Nouy, A. Principal Bundle Structure of Matrix Manifolds. Mathematics 2021, 9, 1669. [Google Scholar] [CrossRef]
Srinivasu, P.N.; SivaSai, J.G.; Ijaz, M.F.; Bhoi, A.K.; Kim, W.; Kang, J.J. Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM. Sensors 2021, 21, 2852. [Google Scholar] [CrossRef]
Batool, I.; Ahmed Khan, T. Software fault prediction using data mining, machine learning, and deep learning techniques: A systematic literature review. Comput. Electr. Eng. 2022, 100, 107886. [Google Scholar] [CrossRef]
Vulli, A.; Srinivasu, P.N.; Sashank, M.S.K.; Shafi, J.; Choi, J.; Ijaz, M.F. Fine-Tuned DenseNet-169 for Breast Cancer Metastasis Prediction Using FastAI and 1-Cycle Policy. Sensors 2022, 22, 2988. [Google Scholar] [CrossRef]
Daoud, M.S.; Aftab, S.; Ahmad, M.; Khan, M.A.; Iqbal, A.; Abbas, S.; Iqbal, M.; Ihnaini, B. Machine learning empowered software defect prediction system. Intell. Autom. Soft Comput. 2022, 31, 1287–1300. [Google Scholar] [CrossRef]
Mustaqeem, M.; Saqib, M. Principal component based support vector machine (PC-SVM): A hybrid technique for software defect detection. Clust. Comput. 2021, 24, 2581–2595. [Google Scholar] [CrossRef]
Balogun, A.O.; Lafenwa-Balogun, F.B.; Mojeed, H.A.; Adeyemo, V.E.; Akande, O.N.; Akintola, A.G.; Bajeh, A.O.; Usman-Hamza, F.E. SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction. In ICCSA 2020: Computational Science and Its Applications—ICCSA 2020; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12254. [Google Scholar] [CrossRef]
Ahmed, S.; Srinivasu, P.N.; Alhumam, A.; Alarfaj, M. AAL and Internet of Medical Things for Monitoring Type-2 Diabetic Patients. Diagnostics 2022, 12, 2739. [Google Scholar] [CrossRef] [PubMed]
Goyal, S.; Bhatia, P.K. Heterogeneous stacked ensemble classifier for software defect prediction. Multimed. Tools Appl. 2022, 81, 37033–37055. [Google Scholar] [CrossRef]
Ali, U.; Aftab, S.; Iqbal, A.; Nawaz, Z.; Bashir, S.; Saeed, M.A. Software Defect Prediction Using Variant based Ensemble Learning and Feature Selection Techniques. Int. J. Mod. Educ. Comput. Sci. 2020, 12, 29–40. [Google Scholar] [CrossRef]
Cetiner, M.; Sahingoz, O.K. A Comparative Analysis for Machine Learning based Software Defect Prediction Systems. In Proceedings of the 2020 11th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–7. [Google Scholar]
Alsaeedi, A.; Khan, M.Z. Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques: A Comparative Study. J. Softw. Eng. Appl. 2019, 12, 85–100. [Google Scholar]
Iqbal, A.; Aftab, S.; Ullah, I.; Bashir, M.S.; Saeed, M.A. A feature selection-based ensemble classification framework for software defect prediction. Int. J. Mod. Educ. Comput. Sci. 2019, 11, 54. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The architecture of the Federated Learning-based Software Fault Forecasting model.

Figure 2. Fine-tuned Spectrum-Based Fault Localization Framework.

Figure 3. Weight updating mechanism in Federated Learning.

Figure 4. Weight updating mechanism in Federated Learning.

Figure 5. RoC curves generated for NASA-MDP programs using FEDRak.

Figure 6. RoC curves generated for PROMISE programs using FEDRak.

Figure 7. RoC curves generated for Unix Utility programs using FEDRak.

Table 1. Details of program instances associated with datasets.

Program	Category	Number of Instances	Number of Features
CM1	NASA-MDP	505	43
MW1	NASA-MDP	403	43
PC1	NASA-MDP	1107	43
PC3	NASA-MDP	1563	43
PC4	NASA-MDP	1458	43
JM1	PROMISE	10,885	22
KC1	PROMISE	2109	22
Gzip	UUP	6573	21
Sed	UUP	12,062	21
Flex	UUP	13,892	21
Grep	UUP	12,653	21

Table 2. Details of the implementation environment.

Environment Specifications	Details
Machine	Acer Aspire VX 15
Processor	Intel Core i7-7700HQ
RAM	16 GB
Operating System	Windows 10 Home
GPU	NVIDIA GeForce GTX 1050
Programming Language	Python
Libraries Used	Pandas, Numpy, Scikit, Flower, Kong

Table 3. Performance of various models concerning CM1.

Approach	Sensitivity	Specificity	Accuracy	F1-Score
LM [35]	0.911	0.911	0.911	0.911
BFG [35]	0.877	0.883	0.880	0.880
CG [35]	0.908	0.856	0.882	0.885
Fused ANN-BR [35]	0.978	0.969	0.974	0.974
NB [36]	0.786	N/A	0.645	0.340
RF [36]	0.712	N/A	0.609	0.321
C4.5 [36]	0.746	N/A	0.667	0.276
ANN-ABC [36]	0.810	N/A	0.680	0.330
SVM [36]	0.790	N/A	0.786	0.321
KNN [36]	0.847	N/A	N/A	0.843
DT [36]	0.742	N/A	0.734	0.812
PC-SVM [36]	0.990	N/A	0.952	0.975
FEDRak (L)	0.981	0.971	0.975	0.979
FEDRak (G)	0.972	0.932	0.957	0.960

Table 4. Performance of various models concerning MW1.

Approach	Sensitivity	Specificity	Accuracy	F1-Score
LM [35]	0.940	0.948	0.944	0.943
BFG [35]	0.935	0.936	0.932	0.931
CG [35]	0.936	0.936	0.936	0.936
Fused ANN-BR [35]	0.968	0.960	0.964	0.964
FEDRak (L)	0.980	0.969	0.972	0.970
FEDRak (G)	0.986	0.964	0.962	0.958
	(27)

Table 5. Performance of various models concerning PC1.

Approach	Sensitivity	Specificity	Accuracy	F1-Score
LM [35]	0.938	0.938	0.938	0.938
BFG [35]	0.932	0.932	0.932	0.932
CG [35]	0.932	0.932	0.932	0.932
Fused ANN-BR [35]	0.986	0.980	0.983	0.983
FEDRak (L)	0.988	0.986	0.981	0.982
FEDRak (G)	0.959	0.970	0.969	0.968

Table 6. Performance of various models concerning PC3.

Approach	Sensitivity	Specificity	Accuracy	F1-Score
LM [35]	0.894	0.919	0.906	0.905
BFG [35]	0.872	0.887	0.880	0.879
CG [35]	0.884	0.882	0.883	0.883
Fused ANN-BR [35]	0.968	0.967	0.968	0.968
BN [37]	N/A	N/A	0.676	0.731
DT [37]	N/A	N/A	0.847	0.839
BN + SMOTE [37]	N/A	N/A	0.882	0.734
DT + SMOTE [37]	N/A	N/A	0.880	0.880
FEDRak (L)	0.970	0.975	0.973	0.972
FEDRak (G)	0.966	0.972	0.966	0.969

Table 7. Performance of various models concerning PC4.

Approach	Sensitivity	Specificity	Accuracy	F1-Score
LM [35]	0.911	0.904	0.907	0.908
BFG [35]	0.908	0.914	0.911	0.911
CG [35]	0.892	0.919	0.906	0.905
Fused ANN-BR [35]	0.981	0.979	0.980	0.980
BN [37]	N/A	N/A	0.728	0.767
DT [37]	N/A	N/A	0.869	0.869
BN + SMOTE [37]	N/A	N/A	0.861	0.861
DT + SMOTE [37]	N/A	N/A	0.913	0.913
FEDRak (L)	0.981	0.987	0.983	0.977
FEDRak (G)	0.969	0.973	0.975	0.970

Table 8. Performance of various models concerning JM1.

Approach	Sensitivity	Specificity	Accuracy	F1-Score
LM [35]	0.805	0.798	0.801	0.802
BFG [35]	0.813	0.784	0.799	0.801
CG [35]	0.792	0.804	0.798	0.797
Fused ANN-BR [35]	0.819	0.817	0.818	0.818
FEDRak (L)	0.912	0.932	0.932	0.923
FEDRak (G)	0.899	0.910	0.907	0.910

Table 9. Performance of various models concerning KC1.

Approach	Sensitivity	Specificity	Accuracy	F1-Score
LM [35]	0.802	0.804	0.803	0.803
BFG [35]	0.790	0.777	0.783	0.785
CG [35]	0.777	0.790	0.783	0.782
Fused ANN-BR [35]	0.857	0.854	0.855	0.856
NB [36]	0.743	N/A	0.658	0.357
RF [36]	0.758	N/A	0.679	0.379
C4.5 [36]	0.756	N/A	0.680	0.340
ANN-ABC [36]	0.770	N/A	0.690	0.330
SVM [36]	0.812	N/A	0.289	0.792
KNN [36]	0.847	N/A	N/A	0.843
DT [36]	0.941	N/A	0.863	0.877
PC-SVM [36]	0.996	N/A	0.866	0.928
BN [37]	N/A	N/A	0.683	0.689
DT [37]	N/A	N/A	0.741	0.717
BN + SMOTE [37]	N/A	N/A	0.724	0.724
DT + SMOTE [37]	N/A	N/A	0.797	0.797
FEDRak (L)	0.914	0.950	0.943	0.941
FEDRak (G)	0.909	0.915	0.918	0.917

Table 10. Performance of Unix utility programs.

Classes	Approach	Sensitivity	Specificity	Accuracy	F1-Score
Gzip	FEDRak (L)	0.906	0.915	0.912	0.909
Gzip	FEDRak (G)	0.890	0.894	0.890	0.888
Sed	FEDRak (L)	0.900	0.910	0.907	0.906
Sed	FEDRak (G)	0.880	0.881	0.879	0.875
Flex	FEDRak (L)	0.909	0.917	0.918	0.910
Flex	FEDRak (G)	0.867	0.871	0.886	0.883
Grep	FEDRak (L)	0.917	0.923	0.920	0.914
Grep	FEDRak (G)	0.898	0.900	0.899	0.897

Table 11. Summary of performance analysis for all the classes.

Approach	Sensitivity	Specificity	Accuracy	F1-Score
CM1	0.976	0.951	0.966	0.969
MW1	0.983	0.966	0.967	0.964
PC1	0.973	0.978	0.975	0.975
PC3	0.968	0.973	0.969	0.970
PC4	0.975	0.980	0.979	0.973
JM1	0.905	0.921	0.919	0.916
KC1	0.911	0.932	0.930	0.929
Gzip	0.898	0.904	0.901	0.898
Sed	0.890	0.895	0.893	0.890
Flex	0.888	0.867	0.902	0.896
Grep	0.907	0.911	0.909	0.905
Average Values	0.934	0.934	0.937	0.935

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alhumam, A. FEDRak: Federated Learning-Based Symmetric Code Statement Ranking Model for Software Fault Forecasting. Symmetry 2023, 15, 1562. https://doi.org/10.3390/sym15081562

AMA Style

Alhumam A. FEDRak: Federated Learning-Based Symmetric Code Statement Ranking Model for Software Fault Forecasting. Symmetry. 2023; 15(8):1562. https://doi.org/10.3390/sym15081562

Chicago/Turabian Style

Alhumam, Abdulaziz. 2023. "FEDRak: Federated Learning-Based Symmetric Code Statement Ranking Model for Software Fault Forecasting" Symmetry 15, no. 8: 1562. https://doi.org/10.3390/sym15081562

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FEDRak: Federated Learning-Based Symmetric Code Statement Ranking Model for Software Fault Forecasting

Abstract

1. Introduction

2. Literature

3. Background

3.1. Feature Selection and Scaling

3.2. Information Extraction

3.3. Dataset Description

3.4. Implementation Environment

3.5. Implementation of Federated Learning

4. Proposed Methodology

4.1. Fine-Tuned Spectrum-Based Fault Localization

4.2. Feature Weight Upgradation at Federated Server

5. Results and Discussion

5.1. Performance Analysis

5.2. Threats to Validity

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI