Next Article in Journal
Fair Benchmark for Unsupervised Node Representation Learning
Previous Article in Journal
The Assignment Problem and Its Relation to Logistics Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Method of Accuracy Increment Using Segmented Regression

1
Electrical Engineering Department, Faculty of Engineering Technology, Al-Balqa Applied University, Amman 19117, Jordan
2
Faculty of Artificial Intelligence, Al-Balqa Applied University, Amman 19117, Jordan
3
Department of Telecommunication and Radioelectronic Systems, National Aviation University, 03058 Kyiv, Ukraine
*
Author to whom correspondence should be addressed.
Algorithms 2022, 15(10), 378; https://doi.org/10.3390/a15100378
Submission received: 2 September 2022 / Revised: 12 October 2022 / Accepted: 13 October 2022 / Published: 17 October 2022

Abstract

:
The main purpose of mathematical model building while employing statistical data analysis is to obtain high accuracy of approximation within the range of observed data and sufficient predictive properties. One of the methods for creating mathematical models is to use the techniques of regression analysis. Regression analysis usually applies single polynomial functions of higher order as approximating curves. Such an approach provides high accuracy; however, in many cases, it does not match the geometrical structure of the observed data, which results in unsatisfactory predictive properties. Another approach is associated with the use of segmented functions as approximating curves. Such an approach has the problem of estimating the coordinates of the breakpoint between adjacent segments. This article proposes a new method for determining abscissas of the breakpoint for segmented regression, minimizing the standard deviation based on multidimensional paraboloid usage. The proposed method is explained by calculation examples obtained using statistical simulation and real data observation.

1. Introduction

Scientists use various models when studying different environmental phenomena. Mathematical models provide an opportunity to determine equations and dependencies to correlate the parameters of miscellaneous objects and processes. Mathematical models are built for various reasons, including the achievement of the best understanding of the objects under study, the possibility of mathematical analysis, and the possibility of conducting experimentation with the model in case it is difficult to repeat the experiment with the objects under study [1].
The process of mathematical model building contains several steps:
(1)
Experimental study and the measuring of the parameters of real-world systems and phenomena;
(2)
Collecting initial data for the model;
(3)
Mathematical formulations and fitting one or more models;
(4)
The statistical simulation of the model to validate it [2].
There are general rules for building mathematical models. These rules assume the following: (1) collecting background information for the phenomenon under study, (2) using simple models at the first stage, (3) determining all parameters and the quantities and correlations between them based on data analysis, (4) complicating the model based on the nature of the phenomenon under study, (5) estimating the efficiency of the model, and (6) others [3]. The efficiency analysis involves choosing the optimal mathematical model for the problem considered.
There are various efficiency measures for mathematical models. Generally, researchers use the following parameters:
(1)
Accuracy—for the coincidence analysis of the output of a mathematical model with observed data;
(2)
Reliability—for the analysis of the precision of a mathematical model;
(3)
Transparency—for the analysis of choices and assumptions of the output expectations [4,5].
To analyze mathematical models, researchers can use additional criteria, such as model simplicity, calculation time, costs, depth level, and others.
The main parameters for the efficiency level of mathematical models in terms of accuracy analysis are standard deviation [6,7], the sum of absolute deviations between the model output and the observed data [8], a weighted sum of squared deviations [9,10], and the maximal deviation [11]. The criterion for these parameters is the minimum value of the estimated parameter [12,13].
This article contains seven sections. The first section discusses the background information for the problems of mathematical model building. The second section presents a literature review regarding the topic of research and presents the statement of the problem. The third section deals with the description of mathematical tools for segmented regression building while using ordinary least squares. The fourth section proposes the step-by-step procedure for accuracy increment during segmented regression usage. The fifth section concentrates on the analysis of the proposed method based on statistical simulations. The sixth section discusses the implementation of the proposed method in real data examples, and the seventh section presents the conclusions.

2. Literature Review and Statement of the Problem

Mathematical model building aims at decreasing the uncertainty level for the objects being studied [14,15]. The analysis of the level, location, and nature of uncertainty helps to obtain more reliable information and adequate knowledge [16,17].
To build mathematical models, researchers use methods from different sciences, such as mathematical analysis, probability theory, data science, regression analysis, mathematical statistics, recognition theory, applied geometry, and others [18].
This article concentrates on the techniques of regression analysis for mathematical model building, so corresponding methods are considered in detail. Regression analysis is used to determine the relationship between two or more variables [19] and is widely used to fit mathematical models to statistical data [20].
Regression analysis is frequently used in various applications due to its approximate ease of calculation, high accuracy, and good predictive properties, depending on the approximating function type usage. Regression analysis is applied to different fields in different capacities, for example, in:
(1)
Medicine: to detect Parkinson’s disease based on the analysis of finger-tapping data [21], to forecast the uptake of oxygen based on genes evaluation and to predict data on patient admission [22], and others;
(2)
Econometrics: to predict the audit opinion using six financial indicators [23], to determine the dependence of economic growth on the level of environmental pollution [24], to describe the trends of economical parameters in correlation with various factors [25,26], and others;
(3)
Transport systems: to determine the optimal periodicity of the implementation of operation processes [27,28], and to analyze possible routes and traffic intensity [29,30,31];
(4)
Aviation: to identify flight conditions and situations based on diagnostic parameter monitoring [32,33], and to predict the human state and decision making depending on various environmental factors [34,35];
(5)
Radar systems: to estimate the efficiency of signal detection [36], to determine the dependence of weather parameters on radar-received signals [37,38,39], and others;
(6)
Navigation systems: to build a mathematical model for the optimal selection of the navigation equipment [40,41,42], to establish the correlation between navigation equipment failures [43], to approximate operational data trends for the prediction of possible aviation events [44], and others;
(7)
Cybersecurity: to evaluate the efficiency of information web-resources functioning [45], to synthesize data-processing algorithms while detecting cyberattacks [46,47,48], to ensure high-level security against cyberattacks [49,50], and others;
(8)
Engineering and control: to describe nonlinear dynamic object behavior [51,52], to build the mathematical model for statistical parameters while designing control systems [53], to make decisions based on statistical information processing [54,55], and others;
(9)
Equipment maintenance: to build the mathematical model for diagnostic variable trends [56], and to determine the uncertainty level while conducting condition monitoring and maintenance preference analysis [57,58];
(10)
Reliability analysis: to describe the behavior of reliability parameters [59,60], to simulate statistically nonstationary random processes of failures occurrence [61,62], to describe the processes of technical condition deterioration in the trend of failure rate [63,64], and others.
Regression analysis usually starts with research on the possibility of using a linear regression model. In the case of an unsatisfactory level of accuracy, more complicated models are used [65]. These models are nonlinear regression models [66]. Nonlinear regression models suggest parabolic, hyperbolic, exponential, segmented, and other approximating functions [65,67]. Because of the complicated calculations required when using a nonlinear regression model, various software can be utilized [68].
There are various methods for increasing the accuracy and predictive properties of mathematical models. One approach is to use segmented regression [69,70]. In this case, it is necessary to determine the coordinates of the breakpoint between adjacent segments. This problem can be solved using various algorithms [69,70,71,72,73,74,75]. These algorithms use the maximum likelihood estimator [69,70], Bayesian changepoint models [71,72], inverted F test [73], random search method, the method of cumulative sums [74,75], and others. A comparative analysis showed some flaws in the algorithms for determining breakpoint coordinates. These flaws are related to a need for prior limitations, as well as the effectiveness of the obtained estimate in terms of robustness and bias. Additionally, the discussed algorithms do not give the possibility to obtain a single mathematical formula for breakpoint coordinates and require the usage of the iterative numerical method described in [76].
The considered literature review motivates authors to synthesize a new approach for calculating the optimal coordinates of breakpoints while using segmented regression and analyzing time series with nonstationary behavior. The building of a mathematical model based on segmented regression usage is of considerable importance because:
  • Using segmented regression gives the possibility to obtain a model with greater accuracy.
  • Segmented regression more correctly describes the geometrical structure of time series.
  • The obtained segmented models have effective predictive properties.
The research gap in the field of mathematical model building is associated with the absence of a step-by-step procedure for determining the optimal segmented regression model in case of multiple breakpoints in a dataset structure. At the same time, to solve such problems, the method of simple enumeration of the possible options is often used. However, such an approach does not provide mathematical formulations and requires a long computing time.
Therefore, the goal of this article is: (1) to describe the technique of segmented regression building and (2) to obtain mathematical equations for a step-by-step procedure of accuracy increment based on optimal breakpoints abscissas calculations.
Let us state the research problem mathematically. Let us present the statistical dataset in two arrays Y = { y i } and X = { x i } , each with sample size n . Y is the dependent or response variable, while X is the independent or predictor variable. The relationship between the variables is determined by the function set ϕ k ( X ,   c m ,   k ) , where k describes the quantity of the model being fitted to the dataset and c m ,   k is a vector of m parameters for the k -th regression model. In this case, the regression model is determined by the equation [65]
Y = ϕ k ( X ,   c m ,   k ) + Δ ,
where Δ is an error, which can be described by a normal probability density function. Such an assumption allows the use of ordinary least squares (OLS). For example, in the case of linear regression, ϕ 0 ( X ,   c m ,   0 ) = c 0 ,   0 + c 1 ,   0 X , where c 0 ,   0 and c 1 ,   0 are coefficients to be estimated.
This paper focuses on increasing the accuracy of mathematical models based on segmented regression usage. In this case, the function set ϕ k ( X ,   c m ,   k ,   x br   q ,   k ) depends on abscissas x br   q ,   k of the breakpoints, where q is the quantity of breakpoints. The accuracy of the model using OLS is usually estimated by the standard deviation σ between the model output and the observed data. The standard deviation depends on the values of abscissas x br   q ,   k of the breakpoint. Thus, this paper aims to solve the minimization problem that can be formulated as follows:
{ x br   opt 1 ,     x br   opt 2 ,   ,   x br   opt   q } = arg     min ( σ ( x br   1 ,     x br   2 ,   ,   x br     q )   ) .

3. Segmented Regression Models

This section presents the basic mathematical equations for different segmented regression models. Authors mostly employ piecewise linear, linear-quadratic, and quadratic models.
  • Segmented linear regression (SLR)
This regression type is a sequential connection of q + 1 straight-line segments without discontinuities. The mathematical model of SLR is given as
ϕ 1 ( X ,   c m ,   1 ,   x br   q ,   1 ) = c 0 ,   1 + c 1 ,   1 X + i = 1 q c i + 1 ,     1 ( X x br     i ) h ( X x br     i )
where h ( X x br     i ) is the Heaviside function. This function helps to obtain the single mathematical equation for the segmented model.
An example of a mathematical model of three-segmented linear regression has the form
ϕ 1 ( X ,   c m ,   1 ,   x br   q ,   1 ) = c 0 ,   1 + c 1 ,   1 X + c 2 ,   1 ( X x br     1 ) h ( X x br     1 ) + c 3 ,   1 ( X x br     2 ) h ( X x br     2 ) .
This model has two breakpoints, x br     1 and x br     2 , and it requires the computation of four unknown coefficients: c 0 ,   1 , c 1 ,   1 , c 2 ,   1 , and c 3 ,   1 . These coefficients are estimated based on the OLS. The computation result can be presented in the form of matrix equations
C = Ω 1 Ψ ,   C = ( c 0 ,   1 c 1 ,   1 c 2 ,   1 c 3 ,   1 ) ,   Ψ = ( i = 1 n y i i = 1 n y i x i x i > x br     1 y i ( x i x br     1 ) x i > x br     2 ( x i x br     2 ) ) ,
Ω = ( n i = 1 n x i x i > x br     1 ( x i x br     1 ) x i > x br     2 ( x i x br     2 ) i = 1 n x i i = 1 n x i 2 x i > x br     1 x i ( x i x br     1 ) x i > x br     2 x i ( x i x br     2 ) x i > x br     1 ( x i x br     1 ) x i > x br     1 x i ( x i x br     1 ) x i > x br     1 ( x i x br     1 ) 2 x i > x br     2 ( x i x br     1 ) ( x i x br     2 ) x i > x br     2 ( x i x br     2 ) x i > x br     2 x i ( x i x br     2 ) x i > x br     2 ( x i x br     1 ) ( x i x br     2 ) x i > x br     2 ( x i x br     2 ) 2 ) ,
where x i > x br     1   ( 2 ) corresponds to all x i greater than x br     1   ( 2 ) .
2.
Segmented quadratic regression (SQR)
This regression type is a sequential connection of q + 1 quadratic parabola segments without discontinuities. The mathematical model of SQR is given as
ϕ 2 ( X ,   c m ,   2 ,   x br   q ,   2 ) = c 0 ,   2 + c 1 ,   2 X + c 2 ,   2 X 2 + i = 1 q c i + 2 ,     2 ( X x br     i ) 2 h ( X x br     i )
An example of a mathematical model of two-segmented quadratic regression has the form
ϕ 2 ( X ,   c m ,   2 ,   x br   q ,   2 ) = c 0 ,   2 + c 1 ,   2 X + c 2 ,   2 X 2 + c 3 ,   2 ( X x br     1 ) 2 h ( X x br     1 )
This model has one breakpoint, x br     1 , and it requires the computation of four unknown coefficients: c 0 ,   2 , c 1 ,   2 , c 2 ,   2 , and c 3 ,   2 . These coefficients are estimated based on the OLS. The computation result can be presented in the form of matrix equations
C = Ω 1 Ψ C = ( c 0 ,   1 c 1 ,   1 c 2 ,   1 c 3 ,   1 ) ,   Ψ = ( i = 1 n y i i = 1 n y i x i i = 1 n y i x i 2 x i > x br     1 y i ( x i x br     1 ) 2 ) ,
Ω = ( n i = 1 n x i i = 1 n x i 2 x i > x br     1 ( x i x br     1 ) 2 i = 1 n x i i = 1 n x i 2 i = 1 n x i 3 x i > x br     1 x i ( x i x br     1 ) 2 i = 1 n x i 2 i = 1 n x i 3 i = 1 n x i 4 x i > x br     1 x i 2 ( x i x br     1 ) 2 x i > x br     1 ( x i x br     1 ) 2 x i > x br     1 x i ( x i x br     1 ) 2 x i > x br     1 x i 2 ( x i x br     1 ) 2 x i > x br     1 ( x i x br     1 ) 4 ) .
3.
Segmented linear-quadratic regression (SLQR)
This regression type is a sequential connection of q + 1 straight lines and quadratic parabola segments without discontinuities. The mathematical model of SLQR is given as
ϕ 3 ( X ,   c m ,   3 ,   x br   q ,   3 ) = c 0 ,   3 + c 1 ,   3 X + c 2 ,   3 X 2 s ( 0 ) + i = 1 q c i + 2 ,     3 ( X x br     i ) s ( i ) + 1 h ( X x br     i ) ,  
where s ( i ) is an indicator function. If the segment is a straight line, s ( i ) = 0 . If the segment is a quadratic parabola, s ( i ) = 1 .
An example of a mathematical model of two-segmented linear-quadratic regression has the form
f 3 ( X ) = c 0 ,   3 + c 1 ,   3 X + c 2 ,   3 X 2 c 3 ,   3 ( X x br     1 ) 2 h ( X x br     1 ) .
This model has one breakpoint, x br     1 , and it requires the computation of three unknown coefficients: c 0 ,   3 , c 1 ,   3 , and c 2 ,   3 . The feature of this model is the equality of adjacent coefficients for the transition between the quadratic parabola segment and the straight-line segment. Thus, c 3 ,   3 = c 2 ,   3 . The coefficients are estimated based on the OLS. The computation result can be presented in the form of matrix equations
C = Ω 1 Ψ ,   C = ( c 0 ,   3 c 1 ,   3 c 2 ,   3 ) ,   Ψ = ( i = 1 n y i i = 1 n y i x i i = 1 n y i x i 2 x i > x br     1 y i ( x i x br     1 ) 2 ) ,
Ω = ( n i = 1 n x i i = 1 n x i 2 x i > x br     1 ( x i x br     1 ) 2 i = 1 n x i i = 1 n x i 2 i = 1 n x i 3 x i > x br     1 x i ( x i x br     1 ) 2 i = 1 n x i 2 x i > x br     1 ( x i x br     1 ) 2 i = 1 n x i 3 x i > x br     1 x i ( x i x br     1 ) 2 i = 1 n x i 4 + x i > x br     1 ( ( x i x br     1 ) 4 2 x i 2 ( x i x br     1 ) 2 ) )

4. Step-by-Step Procedure for Accuracy Increment during Segmented Regression Usage

The method of accuracy increment during segmented regression usage is associated with the estimation of breakpoint abscissas. The breakpoint is the point of connection between two neighboring segments.
The step-by-step procedure contains the following operations:
  • Choosing of the regression model and the quantity of segments. At this stage, the researcher analyzes the geometrical structure of the observed data presented graphically in the form of the dependence of Y on X . After that, based on their experience, the researcher must choose one of the models SLR, SQR, and SLQR. To substantiate the decision on segmented regression usage, the researcher can test the initial data for nonlinearity. The geometrical structure of the observed data also gives the ability to choose the quantity q of the breakpoints
  • Determining the possible range of values of the breakpoint abscissas. At this stage, the researcher subjectively chooses the discrete range for all breakpoints. The minimal quantity of discrete values should be greater than five. The result of this step is a two-dimensional array x b r with size q × w , where w is the number of discrete values in the range of breakpoint abscissas.
  • Building a regression model. At this stage, based on the matrix equations presented in the previous section, the researcher calculates the unknown coefficients for the chosen regression model and all possible values in the array x b r .
  • Calculating the standard deviations. In the case of OLS usage, the accuracy of the model is determined by the standard deviation between the model output and the observed data, which can be presented as follows:
    σ = 1 n l i = 1 n ( y i y ^ i ) 2 ,
    where l is the degree of freedom for the chosen regression model.
At this stage, it is necessary to determine the discrete multidimensional dependence σ ( x br   1 ,     x br   2 ,   ,   x br     q ) for all possible values in the array x b r .
Note that in the case of an alternative regression method (for example, least absolute deviations regression), similar calculations for corresponding accuracy measures should be completed.
5.
Approximating the standard deviation dependence on the breakpoint abscissas by multidimensional paraboloid using OLS. The dimension of the paraboloid corresponds to the quantity q of breakpoints. It is possible to use one of two types of paraboloid:
(a)
General:
Σ ( x br   1 ,     x br   2 ,   ,   x br     q ) = α 0 + i = 1 q α i x br     i 2 + i = 1 q β i x br     i + i < j γ i ,   j x br     i x br     j ,  
(a)
Simplified:
Σ ( x br   1 ,     x br   2 ,   ,   x br     q ) = α 0 + i = 1 q α i x br     i 2 + i = 1 q β i x br     i ,  
where A i , β i , and γ i ,   j are approximation coefficients. The simplified paraboloid (5) can be used in case of assumptions about γ i ,   j = 0 for the general paraboloid (4).
The coefficients of Equations (4) and (5) are estimated based on OLS. Such a calculation is possible, because all of the values of the possible breakpoints in the two-dimensional array Χ br with size q × w are known, and function Σ ( x br   1 ,     x br   2 ,   ,   x br     q ) values correspond to the standard deviations σ ( x br   1 ,     x br   2 ,   ,   x br     q ) obtained at the previous step.
Consider the case of a simplified paraboloid. According to OLS, it is necessary to solve the system of equations
{ α 0 i 1 = 1 w i q = 1 w ( σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) ( α 0 + j = 1 q α j x br     j ,   i j 2 + j = 1 q β j x br     j ,   i j ) ) 2 = 0 , α 1 i 1 = 1 w i q = 1 w ( σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) ( α 0 + j = 1 q α j x br     j ,   i j 2 + j = 1 q β j x br     j ,   i j ) ) 2 = 0 , β 1 i 1 = 1 w i q = 1 w ( σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) ( α 0 + j = 1 q α j x br     j ,   i j 2 + j = 1 q β j x br     j ,   i j ) ) 2 = 0 , α q i 1 = 1 w i q = 1 w ( σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) ( α 0 + j = 1 q α j x br     j ,   i j 2 + j = 1 q β j x br     j ,   i j ) ) 2 = 0 , β q i 1 = 1 w i q = 1 w ( σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) ( α 0 + j = 1 q α j x br     j ,   i j 2 + j = 1 q β j x br     j ,   i j ) ) 2 = 0 .
Let us simplify the first equation in the system. After derivative calculation, it can be presented as follows:
2 i 1 = 1 w i q = 1 w ( σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) ( α 0 + j = 1 q α j x br     j ,   i j 2 + j = 1 q β j x br     j ,   i j ) ) = 0
or
i 1 = 1 w i q = 1 w ( α 0 + j = 1 q α j x br     j ,   i j 2 + j = 1 q β j x br     j ,   i j ) = i 1 = 1 w i q = 1 w σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q )
Making simplifications in the left side of equation, we can get
i 1 = 1 w i q = 1 w α 0 + i 1 = 1 w i q = 1 w j = 1 q α j x br     j ,   i j 2 + i 1 = 1 w i q = 1 w j = 1 q β j x br     j ,   i j = i 1 = 1 w i q = 1 w σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) .
Taking into account that
i 1 = 1 w i q = 1 w α 0 = w q ,
i 1 = 1 w i q = 1 w j = 1 q α j x br     j ,   i j 2 = w q 1 j = 1 q i j = 1 w α j x br     j ,   i j 2 ,
i 1 = 1 w i q = 1 w j = 1 q β j x br     j ,   i j = w q 1 j = 1 q i j = 1 w β j x br     j ,   i j ,
the first equation can be presented as follows:
α 0 w q + α 1 w q 1 i 1 = 1 w x br   1     i 1 2 + β 1 w q 1 i 1 = 1 w x br   1     i 1 + α 2 w q 1 i 2 = 1 w x br   2     i 2 2 + β 2 w q 1 i 2 = 1 w x br   2     i 2 + + + α q w q 1 i q = 1 w x br   q     i q 2 + β q w q 1 i q = 1 w x br q     i q = i 1 = 1 w i q = 1 w σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q )
Similar simplifications can be made for other equations in the system. Therefore, the computation result for paraboloid (5) can be presented in the form of matrix equations
C = Ω 1 Ψ ,   C = ( α 0 α 1 β 1 α q β q ) ,   Ψ = ( i 1 = 1 w i q = 1 w σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) i 1 = 1 w i q = 1 w x br   1     i 1 2 σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) i 1 = 1 w i q = 1 w x br   1     i 1 σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) i 1 = 1 w i q = 1 w x br q     i q 2 σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) i 1 = 1 w i q = 1 w x br   q     i q σ ( x br   1     i 1 ,     x br   2     i 2 ,   ,   x br   q     i q ) ) ,
Ω = ( w q w q 1 i 1 = 1 w x br   1     i 1 2 w q 1 i 1 = 1 w x br   1     i 1 w q 1 i q = 1 w x br   q   i q 2 w q 1 i q = 1 w x br   q     i q w q 1 i 1 = 1 w x br   1     i 1 2 w q 1 i 1 = 1 w x br   1     i 1 4 w q 1 i 1 = 1 w x br   1     i 1 3 w q 2 i 1 = 1 w i q = 1 w x br   1     i 1 2 x br   q   i q 2 w q 2 i 1 = 1 w i q = 1 w x br   1     i 1 2 x br   q     i q w q 1 i 1 = 1 w x br   1     i 1 w q 1 i 1 = 1 w x br   1     i 1 3 w q 1 i 1 = 1 w x br   1     i 1 2 w q 2 i 1 = 1 w i q = 1 w x br   1     i 1 x br   q   i q 2 w q 2 i 1 = 1 w i q = 1 w x br   1     i 1 x br   q     i q w q 1 i q = 1 w x br   q   i q 2 w q 2 i 1 = 1 w i q = 1 w x br   1     i 1 2 x br   q   i q 2 w q 2 i 1 = 1 w i q = 1 w x br   1     i 1 x br   q   i q 2 w q 1 i q = 1 w x br   q   i q 4 w q 1 i q = 1 w x br   q   i q 3 w q 1 i q = 1 w x br   q     i q w q 2 i 1 = 1 w i q = 1 w x br   1     i 1 2 x br   q     i q w q 2 i 1 = 1 w i q = 1 w x br   1     i 1 x br   q     i q w q 1 i q = 1 w x br   q   i q 3 w q 1 i q = 1 w x br   q   i q 2 )
6.
Calculating the coordinates of paraboloid optimum. To obtain the minimum standard deviation, it is necessary to determine the coordinates of the minimum multidimensional paraboloid. To do this, the partial derivatives are calculated and equated to zero [77]:
{ Σ ( x br   1 ,     x br   2 ,   ... ,   x br     q ) x br   1 = 0 , Σ ( x br   1 ,     x br   2 ,   ... ,   x br     q ) x br   2 = 0 , ... Σ ( x br   1 ,     x br   2 ,   ... ,   x br     q ) x br     q = 0 .
This system for general paraboloid (4) can be presented in the form of q linear equations system. For paraboloid (5), the solution of the system is given as
x br   i     opt = β i 2 α i .
7.
Calculating the coefficients of the model for the optimal case. The coefficients of SLR, SQR, or SLQR are computed for the optimal location of the breakpoints using OLS. The final model can be used for the explanation and prediction of the response variable.
Consider the simple example for proposed method. Let us use the dataset with a small sample size presented in [6]. These data describe the relationship between production lot size x and the average production cost per unit y (in dollars) and are given in Table 1.
Consider this step-by-step procedure.
  • To describe the presented data, the SLR model with q = 1 breakpoint is chosen.
  • The possible breakpoint abscissas values are x b r = { 160 ;     180 ;   200 ;     220 ;     240 } . Therefore, in this case x b r is a two-dimensional array with size 1 × 5 .
  • There are five alternative SLR models for all possible values in the array xbr:
    ϕ 1 ,   1 ( X ) = 14.446 0.0619 X + 0.0408 ( X 160 ) h ( X 160 ) ,
    ϕ 1 ,   2 ( X ) = 15.825 0.05601 X + 0.0397 ( X 180 ) h ( X 180 ) ,
    ϕ 1 ,   3 ( X ) = 15.117 0.0502 X + 0.03885 ( X 200 ) h ( X 200 ) ,
    ϕ 1 ,   4 ( X ) = 14.324 0.0443 X + 0.0373 ( X 220 ) h ( X 220 ) ,
    ϕ 1 ,   5 ( X ) = 13.713 0.04002 X + 0.0391 ( X 240 ) h ( X 240 ) .
  • The standard deviations for the obtained SLR models are σ = { 0.5 ;     0.362 ;   0.316 ;     0.435 ;     0.53 } .
  • Because of one breakpoint, this multidimensional paraboloid converts into simple parabola. The result of the calculation is
    Σ ( x br   1 ) = 4.41715 0.04446 x br     1 + 1.128 10 4 x br     1 2 .
6.
The optimal value of the breakpoint abscissa is
x br   1     opt = β 1 2 α 1 = 197.031 .
7.
The optimal SLR model is calculated for the obtained breakpoint abscissa. The final equation is
ϕ 1       opt ( X ) = 15.256 0.0513 X + 0.03945 ( X 197.031 ) h ( X 197.031 )
The standard deviation for the optimal SLR model is 0.313. The result of the model building using the SLR model is shown in Figure 1.

5. Analysis of Proposed Method Based on Statistical Simulation

The analysis of the proposed method is performed using statistical simulation and real data examples. This section presents the statistical simulation results. During the simulation, a dataset with two breakpoints is generated using build-in software operators. The dataset is an additive mixture of deterministic components and random noise.
Assume that the deterministic component corresponds to an SLR model
ϕ 1 ( X ,   c m ,   1 ,   x br   q ,   1 ) = c 0 ,   1 + c 1 ,   1 X + c 2 ,   1 ( X x br     1 ) h ( X x br     1 ) + c 3 ,   1 ( X x br     2 ) h ( X x br     2 )
The random noise is distributed according to the Gaussian probability density function.
The initial data for the simulation are as follows:
(1)
Sample size n = 120 ;
(2)
Sampling time δ = 1 (for discrete representation of the deterministic component);
(3)
Predetermined parameters of the SLR model: c 0 ,   1 = 220 , c 1 ,   1 = 3 , c 2 ,   1 = 5 , c 3 ,   1 = 4 , x br     1 = 25 , and x br     2 = 70 (such parameters correspond, for example, to the real process of deterioration occurrence when monitoring the values of voltage for the supply of electronic devices [63]);
(4)
Predetermined parameters of Gaussian noise: the expected value is equal to zero and the standard deviation equal to 20 (additionally, it is assumed that the noise values are independent random variables for any sampling time moment);
(5)
The quantity of simulations reiteration N = 1000 .
Consider the calculation procedure of the proposed method for one of the generated datasets. Table 2 shows one of the generated datasets. Figure 2 presents three realizations of the generated datasets, and each realization is marked by circle, triangle, or diamond (the circles correspond to the data in Table 2).
To describe the obtained dataset, we choose the SLR model with three segments with q = 2 breakpoints. To simplify the calculations, we choose the quantity of discrete values within the range of possible breakpoints to be w = 5 . According to the geometrical structure of the observed dataset (Figure 2), the ranges for two breakpoints are as follows:
x br     1 = { 15 ;     20 ;     25 ;     30 ;   35 } ,
x br     2 = { 60 ;     65 ;     70 ;     75 ;     80 } .
The next step is to evaluate the unknown coefficients c 0 ,   1 , c 1 ,   1 , c 2 ,   1 , and c 3 ,   1 for all possible values of the first and second breakpoints using OLS. As a result, 25 alternative SLR models are obtained.
After that, the standard deviations between the model output and the observed data for these SLR models are determined. Table 3 shows the computation results.
Even visual analysis of the data on the standard deviation (Table 3) indicates that the minimal standard deviation is located approximately near x br     1 = 25 and x br     2 = 70 . To estimate the exact values of breakpoint abscissas, paraboloids (4) and (5) are built using OLS.
After the calculations, the following mathematical equations were obtained:
Σ ( x br   1 ,     x br   2 ) = 148.8 1.1295 x br     1 3.2925 x br     2 + 0.01008 x br     1 2 + 0.02188 x br     2 2 + 8.534 10 3 x br     1 x br     2 ,
Σ ( x br   1 ,     x br   2 ) = 133.87 0.5321 x br     1 3.0791 x br     2 + 0.01008 x br     1 2 + 0.02188 x br     2 2 .
Figure 3 and Figure 4 show the visual presentation of paraboloids (4) and (5) for this numerical example, respectively.
To determine the optimum coordinates for three-dimensional general paraboloid (4), it is necessary to solve the following system of two linear equations:
{ Σ ( x br   1 ,     x br   2 ) x br   1 = 0 , Σ ( x br   1 ,     x br   2 ) x br   2 = 0 .
In this case, the calculation gives the following solution:
x br     1   opt = β 2 γ 1 ,   2 2 α 2 β 1 4 α 1 α 2 γ 1 ,   2 2 ,
x br     2   opt = β 1 + 2 α 1 x br     1   opt γ 1 ,   2 .
The general paraboloid (4) has a minimum standard deviation at the coordinates
x br     1   opt ( gen ) = 26.361 ,
x br     2   opt ( gen ) = 70.086 .
The simplified paraboloid (5) has a minimum standard deviation at the coordinates
x br     1   opt ( sim ) = 26.397 ,
x br     2   opt ( sim ) = 70.351
The results of the calculation for paraboloids (4) and (5) almost coincide. The relative error for the first and second breakpoint abscissa is equal to 5.558% and 0.5014%, respectively.
After the calculation of the model’s coefficients for the optimal case, the optimal SLR models for paraboloids (4) and (5) are obtained:
ϕ 1 ( X ,   c m ,   1 ,   x br   q ,   1 ) = 212.169 2.526 X + 4.711 ( X 26.361 ) h ( X 26.361 ) 4.440 ( X 70.086 ) h ( X 70.086 ) ,
ϕ 1 ( X ,   c m ,   1 ,   x br   q ,   1 ) = 212.022 2.509 X + 4.678 ( X 26.397 ) h ( X 26.397 ) 4.443 ( X 70.351 ) h ( X 70.351 ) .
The obtained SLR models give almost the same standard deviations equal to 18.429 and 18.424, respectively.
Figure 5 shows the generated dataset and final optimal SLR models. Visual analysis shows the coincidence of both SLR models.
We consider the general simulation results for all iterations. Repeating the simulation provides an opportunity to perform a complete statistical analysis of the breakpoint estimation during mathematical model building. An analysis was performed by plotting histograms and evaluating the numerical characteristics of the random variables. Figure 5 shows the histograms for the estimate of two breakpoint abscissas and the usage of different optimization options (general and simplified paraboloids). The parameter λ in Figure 6 is the quantity of breakpoint abscissa estimates, which are located in the corresponding grouping interval of the histogram.
Table 4 shows the numerical characteristics of the breakpoint abscissas estimates (mathematical expectation, standard deviation, range of change, and skewness).
To describe the obtained estimates of breakpoint abscissa completely, it is necessary to fit the histogram by theoretical probability density function. Approximate assumptions can be made based on the graphical view of the histograms in Figure 6. The shape of the histogram can correspond to the Gaussian probability density function. Such an assumption can be proven using the chi-squared test with high confidence probability.
The breakpoint estimation bias has preferable values when the general paraboloid method is used. However, the benefit is negligible and averages 0.337% compared with the simplified paraboloid method. The highest percentage of estimate bias (in relative values) is 3.012%. In the case of a long-term breakpoint, the simplified paraboloid method has, on average, a narrower range of change of breakpoint estimates.
Let us analyze the proposed method in comparison with the method of simple enumeration. To obtain the approximately 3% of breakpoint abscissas estimate bias, the method of simple enumeration requires at least 33 possible values for each breakpoint. Therefore, it is necessary to repeat computations for at least 1089 iterations in the case of two breakpoints. At the same time, the proposed method requires 25 iterations and additional calculations of the paraboloid optimum. Therefore, the proposed method reduces the computing time by at least 30 times compared to the method of simple enumeration.
A comparison of the simulation results for a range of initial data provides the ability to conclude approximately the same accuracy characteristics for SLR models based on general and simplified paraboloid usage. Therefore, in practical cases, the adoption of the simplified paraboloid method usage is more advantageous when creating a segmented regression model because of the reduction in computations and calculation time.

6. Real Data Example

Consider the example of real data on the number of earthquakes with a magnitude of 7 or higher by year, according to the United States Geological Survey [78]. Table 5 presents the corresponding data from 1922 to 2021.
Table 5 contains data observed from 1922 to 2021, where i is the number of observations, X is the year, and Y is the quantity of earthquakes.
Figure 7 shows the graphical view of the dataset.
To simplify the presentation and calculations, the first year of observation (1922) is assigned a zero point at the abscissa axis in the next computations. Thus, to return to the original data, it is necessary to add 1922 for the shifted abscissa axis.
According to the visual analysis of the dataset, let us assume that there are five breakpoints in this realization. The following are the ranges for these breakpoints:
x br     1 = { 17 ;     18 ;     19 ;     20 ;   21 } ,
x br     2 = { 35 ;     36 ;     37 ;     38 ;     39 } ,
x br     3 = { 46 ;     47 ;     48 ;     49 ;     50 } ,
x br     4 = { 55 ;     56 ;     57 ;     58 ;     59 }   and
x br     5 = { 82 ;     83 ;     84 ;     85 ;     86 } .
With such a range of variables, 3125 different SLR model options are available. The standard deviation is calculated for each case. As a result, a six-dimensional array σ ( x br   1 ,     x br   2 ,   x br   3 ,   x br   4 ,   x br   5 ) is generated. To approximate the obtained data, OLS is used on a six-dimensional optimization paraboloid. For simplicity, we used a simplified paraboloid as follows:
Σ ( x br   1 ,     x br   2 ,   x br   3 ,   x br   4 ,   x br   5 ) = 55431 3.085 x br     1 + 4.832 x br     2 + 79.227 x br     3 + 33.495 x br     4 + 1251 x br     5 + 0.0811 x br     1 2 0.0651 x br     2 2 0.8251 x br     3 2 0.2937 x br     4 2 7.445 x br     5 2 .
This simplified paraboloid has optimum standard deviation at the coordinates
x br     1   opt ( sim ) = 19.01 ,   x br     2   opt ( sim ) = 37.086 ,   x br     3   opt ( sim ) = 48.008 ,
x br     4   opt ( sim ) = 57.019 ,   x br     5   opt ( sim ) = 84.005 .
After the calculation of the model’s coefficients for the optimal case of breakpoint locations using OLS, the final SLR model is obtained:
ϕ 1 ( X ,   c m ,   1 ,   x br   q ,   1 ) = 10.197 + 0.1356 X 0.3553 ( X 19.01 ) h ( X 19.01 ) + 0.8002 ( X 37.086 ) h ( X 37.086 ) 1.1212 ( X 48.008 ) h ( X 48.008 ) + 0.7777 ( X 57.019 ) h ( X 57.019 ) 0.4252 ( X 48.008 ) h ( X 84.005 ) .
The standard deviation for the obtained SLR model is equal to 3.799. Figure 8 shows the observed dataset and the final optimal SLR model.
The method of simple enumeration for a given dataset gives approximately the same result as that shown in Figure 8. However, this method increases the computing time approximately twice. Polynomial regression using a seventh-order polynomial is characterized by a faster computation time; however, it gives unacceptable predictive properties.
The results of the mathematical model building can be used for solving prediction problems. Consider this problem for the observed dataset based on generally known results that have been extensively described in the literature (for example, [76,77]) and innovative methods that may be used in accordance with the properties of segmented regression models.
To predict the future trend, let us determine the range of the SLR model change. For this purpose, we used a straight line and OLS to approximate the upper and lower ordinates of the breakpoints. The lower line contains the zero point, and the second and fourth breakpoints. The upper line contains the first, third, and fifth breakpoints. The numerical values of the calculated equations are
Y ( lower ) ( X ) = 9.871 3.606 10 3 X ,   Y ( upper ) ( X ) = 11.881 + 0.0592 X .
The last segment of the SLR model is continued to the intersection point with the lower straight line. Figure 9 shows the visual representation of the trend prediction.
This method of prediction and the obtained SLR model allow us to anticipate that, through 2042, the average annual number of earthquakes with a magnitude of 7 or higher would decrease.
In general, the proposed method can be applied to different datasets and, in the case of using multidimensional optimization, to determine breakpoints.

7. Conclusions

This article presents a method of accuracy increment when segmented regression is used. The main problem for segmented regression model building is the estimation of the coordinates of the breakpoint between adjacent segments. To solve this problem, two types of multidimensional optimization paraboloids are used. The paraboloids contain information on standard deviations between the model output and the observed data for different sets of possible values of breakpoint abscissas. The minimum standard deviation of each paraboloid coincided with the optimal position of the breakpoints.
A step-by-step procedure for the proposed method was described by examples based on statistical simulation and real data observation.
Generally, the use of SLR, SQR, and SLQR models provides a mathematical model with high accuracy, more accurately describes the geometrical structure of the analyzed dataset, and has good predictive properties.
The results of this research can be used during mathematical model building for statistical data obtained in various branches of human activity.

Author Contributions

Conceptualization, J.A.-A. and M.Z.; methodology, J.A.-A. and V.K.; software, R.O.; validation, J.A.-A., A.M. and M.Z.; formal analysis, R.O.; investigation, J.A.-A.; resources, A.M.; data curation, M.Z.; writing—original draft preparation, J.A.-A. and M.Z.; writing—review and editing, J.A.-A. and M.Z.; visualization, R.O.; supervision, V.K.; project administration, J.A.-A.; funding acquisition, J.A.-A. and A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in the study are obtained in two ways: (1) using random number generator during statistical simulation; (2) using open source from The United States Geological Survey with citing corresponding link.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

OLSOrdinary least squares
SLQRSegmented linear-quadratic regression
SLRSegmented linear regression
SQRSegmented quadratic regression

References

  1. Williams, H.P. Model Building in Mathematical Programming, 5th ed.; John Wiley & Sons: New York, NY, USA, 2013. [Google Scholar]
  2. Banwarth-Kuhn, M.; Sindi, S. How and why to build a mathematical model: A case study using prion aggregation. J. Biol. Chem. 2020, 295, 5022–5035. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Neumaier, A. Mathematical model building. In Modeling Languages in Mathematical Optimization; Applied Optimization; Kallrath, J., Ed.; Springer: Boston, MA, USA, 2004; Volume 88. [Google Scholar]
  4. Bodner, K.; Fortin, M.-J.; Molnár, P.K. Making predictive modelling ART: Accurate, reliable, and transparent. Ecosphere 2020, 11, e03160. [Google Scholar] [CrossRef]
  5. Ostroumov, I.V.; Kuzmenko, N.S. Accuracy estimation of alternative positioning in navigation. In Proceedings of the 4th IEEE International Conference on Methods and Systems of Navigation and Motion Control, Kiev, Ukraine, 18–20 October 2016; pp. 291–294. [Google Scholar]
  6. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
  7. Zaliskyi, M.; Petrova, Y.; Asanov, M.; Bekirov, E. Statistical data processing during wind generators operation. Int. J. Electr. Electron. Eng. Telecommun. 2019, 8, 33–38. [Google Scholar] [CrossRef] [Green Version]
  8. Giloni, A.; Padberg, M. Alternative methods of linear regression. Math. Comput. Model. 2002, 35, 361–374. [Google Scholar] [CrossRef] [Green Version]
  9. Kaufman, R. Heteroskedasticity in Regression: Detection and Correction; Sage Publications: Los Angeles, CA, USA, 2013. [Google Scholar] [CrossRef]
  10. Hassan, M.; Hossny, M.; Nahavandi, S.; Creighton, D. Heteroskedasticity variance index. In Proceedings of the 14th International Conference on Computer Modelling and Simulation, Cambridge, UK, 28–30 March 2012; Volume 2012, pp. 135–141. [Google Scholar]
  11. Cheng, F. Maximum deviation of error density estimators in censored linear regression. Stat. Probab. Lett. 2012, 82, 1657–1664. [Google Scholar] [CrossRef]
  12. Ostroumov, I.; Kuzmenko, N.S. Accuracy assessment of aircraft positioning by multiple radio navigational AIDS. Telecommun. Radio Eng. 2018, 77, 705–715. [Google Scholar] [CrossRef]
  13. Weisberg, S. Applied Linear Regression; John Wiley and Sons: New York, NY, USA, 2005. [Google Scholar]
  14. Prokopenko, I.; Omelchuk, I.; Maloyed, M. Synthesis of signal detection algorithms under conditions of aprioristic uncertainty. In Proceedings of the 2020 IEEE Ukrainian Microwave Week, Kharkiv, Ukraine, 21–25 September 2020; pp. 418–423. [Google Scholar] [CrossRef]
  15. Prokopenko, I. Robust methods and algorithms of signal processing. In Proceedings of the IEEE Microwaves, Radar and Remote Sensing Symposium, Kyiv, Ukraine, 29–31 August 2017; Volume 2017, pp. 71–74. [Google Scholar] [CrossRef]
  16. Walker, W.; Harremoës, P.; Rotmans, J.; van der Sluijs, J.P.; Van Asselt, M.; Janssen, P.; Von Krauss, M.K. Defining uncertainty: A conceptual basis for uncertainty management in model-based decision support. Integr. Assess. 2003, 4, 5–17. [Google Scholar] [CrossRef]
  17. van Oijen, M. Bayesian methods for quantifying and reducing uncertainty and error in forest models. Curr. For. Rep. 2017, 3, 269–280. [Google Scholar] [CrossRef] [Green Version]
  18. Rawlings, J.O.; Pantula, S.G.; Dickey, D.A. Applied Regression Analysis: A Research Tool; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
  19. Nachtsheim, C.; Neter, J.; Kutner, M.; Wasserman, W. Applied Linear Statistical Models, 5th ed.; McGraw-Hill: New York, NY, USA, 2005. [Google Scholar]
  20. Atkinson, A.; Riani, M. Robust Diagnostic Regression Analysis; Springer: New York, NY, USA, 2000. [Google Scholar] [CrossRef]
  21. Sano, Y.; Kandori, A.; Miyoshi, T.; Tsuji, T.; Shima, K.; Yokoe, M.; Sakoda, S. Severity estimation of finger-tapping caused by Parkinson’s disease by using linear discriminant regression analysis. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, USA, 28 August–1 September 2012; Volume 2012, pp. 4315–4318. [Google Scholar]
  22. Jin, R.; Si, L.; Srivastava, S.; Li, Z.; Chan, C. A knowledge driven regression model for gene expression and microarray analysis. In Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society, New York, NY, USA, 30 August–3 September 2006; Volume 2006, pp. 5326–5329. [Google Scholar]
  23. Jinyu, T.; Xin, Z. Apply multiple linear regression model to predict the audit opinion. In Proceedings of the ISECS International Colloquium on Computing, Communication, Control, and Management, Sanya, China, 8–9 August 2009; pp. 303–306. [Google Scholar] [CrossRef]
  24. Yan, X.; Du, X.; Jiang, Y. Research on the relationship between environmental pollution and economy. In Proceedings of the International Conference on Mechanic Automation and Control Engineering, Wuhan, China, 26–28 June 2010; Volume 2010, pp. 1809–1812. [Google Scholar] [CrossRef]
  25. Ding, Y.; Yuechao, D. A new method multi-factor trend regression and its application to economy forecast in Jiangxi. First Int. Workshop Knowl. Discov. Data Min. 2008, 2008, 63–67. [Google Scholar] [CrossRef]
  26. Yun-Ning, Z.; Qian, Y. Regression analysis of the real estate investment and economy increased. In Proceedings of the Third International Conference on Intelligent System Design and Engineering Applications, Hong Kong, China, 16–18 January 2013; Volume 2013, pp. 1099–1101. [Google Scholar]
  27. Goncharenko, A. A multi-optional hybrid functions entropy as a tool for transportation means repair optimal periodicity determination. Aviation 2018, 22, 60–66. [Google Scholar] [CrossRef]
  28. Goncharenko, A. aircraft operation depending upon the uncertainty of maintenance alternatives. Aviation 2017, 21, 126–131. [Google Scholar] [CrossRef]
  29. Dyvak, M.; Darmorost, I.; Shevchuk, R.; Manzhula, V.; Kasatkina, N. Correlation analysis traffic intensity of the motor vehicles and the air pollution by their harmful emissions. In Proceedings of the 14th International Conference on Advanced Trends in Radioelecrtronics, Telecommunications and Computer Engineering (TCSET), Lviv-Slavske, Ukraine, 20–24 February 2018; pp. 855–858. [Google Scholar] [CrossRef]
  30. Li, Z.; Zhang, J.; Zhang, X. A dynamic model for aircraft route optimizing in airport surface management. In Proceedings of the 9th International Conference on Electronic Measurement & Instruments, Beijing, China, 16–19 August 2009; pp. 3-1068–3-1072. [Google Scholar] [CrossRef]
  31. Ostroumov, I.; Marais, K.; Kuzmenko, N. Aircraft positioning using multiple distance measurements and spline prediction. Aviation 2022, 26, 1–10. [Google Scholar] [CrossRef]
  32. Kuzmenko, N.S.; Kharchenko, V.P.; Ostroumov, I.V. Identification of unmanned aerial vehicle flight situation. In Proceedings of the IEEE 5th International Conference on Actual problems of Unmanned Aerial Vehicles Development, Kiev, Ukraine, 17–19 October 2017; pp. 116–120. [Google Scholar]
  33. Bezkorovainyi, Y.N.; Sushchenko, O.A. Improvement of UAV positioning by information of inertial sensors. In Proceedings of the IEEE 5th International Conference on Methods and Systems of Navigation and Motion Control, Kyiv, Ukraine, 16–18 October 2018; pp. 123–126. [Google Scholar]
  34. Shmelova, T.; Sikirda, Y.; Kasatkin, M. Modeling of the collaborative decision making by remote pilot and air traffic controller in flight emergencies. In Proceedings of the 5th International Conference on Actual Problems of Unmanned Aerial Vehicles Developments, Kyiv, Ukraine, 22–24 October 2019; Volume 2019, pp. 230–233. [Google Scholar] [CrossRef]
  35. Shmelova, T.; Sechko, O. Application artificial intelligence for real-time monitoring, diagnostics, and correction human state. CEUR Workshop Proc. 2019, 2488, 185–194. [Google Scholar]
  36. Prokopenko, I.G.; Migel, S.V.; Prokopenko, K.I. Signal modeling for the efficient target detection tasks. Int. Radar Symp. 2013, 2, 976–982. [Google Scholar]
  37. Averyanova, Y.; Averyanov, A.; Yanovsky, F. Polarization signal components estimate in weather radar. In Proceedings of the 12th International Conference on Mathematical Methods in Electromagnetic Theory, Odessa, Ukraine, 29 June–2 July 2008; pp. 360–362. [Google Scholar]
  38. Averyanova, Y.; Averyanov, A.; Yanovsky, F. The approach to estimating critical wind speed in liquid precipitation using radar polarimetry. In Proceedings of the International Conference on Mathematical Methods in Electromagnetic Theory, Kharkiv, Ukraine, 28–30 August 2012; pp. 517–520. [Google Scholar] [CrossRef]
  39. Yanovsky, F.J.; Prokopenko, I.G.; Prokopenko, K.I.; Russchenberg, H.W.J.; Ligthart, L.P. Radar estimation of turbulence eddy dissipation rate in rain. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Toronto, ON, Canada, 24–28 June 2002; pp. 63–65. [Google Scholar] [CrossRef]
  40. Ostroumov, I.; Kuzmenko, N. Configuration analysis of European navigational aids network. In Proceedings of the 2021 Integrated Communications Navigation and Surveillance Conference, Virtual, 20 April–22 April 2021; pp. 1–9. [Google Scholar] [CrossRef]
  41. Ostroumov, I.V.; Kuzmenko, N.S.; Marais, K. Optimal pair of navigational Aids selection. In Proceedings of the IEEE 5th International Conference on Methods and Systems of Navigation and Motion Control, Kyiv, Ukraine, 16–18 October 2018; pp. 32–35. [Google Scholar]
  42. Ostroumov, I.; Kuzmenko, N.S. Accuracy improvement of vor/vor navigation with angle extrapolation by linear regression. Telecommun. Radio Eng. 2019, 78, 1399–1412. [Google Scholar] [CrossRef]
  43. Solomentsev, O.; Zaliskyi, M. Correlated failures analysis in navigation system. In Proceedings of the IEEE 5th International Conference on Methods and Systems of Navigation and Motion Control, Kyiv, Ukraine, 16–18 October 2018; pp. 41–44. [Google Scholar]
  44. Ostroumov, I.; Kuzmenko, N. Compatibility analysis of multi signal processing in apnt with current navigation infrastructure. Telecommun. Radio Eng. 2018, 77, 211–223. [Google Scholar] [CrossRef]
  45. Dyvak, M.; Melnyk, A.; Kovbasistyi, A.; Shevchuk, R.; Huhul, O.; Tymchyshyn, V. Mathematical modeling of the estimation process of functioning efficiency level of information web-resources. In Proceedings of the 10th International Conference on Advanced Computer Information Technologies, Deggendorf, Germany, 13–15 May 2020; pp. 492–496. [Google Scholar] [CrossRef]
  46. Zaliskyi, M.; Odarchenko, R.; Gnatyuk, S.; Petrova, Y.; Chaplits, A. Method of traffic monitoring for DDoS attacks detection in e-health systems and networks. CEUR Workshop Proc. 2018, 2255, 193–204. [Google Scholar]
  47. Gnatyuk, S. Critical aviation information systems cybersecurity, meeting security challenges through data analytics and decision support. In NATO Science for Peace and Security Series, D: Information and Communication Security; IOS Press Ebooks: Amsterdam, The Netherlands, 2016; Volume 47, pp. 308–316. [Google Scholar]
  48. Hu, Z.; Odarchenko, R.; Gnatyuk, S.; Zaliskyi, M.; Chaplits, A.; Bondar, S.; Borovik, V. Statistical techniques for detecting cyberattacks on computer networks based on an analysis of abnormal traffic behavior. Int. J. Comput. Netw. Inf. Secur. 2021, 12, 1–13. [Google Scholar] [CrossRef]
  49. Kalimoldayev, M.; Tynymbayev, S.; Gnatyuk, S.; Ibraimov, M.; Magzom, M. The device for multiplying polynomials modulo an irreducible polynomial, news of the national academy of sciences of the republic of Kazakhstan. Ser. Geol. Tech. Sci. 2019, 2, 199–205. [Google Scholar] [CrossRef]
  50. Gnatyuk, S.; Akhmetov, B.; Kozlovskyi, V.; Kinzeryavyy, V.; Aleksander, M.; Prysiazhnyi, D. New secure block cipher for critical applications: Design, implementation, speed and security analysis. Adv. Intell. Syst. Comput. 2020, 1126, 93–104. [Google Scholar]
  51. Volianskyi, R.; Sadovoi, O.; Volianska, N.; Sinkevych, O. Construction of parallel piecewise-linear interval models for nonlinear dynamical objects. In Proceedings of the 2019 9th International Conference on Advanced Computer Information Technologies (ACIT), Ceske Budejovice, Czech Republic, 5–7 June 2019; pp. 97–100. [Google Scholar] [CrossRef]
  52. Chikovani, V.V.; Suschenko, O.A. Differential mode of operation for Coriolis vibratory gyro with ring-like resonator. In Proceedings of the IEEE 34th International Scientific Conference on Electronics and Nanotechnology, Kyiv, Ukraine, 15–18 April 2014; pp. 451–455. [Google Scholar] [CrossRef]
  53. Solomentsev, O.; Zaliskyi, M. Method of sequential estimation of statistical distribution parameters in control systems design. In Proceedings of the IEEE 3rd International Conference Methods and Systems of Navigation and Motion Control, Kyiv, Ukraine, 14–17 October 2014; pp. 135–138. [Google Scholar] [CrossRef]
  54. Sushchenko, O.; Bezkorovayniy, Y.; Golytsin, V. Processing of redundant information in airborne electronic systems by means of neural networks. In Proceedings of the IEEE 39th International Conference on Electronics and Nanotechnology, Kyiv, Ukraine, 16–18 April 2019; pp. 652–655. [Google Scholar] [CrossRef]
  55. Chikovani, V.; Sushchenko, O.; Tsiruk, H. Redundant information processing techniques comparison for differential vibratory gyroscope. East. Eur. J. Enterp. Technol. 2016, 4, 45–52. [Google Scholar] [CrossRef]
  56. Solomentsev, O.V.; Zaliskyi, M.U.; Zuiev, O.V.; Asanov, M.M. Data processing in exploitation system of unmanned aerial vehicles radioelectronic equipment. In Proceedings of the IEEE 2nd International Conference on Actual Problems of Unmanned Air Vehicles Developments, Kyiv, Ukraine, 15–17 October 2013; pp. 77–80. [Google Scholar] [CrossRef]
  57. Goncharenko, A.V. Optimal UAV maintenance periodicity obtained on the multi-optional basis. In Proceedings of the IEEE 4th International Conference on Actual Problems of UAV Developments, Kyiv, Ukraine, 17–19 October 2017; pp. 65–68. [Google Scholar] [CrossRef]
  58. Goncharenko, A. Development of a theoretical approach to the conditional optimization of aircraft maintenance preference uncertainty. Aviation 2018, 22, 40–44. [Google Scholar] [CrossRef] [Green Version]
  59. Solomentsev, O.; Zaliskyi, M.; Zuiev, O. Radioelectronic equipment availability factor models. In Proceedings of the 2013 Signal Processing Symposium (SPS), Serock, Poland, 5–7 June 2013; pp. 1–3. [Google Scholar] [CrossRef]
  60. Dhillon, B.S. Reliability, Quality, and Safety for Engineers; CRC Press: Boca Raton, FL, USA, 2005; p. 216. [Google Scholar]
  61. Nakagawa, T. Maintenance Theory of Reliability; Springer: London, UK, 2005; p. 270. [Google Scholar]
  62. Ulansky, V.; Terentyeva, I. Availability assessment of a telecommunications system with permanent and intermittent faults. In Proceedings of the IEEE First Ukraine Conference on Electrical and Computer Engineering, Kyiv, Ukraine, 29 May–2 June 2017; pp. 908–911. [Google Scholar] [CrossRef]
  63. Taranenko, A.G.; Gabrousenko, Y.I.; Holubnychyi, A.G.; Slipukhina, I.A. Estimation of redundant radionavigation system reliability. In Proceedings of the IEEE 5th International Conference on Methods and Systems of Navigation and Motion Control, Kiev, Ukraine, 16–18 October 2018; pp. 28–31. [Google Scholar]
  64. Solomentsev, O.; Zaliskyi, M.; Nemyrovets, Y.; Asanov, M. Signal processing in case of radio equipment technical state deterioration. In Proceedings of the 2015 Signal Processing Symposium (SPSympo), Debe Village, Poland, 10–12 June 2015; pp. 1–5. [Google Scholar] [CrossRef]
  65. Seber, G.A.F.; Wild, C.J. Nonlinear Regression; John Wiley and Sons: New York, NY, USA, 2003; p. 768. [Google Scholar]
  66. Bates, D.M.; Watts, D.G. Nonlinear Regression Analysis and Its Applications; John Wiley and Sons: New York, NY, USA, 1988; p. 366. [Google Scholar]
  67. Knuepling, F.; Allen, J. Testing between different types of switching regression models. J. Econ. Econom. Econ. Econom. Soc. 2015, 58, 30–63. [Google Scholar]
  68. Huet, S.; Bouvier, A.; Poursat, M.-A.; Jolivet, E. Statistical tools for nonlinear regression. In A Practical Guide with S-PLUS and R Examples; Springer: New York, NY, USA, 2004; p. 232. [Google Scholar]
  69. Tishler, A.; Zang, I. A new maximum likelihood algorithm for piecewise regression. J. Am. Stat. Assoc. 1981, 76, 980–987. [Google Scholar] [CrossRef]
  70. Buteikis, A. Practical Econometrics and Data Science; Vilnius University: Vilnius, Lithuania, 2020. [Google Scholar]
  71. Carlin, B.P.; Gelfand, A.E.; Smith, A.F.M. Hierarchical bayesian analysis of changepoint problems. J. R. Stat. Soc. Ser. C Applied Stat. 1992, 41, 389–405. [Google Scholar] [CrossRef]
  72. Ferreira, P.E. A bayesian analysis of a switching regression model: Known number of regimes. J. Am. Stat. Assoc. 1975, 70, 370–374. [Google Scholar] [CrossRef]
  73. Toms, J.D.; Lesperance, M.L. Piecewise regression: A tool for identifying ecological thresholds. Ecology 2003, 84, 2034–2041. [Google Scholar] [CrossRef]
  74. Solomentsev, O.; Zaliskyi, M.; Shcherbyna, O.; Kozhokhina, O. Sequential procedure of changepoint analysis during operational data processing. In Proceedings of the 2020 IEEE Microwave Theory and Techniques in Wireless Communications (MTTW), Riga, Latvia, 1–2 October 2020; Volume 1, pp. 168–171. [Google Scholar] [CrossRef]
  75. Solomentsev, O.; Zaliskyi, M.; Herasymenko, T.; Petrova, Y. Data processing method for deterioration detection during radio equipment operation. In Proceedings of the 2019 IEEE Microwave Theory and Techniques in Wireless Communications (MTTW), Riga, Latvia, 1–2 October 2019; Volume 1, pp. 1–4. [Google Scholar] [CrossRef]
  76. Nocedal, J.; Wright, S.J. Numerical Optimization, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
  77. Reklaitis, G.V.; Ravindran, A.; Ragsdell, K.M. Engineering Optimization, Methods & Applications; John Wiley and Sons: New York, NY, USA, 1983; p. 688. [Google Scholar]
  78. The United States Geological Survey. Available online: https://earthquake.usgs.gov (accessed on 1 July 2022).
Figure 1. Dataset and final optimal SLR model.
Figure 1. Dataset and final optimal SLR model.
Algorithms 15 00378 g001
Figure 2. Examples of generated additive mixture of SLR model and Gaussian noise.
Figure 2. Examples of generated additive mixture of SLR model and Gaussian noise.
Algorithms 15 00378 g002
Figure 3. General paraboloid (4) for presented example.
Figure 3. General paraboloid (4) for presented example.
Algorithms 15 00378 g003
Figure 4. Simplified paraboloid (5) for presented example.
Figure 4. Simplified paraboloid (5) for presented example.
Algorithms 15 00378 g004
Figure 5. Generated dataset and final optimal SLR models.
Figure 5. Generated dataset and final optimal SLR models.
Algorithms 15 00378 g005
Figure 6. Histograms for breakpoint abscissas for different optimization options: (A)—estimates of the first breakpoint for general paraboloid; (B)—estimates of the second breakpoint for general paraboloid; (C)—estimates of the first breakpoint for simplified paraboloid; (D)—estimates of the second breakpoint for simplified paraboloid.
Figure 6. Histograms for breakpoint abscissas for different optimization options: (A)—estimates of the first breakpoint for general paraboloid; (B)—estimates of the second breakpoint for general paraboloid; (C)—estimates of the first breakpoint for simplified paraboloid; (D)—estimates of the second breakpoint for simplified paraboloid.
Algorithms 15 00378 g006
Figure 7. Data on quantity of earthquakes of magnitude 7 or higher by year.
Figure 7. Data on quantity of earthquakes of magnitude 7 or higher by year.
Algorithms 15 00378 g007
Figure 8. Observed dataset and final optimal SLR model.
Figure 8. Observed dataset and final optimal SLR model.
Algorithms 15 00378 g008
Figure 9. Prediction-based SLR model.
Figure 9. Prediction-based SLR model.
Algorithms 15 00378 g009
Table 1. Relationship between production lot size x and the average production cost per unit y.
Table 1. Relationship between production lot size x and the average production cost per unit y.
XYXYXY
11009.7351805.8792604.02
21209.6162004.98102804.46
31408.1572205.09113003.82
41606.9882404.79
Table 2. Example of obtained dataset.
Table 2. Example of obtained dataset.
XYXYXYXY
1198.90331130.96961225.82691196.18
2194.80432173.30562235.96392199.174
3230.1333177.69963225.26893186.254
4241.92934135.85864235.38294159.026
5205.04635146.67165240.45795180.743
6207.05836166.77266226.66596186.322
7221.11637191.34767264.91797198.164
8196.14238151.13368208.28298155.179
9185.14939180.64169238.46599172.515
10168.83640176.75470247.729100177.046
11140.46241224.39671220.332101148.359
12164.65742141.49972256.255102212.448
13181.90343196.57273250.635103170.645
14189.76344179.66174244.736104171.808
15193.57345166.33675216.803105139.09
16153.47346175.00776225.626106162.471
17173.22647192.46777241.812107178.876
18173.60148162.74878248.445108147.776
19164.41649175.22479228.271109168.604
20144.77950208.31780193.772110177.17
21165.47751179.93281215.457111151.077
22156.02252202.74382209.727112153.421
23191.78653183.88983223.962113125.35
24124.95354182.19184202.548114135.484
25144.00655204.99685206.732115152.601
26181.28956212.03486238.368116111.133
27131.82857192.9687214.105117131.803
28148.11458240.10688204.29118142.927
29189.11859230.51189185.714119151.265
30159.1160188.66690184.075120145.401
Table 3. Computation results for standard deviation.
Table 3. Computation results for standard deviation.
Standard DeviationAbscissa of the Second Breakpoint
xbr2 = 60xbr2 = 65xbr2 = 70xbr2 = 75xbr2 = 80
The abscissa of the first breakpointxbr1 = 1522.94721.04219.92819.91120.948
xbr1 = 2021.69219.81318.87919.16620.534
xbr1 = 2520.91219.14518.45419.04120.667
xbr1 = 3020.63719.05318.62219.45421.229
xbr1 = 3520.81819.44319.24920.23922.051
Table 4. Numerical characteristics of breakpoint abscissas estimates.
Table 4. Numerical characteristics of breakpoint abscissas estimates.
Statistical CharacteristicGeneral ParaboloidSimplified Paraboloid
Mathematical expectation of xbr125.74625.753
Standard deviation for xbr12.2182.13
Minimum of xbr117.15817.07
Maximum of xbr136.14336.306
Skewness for xbr1–0.0550.054
Mathematical expectation of xbr270.11370.35
Standard deviation for xbr22.3972.274
Minimum of xbr262.13863.486
Maximum of xbr284.24283.238
Skewness for xbr20.5830.536
Table 5. Quantity of earthquakes of magnitude 7 or higher by year.
Table 5. Quantity of earthquakes of magnitude 7 or higher by year.
iXYiXYiXYiXY
119226261947135119721676199716
219231627194812521973977199812
319248281949135319741178199918
419251029195085419751379200015
51926930195165519761480200116
619271231195295619771181200213
719281832195365719781282200315
81929143319549581979883200416
9193043419555591980684200511
10193117351956196019811085200611
11193273619577611982886200718
121933837195866219831487200812
13193412381959136319841488200917
14193513391960116419851589201024
151936940196196519861190201120
1619379411962176619871391201216
1719382342196396719881192201319
1819391443196415681989893201412
191940844196586919901894201519
20194111451966107019911795201616
2119421346196720711992139620177
22194317471968147219931297201817
23194412481969157319941398201910
241945749197015741995209920209
251946125019711375199615100202119
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Al-Azzeh, J.; Mesleh, A.; Zaliskyi, M.; Odarchenko, R.; Kuzmin, V. A Method of Accuracy Increment Using Segmented Regression. Algorithms 2022, 15, 378. https://doi.org/10.3390/a15100378

AMA Style

Al-Azzeh J, Mesleh A, Zaliskyi M, Odarchenko R, Kuzmin V. A Method of Accuracy Increment Using Segmented Regression. Algorithms. 2022; 15(10):378. https://doi.org/10.3390/a15100378

Chicago/Turabian Style

Al-Azzeh, Jamil, Abdelwadood Mesleh, Maksym Zaliskyi, Roman Odarchenko, and Valeriyi Kuzmin. 2022. "A Method of Accuracy Increment Using Segmented Regression" Algorithms 15, no. 10: 378. https://doi.org/10.3390/a15100378

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop