#
A New Robust Regression Method Based on Minimization of Geodesic Distances on a Probabilistic Manifold: Application to Power Laws^{ †}

^{1}

^{2}

^{†}

## Abstract

**:**

## 1. Introduction

_{1}, x

_{2}, …, x

_{P}, on which it depends. Scaling laws are often expressed in terms of a power law:

_{i}by a constant a, the power law in Equation (1) essentially remains the same, being multiplied only by a constant ${a}^{{\mathrm{\beta}}_{i}}$.

## 2. Geodesic Least Squares Regression

#### 2.1. Distance in Information Geometry

_{m}}|{θ

^{k}}) [9] describing a set {x

_{m}} of M variables (m = 1, …, M), parameterized by a set {θ

^{k}} of P parameters (k = 1, …, P), the entries g

_{ij}of the Fisher information matrix are given by (no summation):

^{i}have been parameterized along the geodesic by t and Γ

^{r}

_{ij}are the Christoffel symbols of the second kind, defined through the metric as:

^{ij}denotes the components of the inverse metric. The boundary value problem Equation (2) needs to be solved assuming the known values of the coordinates at the boundary points of the geodesic.

_{g}of the geodesic curve between two distributions with parameter sets $\left\{{\mathrm{\theta}}_{1}^{i}\right\}$ and $\left\{{\mathrm{\theta}}_{2}^{i}\right\}$, i.e., the geodesic distance between these distributions, may be locally calculated as follows (assuming t runs from zero to one):

#### 2.2. Geodesics for the Univariate Normal Distribution

#### 2.3. Geodesic Least Squares Methodology

_{n}(n = 1, …, N) are taken as mutually independent and so are the y

_{n}. σ

_{x}and σ

_{y}are assumed to be known, and in this example, they are taken constant for all measurements, i.e., we have homoscedasticity. However, we will also consider heteroscedasticity later on. According to the regression model, conditionally on x

_{n}, each measurement y

_{n}is drawn from a normal distribution:

_{n}. Nevertheless, the Gaussian error propagation laws may be used in the nonlinear case as well, to approximate the conditional distribution p(y|x

_{n}) by a normal distribution, as will be shown in the experiments.

_{obs}(y|y

_{n}). In this example, we take again the normal distribution, but centered on each data point: $\mathcal{N}({y}_{n},{\mathrm{\sigma}}_{\text{obs}}^{2})$, where σ

_{obs}is to be estimated from the data. In the context of the GLM, this is known as the saturated model. The extra parameter σ

_{obs}gives the method added flexibility, since it is not a priori required to equal σ

_{mod}. As a result, GLS is less sensitive to incorrect model assumptions. Note that in this example, we have chosen the observed distribution from the same model (Gaussian) as the modeled distribution. Furthermore, σ

_{mod}is taken as a fixed value for all measurements and so is σ

_{obs}. These assumptions can of course be relaxed, leading to a more general method. However, the transition from OLS to GLS is best explained by means of a Gaussian observed distribution, which, in addition, offers computational advantages, since the expression for the GD has a closed form; see Equation (5).

_{n}with bx

_{n}, but also ${\mathrm{\sigma}}_{\text{obs}}^{2}$ with ${\mathrm{\sigma}}_{y}^{2}+{\mathrm{\beta}}^{2}{\mathrm{\sigma}}_{x}^{2}$. Note that the parameter β occurs both in the mean and the variance of the modeled distribution. Incidentally, forcing ${\mathrm{\sigma}}_{\text{obs}}^{2}\equiv {\mathrm{\sigma}}_{y}^{2}+{\mathrm{\beta}}^{2}{\mathrm{\sigma}}_{x}^{2}$ would take us back to standard maximum likelihood estimation, for the Rao GD between the two Gaussians p

_{obs}and p

_{mod}with means y

_{n}and bx

_{n}, respectively, but with identical standard deviations (fixed along the geodesic path), is precisely the Mahalanobis distance [20]:

## 3. The L-H Power Threshold and Database

_{thr}for the heating power that is required for the plasma to make the transition into a desired regime of high energy confinement (H-mode) in the next-step fusion device ITER (International Thermonuclear Experimental Reactor) [1,23,24]. To a good approximation, this so-called L-H (or H-mode) power threshold depends on the electron density in the plasma ${\overline{n}}_{\mathrm{e}}$ (in 10

^{20}m

^{−3}), the main magnetic field B

_{t}(in tesla (T)) and the total surface area S of the confined plasma (in m

^{2}). This is usually expressed by means of the following scaling relation:

_{mod}. Unfortunately, the error estimates are not available in some cases, and if they are, the precise definition of the error bars is not always clear. Usually, an error bar in the database represents an estimate by the experimentalist of the typical range in which the “true” quantity can be expected to lie, where the uncertainty is assumed to be caused by both stochastic and systematic effects. Moreover, it is difficult to assess the probability that is covered by the stochastic component of the errors mentioned in the database. Since a detailed investigation of the uncertainty of the threshold data is beyond the scope of the present paper, we will assume that the error bars pertain to a stochastic uncertainty corresponding to a single standard deviation of a Gaussian distribution. For some derived quantities, the error bars had to be calculated from the uncertainty on more fundamental measurements. In those cases, we employed Gaussian error propagation rules to estimate the standard deviation on the derived quantities. For the case of the global H-mode confinement database, this strategy has been shown to provide reasonable information on the actual error bars [29].

_{t}, S and P

_{thr}, we obtained 39%, 31%, 28% and 38%, respectively. These levels are clearly much larger than the relative uncertainties due to measurement error alone. Indeed, the typical measurement error bars quoted in the ITPA database, on average, over all devices, are estimated at 4% for ${\overline{n}}_{\mathrm{e}}$, 1% for B

_{t}, 3% for S and 15% for P

_{thr}[25,26].

## 4. Numerical Simulations

#### 4.1. Effect of Outliers

#### 4.1.1. Single Predictor Variable

_{n}and η

_{n}(n = 1, …, 10), with the ξ

_{n}chosen unevenly between zero and 50 and η

_{n}= βξ

_{n}, taking β = 3. Then, Gaussian noise was added to all coordinates according to Equation (6), with σ

_{y}= 2.0 and σ

_{x}= 0.5, resulting in values x

_{n}for the predictor variable and y

_{n}for the response variable. Finally, one outlier was created by multiplying the value of y

_{k}by a factor distributed uniformly between 1.5 and 2.5, with k chosen uniformly among the indices 8, 9 and 10.

_{x}and σ

_{y}. In order to get an idea of the variability of the estimates, Monte Carlo sampling of the data-generating distributions was performed, and the estimation was carried out 100 times.

_{obs}was 5.43 with a standard deviation of 0.24. On the other hand, the modeled value of the standard deviation in the conditional distribution for y was ${\mathrm{\sigma}}_{\mathrm{mod}}=\sqrt{{\mathrm{\sigma}}_{y}^{2}+9{\mathrm{\sigma}}_{x}^{2}}=2.5$. Hence, GLS succeeds in ignoring the outlier by increasing the estimated variability of the data. Put differently, the effect of the outlier is, in a sense, to increase the overall variability of the data, which GLS takes into account by increasing the observed standard deviation of the data (σ

_{obs}) with respect to the standard deviation predicted by the model (σ

_{mod}).

_{mod}= 2.5, using the average estimate $\widehat{\beta}=3.031$ obtained by GLS. These are the green points on the surface, and they lie on a parallel, since they all correspond to Gaussians with the same standard deviation σ

_{mod}. In this particular dataset, the index of the outlier was k = 10, so the point $\widehat{\beta}{x}_{10}$ is indicated individually. Obviously, according to the model, no outlier is expected, so the modeled distribution corresponding to k = 10, which is the green point just mentioned, lies close to the other predicted points (distributions). Next, we plot the observed distributions with their means y

_{n}and standard deviations σ

_{obs}(for this dataset estimated at ${\widehat{\sigma}}_{\text{obs}}=5.43$). These are the blue points, lying at a constant standard deviation σ

_{obs}, which is higher than σ

_{mod}(5.43 > 2.5). The outlier y

_{10}can clearly be observed, and being an outlier, it lies relatively far away from the rest of the blue points (observed distributions). Now suppose that, like MAP, GLS would not be able to increase σ

_{obs}relative to σ

_{mod}in order to accommodate the outlier. Then, the observed distributions would have the same observed means (the measured values y

_{n}), but they would have the standard deviation predicted by the model. Hence, they would lie on the parallel corresponding to σ

_{mod}, just like the green points. We have plotted these fictitious distributions as the red points at the level of σ

_{mod}, and they are labeled ỹ. Again, the outlier (labeled ỹ

_{10}) can be seen, but it seems to lie further away from the other red points (the points ỹ

_{n}) compared to the actually observed situation, i.e., the distance from y

_{10}to the other y

_{n}(blue points). At least this is the case when using (visually) the Euclidean distance in the embedding Euclidean space. We can verify that this is indeed so by using the proper geodesic distance on the surface: overall, the blue points lie closer together (including the outlier) than the red points. Now, in fact, GLS aims at minimizing the distance between each green point (modeled distribution) and its corresponding blue point (observed distribution), so as far as the outlier is concerned, we should really be looking at the geodesic between the point $(\widehat{\beta}{x}_{10},{\mathrm{\sigma}}_{\mathrm{mod}})$ and the point (y

_{10}, σ

_{obs}). The geodesic (labeled “Geo

_{1}”) between these points is also drawn on the surface, and again, we compare this to the fictitious situation, represented by the geodesic (labeled “Geo

_{2}”) between $(\widehat{\beta}{x}_{10},{\mathrm{\sigma}}_{\mathrm{mod}})$ and (ỹ

_{10}, σ

_{mod}). Indeed, again, we see that the geodesic Geo

_{1}is shorter than Geo

_{2}. Therefore, by increasing σ

_{obs}relative to σ

_{mod}, the outlier is not so much an outlier anymore, as measured on the pseudosphere!When calculating the GD, one finds 2.4 for Geo

_{1}and 2.8 for Geo

_{2}. Therefore, GLS obtains a lower value of the objective function (sum of squared geodesic distances) if it increases σ

_{obs}with respect to σ

_{mod}. Of course, there is a limit to this: GLS cannot continue raising σ

_{obs}indefinitely, trying to mitigate the distorting effect of the outlier, for then, the other points would get a too high observed standard deviation, which is not supported by the data. The image that we see in Figure 2 is the best compromise that GLS could find. In fact, we note that, in the case we suspect that y

_{10}could be an outlier, it may very well be worthwhile to introduce two parameters to describe the observed standard deviation: one for the nine points that seem to follow the model and one to take care of the outlier. This would be a very straightforward extension of the method, and we explore this to some extent when using data from the ITPA database below. There, we assign a separate parameter to describe the observed standard deviation of all data coming from a specific tokamak, hence defining an individual parameter for each machine.

#### 4.1.2. Multiple Predictor Variables

_{t}and S, which were introduced in the context of the power threshold scaling law in Section 3 [34]. In particular, we generated a number of realizations of the variable η from the following prescription:

_{0}, β

_{1}, β

_{2}and β

_{3}:

_{0}, β

_{1}, β

_{2}and β

_{3}, all 616 values of η were calculated according to Equation (10), based on the values of ${\overline{n}}_{\mathrm{e}}$, B

_{t}and S from the ITPA database. The range of coefficient values in Equation (11) was chosen to be representative for the values that are typically obtained from a regression analysis on the true scaling law (see Section 5). The exception is β

_{0}, for which the range was chosen of roughly the same order as η − β

_{0}(much smaller values of β

_{0}would not be estimable in comparison with η − β

_{0}).

_{1}, 1% for B

_{t}(variable x

_{2}), 3% for S (variable x

_{3}) and 15% for the dependent variable (variable y, which is P

_{thr}in the real-world regression problem). It should be stressed that, in the light of our comments in Section 3 regarding the variability of the predictor variables, these are rather low noise levels. We further note that fixed relative noise levels lead to a different standard deviation for each measurement (heteroscedasticity).

_{i}(i = 0, …, 3) taken from Equation (11), 10 datasets were realized, each time performing the sampling of noise and outliers.

_{i}, the obtained estimates ${\widehat{\beta}}_{i}$ were defined as the average over the 10 data realizations. Next, histograms were created based on these averages for the estimated coefficients, specifically the normalized histograms of the relative difference $({\mathrm{\beta}}_{i}-{\widehat{\beta}}_{i})/{\mathrm{\beta}}_{i}(i=0,\dots ,3)$, expressed as a percentage, between the true value β

_{i}and the estimated value ${\widehat{\beta}}_{i}$ of each regression parameter. The histograms of these percentage errors are shown in Figure 3. In order to avoid a cluttered figure, the results of OLS, MAP and GLS are plotted in one panel and those of TLS and ROB in another.

_{1}, which is associated with ${\overline{n}}_{\mathrm{e}}$. The robust estimation technique in MATLAB also delivers good results (in fact, not much worse than GLS), as it is designed to cope with outliers. However, we will see that in the next experiment ROB does not perform well at all.

#### 4.2. Effect of Logarithmic Transformation

#### 4.2.1. Single Predictor Variable

_{n}unevenly spread between zero and 60. A power law was proposed to relate the unobserved ξ

_{n}and η

_{n}:

_{n}and η

_{n}, corresponding to a substantial relative error of 40%. We finally took the natural logarithm of all observed values x

_{n}and y

_{n}, enabling application of the same linear regression methods that were used in the previous experiment. In this particular experiment, we chose β

_{0}= 0.8 and β

_{1}= 1.4, but we found that other values yield similar conclusions. Again, 100 data replications were generated, allowing calculation of Monte Carlo averages.

#### 4.2.2. Multiple Predictor Variables

_{i}as given in Equation (11), but now according to a power law:

_{1}), 5% for B

_{t}(variable x

_{2}) and 15% for S (variable x

_{3}). The level for P

_{thr}was kept at 15%, as before. This is still well within the maximum variability range that can be expected for the predictor variables in the ITPA database, as discussed in Section 3.

_{0}and β

_{1}are the largest, compared to those on β

_{2}and β

_{3}, but the majority is still below 20%. As for β

_{0}, the slightly inferior performance of GLS relative to the results with outliers in Section 4.1.2 is simply due to the fact that log β

_{0}for the lowest values of β

_{0}is negligibly small compared to log η − log β

_{0}.

## 5. Power Threshold Scaling

_{thr}. We start with log-linear regression and then apply nonlinear regression analysis. Next, we perform a simple analysis of the influence of the error bars on the estimation results, and we finally provide a discussion of the results in this section.

#### 5.1. Linear Scaling

_{0}, β

_{1}, β

_{2}and β

_{3}in Equation (9) via linear regression. In the GLS method, we introduced additional parameters σ

_{obs}

_{,α}(α = 1, …, N

_{t}), one for each of the N

_{t}= 8 tokamaks contributing data to the scaling. That is, if a certain data point with index n originated from tokamak α, then in term n of the objective function in Equation (8), an observed distribution was used, parameterized by means of the σ

_{obs}

_{,α}corresponding to that machine. The σ

_{obs}

_{,α}serve a similar purpose as the parameter σ

_{obs}defined above, except that they describe the observed standard deviations of the logarithmic power threshold. This, of course, corresponds to the relative errors on the power threshold itself. To calculate σ

_{mod}for each data point, we used the relative measurement error bars quoted in the database (typically 4% for ${\overline{n}}_{\mathrm{e}}$, 1% for B

_{t}, 3% for S and 15% for P

_{thr}). Considering the discussion in Section 3 regarding other sources of uncertainty, it is clear that the σ

_{obs}

_{,α}will need to take into account other, “unexpected” uncertainty sources, hence increasing the flexibility of the method.

^{20}m

^{−3}). All estimates are accompanied by their 95% credible intervals obtained from 100 bootstrap samples (artificial datasets). We stress that this notion of a credible interval corresponds to the standard Bayesian definition of an interval wherein the true value of a stochastic variable is assumed to lie with a certain probability (e.g., 0.95).

_{obs,α}(observed standard deviation on log P

_{thr}), for each of the devices contributing to the IAEA02 data, were expressed as a relative error on the bootstrap-averaged P

_{thr}. These relative errors and their credible intervals are given in Table 4. The relative error on the power threshold lies around 15% to 30% for the various machines, except for ASDEX, where the uncertainty reaches a higher level of about 40%. On average, this yields an estimated error of 24.2% for P

_{thr}, which is quite somewhat higher than the average of 15% mentioned in the database, although still considerably lower than the upper bound of 38%, as calculated in Section 3. Again, this is an indication of additional sources of uncertainty, on top of mere measurement error, causing the data points to deviate from the proposed regression model, as discussed already in Section 3. That extra uncertainty is captured by the GLS method.

#### 5.2. Nonlinear Scaling

_{mod}, given by:

_{obs}

_{,α}, we introduced an approximation assuming constant error bars for all measurements from a single machine. This assumption may be relaxed in the future.

_{thr}are relatively similar to those using log-linear scaling, with an average over all devices of 22.7%, which is again higher than the 15% expected from measurement error only.

#### 5.3. Influence of Error Bars

_{t}, 5% on S and 32% on P

_{thr}. Again, these are all below the maxima quoted in Section 3.

_{t}, S and P

_{thr}to values computed from the average percentages mentioned earlier in Section 3: 4% for ${\overline{n}}_{\mathrm{e}}$, 1% for B

_{t}, 3% for S and 15% for P

_{thr}. These are averages over all machines, rendering the final absolute error bars (standard deviations), computed from the relative errors, less precise. The estimation results using power-law regression with MAP and GLS are shown in Table 9. The results of both methods are clearly affected by the averaging step, but again, MAP is seen to be more sensitive to the change in the error bars compared to GLS, which maintains estimates in a similar range as those given in Tables 3 and 5. The estimates of the observed standard deviations, given in Table 10, are adjusted accordingly by GLS.

#### 5.4. Discussion

## 6. Conclusions

## Acknowledgments

^{†}This paper is an extended version of our paper published in the 34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2014), 21–26 September 2014, Amboise, France.

## Conflicts of Interest

## References and Notes

- Doyle, E.J.; Houlberg, W.A.; Kamada, Y.; Mukhovatov, V.; Osborne, T.H.; Polevoi, A.; Bateman, G.; Connor, J.W.; Cordey, J.G.; Fujita, T.; et al. Chapter 2: Plasma confinement and transport. Nucl. Fusion.
**2007**, 47, S18–S127. [Google Scholar] - Xiao, X.; White, E.P.; Hooten, M.B.; Durham, S.L. On the use of log-transformations vs. nonlinear regression for analyzing biological power laws. Ecology
**2011**, 92, 1887–1894. [Google Scholar] - McDonald, D.; Meakins, A.J.; Svensson, J.; Kirk, A.; Cordey, J.G. ITPA H-mode Threshold Database WG. The impact of statistical models on scalings derived from multi-machine H-mode threshold experiments. Plasma Phys. Control. Fusion.
**2006**, 48, A439–A447. [Google Scholar] - Verdoolaege, G. Geodesic least squares regression on information manifolds. AIP Conf. Proc.
**2013**, 1636, 43–48. [Google Scholar] - Verdoolaege, G. Geodesic least squares regression for scaling studies in magnetic confinement fusion. AIP Conf. Proc.
**2014**, 1641, 564–571. [Google Scholar] - Basu, A.; Shioya, H.; Park, C. Statistical Inference: The Minimum Distance Approach; Chapman & Hall/CRC: Boca Raton, FL, USA, 2011; Volume 120. [Google Scholar]
- McCullagh, P.; Nelder, J. Generalized Linear Models, 2nd ed; Chapman & Hall/CRC: Boca Raton, FL, USA, 1989; Volume 37. [Google Scholar]
- Amari, S.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: New York, NY, USA, 2000. [Google Scholar]
- We follow standard notational practice from differential geometry with respect to index placement in the following definitions for the metric, Christoffel symbols and geodesic distance. However, in the remainder of the paper we will revert to subscript indices only, in order to avoid other notational problems.
- Oprea, J. Differential Geometry and Its Applications, 2nd ed; The Mathematical Association of America: Washington, DC, USA, 2007. [Google Scholar]
- Verdoolaege, G.; Scheunders, P. On the geometry of multivariate generalized Gaussian models. J. Math. Imaging Vis.
**2011**, 43, 180–193. [Google Scholar] - Kass, R.; Vos, P. Geometrical Foundations of Asymptotic Inference; Wiley: New York, NY, USA, 1997. [Google Scholar]
- Verdoolaege, G.; Scheunders, P. Geodesics on the manifold of multivariate generalized Gaussian distributions with an application to multicomponent texture discrimination. Int. J. Comput. Vis.
**2011**, 95, 265–286. [Google Scholar] - Kullback, S. Information Theory and Statistics; Dover Publications: New York, NY, USA, 1968. [Google Scholar]
- Atkinson, C.; Mitchell, A. Rao’s distance measure. Indian J. Stat.
**1981**, 48, 345–365. [Google Scholar] - Burbea, J.; Rao, C. Entropy differential metric, distance and divergence measures in probability spaces: A unified approach. J. Multivar. Anal.
**1982**, 12, 575–596. [Google Scholar] - Nielsen, F.; Nock, R. Visualizing hyperbolic Voronoi diagrams. Proceedings of the 30th Annual Symposium on Computational Geometry (SOCG’14), Kyoto, Japan, 8–1 June 2014; p. 90.
- Beran, R. Minimum Hellinger distance estimates for parametric models. Ann. Stat.
**1977**, 5, 445–463. [Google Scholar] - Pak, R. Minimum Hellinger distance estimation in simple regression models; distribution and efficiency. Stat. Probab. Lett.
**1996**, 26, 263–269. [Google Scholar] - Rao, C. Differential metrics in probability spaces. In Differential Geometry in Statistical Inference; Institute of Mathematical Statistics: Hayward, CA, USA, 1987. [Google Scholar]
- Gill, P.; Murray, W.; Wright, M. Numerical Linear Algebra and Optimization; Addison Wesley: Boston, MA, USA, 1991; Volume 1. [Google Scholar]
- Casella, G.; Berger, R. Statistical Inference, 2nd ed; Cengage Learning: Hampshire, UK, 2002. [Google Scholar]
- Snipes, J.A.; Greenwald, M.; Ryter, F.; Kardaun, O.J.W.F.; Stober, J.; Valovic, M.; Valovic, S.J.; Sykes, A.; Dnestrovskij, A.; Walsh, M.; et al. Multi-Machine global confinement and H-mode threshold analysis. Proceedings of the 19th. IAEA Fusion Energy Conference, Lyon, France, 14–19 October 2002.
- Martin, Y.R.; Takizuka, T. The ITPA CDBM H-mode Threshold Database Working Group. Power requirements for accessing the H-mode in ITER. J. Phys. Conf. Ser.
**2008**, 123, 012033. [Google Scholar] - Ryter, F. The H-Mode Database Working Group. H Mode power threshold database for ITER. Nucl. Fusion.
**1996**, 36, 1217–1264. [Google Scholar] - Ryter, F. The H-Mode Threshold Database Group. Progress of the international H-Mode power threshold database activity. Plasma Phys. Control. Fusion.
**2002**, 44, A415–A421. [Google Scholar] - ITPA—Threshold database. Available online: http://efdasql.ipp.mpg.de/threshold accessed on 30 June 2015.
- Whereas the most recent update of the database dates from 2008 [24], we used the earlier version from 2002, because it allows a better illustration of the advantages of GLS with respect to other methods. The reason is that the data in the most recent version is significantly better conditioned, in which case even a simple regression technique such as OLS turns out to be able to provide acceptable estimates of the regression parameters. This point is not relevant for the present discussion, as here our aim is to demonstrate the advantages of GLS in cases where the data are not in the best shape.
- Verdoolaege, G.; Karagounis, G.; Tendler, M.; van Oost, G. Pattern recognition in probability spaces for visualization and identification of plasma confinement regimes and confinement time scaling. Plasma Phys. Control. Fusion.
**2012**, 54, 124006. [Google Scholar] - Preuss, R.; Dose, V. Errors in all variables. AIP Conf. Proc.
**2005**, 803, 448–455. [Google Scholar] - Markovsky, I.; van Huffel, S. Overview of total least-squares methods. Signal Process.
**2007**, 87, 2283–2302. [Google Scholar] - Maronna, R.; Martin, D.; Yohai, V. Robust Statistics: Theory and Methods; Wiley: New York, NY, USA, 2006. [Google Scholar]
- MATLAB and Statistics Toolbox Release 2015a; The Mathworks Inc: Natick, MA, USA, 2015.
- We use the notation η for the response variable instead of P
_{thr}because in this experiment η is generated artificially and therefore it is not necessarily related to the actual power threshold in fusion devices. - Von Toussaint, U.; Frey, M.; Gori, S. Fitting of functions with uncertainties in dependent and independent variables. AIP Conf. Proc.
**2009**, 1193, 302–310. [Google Scholar] - OLS is not repeated here because it does not depend on the error bars.
- Pennec, X. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. J. Math. Imaging Vis.
**2006**, 25, 127–154. [Google Scholar]

**Figure 1.**(

**a**) Illustration of the Poincaré half-plane with several half-circle geodesics in the background, together with the geodesic between the points p

_{1}and p

_{2}and between p

_{3}and p

_{4}, defined in the main text. (

**b**) Probability densities corresponding to the points p

_{1}, p

_{2}, p

_{3}and p

_{4}indicated in (a). The densities associated with some intermediate points on the geodesics between p

_{1}and p

_{2}and between p

_{3}and p

_{4}are also drawn. (

**c**) Rendering of one blade of the tractroid, again with the two geodesics superimposed. The parallels of the tractroid are lines of constant standard deviation σ, while the meridians (the tractrices) are lines of constant mean µ. This representation of the normal manifold is periodic in the µ-direction, and a rescaled version (longer period along µ) is shown in (

**d**).

**Figure 2.**A portion of the pseudosphere together with the regression results on synthetic data with an outlier, as described in the main text.

**Figure 3.**(

**a**) Histograms of the relative error in estimating the regression coefficients β

_{i}by means of OLS, MAP and GLS for a linear regression problem with outliers. Horizontal axes represent the error in percent and vertical axes probability, normalized to one. (

**b**) Similar, for TLS and ROB.

**Figure 4.**Histograms of the relative error in estimating the regression coefficients β

_{i}by means of OLS, MAP and GLS for a power-law regression problem after a logarithmic transformation. Horizontal axes represent the error in percent and vertical axes probability, normalized to one. (

**b**) Similar, for TLS and ROB.

**Table 1.**Monte Carlo estimates of the mean and standard deviation for the slope parameter in linear regression with errors on both variables and one outlier. GLS, geodesic least squares regression; TLS, total least squares; ROB, robust method.

Original | GLS | OLS | MAP | TLS | ROB |
---|---|---|---|---|---|

β = 3.00 | 3.031 ± 0.035 | 3.68 ± 0.29 | 3.83 ± 0.36 | 4.6 ± 1.0 | 2.992 ± 0.041 |

**Table 2.**Monte Carlo estimates of the mean and standard deviation for the parameters in a log-linear regression experiment with proportional additive noise on both variables.

Parameter | Original | GLS | OLS | MAP | TLS | ROB |
---|---|---|---|---|---|---|

β_{0} | 0.80 | 0.94 ± 0.47 | 2.2 ± 2.3 | 3.0 ± 1.7 | 0.99 ± 0.70 | 2.72 ± 0.77 |

β_{1} | 1.40 | 1.39 ± 0.11 | 1.19 ± 0.16 | 1.08 ± 0.26 | 1.41 ± 0.14 | 1.17 ± 0.11 |

**Table 3.**Estimates of regression parameters and predictions for ITER in log-transformed linear scaling of the H-mode threshold power using the IAEA02 dataset. The bootstrap averages are given, as well as the 95% credible intervals (CI).

Method | ${\widehat{\beta}}_{0}$ | ${\widehat{\beta}}_{1}$ | ${\widehat{\beta}}_{2}$ | ${\widehat{\beta}}_{3}$ | ${\widehat{P}}_{\text{thr},0.5}(\mathrm{MW})$ | ${\widehat{P}}_{\text{thr},1.0}(\mathrm{MW})$ | |
---|---|---|---|---|---|---|---|

OLS | Average | 0.0507 | 0.485 | 0.873 | 0.843 | 38.0 | 53.2 |

CI | ±0.0060 | ±0.073 | ±0.061 | ±0.041 | ±4.4 | ±8.0 | |

MAP | Average | 0.0449 | 0.567 | 0.867 | 0.901 | 45.6 | 67.6 |

CI | ±0.0051 | ±0.078 | ±0.069 | ±0.039 | ±5.0 | ±9.6 | |

GLS | Average | 0.0426 | 0.660 | 0.795 | 0.946 | 48.3 | 76.4 |

CI | ±0.0042 | ±0.069 | ±0.059 | ±0.034 | ±4.7 | ±9.8 |

**Table 4.**Estimates of the observed standard deviations σ

_{obs,α}of the logarithmic power threshold, expressed as percentage errors on P

_{thr}itself, for the tokamaks contributing to the IAEA02 dataset, obtained using log-transformed linear scaling. The bootstrap averages are given, as well as the 95% credible intervals (CI).

ASDEX | AUG | CMOD | DIII-D | JET | JFT-2M | JT-60U | PBXM | |
---|---|---|---|---|---|---|---|---|

Average (%) | 41.8 | 23.0 | 22.0 | 15.7 | 24.6 | 15.9 | 22.8 | 27.6 |

CI (%) | ±5.3 | ±1.4 | ±1.1 | ±1.8 | ±2.0 | ±1.2 | ±2.3 | ±2.9 |

**Table 5.**Estimates of regression parameters and predictions for ITER in power-law scaling on the original scale of the H-mode threshold power using the IAEA02 dataset. The bootstrap averages are given, as well as the 95% credible intervals (CI).

Method | ${\widehat{\mathrm{\beta}}}_{0}$ | ${\widehat{\mathrm{\beta}}}_{1}$ | ${\widehat{\mathrm{\beta}}}_{2}$ | ${\widehat{\mathrm{\beta}}}_{3}$ | ${\widehat{\mathrm{P}}}_{\text{thr},0.5}(\mathrm{MW})$ | ${\widehat{\mathrm{P}}}_{\text{thr},1.0}(\mathrm{MW})$ | |
---|---|---|---|---|---|---|---|

OLS | Average | 0.0274 | 0.773 | 0.96 | 1.038 | 69 | 118 |

CI | ±0.0083 | ±0.090 | ±0.10 | ±0.071 | ±15 | ±32 | |

MAP | Average | 0.0425 | 0.643 | 0.788 | 0.933 | 44.2 | 69.1 |

CI | ±0.0041 | ±0.074 | ±0.079 | ±0.034 | ±3.8 | ±8.2 | |

GLS | Average | 0.0397 | 0.715 | 0.751 | 0.984 | 51.6 | 84.7 |

CI | ±0.0036 | ±0.071 | ±0.081 | ±0.031 | ±4.0 | ±8.8 |

**Table 6.**Estimates of the observed standard deviations σ

_{obs}

_{,α}of the power threshold P

_{thr}, expressed as percentage errors, for the machines contributing to the IAEA02 dataset, obtained using power-law scaling. The bootstrap averages are given, as well as the 95% credible intervals (CI).

ASDEX | AUG | CMOD | DIII-D | JET | JFT-2M | JT-60U | PBXM | |
---|---|---|---|---|---|---|---|---|

Average (%) | 35.8 | 21.2 | 20.4 | 15.9 | 22.4 | 15.7 | 22.3 | 27.7 |

CI (%) | ±9.1 | ±4.3 | ±3.4 | ±2.4 | ±3.8 | ±2.2 | ±4.6 | ±8.1 |

**Table 7.**Estimates of regression parameters and predictions for ITER in power-law scaling on the original scale of the H-mode threshold power using the IAEA02 dataset with all error bars (on the root quantities) doubled.

Method | ${\widehat{\mathrm{\beta}}}_{0}$ | ${\widehat{\mathrm{\beta}}}_{1}$ | ${\widehat{\mathrm{\beta}}}_{2}$ | ${\widehat{\mathrm{\beta}}}_{3}$ | ${\widehat{\mathrm{P}}}_{\text{thr},0.5}(\mathrm{MW})$ | ${\widehat{\mathrm{P}}}_{\text{thr},1.0}(\mathrm{MW})$ |
---|---|---|---|---|---|---|

MAP | 0.0436 | 0.581 | 0.828 | 0.900 | 41.0 | 61.3 |

GLS | 0.0393 | 0.725 | 0.742 | 0.990 | 52.1 | 86.2 |

**Table 8.**Estimates of the observed standard deviations σ

_{obs,α}of the power threshold P

_{thr}, expressed as percentage errors, for the machines contributing to the IAEA02 dataset with all error bars doubled, obtained using power-law scaling.

ASDEX | AUG | CMOD | DIII-D | JET | JFT-2M | JT-60U | PBXM |
---|---|---|---|---|---|---|---|

49.5 | 35.9 | 31.7 | 24.9 | 32.9 | 27.6 | 38.9 | 47.7 |

**Table 9.**Estimates of regression parameters and predictions for ITER in power-law scaling on the original scale of the H-mode threshold power using the IAEA02 dataset with averaged error bars.

Method | ${\widehat{\mathrm{\beta}}}_{0}$ | ${\widehat{\mathrm{\beta}}}_{1}$ | ${\widehat{\mathrm{\beta}}}_{2}$ | ${\widehat{\mathrm{\beta}}}_{3}$ | ${\widehat{\mathrm{P}}}_{\text{thr},0.5}(\mathrm{MW})$ | ${\widehat{\mathrm{P}}}_{\text{thr},1.0}(\mathrm{MW})$ |
---|---|---|---|---|---|---|

MAP | 0.0488 | 0.552 | 0.807 | 0.862 | 35.1 | 51.5 |

GLS | 0.0429 | 0.647 | 0.780 | 0.938 | 45.7 | 71.5 |

**Table 10.**Estimates of the observed standard deviations σ

_{obs}

_{,α}of the power threshold P

_{thr}, expressed as percentage errors, for the machines contributing to the IAEA02 dataset with averaged error bars, obtained using power-law scaling.

ASDEX | AUG | CMOD | DIII-D | JET | JFT-2M | JT-60U | PBXM |
---|---|---|---|---|---|---|---|

49.5 | 35.9 | 31.7 | 24.9 | 32.9 | 27.6 | 38.9 | 47.7 |

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Verdoolaege, G.
A New Robust Regression Method Based on Minimization of Geodesic Distances on a Probabilistic Manifold: Application to Power Laws. *Entropy* **2015**, *17*, 4602-4626.
https://doi.org/10.3390/e17074602

**AMA Style**

Verdoolaege G.
A New Robust Regression Method Based on Minimization of Geodesic Distances on a Probabilistic Manifold: Application to Power Laws. *Entropy*. 2015; 17(7):4602-4626.
https://doi.org/10.3390/e17074602

**Chicago/Turabian Style**

Verdoolaege, Geert.
2015. "A New Robust Regression Method Based on Minimization of Geodesic Distances on a Probabilistic Manifold: Application to Power Laws" *Entropy* 17, no. 7: 4602-4626.
https://doi.org/10.3390/e17074602