Next Issue
Volume 3, September
Previous Issue
Volume 3, March
 
 

Stats, Volume 3, Issue 2 (June 2020) – 6 articles

Cover Story (view full-size image): Restricted mean survival time (RMST) is experiencing a renaissance and is advocated as a model-free, easy-to-interpret alternative to proportional hazard regression. Estimation of RMST and associated variance is mainly done by numerical integration of Kaplan–Meier curves. Pseudo-observations and the flexible parametric survival method are the two main alternatives to the Kaplan–Meier method. Flexible parametric survival methods outperform efficacy-wise both competitors, but the differences are small to negligible. RMST estimation and associated variance can be done with any of the three methods depending on the needs and aims of the researchers. The Kaplan–Meier method is the easiest to implement, flexible parametric survival models are the most powerful, while pseudo-observations allow for adjustment for relevant co-variates. View this paper.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
8 pages, 269 KiB  
Article
Generalized Mutual Information
by Zhiyi Zhang
Stats 2020, 3(2), 158-165; https://doi.org/10.3390/stats3020013 - 10 Jun 2020
Cited by 5 | Viewed by 3103
Abstract
Mutual information is one of the essential building blocks of information theory. It is however only finitely defined for distributions in a subclass of the general class of all distributions on a joint alphabet. The unboundedness of mutual information prevents its potential utility [...] Read more.
Mutual information is one of the essential building blocks of information theory. It is however only finitely defined for distributions in a subclass of the general class of all distributions on a joint alphabet. The unboundedness of mutual information prevents its potential utility from being extended to the general class. This is in fact a void in the foundation of information theory that needs to be filled. This article proposes a family of generalized mutual information whose members are indexed by a positive integer n, with the nth member being the mutual information of nth order. The mutual information of the first order coincides with Shannon’s, which may or may not be finite. It is however established (a) that each mutual information of an order greater than 1 is finitely defined for all distributions of two random elements on a joint countable alphabet, and (b) that each and every member of the family enjoys all the utilities of a finite Shannon’s mutual information. Full article
21 pages, 4758 KiB  
Article
Modelling the Behaviour of Currency Exchange Rates with Singular Spectrum Analysis and Artificial Neural Networks
by Paulo Canas Rodrigues, Olushina Olawale Awe, Jonatha Sousa Pimentel and Rahim Mahmoudvand
Stats 2020, 3(2), 137-157; https://doi.org/10.3390/stats3020012 - 01 Jun 2020
Cited by 9 | Viewed by 2767
Abstract
A proper understanding and analysis of suitable models involved in forecasting currency exchange rates dynamics is essential to provide reliable information about the economy. This paper deals with model fit and model forecasting of eight time series of historical data about currency exchange [...] Read more.
A proper understanding and analysis of suitable models involved in forecasting currency exchange rates dynamics is essential to provide reliable information about the economy. This paper deals with model fit and model forecasting of eight time series of historical data about currency exchange rate considering the United States dollar as reference. The time series techniques: classical autoregressive integrated moving average model, the non-parametric univariate and multivariate singular spectrum analysis (SSA), artificial neural network (ANN) algorithms, and a recent prominent hybrid method that combines SSA and ANN, are considered and their performance compared in terms of model fit and model forecasting. Moreover, specific methodological and computational adaptations were conducted to allow for these analyses and comparisons. Full article
(This article belongs to the Special Issue Time Series Analysis and Forecasting)
Show Figures

Figure 1

17 pages, 2663 KiB  
Article
A-Spline Regression for Fitting a Nonparametric Regression Function with Censored Data
by Ersin Yılmaz, Syed Ejaz Ahmed and Dursun Aydın
Stats 2020, 3(2), 120-136; https://doi.org/10.3390/stats3020011 - 29 May 2020
Cited by 2 | Viewed by 1947
Abstract
This paper aims to solve the problem of fitting a nonparametric regression function with right-censored data. In general, issues of censorship in the response variable are solved by synthetic data transformation based on the Kaplan–Meier estimator in the literature. In the context of [...] Read more.
This paper aims to solve the problem of fitting a nonparametric regression function with right-censored data. In general, issues of censorship in the response variable are solved by synthetic data transformation based on the Kaplan–Meier estimator in the literature. In the context of synthetic data, there have been different studies on the estimation of right-censored nonparametric regression models based on smoothing splines, regression splines, kernel smoothing, local polynomials, and so on. It should be emphasized that synthetic data transformation manipulates the observations because it assigns zero values to censored data points and increases the size of the observations. Thus, an irregularly distributed dataset is obtained. We claim that adaptive spline (A-spline) regression has the potential to deal with this irregular dataset more easily than the smoothing techniques mentioned here, due to the freedom to determine the degree of the spline, as well as the number and location of the knots. The theoretical properties of A-splines with synthetic data are detailed in this paper. Additionally, we support our claim with numerical studies, including a simulation study and a real-world data example. Full article
Show Figures

Figure 1

13 pages, 331 KiB  
Article
A Brief Overview of Restricted Mean Survival Time Estimators and Associated Variances
by Szilárd Nemes, Erik Bülow and Andreas Gustavsson
Stats 2020, 3(2), 107-119; https://doi.org/10.3390/stats3020010 - 26 May 2020
Cited by 10 | Viewed by 4303
Abstract
Restricted Mean Survival Time ( R M S T ) experiences a renaissance and is advocated as a model-free, easy to interpret alternative to proportional hazards regression and hazard rates with implication in causal inference. Estimation of R M S T and associated [...] Read more.
Restricted Mean Survival Time ( R M S T ) experiences a renaissance and is advocated as a model-free, easy to interpret alternative to proportional hazards regression and hazard rates with implication in causal inference. Estimation of R M S T and associated variance is mainly done by numerical integration of Kaplan–Meier curves. In this paper we briefly review the two main alternatives to the Kaplan–Meier method; analysis based on pseudo-observations, and the flexible parametric survival method. Using computer simulations, we assess the efficacy of the three methods compared to a fully parametric approach where the distribution of survival times is known. Thereafter, the three methods are directly compared without any distributional assumption for the survival data. Generally, flexible parametric survival methods outperform both competitors, however the differences are small. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

13 pages, 382 KiB  
Article
Depth Induced Regression Medians and Uniqueness
by Yijun Zuo
Stats 2020, 3(2), 94-106; https://doi.org/10.3390/stats3020009 - 10 Apr 2020
Cited by 1 | Viewed by 1655
Abstract
The notion of median in one dimension is a foundational element in nonparametric statistics. It has been extended to multi-dimensional cases both in location and in regression via notions of data depth. Regression depth (RD) and projection regression depth (PRD) represent the two [...] Read more.
The notion of median in one dimension is a foundational element in nonparametric statistics. It has been extended to multi-dimensional cases both in location and in regression via notions of data depth. Regression depth (RD) and projection regression depth (PRD) represent the two most promising notions in regression. Carrizosa depth D C is another depth notion in regression. Depth-induced regression medians (maximum depth estimators) serve as robust alternatives to the classical least squares estimator. The uniqueness of regression medians is indispensable in the discussion of their properties and the asymptotics (consistency and limiting distribution) of sample regression medians. Are the regression medians induced from RD, PRD, and D C unique? Answering this question is the main goal of this article. It is found that only the regression median induced from PRD possesses the desired uniqueness property. The conventional remedy measure for non-uniqueness, taking average of all medians, might yield an estimator that no longer possesses the maximum depth in both RD and D C cases. These and other findings indicate that the PRD and its induced median are highly favorable among their leading competitors. Full article
Show Figures

Figure 1

10 pages, 505 KiB  
Article
The Prediction of Batting Averages in Major League Baseball
by Sarah R. Bailey, Jason Loeppky and Tim B. Swartz
Stats 2020, 3(2), 84-93; https://doi.org/10.3390/stats3020008 - 03 Apr 2020
Cited by 7 | Viewed by 6551
Abstract
The prediction of yearly batting averages in Major League Baseball is a notoriously difficult problem where standard errors using the well-known PECOTA (Player Empirical Comparison and Optimization Test Algorithm) system are roughly 20 points. This paper considers the use of ball-by-ball data provided [...] Read more.
The prediction of yearly batting averages in Major League Baseball is a notoriously difficult problem where standard errors using the well-known PECOTA (Player Empirical Comparison and Optimization Test Algorithm) system are roughly 20 points. This paper considers the use of ball-by-ball data provided by the Statcast system in an attempt to predict batting averages. The publicly available Statcast data and resultant predictions supplement proprietary PECOTA forecasts. With detailed Statcast data, we attempt to account for a luck component involving batting averages. It is anticipated that the luck component will not be repeated in future seasons. The two predictions (Statcast and PECOTA) are combined via simple linear regression to provide improved forecasts of batting average. Full article
(This article belongs to the Section Data Science)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop