Next Issue
Volume 6, September
Previous Issue
Volume 6, March
 
 

Stats, Volume 6, Issue 2 (June 2023) – 16 articles

Cover Story (view full-size image): The AUC is routinely used to determine how strongly a given model discriminates between the levels of a binary outcome. This approach is straightforward to apply, visualize, and interpret and has remained a popular tool for decades. Standard inference with the AUC requires that outcomes be independent of each other, so underlying multi-level designs make the generation of confidence intervals (CIs) challenging. This manuscript presents an approach so that valid CIs for the AUC may be calculated in a three-level hierarchical setting. The performance of CIs around the AUC was assessed through simulation. Case studies are presented so that interested readers may better understand the application and flexibility of this approach and use it in their own research. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
28 pages, 737 KiB  
Article
The Gamma-Topp-Leone-Type II-Exponentiated Half Logistic-G Family of Distributions with Applications
by Broderick Oluyede and Thatayaone Moakofi
Stats 2023, 6(2), 706-733; https://doi.org/10.3390/stats6020045 - 19 Jun 2023
Viewed by 1046
Abstract
The new Ristić and Balakhrisnan or Gamma-Topp-Leone-Type II-Exponentiated Half Logistic-G (RB-TL-TII-EHL-G) family of distributions is introduced and investigated in this paper. This work derives and studies some of the main statistical characteristics of this new family of distributions. The maximum likelihood estimation technique [...] Read more.
The new Ristić and Balakhrisnan or Gamma-Topp-Leone-Type II-Exponentiated Half Logistic-G (RB-TL-TII-EHL-G) family of distributions is introduced and investigated in this paper. This work derives and studies some of the main statistical characteristics of this new family of distributions. The maximum likelihood estimation technique is used to estimate the model parameters, and a simulation study is used to assess the consistency of the estimators. Applications to three real-life datasets from various fields show the value and adaptability of the new RB-TL-TII-EHL-G family of distributions. From our results, it is evident that the new proposed distribution is flexible enough to characterize datasets from different fields compared to several other existing distributions in the literature. Full article
Show Figures

Figure 1

17 pages, 475 KiB  
Article
Modeling Model Misspecification in Structural Equation Models
by Alexander Robitzsch
Stats 2023, 6(2), 689-705; https://doi.org/10.3390/stats6020044 - 14 Jun 2023
Cited by 3 | Viewed by 1285
Abstract
Structural equation models constrain mean vectors and covariance matrices and are frequently applied in the social sciences. Frequently, the structural equation model is misspecified to some extent. In many cases, researchers nevertheless intend to work with a misspecified target model of interest. In [...] Read more.
Structural equation models constrain mean vectors and covariance matrices and are frequently applied in the social sciences. Frequently, the structural equation model is misspecified to some extent. In many cases, researchers nevertheless intend to work with a misspecified target model of interest. In this article, a simultaneous statistical inference for sampling errors and model misspecification errors is discussed. A modified formula for the variance matrix of the parameter estimate is obtained by imposing a stochastic model for model errors and applying M-estimation theory. The presence of model errors is quantified in increased standard errors in parameter estimates. The proposed inference is illustrated with several analytical examples and an empirical application. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
15 pages, 1505 KiB  
Article
Area under the Curve as an Alternative to Latent Growth Curve Modeling When Assessing the Effects of Predictor Variables on Repeated Measures of a Continuous Dependent Variable
by Daniel Rodriguez
Stats 2023, 6(2), 674-688; https://doi.org/10.3390/stats6020043 - 25 May 2023
Viewed by 1664
Abstract
Researchers conducting longitudinal data analysis in psychology and the behavioral sciences have several statistical methods to choose from, most of which either require specialized software to conduct or advanced knowledge of statistical methods to inform the selection of the correct model options (e.g., [...] Read more.
Researchers conducting longitudinal data analysis in psychology and the behavioral sciences have several statistical methods to choose from, most of which either require specialized software to conduct or advanced knowledge of statistical methods to inform the selection of the correct model options (e.g., correlation structure). One simple alternative to conventional longitudinal data analysis methods is to calculate the area under the curve (AUC) from repeated measures and then use this new variable in one’s model. The present study assessed the relative efficacy of two AUC measures: the AUC with respect to the ground (AUC-g) and the AUC with respect to the increase (AUC-i) in comparison to latent growth curve modeling (LGCM), a popular repeated measures data analysis method. Using data from the ongoing Panel Study of Income Dynamics (PSID), we assessed the effects of four predictor variables on repeated measures of social anxiety, using both the AUC and LGCM. We used the full information maximum likelihood (FIML) method to account for missing data in LGCM and multiple imputation to account for missing data in the calculation of both AUC measures. Extracting parameter estimates from these models, we next conducted Monte Carlo simulations to assess the parameter bias and power (two estimates of performance) of both methods in the same models, with sample sizes ranging from 741 to 50. The results using both AUC measures in the initial models paralleled those of LGCM, particularly with respect to the LGCM baseline. With respect to the simulations, both AUC measures preformed as well or even better than LGCM in all sample sizes assessed. These results suggest that the AUC may be a viable alternative to LGCM, especially for researchers with less access to the specialized software necessary to conduct LGCM. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

17 pages, 458 KiB  
Article
A New Extended Weibull Distribution with Application to Influenza and Hepatitis Data
by Gauss M. Cordeiro, Elisângela C. Biazatti and Luís H. de Santana
Stats 2023, 6(2), 657-673; https://doi.org/10.3390/stats6020042 - 19 May 2023
Cited by 1 | Viewed by 1607
Abstract
The Weibull is a popular distribution that models monotonous failure rate data. In this work, we introduce the four-parameter Weibull extended Weibull distribution that presents greater flexibility, thus modeling data with bathtub-shaped and unimodal failure rate. Some of its mathematical properties such as [...] Read more.
The Weibull is a popular distribution that models monotonous failure rate data. In this work, we introduce the four-parameter Weibull extended Weibull distribution that presents greater flexibility, thus modeling data with bathtub-shaped and unimodal failure rate. Some of its mathematical properties such as quantile function, linear representation and moments are provided. The maximum likelihood estimation is adopted to estimate its parameters, and the log-Weibull extended Weibull regression model is presented. In addition, some simulations are carried out to show the consistency of the estimators. We prove the greater flexibility and performance of this distribution and the regression model through applications to influenza and hepatitis data. The new models perform much better than some of their competitors. Full article
(This article belongs to the Section Regression Models)
Show Figures

Figure 1

14 pages, 1517 KiB  
Article
Interval-Censored Regression with Non-Proportional Hazards with Applications
by Fábio Prataviera, Elizabeth M. Hashimoto, Edwin M. M. Ortega, Taciana V. Savian and Gauss M. Cordeiro
Stats 2023, 6(2), 643-656; https://doi.org/10.3390/stats6020041 - 17 May 2023
Viewed by 1052
Abstract
Proportional hazards models and, in some situations, accelerated failure time models, are not suitable for analyzing data when the failure ratio between two individuals is not constant. We present a Weibull accelerated failure time model with covariables on the location and scale parameters. [...] Read more.
Proportional hazards models and, in some situations, accelerated failure time models, are not suitable for analyzing data when the failure ratio between two individuals is not constant. We present a Weibull accelerated failure time model with covariables on the location and scale parameters. By considering the effects of covariables not only on the location parameter, but also on the scale, a regression should be able to adequately describe the difference between treatments. In addition, the deviance residuals adapted for data with the interval censored and the exact time of failure proved to be satisfactory to verify the fit of the model. This information favors the Weibull regression as an alternative to the proportional hazards models without masking the effect of the explanatory variables. Full article
(This article belongs to the Section Survival Analysis)
Show Figures

Figure 1

17 pages, 3203 KiB  
Article
Climate Change: Linear and Nonlinear Causality Analysis
by Jiecheng Song and Merry Ma
Stats 2023, 6(2), 626-642; https://doi.org/10.3390/stats6020040 - 15 May 2023
Viewed by 1736
Abstract
The goal of this study is to detect linear and nonlinear causal pathways toward climate change as measured by changes in global mean surface temperature and global mean sea level over time using a data-based approach in contrast to the traditional physics-based models. [...] Read more.
The goal of this study is to detect linear and nonlinear causal pathways toward climate change as measured by changes in global mean surface temperature and global mean sea level over time using a data-based approach in contrast to the traditional physics-based models. Monthly data on potential climate change causal factors, including greenhouse gas concentrations, sunspot numbers, humidity, ice sheets mass, and sea ice coverage, from January 2003 to December 2021, have been utilized in the analysis. We first applied the vector autoregressive model (VAR) and Granger causality test to gauge the linear Granger causal relationships among climate factors. We then adopted the vector error correction model (VECM) as well as the autoregressive distributed lag model (ARDL) to quantify the linear long-run equilibrium and the linear short-term dynamics. Cointegration analysis has also been adopted to examine the dual directional Granger causalities. Furthermore, in this work, we have presented a novel pipeline based on the artificial neural network (ANN) and the VAR and ARDL models to detect nonlinear causal relationships embedded in the data. The results in this study indicate that the global sea level rise is affected by changes in ice sheet mass (both linearly and nonlinearly), global mean temperature (nonlinearly), and the extent of sea ice coverage (nonlinearly and weakly); whereas the global mean temperature is affected by the global surface mean specific humidity (both linearly and nonlinearly), greenhouse gas concentration as measured by the global warming potential (both linearly and nonlinearly) and the sunspot number (only nonlinearly and weakly). Furthermore, the nonlinear neural network models tend to fit the data closer than the linear models as expected due to the increased parameter dimension of the neural network models. Given that the information criteria are not generally applicable to the comparison of neural network models and statistical time series models, our next step is to examine the robustness and compare the forecast accuracy of these two models using the soon-available 2022 monthly data. Full article
(This article belongs to the Special Issue Modern Time Series Analysis II)
Show Figures

Figure 1

9 pages, 272 KiB  
Article
Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research
by Sixia Chen, Alexandra May Woodruff, Janis Campbell, Sara Vesely, Zheng Xu and Cuyler Snider
Stats 2023, 6(2), 617-625; https://doi.org/10.3390/stats6020039 - 08 May 2023
Cited by 1 | Viewed by 1759
Abstract
Nonprobability samples have been used frequently in practice including public health study, economics, education, and political polls. Naïve estimates based on nonprobability samples without any further adjustments may suffer from serious selection bias. Mass imputation has been shown to be effective in practice [...] Read more.
Nonprobability samples have been used frequently in practice including public health study, economics, education, and political polls. Naïve estimates based on nonprobability samples without any further adjustments may suffer from serious selection bias. Mass imputation has been shown to be effective in practice to improve the representativeness of nonprobability samples. It builds an imputation model based on nonprobability samples and generates imputed values for all units in the probability samples. In this paper, we compare two mass imputation approaches including latent joint multivariate normal model mass imputation (e.g., Generalized Efficient Regression-Based Imputation with Latent Processes (GERBIL)) and fully conditional specification (FCS) procedures for integrating multiple outcome variables simultaneously. The Monte Carlo simulation study shows the benefits of GERBIL and FCS with predictive mean matching in terms of balancing the Monte Carlo bias and variance. We further evaluate our proposed method by combining the information from Tribal Behavioral Risk Factor Surveillance System and Behavioral Risk Factor Surveillance System data files. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
21 pages, 2395 KiB  
Review
Big Data Analytics and Machine Learning in Supply Chain 4.0: A Literature Review
by Elena Barzizza, Nicolò Biasetton, Riccardo Ceccato and Luigi Salmaso
Stats 2023, 6(2), 596-616; https://doi.org/10.3390/stats6020038 - 05 May 2023
Cited by 1 | Viewed by 3039
Abstract
Owing to the development of the technologies of Industry 4.0, recent years have witnessed the emergence of a new concept of supply chain management, namely Supply Chain 4.0 (SC 4.0). Huge investments in information technology have enabled manufacturers to trace the intangible flow [...] Read more.
Owing to the development of the technologies of Industry 4.0, recent years have witnessed the emergence of a new concept of supply chain management, namely Supply Chain 4.0 (SC 4.0). Huge investments in information technology have enabled manufacturers to trace the intangible flow of information, but instruments are required to take advantage of the available data sources: big data analytics (BDA) and machine learning (ML) represent important tools for this task. Use of advanced technologies can improve supply chain performances and support reaching strategic goals, but their implementation is challenging in supply chain management. The aim of this study was to understand the main benefits, challenges, and areas of application of BDA and ML in SC 4.0 as well as to understand the BDA and ML techniques most commonly used in the field, with a particular focus on nonparametric techniques. To this end, we carried out a literature review. From our analysis, we identified three main gaps, namely, the need for appropriate analytical tools to manage challenging data configurations; the need for a more reliable link with practice; the need for instruments to select the most suitable BDA or ML techniques. As a solution, we suggest and comment on two viable solutions: nonparametric statistics, and sentiment analysis and clustering. Full article
Show Figures

Figure 1

20 pages, 329 KiB  
Article
Game-Theoretic Models of Coopetition in Cournot Oligopoly
by Guennady Ougolnitsky and Alexey Korolev
Stats 2023, 6(2), 576-595; https://doi.org/10.3390/stats6020037 - 04 May 2023
Cited by 1 | Viewed by 1319
Abstract
Coopetition means that in economic interactions, both competition and cooperation are presented in the same time. We built and investigated analytically and numerically game theoretic models of coopetition in normal form and in the form of characteristic function. The basic model in normal [...] Read more.
Coopetition means that in economic interactions, both competition and cooperation are presented in the same time. We built and investigated analytically and numerically game theoretic models of coopetition in normal form and in the form of characteristic function. The basic model in normal form reflects competition between firms in Cournot oligopoly and their cooperation in mutually profitable activities such as marketing, R&D, and environmental protection. Each firm divides its resource between competition and cooperation. In the model in normal form we study Nash and Stackelberg settings and compare the results. In cooperative setting we consider Neumann–Morgenstern, Petrosyan–Zaccour, and Gromova–Petrosyan versions of characteristic functions and calculate the respective Shapley values. The payoffs in all cases are compared, and the respective conclusions about the relative efficiency of different ways of organization for separate agents and the whole society are made. Full article
24 pages, 732 KiB  
Article
Causal Inference in Threshold Regression and the Neural Network Extension (TRNN)
by Yiming Chen, Paul J. Smith and Mei-Ling Ting Lee
Stats 2023, 6(2), 552-575; https://doi.org/10.3390/stats6020036 - 28 Apr 2023
Viewed by 1581
Abstract
The first-hitting-time based model conceptualizes a random process for subjects’ latent health status. The time-to-event outcome is modeled as the first hitting time of the random process to a pre-specified threshold. Threshold regression with linear predictors has numerous benefits in causal survival analysis, [...] Read more.
The first-hitting-time based model conceptualizes a random process for subjects’ latent health status. The time-to-event outcome is modeled as the first hitting time of the random process to a pre-specified threshold. Threshold regression with linear predictors has numerous benefits in causal survival analysis, such as the estimators’ collapsibility. We propose a neural network extension of the first-hitting-time based threshold regression model. With the flexibility of neural networks, the extended threshold regression model can efficiently capture complex relationships among predictors and underlying health processes while providing clinically meaningful interpretations, and also tackle the challenge of high-dimensional inputs. The proposed neural network extended threshold regression model can further be applied in causal survival analysis, such as performing as the Q-model in G-computation. More efficient causal estimations are expected given the algorithm’s robustness. Simulations were conducted to validate estimator collapsibility and threshold regression G-computation. The performance of the neural network extended threshold regression model is also illustrated by using simulated and real high-dimensional data from an observational study. Full article
(This article belongs to the Special Issue Statistical Learning for High-Dimensional Data)
Show Figures

Figure 1

13 pages, 354 KiB  
Article
Adaptations on the Use of p-Values for Statistical Inference: An Interpretation of Messages from Recent Public Discussions
by Eleni Verykouki and Christos T. Nakas
Stats 2023, 6(2), 539-551; https://doi.org/10.3390/stats6020035 - 25 Apr 2023
Cited by 1 | Viewed by 2351
Abstract
P-values have played a central role in the advancement of research in virtually all scientific fields; however, there has been significant controversy over their use. “The ASA president’s task force statement on statistical significance and replicability” has provided a solid basis [...] Read more.
P-values have played a central role in the advancement of research in virtually all scientific fields; however, there has been significant controversy over their use. “The ASA president’s task force statement on statistical significance and replicability” has provided a solid basis for resolving the quarrel, but although the significance part is clearly dealt with, the replicability part raises further discussions. Given the clear statement regarding significance, in this article, we consider the validity of p-value use for statistical inference as de facto. We briefly review the bibliography regarding the relevant controversy in recent years and illustrate how already proposed approaches, or slight adaptations thereof, can be readily implemented to address both significance and reproducibility, adding credibility to empirical study findings. The definitions used for the notions of replicability and reproducibility are also clearly described. We argue that any p-value must be reported along with its corresponding s-value followed by (1α)% confidence intervals and the rejection replication index. Full article
Show Figures

Figure 1

13 pages, 519 KiB  
Article
Evaluation of Risk Prediction with Hierarchical Data: Dependency Adjusted Confidence Intervals for the AUC
by Camden Bay, Robert J Glynn, Johanna M Seddon, Mei-Ling Ting Lee and Bernard Rosner
Stats 2023, 6(2), 526-538; https://doi.org/10.3390/stats6020034 - 24 Apr 2023
Cited by 1 | Viewed by 1111
Abstract
The area under the true ROC curve (AUC) is routinely used to determine how strongly a given model discriminates between the levels of a binary outcome. Standard inference with the AUC requires that outcomes be independent of each other. To overcome this limitation, [...] Read more.
The area under the true ROC curve (AUC) is routinely used to determine how strongly a given model discriminates between the levels of a binary outcome. Standard inference with the AUC requires that outcomes be independent of each other. To overcome this limitation, a method was developed for the estimation of the variance of the AUC in the setting of two-level hierarchical data using probit-transformed prediction scores generated from generalized estimating equation models, thereby allowing for the application of inferential methods. This manuscript presents an extension of this approach so that inference for the AUC may be performed in a three-level hierarchical data setting (e.g., eyes nested within persons and persons nested within families). A method that accounts for the effect of tied prediction scores on inference is also described. The performance of 95% confidence intervals around the AUC was assessed through the simulation of three-level clustered data in multiple settings, including ones with tied data and variable cluster sizes. Across all settings, the actual 95% confidence interval coverage varied from 0.943 to 0.958, and the ratio of the theoretical variance to the empirical variance of the AUC varied from 0.920 to 1.013. The results are better than those from existing methods. Two examples of applying the proposed methodology are presented. Full article
Show Figures

Figure 1

7 pages, 240 KiB  
Article
Recurring Errors in Studies of Gender Differences in Variability
by Theodore P. Hill and Rosalind Arden
Stats 2023, 6(2), 519-525; https://doi.org/10.3390/stats6020033 - 21 Apr 2023
Viewed by 3836
Abstract
The past quarter century has seen a resurgence of research on the controversial topic of gender differences in variability, in part because of its potential implications for the issue of under- and over-representation of various subpopulations of our society, with respect to different [...] Read more.
The past quarter century has seen a resurgence of research on the controversial topic of gender differences in variability, in part because of its potential implications for the issue of under- and over-representation of various subpopulations of our society, with respect to different traits. Unfortunately, several basic statistical, inferential, and logical errors are being propagated in studies on this highly publicized topic. These errors include conflicting interpretations of the numerical significance of actual variance ratio values; a mistaken claim about variance ratios in mixtures of distributions; incorrect inferences from variance ratio values regarding the relative roles of sociocultural and biological factors; and faulty experimental designs. Most importantly, without knowledge of the underlying distributions, the standard variance ratio test statistic is shown to have no implications for tail ratios. The main aim of this note is to correct the scientific record and to illuminate several of these key errors in order to reduce their further propagation. For concreteness, the arguments will focus on one highly influential paper. Full article
13 pages, 13506 KiB  
Article
Detecting Regional Differences in Italian Health Services during Five COVID-19 Waves
by Lucio Palazzo and Riccardo Ievoli
Stats 2023, 6(2), 506-518; https://doi.org/10.3390/stats6020032 - 15 Apr 2023
Cited by 1 | Viewed by 1217
Abstract
During the waves of the COVID-19 pandemic, both national and/or territorial healthcare systems have been severely stressed in many countries. The availability (and complexity) of data requires proper comparisons for understanding differences in the performance of health services. With this aim, we propose [...] Read more.
During the waves of the COVID-19 pandemic, both national and/or territorial healthcare systems have been severely stressed in many countries. The availability (and complexity) of data requires proper comparisons for understanding differences in the performance of health services. With this aim, we propose a methodological approach to compare the performance of the Italian healthcare system at the territorial level, i.e., considering NUTS 2 regions. Our approach consists of three steps: the choice of a distance measure between available time series, the application of weighted multidimensional scaling (wMDS) based on this distance, and, finally, a cluster analysis on the MDS coordinates. We separately consider daily time series regarding the deceased, intensive care units, and ordinary hospitalizations of patients affected by COVID-19. The proposed procedure identifies four clusters apart from two outlier regions. Changes between the waves at a regional level emerge from the main results, allowing the pressure on territorial health services to be mapped between 2020 and 2022. Full article
(This article belongs to the Special Issue Novel Semiparametric Methods)
Show Figures

Figure 1

11 pages, 303 KiB  
Article
Model Selection with Missing Data Embedded in Missing-at-Random Data
by Keiji Takai and Kenichi Hayashi
Stats 2023, 6(2), 495-505; https://doi.org/10.3390/stats6020031 - 11 Apr 2023
Viewed by 1156
Abstract
When models are built with missing data, an information criterion is needed to select the best model among the various candidates. Using a conventional information criterion for missing data may lead to the selection of the wrong model when data are not missing [...] Read more.
When models are built with missing data, an information criterion is needed to select the best model among the various candidates. Using a conventional information criterion for missing data may lead to the selection of the wrong model when data are not missing at random. Conventional information criteria implicitly assume that any subset of missing-at-random data is also missing at random, and thus the maximum likelihood estimator is assumed to be consistent; that is, it is assumed that the estimator will converge to the true value. However, this assumption may not be practical. In this paper, we develop an information criterion that works even for not-missing-at-random data, so long as the largest missing data set is missing at random. Simulations are performed to show the superiority of the proposed information criterion over conventional criteria. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
13 pages, 635 KiB  
Article
The Network Bass Model with Behavioral Compartments
by Giovanni Modanese
Stats 2023, 6(2), 482-494; https://doi.org/10.3390/stats6020030 - 24 Mar 2023
Cited by 1 | Viewed by 1325
Abstract
A Bass diffusion model is defined on an arbitrary network, with the additional introduction of behavioral compartments, such that nodes can have different probabilities of receiving the information/innovation from the source and transmitting it to other nodes. The dynamics are described by a [...] Read more.
A Bass diffusion model is defined on an arbitrary network, with the additional introduction of behavioral compartments, such that nodes can have different probabilities of receiving the information/innovation from the source and transmitting it to other nodes. The dynamics are described by a large system of non-linear ordinary differential equations, whose numerical solutions can be analyzed in dependence on diffusion parameters, network parameters, and relations between the compartments. For example, in a simple case with two compartments (Enthusiasts and Sceptics about the innovation), we consider cases in which the “publicity” and imitation terms act differently on the compartments, and individuals from one compartment do not imitate those of the other, thus increasing the polarization of the system and creating sectors of the population where adoption becomes very slow. For some categories of scale-free networks, we also investigate the dependence on the features of the networks of the diffusion peak time and of the time at which adoptions reach 90% of the population. Full article
(This article belongs to the Section Econometric Modelling)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop