Stats

16 pages, 1949 KiB

Open AccessArticle

A New Biased Estimator to Combat the Multicollinearity of the Gaussian Linear Regression Model

by Issam Dawoud and B. M. Golam Kibria

Stats 2020, 3(4), 526-541; https://doi.org/10.3390/stats3040033 - 06 Nov 2020

Cited by 20 | Viewed by 3249

In a multiple linear regression model, the ordinary least squares estimator is inefficient when the multicollinearity problem exists. Many authors have proposed different estimators to overcome the multicollinearity problem for linear regression models. This paper introduces a new regression estimator, called the Dawoud–Kibria [...] Read more.

In a multiple linear regression model, the ordinary least squares estimator is inefficient when the multicollinearity problem exists. Many authors have proposed different estimators to overcome the multicollinearity problem for linear regression models. This paper introduces a new regression estimator, called the Dawoud–Kibria estimator, as an alternative to the ordinary least squares estimator. Theory and simulation results show that this estimator performs better than other regression estimators under some conditions, according to the mean squares error criterion. The real-life datasets are used to illustrate the findings of the paper. Full article

► Show Figures

Figure 1

16 pages, 979 KiB

Open AccessArticle

On the Number of Independent Pieces of Information in a Functional Linear Model with a Scalar Response

by Eduardo L. Montoya

Stats 2020, 3(4), 510-525; https://doi.org/10.3390/stats3040032 - 05 Nov 2020

Viewed by 1660

Abstract

In a functional linear model (FLM) with scalar response, the parameter curve quantifies the relationship between a functional explanatory variable and a scalar response. While these models can be ill-posed, a penalized regression spline approach may be used to obtain an estimate of [...] Read more.

In a functional linear model (FLM) with scalar response, the parameter curve quantifies the relationship between a functional explanatory variable and a scalar response. While these models can be ill-posed, a penalized regression spline approach may be used to obtain an estimate of the parameter curve. The penalized regression spline estimate will be dependent on the value of a smoothing parameter. However, the ability to obtain a reasonable parameter curve estimate is reliant on how much information is present in the covariate functions for estimating the parameter curve. We propose to quantify the information present in the covariate functions to estimate the parameter curve. In addition, we examine the influence of this information on the stability of the parameter curve estimator and on the performance of smoothing parameter selection methods in a FLM with a scalar response. Full article

► Show Figures

Figure 1

26 pages, 491 KiB

Open AccessArticle

Model Free Inference on Multivariate Time Series with Conditional Correlations

by Dimitrios Thomakos, Johannes Klepsch and Dimitris N. Politis

Stats 2020, 3(4), 484-509; https://doi.org/10.3390/stats3040031 - 03 Nov 2020

Cited by 1 | Viewed by 2428

Abstract

New results on volatility modeling and forecasting are presented based on the NoVaS transformation approach. Our main contribution is that we extend the NoVaS methodology to modeling and forecasting conditional correlation, thus allowing NoVaS to work in a multivariate setting as well. We [...] Read more.

New results on volatility modeling and forecasting are presented based on the NoVaS transformation approach. Our main contribution is that we extend the NoVaS methodology to modeling and forecasting conditional correlation, thus allowing NoVaS to work in a multivariate setting as well. We present exact results on the use of univariate transformations and on their combination for joint modeling of the conditional correlations: we show how the NoVaS transformed series can be combined and the likelihood function of the product can be expressed explicitly, thus allowing for optimization and correlation modeling. While this keeps the original “model-free” spirit of NoVaS it also makes the new multivariate NoVaS approach for correlations “semi-parametric”, which is why we introduce an alternative using cross validation. We also present a number of auxiliary results regarding the empirical implementation of NoVaS based on different criteria for distributional matching. We illustrate our findings using simulated and real-world data, and evaluate our methodology in the context of portfolio management. Full article

(This article belongs to the Special Issue Time Series Analysis and Forecasting)

► Show Figures

Figure 1

9 pages, 286 KiB

Open AccessArticle

A Note on the Nonparametric Estimation of the Conditional Mode by Wavelet Methods

by Salim Bouzebda and Christophe Chesneau

Stats 2020, 3(4), 475-483; https://doi.org/10.3390/stats3040030 - 31 Oct 2020

Cited by 3 | Viewed by 1637

Abstract

The purpose of this note is to introduce and investigate the nonparametric estimation of the conditional mode using wavelet methods. We propose a new linear wavelet estimator for this problem. The estimator is constructed by combining a specific ratio technique and an established [...] Read more.

The purpose of this note is to introduce and investigate the nonparametric estimation of the conditional mode using wavelet methods. We propose a new linear wavelet estimator for this problem. The estimator is constructed by combining a specific ratio technique and an established wavelet estimation method. We obtain rates of almost sure convergence over compact subsets of

R^{d}

. A general estimator beyond the wavelet methodology is also proposed, discussing adaptivity within this statistical framework. Full article

10 pages, 230 KiB

Open AccessArticle

Psychometric Properties of the Adult Self-Report: Data from over 11,000 American Adults

by Michelle Guerrero, Matt Hoffmann and Laura Pulkki-Råback

Stats 2020, 3(4), 465-474; https://doi.org/10.3390/stats3040029 - 29 Oct 2020

Cited by 9 | Viewed by 3468

Abstract

The first purpose of this study was to examine the factor structure of the Adult Self-Report (ASR) via traditional confirmatory factor analysis (CFA) and contemporary exploratory structural equation modeling (ESEM). The second purpose was to examine the measurement invariance of the ASR subscales [...] Read more.

The first purpose of this study was to examine the factor structure of the Adult Self-Report (ASR) via traditional confirmatory factor analysis (CFA) and contemporary exploratory structural equation modeling (ESEM). The second purpose was to examine the measurement invariance of the ASR subscales across age groups. We used baseline data from the Adolescent Brain Cognitive Development study. ASR data from 11,773 participants were used to conduct the CFA and ESEM analyses and data from 11,678 participants were used to conduct measurement invariance testing. Fit indices supported both the CFA and ESEM solutions, with the ESEM solution yielding better fit indices. However, several items in the ESEM solution did not sufficiently load on their intended factors and/or cross-loaded on unintended factors. Results from the measurement invariance analysis suggested that the ASR subscales are robust and fully invariant across subgroups of adults formed on the basis of age (18–35 years vs. 36–59 years). Future research should seek to both CFA and ESEM to provide a more comprehensive assessment of the ASR. Full article

(This article belongs to the Special Issue Statistics in Epidemiology)

21 pages, 889 KiB

Open AccessArticle

Local Processing of Massive Databases with R: A National Analysis of a Brazilian Social Programme

by Hellen Paz, Mateus Maia, Fernando Moraes, Ricardo Lustosa, Lilia Costa, Samuel Macêdo, Marcos E. Barreto and Anderson Ara

Stats 2020, 3(4), 444-464; https://doi.org/10.3390/stats3040028 - 19 Oct 2020

Cited by 4 | Viewed by 3742

Abstract

The analysis of massive databases is a key issue for most applications today and the use of parallel computing techniques is one of the suitable approaches for that. Apache Spark is a widely employed tool within this context, aiming at processing large amounts [...] Read more.

The analysis of massive databases is a key issue for most applications today and the use of parallel computing techniques is one of the suitable approaches for that. Apache Spark is a widely employed tool within this context, aiming at processing large amounts of data in a distributed way. For the Statistics community, R is one of the preferred tools. Despite its growth in the last years, it still has limitations for processing large volumes of data in single local machines. In general, the data analysis community has difficulty to handle a massive amount of data on local machines, often requiring high-performance computing servers. One way to perform statistical analyzes over massive databases is combining both tools (Spark and R) via the sparklyr package, which allows for an R application to use Spark. This paper presents an analysis of Brazilian public data from the Bolsa Família Programme (BFP—conditional cash transfer), comprising a large data set with 1.26 billion observations. Our goal was to understand how this social program acts in different cities, as well as to identify potentially important variables reflecting its utilization rate. Statistical modeling was performed using random forest to predict the utilization rated of BFP. Variable selection was performed through a recent method based on the importance and interpretation of variables in the random forest model. Among the 89 variables initially considered, the final model presented a high predictive performance capacity with 17 selected variables, as well as indicated high importance of some variables for the observed utilization rate in income, education, job informality, and inactive youth, namely: family income, education, occupation and density of people in the homes. In this work, using a local machine, we highlighted the potential of aggregating Spark and R for analysis of a large database of 111.6 GB. This can serve as proof of concept or reference for other similar works within the Statistics community, as well as our case study can provide important evidence for further analysis of this important social support programme. Full article

(This article belongs to the Section Data Science)

► Show Figures

Figure 1

17 pages, 454 KiB

Open AccessArticle

Identification of Judicial Outcomes in Judgments: A Generalized Gini-PLS Approach

by Gildas Tagny-Ngompé, Stéphane Mussard, Guillaume Zambrano, Sébastien Harispe and Jacky Montmain

Stats 2020, 3(4), 427-443; https://doi.org/10.3390/stats3040027 - 27 Sep 2020

Cited by 2 | Viewed by 2344

Abstract

This paper presents and compares several text classification models that can be used to extract the outcome of a judgment from justice decisions, i.e., legal documents summarizing the different rulings made by a judge. Such models can be used to gather important statistics [...] Read more.

This paper presents and compares several text classification models that can be used to extract the outcome of a judgment from justice decisions, i.e., legal documents summarizing the different rulings made by a judge. Such models can be used to gather important statistics about cases, e.g., success rate based on specific characteristics of cases’ parties or jurisdiction, and are therefore important for the development of Judicial prediction not to mention the study of Law enforcement in general. We propose in particular the generalized Gini-PLS which better considers the information in the distribution tails while attenuating, as in the simple Gini-PLS, the influence exerted by outliers. Modeling the studied task as a supervised binary classification, we also introduce the LOGIT-Gini-PLS suited to the explanation of a binary target variable. In addition, various technical aspects regarding the evaluated text classification approaches which consists of combinations of representations of judgments and classification algorithms are studied using an annotated corpora of French justice decisions. Full article

(This article belongs to the Special Issue Interdisciplinary Research on Predictive Justice)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Stats, Volume 3, Issue 4 (December 2020) – 7 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI