Statistical Methods in Data Science and Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: closed (30 November 2023) | Viewed by 22502

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
School of Mathematics and Statistics, Yunnan University, Kunming 650091, China
Interests: Bayesian statistics; empirical likelihood; nonparametric inference; high-dimensional data analysis; longitudinal data analysis; quantile regression; structural equation models
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Statistics, Feng Chia University, NO.100, Wenhwa Road, Seatwen, Taichung 40724, Taiwan
Interests: Missing data regression; Statistical methodology for zero-inflated count data; randomized response design and analysis

Special Issue Information

Dear Colleagues,

With the development of big data, the importance of data science is growing rapidly, leading to a large number of studies in various disciplines. such as mathematics, statistics, computer science, and artificial intelligence. Data science requires modeling, computation, and learning to realize conversion from data to information, from information to knowledge, and from knowledge to decision making. Due to the complexity of big data, such as missing, high- and ultrahigh-dimensional, response-dependent, time series, distributed storage, etc., the existing big data analysis theories, methods, and algorithms are facing huge challenges, particularly in terms of basic statistical theories, methods, and algorithms around estimation, hypothesis testing, confidence interval, and variable selection, including frequentist and Bayesian big data.

Prof. Dr. Niansheng Tang
Prof. Dr. Shen-Ming Lee
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • missing data
  • high-dimensional data
  • ultrahigh-dimensional data
  • response-dependent data
  • survival data
  • longitudinal data
  • censored data
  • time series data
  • machine learning
  • meta learning
  • deep learning
  • online learning
  • reinforcement learning
  • knowledge graph
  • distributed inference
  • Bayesian inference
  • variational Bayesian learning
  • supervised learning
  • semi-supervised learning
  • resampling-based inference
  • estimation
  • variable selection
  • hypothesis test
  • confidence interval
  • computation
  • optimization algorithm
  • data clustering
  • data classification
  • data mining

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

14 pages, 318 KiB  
Article
Generalized Linear Models with Covariate Measurement Error and Zero-Inflated Surrogates
by Ching-Yun Wang, Jean de Dieu Tapsoba, Catherine Duggan and Anne McTiernan
Mathematics 2024, 12(2), 309; https://doi.org/10.3390/math12020309 - 17 Jan 2024
Viewed by 618
Abstract
Epidemiological studies often encounter a challenge due to exposure measurement error when estimating an exposure–disease association. A surrogate variable may be available for the true unobserved exposure variable. However, zero-inflated data are encountered frequently in the surrogate variables. For example, many nutrient or [...] Read more.
Epidemiological studies often encounter a challenge due to exposure measurement error when estimating an exposure–disease association. A surrogate variable may be available for the true unobserved exposure variable. However, zero-inflated data are encountered frequently in the surrogate variables. For example, many nutrient or physical activity measures may have a zero value (or a low detectable value) among a group of individuals. In this paper, we investigate regression analysis when the observed surrogates may have zero values among some individuals of the whole study cohort. A naive regression calibration without taking into account a probability mass of the surrogate variable at 0 (or a low detectable value) will be biased. We developed a regression calibration estimator which typically can have smaller biases than the naive regression calibration estimator. We propose an expected estimating equation estimator which is consistent under the zero-inflated surrogate regression model. Extensive simulations show that the proposed estimator performs well in terms of bias correction. These methods are applied to a physical activity intervention study. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Show Figures

Figure 1

16 pages, 1634 KiB  
Article
DINA Model with Entropy Penalization
by Juntao Wang and Yuan Li
Mathematics 2023, 11(18), 3993; https://doi.org/10.3390/math11183993 - 20 Sep 2023
Viewed by 634
Abstract
The cognitive diagnosis model (CDM) is an effective statistical tool for extracting the discrete attributes of individuals based on their responses to diagnostic tests. When dealing with cases that involve small sample sizes or highly correlated attributes, not all attribute profiles may be [...] Read more.
The cognitive diagnosis model (CDM) is an effective statistical tool for extracting the discrete attributes of individuals based on their responses to diagnostic tests. When dealing with cases that involve small sample sizes or highly correlated attributes, not all attribute profiles may be present. The standard method, which accounts for all attribute profiles, not only increases the complexity of the model but also complicates the calculation. Thus, it is important to identify the empty attribute profiles. This paper proposes an entropy-penalized likelihood method to eliminate the empty attribute profiles. In addition, the relation between attribute profiles and the parameter space of item parameters is discussed, and two modified expectation–maximization (EM) algorithms are designed to estimate the model parameters. Simulations are conducted to demonstrate the performance of the proposed method, and a real data application based on the fraction–subtraction data is presented to showcase the practical implications of the proposed method. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Show Figures

Figure 1

23 pages, 668 KiB  
Article
Ordinal Time Series Analysis with the R Package otsfeatures
by Ángel López-Oriona and José A. Vilar
Mathematics 2023, 11(11), 2565; https://doi.org/10.3390/math11112565 - 03 Jun 2023
Cited by 2 | Viewed by 1272
Abstract
The 21st century has witnessed a growing interest in the analysis of time series data. While most of the literature on the topic deals with real-valued time series, ordinal time series have typically received much less attention. However, the development of specific analytical [...] Read more.
The 21st century has witnessed a growing interest in the analysis of time series data. While most of the literature on the topic deals with real-valued time series, ordinal time series have typically received much less attention. However, the development of specific analytical tools for the latter objects has substantially increased in recent years. The R package otsfeatures attempts to provide a set of simple functions for analyzing ordinal time series. In particular, several commands allowing the extraction of well-known statistical features and the execution of inferential tasks are available for the user. The output of several functions can be employed to perform traditional machine learning tasks including clustering, classification, or outlier detection. otsfeatures also incorporates two datasets of financial time series which were used in the literature for clustering purposes, as well as three interesting synthetic databases. The main properties of the package are described and its use is illustrated through several examples. Researchers from a broad variety of disciplines could benefit from the powerful tools provided by otsfeatures. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Show Figures

Figure 1

21 pages, 743 KiB  
Article
Optimal Model Averaging Estimation for the Varying-Coefficient Partially Linear Models with Missing Responses
by Jie Zeng, Weihu Cheng and Guozhi Hu
Mathematics 2023, 11(8), 1883; https://doi.org/10.3390/math11081883 - 16 Apr 2023
Viewed by 1180
Abstract
In this paper, we propose a model averaging estimation for the varying-coefficient partially linear models with missing responses. Within this context, we construct a HRCp weight choice criterion that exhibits asymptotic optimality under certain assumptions. Our model averaging procedure can simultaneously [...] Read more.
In this paper, we propose a model averaging estimation for the varying-coefficient partially linear models with missing responses. Within this context, we construct a HRCp weight choice criterion that exhibits asymptotic optimality under certain assumptions. Our model averaging procedure can simultaneously address the uncertainty on which covariates to include and the uncertainty on whether a covariate should enter the linear or nonlinear component of the model. The simulation results in comparison with some related strategies strongly favor our proposal. A real dataset is analyzed to illustrate the practical application as well. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Show Figures

Figure 1

12 pages, 251 KiB  
Article
Logistic Regression Based on Individual-Level Predictors and Aggregate-Level Responses
by Zheng Xu
Mathematics 2023, 11(3), 746; https://doi.org/10.3390/math11030746 - 02 Feb 2023
Cited by 2 | Viewed by 1541
Abstract
We propose estimation methods to conduct logistic regression based on individual-level predictors and aggregate-level responses. We derive the likelihood of logistic models in this situation and proposed estimators with different optimization methods. Simulation studies have been conducted to evaluate and compare the performance [...] Read more.
We propose estimation methods to conduct logistic regression based on individual-level predictors and aggregate-level responses. We derive the likelihood of logistic models in this situation and proposed estimators with different optimization methods. Simulation studies have been conducted to evaluate and compare the performance of the different estimators. A real data-based study has been conducted to illustrate the use of our estimators and compare the different estimators. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
18 pages, 336 KiB  
Article
A Flexible Method for Diagnostic Accuracy with Biomarker Measurement Error
by Ching-Yun Wang and Ziding Feng
Mathematics 2023, 11(3), 549; https://doi.org/10.3390/math11030549 - 19 Jan 2023
Cited by 1 | Viewed by 1170
Abstract
Diagnostic biomarkers are often measured with errors due to imperfect lab conditions or analytic variability of the assay. The ability of a diagnostic biomarker to discriminate between cases and controls is often measured by the area under the receiver operating characteristic curve (AUC), [...] Read more.
Diagnostic biomarkers are often measured with errors due to imperfect lab conditions or analytic variability of the assay. The ability of a diagnostic biomarker to discriminate between cases and controls is often measured by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, among others. Ignoring measurement error can cause biased estimation of a diagnostic accuracy measure, which results in misleading interpretation of the efficacy of a diagnostic biomarker. Existing assays available are either research grade or clinical grade. Research assays are cost effective, often multiplex, but they may be associated with moderate measurement errors leading to poorer diagnostic performance. In comparison, clinical assays may provide better diagnostic ability, but with higher cost since they are usually developed by industry. Correction for attenuation methods are often valid when biomarkers are from a normal distribution, but may be biased with skewed biomarkers. In this paper, we develop a flexible method based on skew–normal biomarker distributions to correct for bias in estimating diagnostic performance measures including AUC, sensitivity, and specificity. Finite sample performance of the proposed method is examined via extensive simulation studies. The methods are applied to a pancreatic cancer biomarker study. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Show Figures

Figure 1

16 pages, 277 KiB  
Article
A Proposed Simulation Technique for Population Stability Testing in Credit Risk Scorecards
by Johan du Pisanie, James Samuel Allison and Jaco Visagie
Mathematics 2023, 11(2), 492; https://doi.org/10.3390/math11020492 - 16 Jan 2023
Cited by 2 | Viewed by 1564
Abstract
Credit risk scorecards are logistic regression models, fitted to large and complex data sets, employed by the financial industry to model the probability of default of potential customers. In order to ensure that a scorecard remains a representative model of the population, one [...] Read more.
Credit risk scorecards are logistic regression models, fitted to large and complex data sets, employed by the financial industry to model the probability of default of potential customers. In order to ensure that a scorecard remains a representative model of the population, one tests the hypothesis of population stability; specifying that the distribution of customers’ attributes remains constant over time. Simulating realistic data sets for this purpose is nontrivial, as these data sets are multivariate and contain intricate dependencies. The simulation of these data sets are of practical interest for both practitioners and for researchers; practitioners may wish to consider the effect that a specified change in the properties of the data has on the scorecard and its usefulness from a business perspective, while researchers may wish to test a newly developed technique in credit scoring. We propose a simulation technique based on the specification of bad ratios, this is explained below. Practitioners can generally not be expected to provide realistic parameter values for a scorecard; these models are simply too complex and contain too many parameters to make such a specification viable. However, practitioners can often confidently specify the bad ratio associated with two different levels of a specific attribute. That is, practitioners are often comfortable with making statements such as “on average a new customer is 1.5 times as likely to default as an existing customer with similar attributes”. We propose a method which can be used to obtain parameter values for a scorecard based on specified bad ratios. The proposed technique is demonstrated using a realistic example, and we show that the simulated data sets adhere closely to the specified bad ratios. The paper provides a link to a Github project with the R code used to generate the results. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
21 pages, 361 KiB  
Article
Prediction of a Sensitive Feature under Indirect Questioning via Warner’s Randomized Response Technique and Latent Class Model
by Shen-Ming Lee, Phuoc-Loc Tran, Truong-Nhat Le and Chin-Shang Li
Mathematics 2023, 11(2), 345; https://doi.org/10.3390/math11020345 - 09 Jan 2023
Cited by 2 | Viewed by 1220
Abstract
We investigate the association of a sensitive characteristic or latent variable with observed binary random variables by the randomized response (RR) technique of Warner in his publication (Warner, S.L. J. Am. Stat. Assoc.1965, 60, 63–69) and a latent class model. [...] Read more.
We investigate the association of a sensitive characteristic or latent variable with observed binary random variables by the randomized response (RR) technique of Warner in his publication (Warner, S.L. J. Am. Stat. Assoc.1965, 60, 63–69) and a latent class model. First, an expectation-maximization (EM) algorithm is provided to easily estimate the parameters of the null and alternative/full models for the association between a sensitive characteristic and an observed categorical random variable under the RR design of Warner’s paper above. The likelihood ratio test (LRT) is utilized to identify observed categorical random variables that are significantly related to the sensitive trait. Another EM algorithm is then presented to estimate the parameters of a latent class model constructed through the sensitive attribute and the observed binary random variables that are obtained from dichotomizing observed categorical random variables selected from the above LRT. Finally, two classification criteria are conducted to predict an individual in the sensitive or non-sensitive group. The practicality of the proposed methodology is illustrated with an actual data set from a survey study of the sexuality of first-year students, except international students, at Feng Chia University in Taiwan in 2016. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
26 pages, 562 KiB  
Article
Non-Parametric Non-Inferiority Assessment in a Three-Arm Trial with Non-Ignorable Missing Data
by Wei Li, Yunqi Zhang and Niansheng Tang
Mathematics 2023, 11(1), 246; https://doi.org/10.3390/math11010246 - 03 Jan 2023
Viewed by 1383
Abstract
A three-arm non-inferiority trial including a placebo is usually utilized to assess the non-inferiority of an experimental treatment to a reference treatment. Existing methods for assessing non-inferiority mainly focus on the fully observed endpoints. However, in some clinical trials, treatment endpoints may be [...] Read more.
A three-arm non-inferiority trial including a placebo is usually utilized to assess the non-inferiority of an experimental treatment to a reference treatment. Existing methods for assessing non-inferiority mainly focus on the fully observed endpoints. However, in some clinical trials, treatment endpoints may be subject to missingness for various reasons, such as the refusal of subjects or their migration. To address this issue, this paper aims to develop a non-parametric approach to assess the non-inferiority of an experimental treatment to a reference treatment in a three-arm trial with non-ignorable missing endpoints. A logistic regression is adopted to specify a non-ignorable missingness data mechanism. A semi-parametric imputation method is proposed to estimate parameters in the considered logistic regression. Inverse probability weighting, augmented inverse probability weighting and non-parametric methods are developed to estimate treatment efficacy for known and unknown parameters in the considered logistic regression. Under some regularity conditions, we show asymptotic normality of the constructed estimators for treatment efficacy. A bootstrap resampling method is presented to estimate asymptotic variances of the estimated treatment efficacy. Three Wald-type statistics are constructed to test the non-inferiority based on the asymptotic properties of the estimated treatment efficacy. Empirical studies show that the proposed Wald-type test procedure is robust to the misspecified missingness data mechanism, and behaves better than the complete-case method in the sense that the type I error rates for the former are closer to the pre-given significance level than those for the latter. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Show Figures

Figure 1

13 pages, 274 KiB  
Article
Estimation Curve of Mixed Spline Truncated and Fourier Series Estimator for Geographically Weighted Nonparametric Regression
by Lilis Laome, I Nyoman Budiantara and Vita Ratnasari
Mathematics 2023, 11(1), 152; https://doi.org/10.3390/math11010152 - 28 Dec 2022
Cited by 1 | Viewed by 1137
Abstract
Geographically Weighted Regression (GWR) is the development of multiple linear regression models used in spatial data. The assumption of spatial heterogeneity results in each location having different characteristics and allows the relationships between the response variable and each predictor variable to be unknown, [...] Read more.
Geographically Weighted Regression (GWR) is the development of multiple linear regression models used in spatial data. The assumption of spatial heterogeneity results in each location having different characteristics and allows the relationships between the response variable and each predictor variable to be unknown, hence nonparametric regression becomes one of the alternatives that can be used. In addition, regression functions are not always the same between predictor variables. This study aims to use the Geographically Weighted Nonparametric Regression (GWNR) model with a mixed estimator of truncated spline and Fourier series. Both estimators are expected to overcome unknown data patterns in spatial data. The mixed GWNR model estimator is then determined using the Weighted Maximum Likelihood Estimator (WMLE) technique. The estimator’s characteristics are then determined. The results of the study found that the estimator of the mixed GWNR model is an estimator that is not biased and linear to the response variable y. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
25 pages, 5077 KiB  
Article
Application of an Empirical Best Linear Unbiased Prediction Fay–Herriot (EBLUP-FH) Multivariate Method with Cluster Information to Estimate Average Household Expenditure
by Armalia Desiyanti, Irlandia Ginanjar and Toni Toharudin
Mathematics 2023, 11(1), 135; https://doi.org/10.3390/math11010135 - 27 Dec 2022
Cited by 2 | Viewed by 1151
Abstract
Data at a smaller regional level has now become a necessity for local governments. The average data on household expenditure on food and non-food is designed for provincial and district/city estimation levels. Subdistrict-level statistics are not currently available. Small area estimation (SAE) is [...] Read more.
Data at a smaller regional level has now become a necessity for local governments. The average data on household expenditure on food and non-food is designed for provincial and district/city estimation levels. Subdistrict-level statistics are not currently available. Small area estimation (SAE) is one method to address the problem. The Empirical Best Linear Unbiased Prediction (EBLUP)—Fay Herriot Multivariate method estimates the average household expenditure on food and non-food at the sub-district level in Central Java Province in 2020. Meanwhile, for the sub-districts that are not sampled, the estimation of average household expenditure is done by adding cluster information to the EBLUP Multivariate modeling. The K-Medoids Cluster method is used to classify sub-districts based on their characteristics. Small area estimation using the EBLUP-FH Multivariate method can enhance the parameter estimations obtained using the direct estimation method because it results in a lower level of variation (RSE). For sub-districts that are not sampled, the Residual Standard Error (RSE) value from the estimated results using the EBLUP-FH Multivariate method with cluster information is lower than 25%, indicating that the estimate is accurate. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Show Figures

Figure 1

13 pages, 822 KiB  
Article
Privacy Protection Practice for Data Mining with Multiple Data Sources: An Example with Data Clustering
by Pauline O’Shaughnessy and Yan-Xia Lin
Mathematics 2022, 10(24), 4744; https://doi.org/10.3390/math10244744 - 14 Dec 2022
Cited by 2 | Viewed by 973
Abstract
In the age of data, data mining provides feasible tools with which to handle large datasets consisting of data from multiple sources. However, there is limited research on retrieving statistical information from data when data are confidential and cannot be shared directly. In [...] Read more.
In the age of data, data mining provides feasible tools with which to handle large datasets consisting of data from multiple sources. However, there is limited research on retrieving statistical information from data when data are confidential and cannot be shared directly. In this paper, we address this problem and propose a framework for performing data analysis using data from multiple sources without revealing true values for privacy purposes. The proposed framework includes three steps. First, data custodians individually mask data before publishing; then, the masked data collection is used to reconstruct the density function of the original dataset, from which resampled values are generated; last, existing data mining techniques are applied directly to the resampled data. This framework utilises the technique of reconstructing an original density function from noise-masked data using the moment-based density estimation method, which plays an essential role. Simulation studies show that the proposed framework performs well; analysis results from the resampled data are comparable to those of the original data when the density of the original data is estimated well. The proposed framework is demonstrated in data clustering analysis using the example of a real-life Australian soybean dataset. Results from the k-means algorithms with two and three fitted clusters are presented to show that cluster analysis using resampled data can well replicate that of the original data. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Show Figures

Figure 1

22 pages, 393 KiB  
Article
Model Selection for High Dimensional Nonparametric Additive Models via Ridge Estimation
by Haofeng Wang, Hongxia Jin, Xuejun Jiang and Jingzhi Li
Mathematics 2022, 10(23), 4551; https://doi.org/10.3390/math10234551 - 01 Dec 2022
Viewed by 1285
Abstract
In ultrahigh dimensional data analysis, to keep computational performance well and good statistical properties still working, nonparametric additive models face increasing challenges. To overcome them, we introduce a methodology of model selection for high dimensional nonparametric additive models. Our approach is to propose [...] Read more.
In ultrahigh dimensional data analysis, to keep computational performance well and good statistical properties still working, nonparametric additive models face increasing challenges. To overcome them, we introduce a methodology of model selection for high dimensional nonparametric additive models. Our approach is to propose a novel group screening procedure via nonparametric smoothing ridge estimation (GRIE) to find the importance of each covariate. It is then combined with the sure screening property of GRIE and the model selection property of extended Bayesian information criteria (EBIC) to select the suitable sub-models in nonparametric additive models. Theoretically, we establish the strong consistency of model selection for the proposed method. Extensive simulations and two real datasets illustrate the outstanding performance of the GRIE-EBIC method. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
23 pages, 2911 KiB  
Article
Intelligent Multi-Strategy Hybrid Fuzzy K-Nearest Neighbor Using Improved Hybrid Sine Cosine Algorithm
by Chengfeng Zheng, Mohd Shareduwan Mohd Kasihmuddin, Mohd. Asyraf Mansor, Ju Chen and Yueling Guo
Mathematics 2022, 10(18), 3368; https://doi.org/10.3390/math10183368 - 16 Sep 2022
Cited by 9 | Viewed by 1324
Abstract
The sine and cosine algorithm is a new simple and effective population optimization method proposed in recent years that has been studied in many works of literature. Based on the basic principle of the sine and cosine algorithm, this paper fully studies the [...] Read more.
The sine and cosine algorithm is a new simple and effective population optimization method proposed in recent years that has been studied in many works of literature. Based on the basic principle of the sine and cosine algorithm, this paper fully studies the main parameters affecting the performance of the sine and cosine algorithm, integrates the reverse learning algorithm, adds an elite opposition solution and forms the hybrid sine and cosine algorithm (hybrid SCA). Combined with the fuzzy k-nearest neighbor method and the hybrid SCA, this paper numerically simulates two-class datasets and multi-class datasets, obtains a large number of numerical results and analyzes the results. The hybrid SCA FKNN proposed in this paper has achieved good accuracy in classification and prediction results under 10 different types of data sets. Compared with SCA FKNN, LSCA FKNN, BA FKNN, PSO FKNN and SSA FKNN, the prediction accuracy is significantly improved. In the Wilcoxon signed rank test with SCA FKNN and LSCA FKNN, the zero hypothesis (significance level 0.05) is rejected and the two classifiers have a significantly different accuracy. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Show Figures

Figure 1

Review

Jump to: Research

26 pages, 496 KiB  
Review
Randomized Response Techniques: A Systematic Review from the Pioneering Work of Warner (1965) to the Present
by Truong-Nhat Le, Shen-Ming Lee, Phuoc-Loc Tran and Chin-Shang Li
Mathematics 2023, 11(7), 1718; https://doi.org/10.3390/math11071718 - 03 Apr 2023
Cited by 1 | Viewed by 3881
Abstract
The randomized response technique is one of the most commonly used indirect questioning methods to collect data on sensitive characteristics in survey research covering a wide variety of statistical applications including, e.g., behavioral science, socio-economic, psychological, epidemiology, biomedical, and public health research disciplines. [...] Read more.
The randomized response technique is one of the most commonly used indirect questioning methods to collect data on sensitive characteristics in survey research covering a wide variety of statistical applications including, e.g., behavioral science, socio-economic, psychological, epidemiology, biomedical, and public health research disciplines. After nearly six decades since the technique was invented, many improvements of the randomized response techniques have appeared in the literature. This work provides several different aspects of improvements of the original randomized response work of Warner, as well as statistical methods used in the RR problems. Full article
(This article belongs to the Special Issue Statistical Methods in Data Science and Applications)
Back to TopTop