AI in Statistical Data Analysis

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (25 December 2023) | Viewed by 5886

Special Issue Editors


E-Mail Website
Guest Editor
School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Interests: sensor; security; IoT; privacy
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science, Lakehead University, 955 Oliver Road, Thunder Bay, ON P7B 5E1, Canada
Interests: thick data analytics; web mining; learning analytics; social networking; web services; interoperability; software agility development
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are loaded with a variety of tools and systems used to collect, store and share social management data including public health data, social media, education data, economy and business data, and healthcare and welfare data. Trillions of gigabytes of social management data can be analyzed every second. The collection of data in social management systems has become more streamlined in recent years. Not only does the data help improve day-to-day operations and better people care, but it can also now be better used in predictive modeling. Instead of just looking at historical information or current information, we can use both datasets to track trends and make predictions. We are now able to take preventive measures and track the outcomes. However, data available on public portals and websites exist in different formats and have different metadata when used for social management system surveillance. Data governance of social management systems can rescue to solve this problem by using data management for social system data to understand the intricacies of social system data and to provide tools to accurately examine, analyze and test social datasets well as to assist in both informing and advancing positive social management outcomes within our society. Analyzing and using social management data are not limited to specific countries or regions.

This Special Issue aims to consolidate recent advances in data analytics for a social management system, research in theory, and applications. Pilot studies in analytics-enabled social management are especially welcome. All topics related to social management data analytics are welcome, especially:

  • Artificial intelligence in data analytics;
  • Technical advances for social management data analytics;
  • AI and Big-data for social management data analytics;
  • Test and evaluation of social management data analytics;
  • Metaverse for social management data analytics;
  • Application of data analytics in public informatics;
  • Powerful data analytics and machine learning tools;
  • Methods and techniques in data collecting;
  • Assessment and evaluation of data analytics.

Prof. Dr. Tai-hoon Kim
Prof. Dr. Jinan Fiaidhi
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • machine learning
  • big data
  • data
  • social management data
  • management
  • statistical analysis

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 864 KiB  
Article
Neural Multivariate Grey Model and Its Applications
by Qianyang Li and Xingjun Zhang
Appl. Sci. 2024, 14(3), 1219; https://doi.org/10.3390/app14031219 - 31 Jan 2024
Viewed by 529
Abstract
For time series forecasting, multivariate grey models are excellent at handling incomplete or vague information. The GM(1, N) model represents this group of models and has been widely used in various fields. However, constructing a meaningful GM(1, N) model is challenging due to [...] Read more.
For time series forecasting, multivariate grey models are excellent at handling incomplete or vague information. The GM(1, N) model represents this group of models and has been widely used in various fields. However, constructing a meaningful GM(1, N) model is challenging due to its more complex structure compared to the construction of the univariate grey model GM(1, 1). Typically, fitting and prediction errors of GM(1, N) are not ideal in practical applications, which limits the application of the model. This study presents the neural ordinary differential equation multivariate grey model (NMGM), a new multivariate grey model that aims to enhance the precision of multivariate grey models. NMGM employs a novel whitening equation with neural ordinary differential equations, showcasing higher predictive accuracy and broader applicability than previous models. It can more effectively learn features from various data samples. In experimental validation, our novel model is first used to predict China’s per capita energy consumption, and it performed best in both the test and validation sets, with mean absolute percentage errors (MAPEs) of 0.2537% and 0.7381%, respectively. The optimal results for the compared models are 0.5298% and 1.106%. Then, our model predicts China’s total renewable energy with lower mean absolute percentage errors (MAPEs) of 0.9566% and 0.7896% for the test and validation sets, respectively. The leading outcomes for the competing models are 1.0188% and 1.1493%. The outcomes demonstrate that this novel model exhibits a higher performance than other models. Full article
(This article belongs to the Special Issue AI in Statistical Data Analysis)
Show Figures

Figure 1

19 pages, 2086 KiB  
Article
Enhancing Privacy in Large Language Model with Homomorphic Encryption and Sparse Attention
by Lexin Zhang, Changxiang Li, Qi Hu, Jingjing Lang, Sirui Huang, Linyue Hu, Jingwen Leng, Qiuhan Chen and Chunli Lv
Appl. Sci. 2023, 13(24), 13146; https://doi.org/10.3390/app132413146 - 11 Dec 2023
Viewed by 1523
Abstract
In response to the challenges of personal privacy protection in the dialogue models of the information era, this study introduces an innovative privacy-preserving dialogue model framework. This framework seamlessly incorporates Fully Homomorphic Encryption (FHE) technology with dynamic sparse attention (DSA) mechanisms, aiming to [...] Read more.
In response to the challenges of personal privacy protection in the dialogue models of the information era, this study introduces an innovative privacy-preserving dialogue model framework. This framework seamlessly incorporates Fully Homomorphic Encryption (FHE) technology with dynamic sparse attention (DSA) mechanisms, aiming to enhance the response efficiency and accuracy of dialogue systems without compromising user privacy. Experimental comparative analyses have confirmed the advantages of the proposed framework in terms of precision, recall, accuracy, and latency, with values of 0.92, 0.91, 0.92, and 15 ms, respectively. In particular, the newly proposed DSA module, while ensuring data security, significantly improves performance by up to 100 times compared to traditional multi-head attention mechanisms. Full article
(This article belongs to the Special Issue AI in Statistical Data Analysis)
Show Figures

Figure 1

18 pages, 1250 KiB  
Article
A Simulation-Based Testing of Difference in the Means of Gamma-Distributed Positive Quantities
by Filip Tošenovský
Appl. Sci. 2023, 13(17), 9497; https://doi.org/10.3390/app13179497 - 22 Aug 2023
Viewed by 555
Abstract
This paper presents a simulation-based testing procedure that can be easily applied by practitioners who try to determine whether two gamma-distributed variables have the same expected values. From both theoretical and practical points of view, the gamma distribution and the testing in question [...] Read more.
This paper presents a simulation-based testing procedure that can be easily applied by practitioners who try to determine whether two gamma-distributed variables have the same expected values. From both theoretical and practical points of view, the gamma distribution and the testing in question have been of interest for some time given the many applications they can be used for, which include problems in the fields of economics, industrial statistics, life sciences, and others. The efforts to achieve the stated statistical objective have been focused throughout the years either on performing nontrivial, approximating mathematical steps or on simulations based on resampling techniques of various kinds. This text works with simulations that try to get closer to the true distributions of the quantities of interest so that a test can be designed rather than using samples generated out of samples, as the resampling techniques perform this by taking the initial samples for an approximation of the populations. The results presented in this text were validated, and they were also compared to other methods where possible. The resulting technique was looked upon as a complement to all the techniques that have been presented on this subject. The major advantage of the proposed procedure is seen in its simplicity. Since simulations are the basis for the presented conclusions, the results are unsurprisingly not as general as what could be achieved by exact mathematical deduction, but they do cover a reasonable range of situations that can serve as a basis on which to analogously build further research if desired. Full article
(This article belongs to the Special Issue AI in Statistical Data Analysis)
Show Figures

Figure 1

13 pages, 1615 KiB  
Article
Identifying the Key Hazards behind Website Drop-Offs by Solving a Survival Problem
by Judah Soobramoney, Retius Chifurira, Knowledge Chinhamu and Temesgen Zewotir
Appl. Sci. 2023, 13(14), 8248; https://doi.org/10.3390/app13148248 - 16 Jul 2023
Viewed by 656
Abstract
Within the modern era, corporates are compelled to own an appealing and effective website to survive and thrive within the competitive global digital marketplace. Whilst there are several web metrics to focus on, a key focus area of web analytics is the level [...] Read more.
Within the modern era, corporates are compelled to own an appealing and effective website to survive and thrive within the competitive global digital marketplace. Whilst there are several web metrics to focus on, a key focus area of web analytics is the level of drop-offs. The drop-off rate represents the proportion of visitors that prematurely drop-off a website. Whilst the exact reason behind the drop-off may only be assumed (could be due to the loss of Internet connectivity or dis-interest), this study attempted to identify the triggers behind website drop-offs through a survival problem. Each person entering the website, at a given instance, can view any number of web pages (such as home, contact us, about us, etc.). However, on the studied website, roughly one in five visitors have prematurely dropped-off. The study was conducted on an engineering corporate website with the data collected via the Google Analytics tracking tool. The aim was to determine the key hazards that contributed to the observed drop-off rate through the use of a cox proportional hazard model and a survival random forest model. On the studied website, based on empirical evidence, the online visitors were censored so that those who viewed three or more webpages within the visit were labelled as ‘survived’. Visitors who viewed two or less webpages before leaving the website were labelled as ‘did not survive’. Thereby, the ‘did not survive’ observations represented the visits that prematurely dropped off the website. Using the visitor’s physical and behavioral characteristics, as tracked by Google Analytics, the cox-proportional hazard and survival random forest models were employed to determine the hazards that influence survival. Visitor’s physical characteristics include the device used to access the website, geolocation at the time of the visit, number of previous visits, etc., whilst the behavioral characteristics include the landing page on website, level of engagement, whether entry into the website originated through an organic search or not. Whilst both models have identified similar features as being key hazards, the survival random forest model has been shown to out-perform on the non-linear features relative to the cox proportional hazard model and obtained a higher classification accuracy. During the validation process, the survival random forest model (63%) outperformed the cox model (58%) on classification accuracy. The features that were identified as hazardous indicated that some webpages needed further attention, the visitor’s level of engagement with the website (the degree of scrolling and clicks), the distance between a visitor’s location and the studied corporate’s location, the historic frequency of visiting the website, and if the website entry point was through an organic search. Whilst the study of drop-offs has been a commonly researched problem, this study details the investigation of key hazards through the use of survival models and compares the outcomes of a regression-based model to a machine learning survival model. Full article
(This article belongs to the Special Issue AI in Statistical Data Analysis)
Show Figures

Figure 1

15 pages, 2000 KiB  
Article
Pilot Study on Exercise Performance Level and Physiological Response According to Rest Intervals between Sets during 65% 1RM Bench Press Exercise
by Chul Yoon and Byung-Min Kim
Appl. Sci. 2023, 13(13), 7850; https://doi.org/10.3390/app13137850 - 04 Jul 2023
Viewed by 1011
Abstract
The purpose of this study was a pilot study to determine the performance level and physiologic responses (heart rate and heart rate recovery (%)) of six different rest interval conditions during the performance of seven sets of a 65% 1RM bench press exercise. [...] Read more.
The purpose of this study was a pilot study to determine the performance level and physiologic responses (heart rate and heart rate recovery (%)) of six different rest interval conditions during the performance of seven sets of a 65% 1RM bench press exercise. Eight healthy male university students who were 20 years of age and enrolled at University C were tested. The subjects’ bench press 1RM was measured before the experiment, and they performed bench press exercises with six different rest intervals (30 s, 1 min, 2 min, 3 min, 4 min, and 5 min), which were randomized and crossed over. The experimental measurements were performed once a week and repeated six times per rest interval condition (six intervals) to minimize the learning effect for the subjects. A two-way repeated measures ANOVA was used to verify the data, post-comparison (contrast: repeat) was used to establish statistical significance, and the following results were obtained. First, the level of exercise performance (reps) between sets across the six rest interval conditions showed significant differences (p < 0.000) and high effect sizes (ES ≥ 0.70) across the rest interval conditions. In addition, more reps (in terms of volume) were performed in the relatively longer rest interval conditions. The number of reps over the progression of the sets also showed a significant difference (p < 0.000) for the shorter rest interval condition, with a high effect size (ES ≥ 0.64). There was also an interaction effect (p < 0.000) between the rest interval condition and the set, with the number of repetitions at the beginning of the set decreasing significantly as the set progressed for the relatively short rest interval condition, with a high effect size (ES ≥ 0.60). Second, there was no statistically significant difference in after-exercise heart rate among the rest interval conditions between sets, but the longer rest interval conditions of 4 and 5 min showed a significant difference (p < 0.005) as the set progressed, with a high effect size (ES ≥ 0.41). In each of the six rest interval conditions, heart rate levels were similar in sets 1 and 2 but increased from set 3 to set 7. Immediately after each bout of exercise, the resting heart rate according to rest interval condition was statistically highest in the shorter rest intervals (30 s, 1 min), with a high effect size (p < 0.020) and a high ES ≥ 0.39. Heart rate was also higher in the 2, 3, 4, and 5-min rest intervals, and increased significantly (p < 0.000) as the sets progressed, with a high effect size. Third, heart rate recovery (%) according to the rest interval condition between sets was significantly higher in the longer rest interval conditions (1, 2, 3, 4, and 5 min) than in the 30 s rest interval condition (p < 0.039), with a high effect size (ES ≥ 0.37). In addition, heart rate recovery in all rest interval conditions significantly decreased as the sets progressed (p < 0.05), with a high effect size (ES ≥ 0.37). Taken together, there were significant differences in performance levels (reps), physiological responses, and recovery between rest interval conditions during the equal-intensity resistance exercises in this study. Furthermore, the performance levels between rest interval conditions during the 65% 1RM bench press exercise in this study suggest that rest intervals of 2–3 min may be effective for improving muscular endurance, while rest intervals of 4–5 min may be effective for improving muscle hypertrophy. This suggests that manipulating the rest intervals between sets during resistance training at the same intensity may lead to better training outcomes. Full article
(This article belongs to the Special Issue AI in Statistical Data Analysis)
Show Figures

Figure 1

12 pages, 1289 KiB  
Article
Applying the DEMATEL−ANP Fuzzy Comprehensive Model to Evaluate Public Opinion Events
by Hua Wang, Ling Luo and Tao Liu
Appl. Sci. 2023, 13(9), 5737; https://doi.org/10.3390/app13095737 - 06 May 2023
Cited by 1 | Viewed by 940
Abstract
Network public opinion is a mirror reflecting people’s will, and evaluating its urgency can help to find hidden social crises. Research on public opinion in the field of machine learning usually focuses on micro-sentiment judgment, which is unable to offer support for the [...] Read more.
Network public opinion is a mirror reflecting people’s will, and evaluating its urgency can help to find hidden social crises. Research on public opinion in the field of machine learning usually focuses on micro-sentiment judgment, which is unable to offer support for the evaluation of public opinion events without additional data, and research from the perspective of artificial weighting has the disadvantage of the confusion of explanation. Judging the urgency of public opinion events is usually based on human perception, which is fuzzy and conforms to the attribute of fuzzy mathematics. Therefore, the index system in this paper was constructed in line with five principles, from which the weights were scientifically evaluated by integrating the DEMATEL and ANP model, and fuzzy mathematics was applied to determine the urgency level of public opinion. The result has three-fold significance. First, the index system constructed was more closely linked. Second, the integration of the DEMATEL and ANP weight calculating model took the interdependence of indicators fully into account. Third, fuzzy mathematics provided support for determining the public opinion crisis level, especially in the absence of immediate dissemination data. Full article
(This article belongs to the Special Issue AI in Statistical Data Analysis)
Show Figures

Figure 1

Back to TopTop