Analytics

12 pages, 1510 KiB

Open AccessArticle

Upgraded Thoth: Software for Data Visualization and Statistics

by Russ R. Laher, Frank J. Masci, Luisa M. Rebull, Steven D. Schurr, Wendy Burt, Anastasia Laity, Melanie Swain, David L. Shupe, Steve Groom, Benjamin Rusholme, Mih-Seh Kong, John C. Good, Varoujan Gorjian, Rachel Akeson, Benjamin J. Fulton, David R. Ciardi and Sean Carey

Analytics 2023, 2(1), 284-295; https://doi.org/10.3390/analytics2010015 - 16 Mar 2023

Viewed by 1503

Abstract

Thoth is a free desktop/laptop software application with a friendly graphical user interface that facilitates routine data-visualization and statistical-calculation tasks for astronomy and astrophysical research (and other fields where numbers are visualized). This software has been upgraded with many significant improvements and new [...] Read more.

Thoth is a free desktop/laptop software application with a friendly graphical user interface that facilitates routine data-visualization and statistical-calculation tasks for astronomy and astrophysical research (and other fields where numbers are visualized). This software has been upgraded with many significant improvements and new capabilities. The major upgrades consist of: (1) six new graph types, including 3D stacked-bar charts and 3D surface plots, made by the Orson 3D Charts library; (2) new saving and loading of graph settings; (3) a new batch-mode or command-line operation; (4) new graph-data annotation functions; (5) new options for data-file importation; and (6) a new built-in FITS-image viewer. There is now the requirement that Thoth be run under Java 1.8 or higher. Many other miscellaneous minor upgrades and bug fixes have also been made to Thoth. The newly implemented plotting options generally make possible graph construction and reuse with relative ease, without resorting to writing computer code. The illustrative astronomy case study of this paper demonstrates one of the many ways the software can be utilized. These new software features and refinements help make astronomers more efficient in their work of elucidating data. Full article

(This article belongs to the Special Issue Feature Papers in Analytics)

► Show Figures

Figure 1

19 pages, 331 KiB

Open AccessArticle

Application of Mixture Models for Doubly Inflated Count Data

by Monika Arora and N. Rao Chaganty

Analytics 2023, 2(1), 265-283; https://doi.org/10.3390/analytics2010014 - 11 Mar 2023

Viewed by 1196

Abstract

In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k [...] Read more.

In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences. Full article

19 pages, 3332 KiB

Open AccessArticle

A Voronoi-Based Semantically Balanced Dummy Generation Framework for Location Privacy

by Aditya Tadakaluru and Xiao Qin

Analytics 2023, 2(1), 246-264; https://doi.org/10.3390/analytics2010013 - 03 Mar 2023

Viewed by 1471

Abstract

Location-based services (LBS) require users to provide their current location for service delivery and customization. Location privacy protection addresses concerns associated with the potential mishandling of location information submitted to the LBS provider. Location accuracy has a direct impact on the quality of [...] Read more.

Location-based services (LBS) require users to provide their current location for service delivery and customization. Location privacy protection addresses concerns associated with the potential mishandling of location information submitted to the LBS provider. Location accuracy has a direct impact on the quality of service (QoS), where higher location accuracy results in better QoS. In general, the main goal of any location privacy technique is to achieve maximum QoS while providing minimum or no location information if possible, and using dummy locations is one such location privacy technique. In this paper, we introduced a temporal constraint attack whereby an adversary can exploit the temporal constraints associated with the semantic category of locations to eliminate dummy locations and identify the true location. We demonstrated how an adversary can devise a temporal constraint attack to breach the location privacy of a residential location. We addressed this major limitation of the current dummy approaches with a novel Voronoi-based semantically balanced framework (VSBDG) capable of generating dummy locations that can withstand a temporal constraint attack. Built based on real-world geospatial datasets, the VSBDG framework leverages spatial relationships and operations. Our results show a high physical dispersion cosine similarity of 0.988 between the semantic categories even with larger location set sizes. This indicates a strong and scalable semantic balance for each semantic category within the VSBDG’s output location set. The VSBDG algorithm is capable of producing location sets with high average minimum dispersion distance values of 5861.894 m for residential locations and 6258.046 m for POI locations. The findings demonstrate that the locations within each semantic category are scattered farther apart, entailing optimized location privacy. Full article

(This article belongs to the Special Issue Feature Papers in Analytics)

► Show Figures

Figure 1

21 pages, 370 KiB

Open AccessArticle

Survey of Distances between the Most Popular Distributions

by Mark Kelbert

Analytics 2023, 2(1), 225-245; https://doi.org/10.3390/analytics2010012 - 01 Mar 2023

Viewed by 1233

Abstract

We present a number of upper and lower bounds for the total variation distances between the most popular probability distributions. In particular, some estimates of the total variation distances in the cases of multivariate Gaussian distributions, Poisson distributions, binomial distributions, between a binomial [...] Read more.

We present a number of upper and lower bounds for the total variation distances between the most popular probability distributions. In particular, some estimates of the total variation distances in the cases of multivariate Gaussian distributions, Poisson distributions, binomial distributions, between a binomial and a Poisson distribution, and also in the case of negative binomial distributions are given. Next, the estimations of Lévy–Prohorov distance in terms of Wasserstein metrics are discussed, and Fréchet, Wasserstein and Hellinger distances for multivariate Gaussian distributions are evaluated. Some novel context-sensitive distances are introduced and a number of bounds mimicking the classical results from the information theory are proved. Full article

► Show Figures

Figure 1

27 pages, 31835 KiB

Open AccessArticle

Smart Multimedia Information Retrieval

by Stefan Wagenpfeil, Paul Mc Kevitt and Matthias Hemmje

Analytics 2023, 2(1), 198-224; https://doi.org/10.3390/analytics2010011 - 20 Feb 2023

Cited by 1 | Viewed by 1398

Abstract

The area of multimedia information retrieval (MMIR) faces two major challenges: the enormously growing number of multimedia objects (i.e., images, videos, audio, and text files), and the fast increasing level of detail of these objects (e.g., the number of pixels in images). Both [...] Read more.

The area of multimedia information retrieval (MMIR) faces two major challenges: the enormously growing number of multimedia objects (i.e., images, videos, audio, and text files), and the fast increasing level of detail of these objects (e.g., the number of pixels in images). Both challenges lead to a high demand of scalability, semantic representations, and explainability of MMIR processes. Smart MMIR solves these challenges by employing graph codes as an indexing structure, attaching semantic annotations for explainability, and employing application profiling for scaling, which results in human-understandable, expressive, and interoperable MMIR. The mathematical foundation, the modeling, implementation detail, and experimental results are shown in this paper, which confirm that Smart MMIR improves MMIR in the area of efficiency, effectiveness, and human understandability. Full article

► Show Figures

Figure 1

35 pages, 4883 KiB

Open AccessArticle

The SP Theory of Intelligence, and Its Realisation in the SP Computer Model, as a Foundation for the Development of Artificial General Intelligence

by J. Gerard Wolff

Analytics 2023, 2(1), 163-197; https://doi.org/10.3390/analytics2010010 - 17 Feb 2023

Viewed by 1549

Abstract

The theme of this paper is that the SP Theory of Intelligence (SPTI), and its realisation in the SP Computer Model, is a promising foundation for the development of artificial intelligence at the level of people or higher, also known as ‘artificial [...] Read more.

The theme of this paper is that the SP Theory of Intelligence (SPTI), and its realisation in the SP Computer Model, is a promising foundation for the development of artificial intelligence at the level of people or higher, also known as ‘artificial general intelligence’ (AGI). The SPTI, and alternatives to the SPTI chosen to be representative of potential foundations for the development of AGI, are considered and compared. A key principle in the SPTI and its development is the importance of information compression (IC) in human learning, perception, and cognition. More specifically, IC in the SPTI is achieved via the powerful concept of SP-multiple-alignment, the key to the versatility of the SPTI in diverse aspects of intelligence, and thus a favourable combination of Simplicity with descriptive and explanatory Power. Since there are many uncertainties between where we are now and, far into the future, anything that might qualify as an AGI, a multi-pronged attack on the problem is needed. The SPTI qualifies as the basis for one of those prongs. Although it will take time to achieve AGI, there is potential along the road for many useful benefits and applications of the research. Full article

► Show Figures

Figure 1

17 pages, 542 KiB

Open AccessArticle

Dynamic Skyline Computation with LSD Trees

by Dominik Köppl

Analytics 2023, 2(1), 146-162; https://doi.org/10.3390/analytics2010009 - 09 Feb 2023

Viewed by 1488

Abstract

Given a set of high-dimensional feature vectors

S \subset R^{n}

, the skyline or Pareto problem is to report the subset of vectors in S that are not dominated by any vector of S. Vectors closer to the origin are preferred: [...] Read more.

Given a set of high-dimensional feature vectors

S \subset R^{n}

, the skyline or Pareto problem is to report the subset of vectors in S that are not dominated by any vector of S. Vectors closer to the origin are preferred: we say a vector x is dominated by another distinct vector y if x is equally or further away from the origin than y with respect to all its dimensions. The dynamic skyline problem allows us to shift the origin, which changes the answer set. This problem is crucial for dynamic recommender systems where users can shift the parameters and thus shift the origin. For each origin shift, a recomputation of the answer set from scratch is time intensive. To tackle this problem, we propose a parallel algorithm for dynamic skyline computation that uses multiple local split decision (LSD) trees concurrently. The geometric nature of the LSD trees allows us to reuse previous results. Experiments show that our proposed algorithm works well if the dimension is small in relation to the number of tuples to process. Full article

(This article belongs to the Special Issue Feature Papers in Analytics)

► Show Figures

Figure 1

14 pages, 5640 KiB

Open AccessTechnical Note

Untangling Energy Consumption Dynamics with Renewable Energy Using Recurrent Neural Network

by Munshi Md Shafwat Yazdan, Shah Saki and Raaghul Kumar

Analytics 2023, 2(1), 132-145; https://doi.org/10.3390/analytics2010008 - 01 Feb 2023

Viewed by 1758

Abstract

The environmental issues we are currently facing require long-term prospective efforts for sustainable growth. Renewable energy sources seem to be one of the most practical and efficient alternatives in this regard. Understanding a nation’s pattern of energy use and renewable energy production is [...] Read more.

The environmental issues we are currently facing require long-term prospective efforts for sustainable growth. Renewable energy sources seem to be one of the most practical and efficient alternatives in this regard. Understanding a nation’s pattern of energy use and renewable energy production is crucial for developing strategic plans. No previous study has been performed to explore the dynamics of power consumption with the change in renewable energy production on a country-wide scale. In contrast, a number of deep learning algorithms have demonstrated acceptable performance while handling sequential data in the era of data-driven predictions. In this study, we developed a scheme to investigate and predict total power consumption and renewable energy production time series for eleven years of data using a recurrent neural network (RNN). The dynamics of the interaction between the total annual power consumption and renewable energy production were investigated through extensive exploratory data analysis (EDA) and a feature engineering framework. The performance of the model was found to be satisfactory through the comparison of the predicted data with the observed data, the visualization of the distribution of the errors and root mean squared error (RMSE), and the R² values of 0.084 and 0.82. Higher performance was achieved by increasing the number of epochs and hyperparameter tuning. The proposed framework has the potential to be used and transferred to investigate the trend of renewable energy production and power consumption and predict future scenarios for different communities. The incorporation of a cloud-based platform into the proposed pipeline to perform predictive studies from data acquisition to outcome generation may lead to real-time forecasting. Full article

► Show Figures

Figure 1

27 pages, 7073 KiB

Open AccessArticle

Theory-Guided Analytics Process: Using Theories to Underpin an Analytics Process for New Banking Product Development Using Segmentation-Based Marketing Analytics Leveraging on Marketing Intelligence

by Tristan Lim, Tao Pan, Chin Sin Ong, Shuaiwei Chen and Jie Jun Jeremy Chia

Analytics 2023, 2(1), 105-131; https://doi.org/10.3390/analytics2010007 - 01 Feb 2023

Viewed by 2419

Abstract

Retail banking is undergoing considerable product competitiveness and disruptions. New product development is necessary to tackle such challenges and reinvigorate product lines. This study presents an instrumental real-life banking case study, where marketing analytics was utilized to drive a product differentiation strategy. In [...] Read more.

Retail banking is undergoing considerable product competitiveness and disruptions. New product development is necessary to tackle such challenges and reinvigorate product lines. This study presents an instrumental real-life banking case study, where marketing analytics was utilized to drive a product differentiation strategy. In particular, the study applied unsupervised machine learning techniques of link analysis, latent class analysis, and association analysis to undertake behavioral-based market segmentation, in view of attaining a profitable competitive advantage. To underpin the product development process with well grounded theoretical framing, this study asked the research question: “How may we establish a theory-driven approach for an analytics-driven process?” Findings of this study include a theoretical conceptual framework that underpinned the end-to-end segmentation-driven new product development process, backed by the empirical literature. The study hopes to provide: (i) for managerial practitioners, the use of case-based reasoning for practice-oriented new product development design, planning, and diagnosis efforts, and (ii) for researchers, the potentiality to test of the validity and robustness of an analytical-driven NPD process. The study also hopes to drive a wider research interest that studies the theory-driven approach for analytics-driven processes. Full article

► Show Figures

Figure 1

28 pages, 194600 KiB

Open AccessArticle

MAFFN_YOLOv5: Multi-Scale Attention Feature Fusion Network on the YOLOv5 Model for the Health Detection of Coral-Reefs Using a Built-In Benchmark Dataset

by Sivamani Kalyana Sundara Rajan and Nedumaran Damodaran

Analytics 2023, 2(1), 77-104; https://doi.org/10.3390/analytics2010006 - 19 Jan 2023

Cited by 2 | Viewed by 2888

Abstract

Coral-reefs are a significant species in marine life, which are affected by multiple diseases due to the stress and variation in heat under the impact of the ocean. The autonomous monitoring and detection of coral health are crucial for researchers to protect it [...] Read more.

Coral-reefs are a significant species in marine life, which are affected by multiple diseases due to the stress and variation in heat under the impact of the ocean. The autonomous monitoring and detection of coral health are crucial for researchers to protect it at an early stage. The detection of coral diseases is a difficult task due to the inadequate coral-reef datasets. Therefore, we have developed a coral-reef benchmark dataset and proposed a Multi-scale Attention Feature Fusion Network (MAFFN) as a neck part of the YOLOv5’s network, called “MAFFN_YOLOv5”. The MAFFN_YOLOv5 model outperforms the state-of-the-art object detectors, such as YOLOv5, YOLOX, and YOLOR, by improving the detection accuracy to 8.64%, 3.78%, and 18.05%, respectively, based on the mean average precision (mAP@.5), and 7.8%, 3.72%, and 17.87%, respectively, based on the mAP@.5:.95. Consequently, we have tested a hardware-based deep neural network for the detection of coral-reef health. Full article

► Show Figures

Figure 1

2 pages, 143 KiB

Open AccessEditorial

Acknowledgment to the Reviewers of Analytics in 2022

by Analytics Editorial Office

Analytics 2023, 2(1), 75-76; https://doi.org/10.3390/analytics2010005 - 18 Jan 2023

Viewed by 768

Abstract

High-quality academic publishing is built on rigorous peer review [...] Full article

20 pages, 2518 KiB

Open AccessReview

A Brief Survey of Methods for Analytics over RDF Knowledge Graphs

by Maria-Evangelia Papadaki, Yannis Tzitzikas and Michalis Mountantonakis

Analytics 2023, 2(1), 55-74; https://doi.org/10.3390/analytics2010004 - 17 Jan 2023

Cited by 2 | Viewed by 2340

Abstract

There are several Knowledge Graphs expressed in RDF (Resource Description Framework) that aggregate/integrate data from various sources for providing unified access services and enabling insightful analytics. We observe this trend in almost every domain of our life. However, the provision of effective, efficient, [...] Read more.

There are several Knowledge Graphs expressed in RDF (Resource Description Framework) that aggregate/integrate data from various sources for providing unified access services and enabling insightful analytics. We observe this trend in almost every domain of our life. However, the provision of effective, efficient, and user-friendly analytic services and systems is quite challenging. In this paper we survey the approaches, systems and tools that enable the formulation of analytic queries over KGs expressed in RDF. We identify the main challenges, we distinguish two main categories of analytic queries (domain specific and quality-related), and five kinds of approaches for analytics over RDF. Then, we describe in brief the works of each category and related aspects, like efficiency and visualization. We hope this collection to be useful for researchers and engineers for advancing the capabilities and user-friendliness of methods for analytics over knowledge graphs. Full article

(This article belongs to the Special Issue Feature Papers in Analytics)

► Show Figures

Figure 1

24 pages, 1185 KiB

Open AccessArticle

Theoretical Contributions to Three Generalized Versions of the Celebioglu–Cuadras Copula

by Christophe Chesneau

Analytics 2023, 2(1), 31-54; https://doi.org/10.3390/analytics2010003 - 10 Jan 2023

Cited by 8 | Viewed by 1199

Abstract

Copulas are probabilistic functions that are being used more and more frequently to describe, examine, and model the interdependence of continuous random variables. Among the numerous proposed copulas, renewed interest has recently been shown in the so-called Celebioglu–Cuadras copula. It is mainly because [...] Read more.

Copulas are probabilistic functions that are being used more and more frequently to describe, examine, and model the interdependence of continuous random variables. Among the numerous proposed copulas, renewed interest has recently been shown in the so-called Celebioglu–Cuadras copula. It is mainly because of its simplicity, exploitable dependence properties, and potential for applicability. In this article, we contribute to the development of this copula by proposing three generalized versions of it, each involving three tuning parameters. The main results are theoretical: they consist of determining wide and manageable intervals of admissible values for the involved parameters. The proofs are mainly based on limit, differentiation, and factorization techniques as well as mathematical inequalities. Some of the configuration parameters are new in the literature, and original phenomena are revealed. Subsequently, the basic properties of the proposed copulas are studied, such as symmetry, quadrant dependence, various expansions, concordance ordering, tail dependences, medial correlation, and Spearman correlation. Detailed examples, numerical tables, and graphics are used to support the theory. Full article

► Show Figures

Figure 1

14 pages, 409 KiB

Open AccessArticle

A Parallel Implementation of the Differential Evolution Method

by Vasileios Charilogis and Ioannis G. Tsoulos

Analytics 2023, 2(1), 17-30; https://doi.org/10.3390/analytics2010002 - 09 Jan 2023

Cited by 2 | Viewed by 2501

Abstract

Global optimization is a widely used technique that finds application in many sciences such as physics, economics, medicine, etc., and with many extensions, for example, in the area of machine learning. However, in many cases, global minimization techniques require a high computational time [...] Read more.

Global optimization is a widely used technique that finds application in many sciences such as physics, economics, medicine, etc., and with many extensions, for example, in the area of machine learning. However, in many cases, global minimization techniques require a high computational time and, for this reason, parallel computational approaches should be used. In this paper, a new parallel global optimization technique based on the differential evolutionary method is proposed. This new technique uses a series of independent parallel computing units that periodically exchange the best solutions they have found. Additionally, a new termination rule is proposed here that exploits parallelism to accelerate process termination in a timely and valid manner. The new method is applied to a number of problems in the established literature and the results are quite promising. Full article

► Show Figures

Figure 1

16 pages, 671 KiB

Open AccessReview

A Review of the Vehicle Routing Problem and the Current Routing Services in Smart Cities

by Eleni Boumpa, Vasileios Tsoukas, Vasileios Chioktour, Maria Kalafati, Georgios Spathoulas, Athanasios Kakarountas, Panagiotis Trivellas, Panagiotis Reklitis and George Malindretos

Analytics 2023, 2(1), 1-16; https://doi.org/10.3390/analytics2010001 - 22 Dec 2022

Cited by 1 | Viewed by 3078

Abstract

In this survey, the issues of urban routing are analyzed, and critical considerations for smart and cost-effective delivery services are highlighted. Smart cities require intelligent services and solutions to address their routing issues. This article gives a brief description of current services that [...] Read more.

In this survey, the issues of urban routing are analyzed, and critical considerations for smart and cost-effective delivery services are highlighted. Smart cities require intelligent services and solutions to address their routing issues. This article gives a brief description of current services that either apply classical methods or services that employ machine learning approaches. Furthermore, a comparison of the most promising research options in regard to VRP is provided. Finally, an initial design of a holistic scheme that would optimally combine several tools and approaches to serve the needs of different users with regard to the VRP is presented. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Analytics, Volume 2, Issue 1 (March 2023) – 15 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI