Advances in Applications of Intelligently Mining Massive Data

A special issue of Technologies (ISSN 2227-7080). This special issue belongs to the section "Information and Communication Technologies".

Deadline for manuscript submissions: closed (31 July 2023) | Viewed by 8229

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA
Interests: database design; big data architecture; big data analytics; machine learning and data mining

Special Issue Information

Dear Colleagues,

Today’s highly digitized and interconnected world generates almost unmanageable amounts data – scientific, medical, financial, as well as other kinds of data. This explosive growth in data has generated an urgent need for intelligent data mining techniques and tools that can assist in capturing, storing, and analyzing these data and transforming this massive amount of data into useful information and knowledge. Data mining as well as machine learning techniques have to be applied in a fresh light to analyze (summarize, classify, discover new trends and anomalies) these massive datasets. This Special Issue aims to report on the recent advances and new trends being applied to intelligently mining massive datasets in any field, and welcomes original research, short communications, and review papers. 

Topics of interest include, but are not limited to: 

  • Data Capture and Storage
  • Data Preprocessing
  • Dimensionality Reduction
  • Data Visualization
  • Mining Frequent Patterns
  • Mining Data Streams, Time Series, and Sequence Data
  • Mining Social Networking Graphs
  • Multirelational Data Mining
  • Large-Scale Machine Learning
  • Large-Scale Deep Learning
  • Big Data Technologies for Mining Massive Data
  • Architectures for Large-Scale Parallel Processing
  • Intelligent Data Mining Tools and Techniques
  • Data Mining, Privacy, and Data Security

Dr. Sikha Bagui
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Technologies is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data mining
  • data science
  • machine learning
  • deep learning
  • intelligent data mining tools
  • large scale parallel processing

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

15 pages, 403 KiB  
Article
Fast and Efficient Entropy Coding Architectures for Massive Data Compression
by Francesc Auli-Llinas
Technologies 2023, 11(5), 132; https://doi.org/10.3390/technologies11050132 - 26 Sep 2023
Viewed by 1651
Abstract
The compression of data is fundamental to alleviating the costs of transmitting and storing massive datasets employed in myriad fields of our society. Most compression systems employ an entropy coder in their coding pipeline to remove the redundancy of coded symbols. The entropy-coding [...] Read more.
The compression of data is fundamental to alleviating the costs of transmitting and storing massive datasets employed in myriad fields of our society. Most compression systems employ an entropy coder in their coding pipeline to remove the redundancy of coded symbols. The entropy-coding stage needs to be efficient, to yield high compression ratios, and fast, to process large amounts of data rapidly. Despite their widespread use, entropy coders are commonly assessed for some particular scenario or coding system. This work provides a general framework to assess and optimize different entropy coders. First, the paper describes three main families of entropy coders, namely those based on variable-to-variable length codes (V2VLC), arithmetic coding (AC), and tabled asymmetric numeral systems (tANS). Then, a low-complexity architecture for the most representative coder(s) of each family is presented—more precisely, a general version of V2VLC, the MQ, M, and a fixed-length version of AC and two different implementations of tANS. These coders are evaluated under different coding conditions in terms of compression efficiency and computational throughput. The results obtained suggest that V2VLC and tANS achieve the highest compression ratios for most coding rates and that the AC coder that uses fixed-length codewords attains the highest throughput. The experimental evaluation discloses the advantages and shortcomings of each entropy-coding scheme, providing insights that may help to select this stage in forthcoming compression systems. Full article
(This article belongs to the Special Issue Advances in Applications of Intelligently Mining Massive Data)
Show Figures

Graphical abstract

20 pages, 2450 KiB  
Article
Knowledge Graph Construction for Social Customer Advocacy in Online Customer Engagement
by Bilal Abu-Salih and Salihah Alotaibi
Technologies 2023, 11(5), 123; https://doi.org/10.3390/technologies11050123 - 11 Sep 2023
Cited by 1 | Viewed by 1591
Abstract
The rise of online social networks has revolutionized the way businesses and consumers interact, creating new opportunities for customer word-of-mouth (WoM) and brand advocacy. Understanding and managing customer advocacy in the online realm has become crucial for businesses aiming to cultivate a positive [...] Read more.
The rise of online social networks has revolutionized the way businesses and consumers interact, creating new opportunities for customer word-of-mouth (WoM) and brand advocacy. Understanding and managing customer advocacy in the online realm has become crucial for businesses aiming to cultivate a positive brand image and engage with their target audience effectively. In this study, we propose a framework that leverages the pre-trained XLNet- (bi-directional long-short term memory) BiLSTM- conditional random field (CRF) architecture to construct a Knowledge Graph (KG) for social customer advocacy in online customer engagement (CE). The XLNet-BiLSTM-CRF model combines the strengths of XLNet, a powerful language representation model, with BiLSTM-CRF, a sequence labeling model commonly used in natural language processing tasks. This architecture effectively captures contextual information and sequential dependencies in CE data. The XLNet-BiLSTM-CRF model is evaluated against several baseline architectures, including variations of BERT integrated with other models, to compare their performance in identifying brand advocates and capturing CE dynamics. Additionally, an ablation study is conducted to analyze the contributions of different components in the model. The evaluation metrics, including accuracy, precision, recall, and F1 score, demonstrate that the XLNet-BiLSTM-CRF model outperforms the baseline architectures, indicating its superior ability to accurately identify brand advocates and label customer advocacy entities. The findings highlight the significance of leveraging pre-trained contextual embeddings, sequential modeling, and sequence labeling techniques in constructing effective models for constructing a KG for customer advocacy in online engagement. The proposed framework contributes to the understanding and management of customer advocacy by facilitating meaningful customer-brand interactions and fostering brand loyalty. Full article
(This article belongs to the Special Issue Advances in Applications of Intelligently Mining Massive Data)
Show Figures

Figure 1

Review

Jump to: Research, Other

24 pages, 5116 KiB  
Review
Cleaning Big Data Streams: A Systematic Literature Review
by Obaid Alotaibi, Eric Pardede and Sarath Tomy
Technologies 2023, 11(4), 101; https://doi.org/10.3390/technologies11040101 - 26 Jul 2023
Cited by 1 | Viewed by 2557
Abstract
In today’s big data era, cleaning big data streams has become a challenging task because of the different formats of big data and the massive amount of big data which is being generated. Many studies have proposed different techniques to overcome these challenges, [...] Read more.
In today’s big data era, cleaning big data streams has become a challenging task because of the different formats of big data and the massive amount of big data which is being generated. Many studies have proposed different techniques to overcome these challenges, such as cleaning big data in real time. This systematic literature review presents recently developed techniques that have been used for the cleaning process and for each data cleaning issue. Following the PRISMA framework, four databases are searched, namely IEEE Xplore, ACM Library, Scopus, and Science Direct, to select relevant studies. After selecting the relevant studies, we identify the techniques that have been utilized to clean big data streams and the evaluation methods that have been used to examine their efficiency. Also, we define the cleaning issues that may appear during the cleaning process, namely missing values, duplicated data, outliers, and irrelevant data. Based on our study, the future directions of cleaning big data streams are identified. Full article
(This article belongs to the Special Issue Advances in Applications of Intelligently Mining Massive Data)
Show Figures

Figure 1

Other

Jump to: Research, Review

20 pages, 481 KiB  
Systematic Review
Tendency on the Application of Drill-Down Analysis in Scientific Studies: A Systematic Review
by Victor Hugo Silva-Blancas, José Manuel Álvarez-Alvarado, Ana Marcela Herrera-Navarro and Juvenal Rodríguez-Reséndiz
Technologies 2023, 11(4), 112; https://doi.org/10.3390/technologies11040112 - 13 Aug 2023
Viewed by 1524
Abstract
With the fact that new server technologies are coming to market, it is necessary to update or create new methodologies for data analysis and exploitation. Applied methodologies go from decision tree categorization to artificial neural networks (ANN) usage, which implement artificial intelligence (AI) [...] Read more.
With the fact that new server technologies are coming to market, it is necessary to update or create new methodologies for data analysis and exploitation. Applied methodologies go from decision tree categorization to artificial neural networks (ANN) usage, which implement artificial intelligence (AI) for decision making. One of the least used strategies is drill-down analysis (DD), belonging to the decision trees subcategory, which because of not having AI resources has lost interest among researchers. However, its easy implementation makes it a suitable tool for database processing systems. This research has developed a systematic review to understand the prospective of DD analysis on scientific literature in order to establish a knowledge platform and establish if it is convenient to drive it to integration with superior methodologies, as it would be those based on ANN, and produce a better diagnosis in future works. A total of 80 scientific articles were reviewed from 1997 to 2023, showing a high frequency in 2021 and experimental as the predominant methodology. From a total of 100 problems solved, 42% were using the experimental methodology, 34% descriptive, 17% comparative, and just 7% post facto. We detected 14 unsolved problems, from which 50% fall in the experimental area. At the same time, by study type, methodologies included correlation studies, processes, decision trees, plain queries, granularity, and labeling. It was observed that just one work focuses on mathematics, which reduces new knowledge production expectations. Additionally, just one work manifested ANN usage. Full article
(This article belongs to the Special Issue Advances in Applications of Intelligently Mining Massive Data)
Show Figures

Figure 1

Back to TopTop