Privacy-Preserving Methods and Applications in Big Data Sharing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (12 December 2023) | Viewed by 10407

Special Issue Editors

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Interests: big data management; cloud computing; data privacy protection methods; data query processing technology
School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
Interests: big data privacy and security; artificial intelligence security; IoT security and computing; online learning; deep learning; industrail IOT; computer vision and its security; wireless network security, reinforcement learning and other cutting-edge artificial intelligence design and privacy protection
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid development of modern technology, tremendous data is generated from social networking sites, sensor networks, the Internet, healthcare applications, and many other practical scenarios. Big data is the huge amount of data generated from different sources in multiple formats at very high speed. As a cutting-edge technology, big data has become a very active research area in the past decades and has been closely combined with other emerging domains such as artificial intelligence, the Internet of Things, databases, and smart healthcare. Despite the extensive applications of big data, during processing, analyzing, and implementing big data, privacy security is an inevitable issue and poses a crucial challenge to further promote the development of the information society.

This Special Issue focuses on privacy-preserving methods and applications in big data sharing including big data analyzing, processing, mining, etc. The aim is to gather researchers from various fields and backgrounds to solve privacy concerns on big data. It is an opportunity to present all their latest works and achievements, and bring new perspectives to the future directions of privacy-preserving big data research.

Prof. Dr. Xiaofeng Ding
Prof. Dr. Pan Zhou
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • privacy-preserving databases
  • secure outsourcing
  • cryptography tools for privacy
  • secure multiparty computing
  • artificial intelligence
  • internet of things
  • threat and vulnerability analysis
  • trustworthy machine learning
  • privacy-preserving data analysis
  • secure data mining
  • federated learning
  • data security
  • secure query
  • differential privacy
  • intelligent medical service
  • trust and forensics
  • blockchain systems
  • cybersecurity
  • adversarial attacks

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 5058 KiB  
Article
A Parallel Privacy-Preserving k-Means Clustering Algorithm for Encrypted Databases in Cloud Computing
by Youngho Song, Hyeong-Jin Kim, Hyun-Jo Lee and Jae-Woo Chang
Appl. Sci. 2024, 14(2), 835; https://doi.org/10.3390/app14020835 - 18 Jan 2024
Viewed by 587
Abstract
With the development of cloud computing, interest in database outsourcing has recently increased. However, when the database is outsourced, there is a problem in that the information of the data owner is exposed to internal and external attackers. Therefore, in this paper, we [...] Read more.
With the development of cloud computing, interest in database outsourcing has recently increased. However, when the database is outsourced, there is a problem in that the information of the data owner is exposed to internal and external attackers. Therefore, in this paper, we propose decimal-based encryption operation protocols that support privacy preservation. The proposed protocols improve the operational efficiency compared with binary-based encryption operation protocols by eliminating the need for repetitive operations based on bit length. In addition, we propose a privacy-preserving k-means clustering algorithm using decimal-based encryption operation protocols. The proposed k-means clustering algorithm utilizes efficient decimal-based protocols that enhance the efficiency of the encryption operations. To provide high query processing performance, we also propose a parallel k-means clustering algorithm that supports thread-based parallel processing by using a random value pool. Meanwhile, a security analysis of both the proposed k-means clustering algorithm and the proposed parallel algorithm was performed to prove their data protection, query protection, and access pattern protection capabilities. Through our performance analysis, the proposed k-means clustering algorithm shows about 10~13 times better performance compared with the existing algorithms. Full article
(This article belongs to the Special Issue Privacy-Preserving Methods and Applications in Big Data Sharing)
Show Figures

Figure 1

18 pages, 1175 KiB  
Article
QuoTa: An Online Quality-Aware Incentive Mechanism for Fast Federated Learning
by Hui Cai, Chao Bian, Biyun Sheng, Jian Zhou, Juan Li and Xin He
Appl. Sci. 2024, 14(2), 833; https://doi.org/10.3390/app14020833 - 18 Jan 2024
Viewed by 510
Abstract
In addition to federated optimization, more current studies focus on incentive mechanism design problems for federated learning (FL), stimulating data owners to share their resources securely. Most existing works only considered data quantity but neglected other key factors like data quality and training [...] Read more.
In addition to federated optimization, more current studies focus on incentive mechanism design problems for federated learning (FL), stimulating data owners to share their resources securely. Most existing works only considered data quantity but neglected other key factors like data quality and training time prediction. In combination with all the above factors, we proposed an online quality-aware incentive mechanism based on multi-dimensional reverse auction, QuoTa, for achieving fast FL. In particular, it first designs model quality detection to eliminate some malicious or dispensable devices based on their historical behaviors and marginal contributions. Due to the possible fluctuations of CPU frequency in realistic model training, it next predicts model training time based on the upper confidence bound algorithm. By combining the two modules, QuoTa incentivizes data owners with high data quality, high computing capability, and low cost to participate in the FL process. By rigorous theoretical proof and extensive experiments, we prove that QuoTa satisfies all desired economic properties and achieves higher model accuracy and less convergence time than the state-of-the-art work. Full article
(This article belongs to the Special Issue Privacy-Preserving Methods and Applications in Big Data Sharing)
Show Figures

Figure 1

24 pages, 1244 KiB  
Article
FedNow: An Efficiency-Aware Incentive Mechanism Enables Privacy Protection and Efficient Resource Utilization in Federated Edge Learning
by Jianfeng Lu, Wenxuan Yuan, Shuqin Cao and Pan Zhou
Appl. Sci. 2024, 14(2), 494; https://doi.org/10.3390/app14020494 - 05 Jan 2024
Viewed by 740
Abstract
Federated edge learning (FEL) has recently attracted great interest due to its real-time response and energy-efficient characteristics. Most existing work focuses on designing algorithms to improve model performance, ignoring the malicious behavior and personal decision-making of self-interested edge servers. Although some efforts have [...] Read more.
Federated edge learning (FEL) has recently attracted great interest due to its real-time response and energy-efficient characteristics. Most existing work focuses on designing algorithms to improve model performance, ignoring the malicious behavior and personal decision-making of self-interested edge servers. Although some efforts have been devoted to incentivizing honest edge server engagement by compensating training costs, this rarely considers resource efficiency and often assumes that edge servers provide complete information to the platform, which may lead to the risk of private attribute leakage. Hence, we aim to achieve an incentive mechanism that promotes secure and efficient model training between the platform and edge servers. However, edge servers’ multi-dimensional private attributes and training strategies make the optimization problem nonconvex, and incomplete information further increases the complexity of the analysis. In order to address these challenges and by integrating contract theory and exponential mechanism, we propose an efficiency-aware incentive mechanism, FedNow, which enables edge servers to personally determine their local training rounds while motivating their participation without giving access to their true training strategies and private attributes. Specifically, we enabld edge servers to add noise to their submitted training strategy to hide their true training rounds; then, we carefully designed an efficiency score function to select honest and efficient edge servers without disclosing their private attributes. In order to demonstrate that FedNow strictly outperforms existing schemes in terms of total costs, we theoretically derived sufficient conditions for making the total costs of FedNow lower than existing schemes and designed a greedy algorithm that uses the Monte Carlo method to find feasible near-optimal solutions in polynomial time. Our extensive experimental assessment using synthetic and real datasets shows the superiority of FedNow. Full article
(This article belongs to the Special Issue Privacy-Preserving Methods and Applications in Big Data Sharing)
Show Figures

Figure 1

15 pages, 1128 KiB  
Article
The Use of Blockchain Technology and OCR in E-Government for Document Management: Inbound Invoice Management as an Example
by Fatima Azzam, Mariam Jaber, Amany Saies, Tareq Kirresh, Ruba Awadallah, Abdallah Karakra, Hafez Barghouthi and Saleh Amarneh
Appl. Sci. 2023, 13(14), 8463; https://doi.org/10.3390/app13148463 - 21 Jul 2023
Viewed by 2083
Abstract
The field of electronic government (e-government) is gaining prominence in contemporary society, as it has a significant influence on the wider populace within the context of a technologically advanced world. E-government makes use of information and communication technologies (ICTs) at various levels and [...] Read more.
The field of electronic government (e-government) is gaining prominence in contemporary society, as it has a significant influence on the wider populace within the context of a technologically advanced world. E-government makes use of information and communication technologies (ICTs) at various levels and domains within government agencies and the public sector. ICT reduces manual labour, potential fraud points, errors, and process lapses. The Internet’s quick accessibility and the widespread adoption of modern technologies and disciplines, such as big data, the Internet of Things, machine learning, and artificial intelligence, have accelerated the need for e-government. However, these developments raise a number of data reliability and precision concerns. The adoption of blockchain technology by researchers demonstrates its efficacy in addressing such issues. The present study proposes the SECHash system model, which integrates blockchain and Optical Character Recognition (OCR) technologies for the purpose of regulating the processing of incoming documents by governmental agencies. As a case study to assess the proposed system paradigm, the study uses a document containing incoming invoices. The proposal seeks to maintain the integrity of document data by prohibiting its modification after acceptance. Additionally, SECHash guarantees that accepted documents will not be destroyed or lost. The analysis demonstrates that using the SECHash model system will decrease fraudulent transactions by eradicating manual labour and storing documents on a blockchain network. Full article
(This article belongs to the Special Issue Privacy-Preserving Methods and Applications in Big Data Sharing)
Show Figures

Figure 1

19 pages, 1268 KiB  
Article
A Node Differential Privacy-Based Method to Preserve Directed Graphs in Wireless Mobile Networks
by Jun Yan, Yihui Zhou and Laifeng Lu
Appl. Sci. 2023, 13(14), 8089; https://doi.org/10.3390/app13148089 - 11 Jul 2023
Viewed by 691
Abstract
With the widespread popularity of Wireless Mobile Networks (WMNs) in our daily life, the huge risk to disclose personal privacy of massive graph structure data in WMNs receives more and more attention. Particularly, as a special type of graph data in WMNs, the [...] Read more.
With the widespread popularity of Wireless Mobile Networks (WMNs) in our daily life, the huge risk to disclose personal privacy of massive graph structure data in WMNs receives more and more attention. Particularly, as a special type of graph data in WMNs, the directed graph contains an amount of sensitive personal information. To provide secure and reliable privacy preservation for directed graphs in WMNs, we develop a node differential privacy-based method, which combines differential privacy with graph modification. In the method, the original directed graph is first divided into several sub-graphs after it is transformed into a weighted graph. Then, in each sub-graph, the node degree sequences are obtained by using an exponential mechanism and micro-aggregation is adopted to get the noised node degree sequences, which is used to generate a synthetic directed sub-graph through edge modification. Finally, all synthetic sub-graphs are merged into a synthetic directed graph that can preserve the original directed graph. The theoretical analysis proves that the proposed method satisfies differential privacy. The results of the experiments demonstrate the effectiveness of the presented method in privacy preservation and data utility. Full article
(This article belongs to the Special Issue Privacy-Preserving Methods and Applications in Big Data Sharing)
Show Figures

Figure 1

23 pages, 11713 KiB  
Article
Shapley Values as a Strategy for Ensemble Weights Estimation
by Vaidotas Drungilas, Evaldas Vaičiukynas, Linas Ablonskis and Lina Čeponienė
Appl. Sci. 2023, 13(12), 7010; https://doi.org/10.3390/app13127010 - 10 Jun 2023
Viewed by 1129
Abstract
This study introduces a novel performance-based weighting scheme for ensemble learning using the Shapley value. The weighting uses the reciprocal of binary cross-entropy as a base learner’s performance metric and estimates its Shapley value to measure the overall contribution of a learner to [...] Read more.
This study introduces a novel performance-based weighting scheme for ensemble learning using the Shapley value. The weighting uses the reciprocal of binary cross-entropy as a base learner’s performance metric and estimates its Shapley value to measure the overall contribution of a learner to an equally weighted ensemble of various sizes. Two variants of this strategy were empirically compared with a single monolith model and other static weighting strategies using two large banking-related datasets. A variant that discards learners with a negative Shapley value was ranked as first or at least second when constructing homogeneous ensembles, whereas for heterogeneous ensembles this strategy resulted in a better or at least similar detection performance to other weighting strategies tested. The main limitation being the computational complexity of Shapley calculations, the explored weighting strategy could be considered as a generalization of performance-based weighting. Full article
(This article belongs to the Special Issue Privacy-Preserving Methods and Applications in Big Data Sharing)
Show Figures

Figure 1

21 pages, 5012 KiB  
Article
Blockchained Trustable Federated Learning Utilizing Voting Accountability for Malicious Actor Mitigation
by Brian Stanley, Sang-Gon Lee and Elizabeth Nathania Witanto
Appl. Sci. 2023, 13(11), 6707; https://doi.org/10.3390/app13116707 - 31 May 2023
Viewed by 852
Abstract
The federated learning (FL) approach in machine learning preserves user privacy during data collection. However, traditional FL schemes still rely on a centralized server, making them vulnerable to security risks, such as data breaches and tampering of models caused by malicious actors attempting [...] Read more.
The federated learning (FL) approach in machine learning preserves user privacy during data collection. However, traditional FL schemes still rely on a centralized server, making them vulnerable to security risks, such as data breaches and tampering of models caused by malicious actors attempting to gain access by masquerading as trainers. To address these issues that hamper the trustability of federated learning, requirements were analyzed for several of these problems. The findings revealed that issues, such as the lack of accountability management, malicious actor mitigation, and model leakage, remained unaddressed in prior works. To fill this gap, a blockchain-based trustable FL scheme, MAM-FL, is proposed with the focus on providing accountability to trainers. MAM-FL established a group of voters responsible for evaluating and verifying the validity of the model updates submitted. The effectiveness of MAM-FL was tested based on the reduction of malicious actors present on both trainers’ and voters’ sides and the ability to handle colluding participants. Experiments show that MAM-FL succeeded at reducing the number of malicious actors, despite the test case involving initial collusion in the system. Full article
(This article belongs to the Special Issue Privacy-Preserving Methods and Applications in Big Data Sharing)
Show Figures

Figure 1

26 pages, 850 KiB  
Article
A Comprehensive Survey on Privacy-Preserving Techniques in Federated Recommendation Systems
by Muhammad Asad, Saima Shaukat, Ehsan Javanmardi, Jin Nakazato and Manabu Tsukada
Appl. Sci. 2023, 13(10), 6201; https://doi.org/10.3390/app13106201 - 18 May 2023
Cited by 5 | Viewed by 2992
Abstract
Big data is a rapidly growing field, and new developments are constantly emerging to address various challenges. One such development is the use of federated learning for recommendation systems (FRSs). An FRS provides a way to protect user privacy by training recommendation models [...] Read more.
Big data is a rapidly growing field, and new developments are constantly emerging to address various challenges. One such development is the use of federated learning for recommendation systems (FRSs). An FRS provides a way to protect user privacy by training recommendation models using intermediate parameters instead of real user data. This approach allows for cooperation between data platforms while still complying with privacy regulations. In this paper, we explored the current state of research on FRSs, highlighting existing research issues and possible solutions. Specifically, we looked at how FRSs can be used to protect user privacy while still allowing organizations to benefit from the data they share. Additionally, we examined potential applications of FRSs in the context of big data, exploring how these systems can be used to facilitate secure data sharing and collaboration. Finally, we discuss the challenges associated with developing and deploying FRSs in the real world and how these challenges can be addressed. Full article
(This article belongs to the Special Issue Privacy-Preserving Methods and Applications in Big Data Sharing)
Show Figures

Figure 1

Back to TopTop