# Data Sharing Privacy Metrics Model Based on Information Entropy and Group Privacy Preference

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- In calculating the importance of privacy, we use information entropy to remodel the quantity of data privacy. After mathematical derivation, we use a new weight expression to replace part of the traditional entropy weight method.
- In the quantitative calculation of privacy preference, we add the analytic hierarchy process (AHP) [5] method to the data collection process and release and modify the results based on information entropy. The metric results fully take into account the user’s personalized privacy preferences.
- We construct a complete data-sharing privacy metrics model, which provides a solution for evaluating privacy security in data-sharing scenarios. The experiments verify the validity of the model. Compared with the privacy metrics model based on the traditional entropy weight method, our model gets more reasonable weights and senses the change in data privacy more keenly.

## 2. Related Work

## 3. Data Sharing Privacy Metric Model Based on Information Entropy and Group Privacy Preference

#### 3.1. Problem Modelling

#### 3.2. Weighted Algorithm for Privacy Attributes Based on Information Entropy

- The entropy weight method uses the formula for information entropy, but the physical meaning of information entropy needs to be clarified. It lacks an explanation of scene modeling and probability angles, so it cannot be directly equivalent to privacy quantity.
- In the process of calculating privacy metrics, it is necessary to first quantify the privacy data and then calculate based on the quantized data. There are apparent differences between the data and the original data after privacy quantization and standardization, and improper data processing will lead to the distortion of entropy weight calculation.

- According to the original data, the random event ${Y}_{j}=\left({y}_{1j},\dots ,{y}_{mj}\right)$ of every index value is constructed, $m$ is the number of the $j$th attribute that contains the value type, and the probability distribution $P\left({Y}_{j}\right)=({p}_{{y}_{1j}},\dots ,{p}_{{y}_{mj}})$ of every index value is calculated.
- For each ${Y}_{j}$, the information entropy $H\left({Y}_{j}\right)=-{\displaystyle \sum}_{i=1}^{m}p\left({y}_{ij}\right)\mathrm{log}\left(p\left({y}_{ij}\right)\right)$.
- Since it has been proved in 3.1 that there is a directly proportional relationship between the amount of privacy leakage and the results of the current information entropy calculation, the final weight vector $w=\left({w}_{j}\right),j=1,2,\dots ,k$ is obtained directly by ${w}_{j}=\frac{{H}_{j}}{{{\displaystyle \sum}}_{j=1}^{k}{H}_{j}}$ normalization.

#### 3.3. Weight Correction Based on User Privacy Preferences

- Split the set of data attributes to be published and build the hierarchical attribute architecture. Building a hierarchical structure can not only help get more discriminative results when getting user privacy preferences but also avoid computing the eigenvalues of large matrices and improve the algorithm’s efficiency. The privacy information category classifies the current attribute set and constructs the hierarchical model. For example, 17 personal data attributes can be split and built into an attribute hierarchy, as shown in Figure 3.

- Identity information: attributes related only to the individual, independent of others, natural attributes of the person.
- Social attributes: the attributes that describe individual participation in social relations; attributes related to others; and regional attributes.
- Job financial status: an attribute that describes an individual’s occupation and financial status.

- 2.
- Based on the hierarchical model, the relative importance of users’ privacy preferences is analyzed, and the judgment matrix ${C}_{t\times t}={\left[{b}_{ij}\right]}_{t\times t}$ is constructed for each sub-level, assuming that the number of attributes in the sub-level is $t$. Table 1 is a quantitative representation of subjective opinions, which quantifies abstract and fuzzy user opinions into a numerical matrix by comparing two different attributes. For each ${b}_{ij}$ in the matrix, using the scale in Table 1, get the numerical expression of the user group’s preference for attributes by pairwise comparison.

- 3.
- Single-level sorting. The method of square root or sum product is used to calculate the maximum eigenvalue ${\lambda}_{\mathrm{max}}$ of matrix ${C}_{t\times t}$ and its corresponding eigenvector $p$.
- 4.
- Consistency test. There may be some conflicts between two sets of comparisons, and consistency needs to be verified to ensure the validity of the statistics. Since the data is transformed to the judgment matrix ${C}_{t\times t}$, the problem is transformed to determine whether the matrix ${C}_{t\times t}$ is consistent, that is, whether the largest eigenvalue ${\lambda}_{\mathrm{max}}$ of the matrix equals the order of the matrix $t$. However, absolute consistency is often challenging to achieve, so the use of an approximate way to measure the degree of consistency of the matrix at this time. To avoid the inconsistency caused by the statistics of subjective privacy preferences, the consistency test should be carried out on the calculated results. The consistency index $C.I.$ was obtained by using Formula (11). The random consistency index $R.I.$ is selected according to Table 2 and index number $t$.

- The matrix $C$ has complete consistency when $C.I.=0$.
- When $C.I.$ is close to zero, the matrix $C$ has satisfactory consistency.
- The greater the $C.I.$, the greater the inconsistency of $C$.

- By describing a problem that needs data support, the data consumer puts forward the demand for data usage and sends the demand to the data server.
- According to the requirement, the data server formulates the data attributes that need to be collected, divides the data attributes according to specific rules, and constructs the hierarchical structure model.
- The data server requests that the user group use the actual data. The user gives the privacy importance preference matrix about the data attribute and sends the preference opinion to the data server.
- The data server integrates the data and preferences of each user individually to obtain the actual original data to be published and the group privacy preference matrix.
- The data server iterates through the privacy metrics and protection model shown in Figure 3 to get the data that meets the privacy requirements.
- The data server publishes the final data and provides it to the data consumer for analysis and sharing.

#### 3.4. Metric Results Analysis and Feedback

- External environmental factors. Business scenarios for data usage and the network environment for data transmission. It will dynamically influence the security requirements for data sharing and circulation and restrict the adoption and strength of data security and privacy protection.
- Data source privacy. The privacy attributes, information, and statistical characteristics of the original data source are mainly determined by the privacy metrics mentioned above.
- Data availability. Data that has been protected after processing should be guaranteed to be available. Consider the destruction of crucial information in the data, the destruction of the original distribution, and so on. Protection of privacy and security at the same time, as far as possible, to minimize the impact of protection measures on data utility.

## 4. Experimental Results and Discussion

#### 4.1. Comparative of Weight Distribution of Data Privacy Attributes

#### 4.2. Measures of Privacy Protection Effectiveness

- Classification of attributes: providing classification protection according to the sensitivity of attributes and the identification ability of individuals. It can guarantee data availability on low-sensitive attributes and provide key protection for data on high-sensitive attributes by slicing, suppression, generalization, and so on.
- Classification of individuals: dividing all individuals in the relationship data into high, medium, and low areas according to the average privacy amount and individual sensitivity. Limit the release of highly sensitive individual data through permutation, bucket splitting, and perturbation techniques while reducing the overall privacy impact and providing high availability of data.

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Ouaftouh, S.; Zellou, A.; Idri, A. User Profile Model: A User Dimension Based Classification. In Proceedings of the 10th International Conference on Intelligent Systems: Theories and Applications (SITA), Taipei, China, 17 December 2015. [Google Scholar]
- Ganti, R.K.; Ye, F.; Lei, H. Mobile crowdsensing: Current state and future challenges. IEEE Commun. Mag.
**2011**, 49, 32–39. [Google Scholar] [CrossRef] - Feng, D.G.; Zhang, M.; Li, H. Big data security and privacy protection. Chin. J. Comput.
**2014**, 37, 246–258. [Google Scholar] - Saaty, T.L. Decision making with the analytic hierarchy process. Int. J. Serv. Sci.
**2008**, 1, 83–98. [Google Scholar] [CrossRef][Green Version] - Zou, Z.H.; Yun, Y.; Sun, J.N. Entropy method for determination of weight of evaluating indicators in fuzzy synthetic evaluation for water quality assessment. J. Environ. Sci.
**2006**, 18, 1020–1023. [Google Scholar] [CrossRef] [PubMed] - Zhou, S.G.; Li, F.; Tao, Y.F.; Xiao, X.K. Privacy preservation in database applications: A survey. Chin. J. Comput.
**2009**, 32, 847–861. [Google Scholar] [CrossRef] - Sweeney, L. K-Anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst.
**2002**, 10, 557–570. [Google Scholar] [CrossRef][Green Version] - Machanavajjhala, A.; Kifer, D.; Gehrke, J.; Venkitasubramaniam, M. L-Diversity: Privacy Beyond K-Anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE), Atlanta, GA, USA, 3 April 2006. [Google Scholar]
- Li, N.; Li, T.; Venkatasubramanian, S. T-Closeness: Privacy beyond k-Anonymity and l-Diversity. In Proceedings of the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, 14 May 2007. [Google Scholar]
- Zang, H.; Bolot, J. Anonymization of Location Data Does Not Work: A Large-Scale Measurement Study. In Proceedings of the 17th Annual International Conference on Mobile Computing and Networking, MOBICOM 2011, Las Vegas, NV, USA, 9 September 2011. [Google Scholar]
- Dwork, C. Calibrating noise to sensitivity in private data analysis. Lect. Notes Comput. Sci.
**2006**, 3876, 265–284. [Google Scholar] - Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci.
**2013**, 9, 211–407. [Google Scholar] [CrossRef] - Qu, L.; Yang, J.; Yan, X.; Ma, L.; Yang, Q.; Han, Y. Research on Privacy Protection Technology for Data Publishing. In Proceedings of the 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C), Hainan, China, 6–10 December 2021; pp. 999–1005. [Google Scholar]
- Afifi, M.H.; Zhou, K.; Ren, J. Privacy Characterization and Quantification in Data Publishing. IEEE Trans. Knowl. Data Eng.
**2018**, 30, 1756–1769. [Google Scholar] [CrossRef] - Abdelhameed, S.A.; Moussa, S.M.; Khalifa, M.E. Restricted Sensitive Attributes-based Sequential Anonymization (RSA-SA) approach for privacy-preserving data stream publishing. Knowl.-Based Syst.
**2019**, 164, 1–20. [Google Scholar] [CrossRef] - Domingo-Ferrer, J.; Muralidhar, K.; Bras-Amoros, M. General Confidentiality and Utility Metrics for Privacy-Preserving Data Publishing Based on the Permutation Model. IEEE Trans. Dependable Secur. Comput.
**2021**, 18, 2506–2517. [Google Scholar] [CrossRef] - Zhou, Z.; Wang, Y.; Yu, X.; Miao, J. A Targeted Privacy-Preserving Data Publishing Method Based on Bayesian Network. IEEE Access
**2022**, 10, 89555–89567. [Google Scholar] [CrossRef] - Dıaz, C.; Seys, S.; Claessens, J.; Preneel, B. Towards Measuring Anonymity. In Proceedings of the 2nd International Workshop on Privacy-Enhancing Technologies, San Francisco, CA, USA, 14 April 2002. [Google Scholar]
- Gao, F.; He, J.; Peng, S.; Wu, X. A Quantifying Metric for Privacy Protection Based on Information Theory. In Proceedings of the 3rd International Symposium on Intelligent Information Technology and Security Informatics, Jinggangshan, China, 4 February 2010. [Google Scholar]
- Peng, C.G.; Ding, H.F.; Zhu, Y.J.; Fu, Z.F. Information entropy models and privacy metrics methods for privacy protection. J. Softw.
**2016**, 27, 1891–1903. [Google Scholar] - Zhang, P.P.; Peng, C.G.; Hao, C.Y. Privacy protection model and privacy metric methods based on privacy preference. Comput. Sci.
**2018**, 45, 130–134. [Google Scholar] - Wang, M.N.; Peng, C.G.; He, W.Z.; Ding, X.; Ding, H.F. Privacy metric model of differential privacy via graph theory and mutual information. Comput. Sci.
**2020**, 47, 270–277. [Google Scholar] - Yu, Y.H.; Fu, Y.; Wu, X.P. Metric and classification model for privacy data based on Shannon information entropy and BP neural network. J. Commun.
**2018**, 39, 10–17. [Google Scholar] - Arca, S.; Hewett, R. Is Entropy Enough for Measuring Privacy? In Proceedings of the 2020 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA,, 16 December 2020. [Google Scholar]
- Zhao, M.F.; Lei, C.; Zhong, Y.; Xiong, J.B. Dynamic privacy measurement model and evaluation system for mobile edge crowdsensing. Chin. J. Netw. Inf. Secur.
**2021**, 7, 157–166. [Google Scholar] - Kohavi, R.; Becker, B.; University of California. Adult Data Set. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/datasets/Adult (accessed on 25 July 2022).

**Figure 2.**Data sharing privacy metrics model based on improved information entropy and group privacy preference.

Intensity of Importance | Definition | Explanation |
---|---|---|

1 | Equal importance | The degree of contribution of the two elements is of equal importance |

3 | Moderate importance of one over another | Experience and judgment slightly prefer the former |

5 | Essential importance | Experience and judgment strongly prefer the former element |

7 | Extreme importance | Actually shows a very strong preference for the former element |

9 | Absolute importance | There is sufficient evidence to confirm an absolute preference for the former element |

2, 4, 6, 8 | Intermediate value of adjacent scale | Between two adjacent judgments |

Reciprocals | Relative unimportance | The degree of the latter factor preference is inversely proportional to the value, and the smaller the value, the higher the importance of the latter. |

t | R.I. | T | R.I. |
---|---|---|---|

1 | 0 | 9 | 1.46 |

2 | 0 | 10 | 1.49 |

3 | 0.52 | 11 | 1.52 |

4 | 0.89 | 12 | 1.54 |

5 | 1.12 | 13 | 1.56 |

6 | 1.26 | 14 | 1.58 |

7 | 1.36 | 15 | 1.59 |

8 | 1.41 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Guo, Y.; Zuo, J.; Guo, Z.; Qi, J.; Lu, Y.
Data Sharing Privacy Metrics Model Based on Information Entropy and Group Privacy Preference. *Cryptography* **2023**, *7*, 11.
https://doi.org/10.3390/cryptography7010011

**AMA Style**

Guo Y, Zuo J, Guo Z, Qi J, Lu Y.
Data Sharing Privacy Metrics Model Based on Information Entropy and Group Privacy Preference. *Cryptography*. 2023; 7(1):11.
https://doi.org/10.3390/cryptography7010011

**Chicago/Turabian Style**

Guo, Yihong, Jinxin Zuo, Ziyu Guo, Jiahao Qi, and Yueming Lu.
2023. "Data Sharing Privacy Metrics Model Based on Information Entropy and Group Privacy Preference" *Cryptography* 7, no. 1: 11.
https://doi.org/10.3390/cryptography7010011