# Entropy-Weighted Instance Matching Between Different Sourcing Points of Interest

## Abstract

## 1. Introduction

## 2. Related Work

## 3. The Entropy-Weighted Approach for Finding Matched POIs

#### 3.1. The Strategy of Attribute Selection

- For an attribute category $m$. If ($m\in P$ and $m\notin Q$) or ($m\notin P$ and $m\in Q$ ), then define the similarity of this attribute ${s}_{m}=0$, and exclude this property in the weighted multi-attributes model.
- If ($m\in (P\cap Q)$), then confirm the calculation of according to the feature of attribute value and include this property in the weighted multi-attributes model.

#### 3.2. Spatial Similarity

#### 3.3. Name Similarity

#### 3.4. Category Similarity

**a**) Conception vectors constructed from two minor categories; (

**b**) Conception vectors constructed from a minor and an intermediate category.

#### 3.5. The Entropy-Weighted Multi-Attributes Method

- Set the probability distribution of each calculation ${p}_{ij}$, i = 1, 2… n, j = 1, 2… m, where n refers to the count of discrete similarity that divided with unique interval, and m equal the amount of ${w}_{j}$ (for example, m = 5 in the Five-Methods Model).
- Compute the normalized information entropy ${E}_{j}$, the formula is given as follows [37]:$${E}_{j}=-{h}_{0}{\displaystyle \sum _{i=1}^{n}{p}_{ij}}\xb7{\mathrm{log}}_{2}({p}_{ij})$$
- The weights are calculated as follows:$${w}_{j}=\frac{1-{E}_{j}}{{\displaystyle {\sum}_{k=1}^{m}(1-{E}_{k})}}$$

## 4. Case Study and Discussion

#### 4.1. Experimental Dataset

#### 4.2. The Spatial Attribute

#### 4.3. The Name Attribute

#### 4.4. The Category Attribute

#### 4.5. The Entropy-Weighted Multi-Attributes Model Analysis

_{ij}, respectively. Thus, the calculated entropies ${E}_{j}$ using Equation (5) are presented in Table 1.

Similarity | P(1) | P(2) | P(3) | P(4) | P(5) |
---|---|---|---|---|---|

0 ≤ s < 0.05 | 0 | 0 | 0 | 0 | 0.0830 |

0.05 ≤ s < 0.1 | 0.0040 | 0.0040 | 0.0040 | 0 | 0 |

0.1 ≤ s < 0.15 | 0.0079 | 0.0079 | 0.0040 | 0 | 0 |

0.15 ≤ s < 0.2 | 0.0356 | 0.0040 | 0 | 0 | 0.0040 |

0.2 ≤ s < 0.25 | 0.0395 | 0.0237 | 0.0198 | 0 | 0 |

0.25 ≤ s < 0.3 | 0.0237 | 0.0316 | 0.0356 | 0.004 | 0 |

0.3 ≤ s < 0.35 | 0.0514 | 0.0316 | 0.0277 | 0 | 0 |

0.35 ≤ s < 0.4 | 0.0277 | 0.0435 | 0.0356 | 0.004 | 0 |

0.4 ≤ s < 0.45 | 0.0198 | 0.0988 | 0.0791 | 0.0356 | 0.0079 |

0.45 ≤ s < 0.5 | 0.0395 | 0.0356 | 0.0514 | 0 | 0.0119 |

0.5 ≤ s < 0.55 | 0.0237 | 0.1067 | 0.0949 | 0.0514 | 0.0791 |

0.55 ≤ s < 0.6 | 0.0079 | 0.0632 | 0.0830 | 0.0277 | 0.0079 |

0.6 ≤ s < 0.65 | 0.0316 | 0.1067 | 0.0988 | 0.0395 | 0.0119 |

0.65 ≤ s < 0.7 | 0.0316 | 0.0870 | 0.0791 | 0.0909 | 0.0158 |

0.7 ≤ s < 0.75 | 0.0277 | 0.0909 | 0.0870 | 0.1067 | 0.0593 |

0.75 ≤ s < 0.8 | 0.0356 | 0.0909 | 0.0949 | 0.1581 | 0.1146 |

0.8 ≤ s < 0.85 | 0.0474 | 0.1107 | 0.1225 | 0.2055 | 0.1818 |

0.85 ≤ s < 0.9 | 0.0514 | 0.0237 | 0.0277 | 0.1502 | 0.1146 |

0.9 ≤ s < 0.95 | 0.1146 | 0.0119 | 0.0119 | 0.0632 | 0.1067 |

0.95 ≤ s < 1 | 0.3794 | 0 | 0 | 0 | 0.1462 |

s = 1 | 0 | 0.0277 | 0.0435 | 0.0632 | 0.0553 |

${E}_{j}$ | 0.765 | 0.873 | 0.872 | 0.739 | 0.766 |

Abbreviation | Spatial | Text | Phonetic | WordSeg | Category | |
---|---|---|---|---|---|---|

Five-Methods Model | STPWC | 0.2386 | 0.1289 | 0.1299 | 0.2650 | 0.2376 |

Four-Methods Model | STPC | 0.3246 | 0.1754 | 0.1768 | – | 0.3232 |

STWC | 0.2742 | 0.1482 | – | 0.3046 | 0.2730 | |

SPWC | 0.2739 | – | 0.1492 | 0.3042 | 0.2727 | |

Three-Methods Model | STC | 0.3943 | 0.2131 | – | – | 0.3926 |

SPC | 0.3936 | – | 0.2144 | – | 0.3920 | |

SWC | 0.3219 | – | – | 0.3575 | 0.3205 |

**Figure 8.**(

**a**) prec. of STC, SPC and SWC; (

**b**) prec. of STPC, STWC and SPWC; (

**c**) prec. of STPWC, STPC and SPC; (

**d**) Recall of STC, SPC and SWC; (

**e**) Recall of STPC, STWC and SPWC; (

**f**) Recall of STPWC, SPWC and SWC; (

**g**) F1 of STC, SPC and SWC; (

**h**) F1 of STPC, STWC and SPWC; (

**i**) F1 of STPWC, SPWC and SWC.

## 5. Conclusions and Future Work

## References

