#
Distant Supervision for Relation Extraction with Ranking-Based Methods^{}^{ †}

^{*}

^{†}

## Abstract

**:**

_{1}on the official testing set, with an optimal enhancement of F

_{1}from 27.3% to 29.98%.

## 1. Introduction

`govern_of`relation between a PERSON and an ORGIZATION, the

`date_of_birth`relation between a DATE and a PERSON, etc. [1,2,3,4,5]. It is widely applied in knowledge bases (KB), search engines and question answering systems. Traditionally, supervised learning is widely used in this research field because the training samples annotated by human experts are high-confidence and can be directly imported into machine learning algorithms [6,7,8,9]. In recent years, with the explosive growth of information on the web, it is no longer reasonable for human experts to do extensive data annotation since new information increases exponentially and data annotation would require a great deal of time and effort. In this situation, distant supervision (DS) provides a way to alleviate this problem by aligning existing knowledge bases (KB) with batches of free text. Concretely, DS assumes that a sentence conveys the relation r if it contains the corresponding entity e1 and e2 [10,11,12]. Through mathematical modeling, machine learning algorithms can learn from this weak supervision (the assumption above) and build almost reliable classifiers. However, DS still faces two main challenges today: (1) we do not know in advance the exact mappings between sentences and relations; and (2) we cannot guarantee both the KB and the free text are complete.

`Barack Obama, US`> has two associated relations

`president_of`and

`born_in`according to a KB, and for each of them, DS assumes that at least one (or all in early studies) of the three sentences S1–S3 convey the relation (Table 1). However, we are not able to decide beforehand the actual mappings between sentences and relations (i.e., S1 with

`president_of`, and S2 with

`born_in`). Also, due to the incompleteness problem, some sentences may express none of the relations (i.e., the incompleteness of the KB leads to the conclusion that S3 conveys neither of the relations), or the sentences may not be enough to support all the relations (the incompleteness of free text).

`president_of`and

`born_in`for S1–S3). MIL/MIML addresses the exact mapping issue through the at-least-one assumption, bypassing the hard hypothesis that all sentences express the associated relation(s), and by training the instance-level classifier with the help of this assumption. MIL/MIML with at-least-one (weaken the incompleteness issue) and special add-ins (i.e., adding penalty factors) can also partially deal with the incompleteness problem. Most previous MIL/MIML-based studies learn from all the weakly labeled training data. Differently, we train our models with a subset of the training data according to a series of ranking-based constraints, in order to tackle the incompleteness problem by filtering noise for the training data.

_{1}value on the official testing set (Final F

_{1}). Compared with DNMAR: (1) they explicitly modeled the missing information with hard constraints but we do not; (2) they used all the groups for training but we just select those effective ones; (3) we employ hard EM for parameter estimation but they performed exact inference.

_{1}than the baseline methods. Further, MIML-sort-e achieves the best Final F

_{1}, with an optimal enhancement from 27.3% to 29.98%.

- We are the first to make use of the group-level information to select effective training data.
- Three ranking-based methods and the ensemble are proposed and validated as effective.
- We achieve state-of-the-art results both on the average and optimal performance.
- We analyze why the data selection methods are beneficial through examples and statistical figures.

## 2. Related Work

#### 2.1. Distant Supervision for Relation Extraction

#### 2.2. Noise Reduction for Distant Supervision

## 3. Methods

#### 3.1. MIML-Sort

_{i}) as all the possible relation labels from the KB, while negative labels (N

_{i}) are defined as ȽP

_{i}, where Ƚ stands for the possible relation labels for the key entity (the first entity in the pair). For instance,

`president_of`and

`born_in`are included in P

_{i}for the entity Obama in the example shown in Table 1, while some other relations such as

`founder_of`can be components for N

_{i}. The joint probability of the whole dataset D is defined as:

**w**

_{z}and the group-level classifier

**w**

_{y}.

**w**

_{z}is a multi-class classifier which maps each instance to one of the relation labels.

**w**

_{y}is a series of binary classifiers which utilize the information from the predicted labels from

**w**

_{z}. In Equations (1) and (2), x

_{i}and z

_{i}stands for the instances (sentences) and the corresponding predicted labels for them in Group i, while y

_{i}represents the group-level labels for the group. We use ${z}_{i}^{m}$ to denote the m-th sentence in Group i, ${y}_{i}^{r}$ the r-th label, and ${\mathit{w}}_{y}^{r}$ the binary group-level classifier for label r.

**w**

_{y}and

**w**

_{z}through several training epochs. In the E-step, the algorithm traverses each instance in each group, and uses

**w**

_{z}and

**w**

_{y}from the last training epoch for prediction. The predicted label for each instance is generated by maximizing the item on the right side in Equation (3).

_{i}’ denotes the group labels inferred previously in which the m-th label z

_{m}is replaced by the current candidate label z, and ${x}_{i}^{m}$ stands for the mth instance in the i-th group. Rewriting in log form:

_{i}) to denote the score of each group, which is further defined based on different sorting strategies, and the training dataset is updated by selecting the top θ groups according to f(x

_{i}) (see Equations (5) and (6)).

_{e + 1}is the training dataset for the (e + 1)-th epoch and X

_{e + 1}is the universal set of training groups in the (e + 1)-th epoch.

_{e+1}is generated by Equations (5) and (6).

Algorithm 1 MIML-Sort Training |

Input: training bags {x_{i}}, positive/negative label sets {P_{i}/N_{i}}, label set Ɍ, proportion parameter θ |

Output: instance-level and group-level classifiers w_{z} and w_{y} |

1: foreach ${x}_{i}^{m}$ in each bag x_{i}: |

2:
${z}_{i}^{m}$←each r in P_{i} |

3: end for |

4: foreach iteration t in T: |

5: foreach bag x_{i}: |

6: foreach ${x}_{i}^{m}$ in each bag x_{i}: |

7: ${z}_{m}=\underset{z}{\text{arg\hspace{0.17em}max\hspace{0.17em}}}p(z|{x}_{i}^{m},{\mathit{w}}_{y},{\mathit{w}}_{z})\text{\hspace{0.17em}}=\underset{z}{\text{arg\hspace{0.17em}max\hspace{0.17em}}}p(z|{x}_{i}^{m},{\mathit{w}}_{z})\text{\hspace{0.17em}}\times {\displaystyle \prod _{r\in {P}_{i}\cup {N}_{i}}p({y}_{i}=r|{z}_{i}^{\prime},{\mathit{w}}_{y}^{r})}$ |

8: end for |

9: end for |

10: foreach bag x_{i}: |

11: ${z}_{i}*=\underset{z}{\text{arg\hspace{0.17em}max\hspace{0.17em}}}p(z|{x}_{i},{\mathit{w}}_{z})$ |

12: foreach r in Ɍ: |

13: ${y}_{}^{r*}=\underset{\{0,1\}}{\text{arg\hspace{0.17em}max\hspace{0.17em}}}p(y|{z}_{i}^{*},{\mathit{w}}_{y}^{r})$ |

14: end for |

15: end for |

16: ${D}_{e+1}\leftarrow \delta ({D}_{e},\theta ),\text{\hspace{0.17em}\hspace{0.17em}}given\text{\hspace{0.05em}\hspace{0.17em}}\{f({x}_{i})\}$ |

17: ${\mathit{w}}_{z}^{*}=\underset{w}{\text{arg\hspace{0.17em}max}}{\displaystyle \sum _{i=1}^{n}{\displaystyle \sum _{m\in {M}_{i}}\text{log\hspace{0.17em}}p({l}_{i}^{m*}|{x}_{i}^{m},\mathit{w},{D}_{e+1})}}$ |

18: foreach r in Ɍ: |

19: ${\mathit{w}}_{y}^{l*}=\underset{w}{\text{arg\hspace{0.17em}max}}{\displaystyle \sum _{i,l}\text{log\hspace{0.17em}}p({y}_{i}^{l}|{z}_{i}^{*},\mathit{w},{D}_{e+1})}$ |

20: end for |

21: end for |

_{i}is assigned to each instance, and a set of instances of size |M

_{i}| × |P

_{i}| is generated. The rest lines are similar to MIML-re except for an updated version of dataset D

_{e+1}shown in Lines 16–19. The model parameters

**w**

_{y}and

**w**

_{z}are thus estimated through several iterations.

#### 3.2. Sorting by Conformance of Group-Level Labels

_{i}), a set of positive candidate relation labels that all instances share (P

_{i}), and a set of negative relation labels that this entity pair does not express (N

_{i}). In this method, we assign a score to each group by summing up the sub-scores for all positive labels and negative labels for the group.

_{l}as the key instance for relation l—the instance that has the maximum predicted score for a certain label in the group.

_{i}, the sub-score is the predicted confidence for this label, while for each l’ in N

_{i}

_{,}the sub-score is the confidence in not predicting for the label. For each label in P

_{i}and N

_{i}, the sub-scores are selected from the key instances correspondingly. The final score for group i is defined as:

_{p}and Z

_{N}are normalization factors which are set to be |P

_{i}| and |N

_{i}|. This sorting strategy is inspired by the original MIML-re model in which the label assignment for an instance is partly determined by the group level generative probabilities.

**x**

_{i}). This means the method can implicitly model the missing of the text.

#### 3.3. Sorting by Precision of Labels

_{i}|.

`Obama was born and grew up in the US`. may be either classified into the relations

`born_in`or

`resident_in`, which adds confusions to classifiers. However, retaining the high quality instances may cause the local minimum problem (solved by the N-fold bagging strategy mentioned later) since the proportion of the sentence-level probability can be too large compared with the group-level probability.

#### 3.4. Sorting by Ranking of Group-Level Labels

- R
_{l}—the number of labels that has a higher predicted score than label l within a group. - L
_{l}—the ranking loss for a certain label l in a group which is related to R_{l}.

_{1}and Z

_{2}are the normalization factors which are finally set to be |P

_{i}| and |U

_{i}|, respectively. Here, |$\underset{\xaf}{{U}_{i}}$| denotes the number of non-positive labels that has supportive instances (instances that classified into this class label) in the group. The series ${\sum}_{t=1}^{{R}_{l}}1/t$ is incremental, so that any label l that has a larger R would result in a larger loss, indicating that this label is inclined to be negative in this group. This sorting strategy is inspired by [32,33,34] where they define a similar ranking loss to train multiclass classifiers based on pair-wise structures.

**x**

_{i}). If the text is incomplete and one positive label can be mapped to none of the instances, the score may also be very low. Consequently, this sorting method can implicitly reflect the missing of either the KB or the text.

`president_of`would be ranked after some non-positive labels such as

`talk_up`(S3), and if

`president_of`is missing, S1, which may be correlated to a non-positive label at this time, is likely to have a higher rank than some positive labels.

#### 3.5. Sorting by Ensemble

_{e}to denote the score used in the ensemble.

## 4. Experiments and Analysis

#### 4.1. Dataset

#### 4.2. Implementation

#### 4.2.1. Implementation Details

#### 4.2.2. Baseline Methods

#### 4.3. Evaluation Metrics

- P/R curve

- Precision, Recall, Final F
_{1}

_{1}is usually the main performance measure. We tuned the parameters on the development set to maximize Final F

_{1}analogous to MIML-re [13].

- Max F
_{1}& Avg F_{1}

_{1}means the maximum F

_{1}point on the P-R curve. The P-R curve shows the average performance of an algorithm, and can also be reflected by average (Avg) P/R/F

_{1}values.

#### 4.4. Results

#### 4.4.1. Compared with MIML-Semi

_{1}is far from satisfactory, due to the very low precision. This is probablydue to that the target for MIML-semi is to tune the P-R curve but not the Final F

_{1}. However, our methods not only gain good Avg F

_{1}(Avg F

_{1}of MIML-sort-r is 3.32% higher than that of MIML-semi) but also good Final F

_{1}(far better than MIML-semi). We also notice that the Max F

_{1}values of MIML-sort(s) are all higher than MIML-semi, with MIML-sort-r the best of all, 3.1% better than MIML-semi.

_{1}. The results show the effectiveness of the selection strategies. The best Final F

_{1}, which is naturally achieved by MIML-sort-e, shows that the ensemble absorbs the advantages of the base methods.

#### 4.4.2. Compared with Other Baselines

_{1}and Avg F

_{1}in Table 2. MIML-sort(s) have better Max F

_{1}than all other work. Avg F

_{1}directly reflects the level of the P-R curve, and we find that MIML-sort-r’s figure is prominent, which is 4.12% higher than MIML-re and 5.26% higher than Mintz++. The official test results of KBP slot filling evaluation can be recognized from precision, recall and Final F

_{1}. We notice that MIML-sort-e gains the best Final F

_{1}, with an enhancement of 2.68% than MIML-re and 9.59% than MIML-semi. Other MIML-sort(s) also outperform the baselines. In addition, we see that our methods mostly benefit from recall (an exception is MIML-semi who has a very imbalanced precision and recall). We think this is mainly due to the improvement of data quality by applying the ranking-based methods, which reduces ambiguities for classification.

## 5. Discussion and Future Work

_{1}can be improved. Comparing the three basic methods we propose, we learn that MIML-sort-r performs best (from the curve), in terms of both the maximum and the average performance. The ranking strategy used in this method is validated as very effective for distantly supervised relation extraction and we think it also can be integrated in other learning frameworks, behaving as an element for the loss function. Compared with MIML-sort-r, MIML-sort-l and MIML-sort-p both have their drawbacks: MIML-sort-l does not consider the imbalanced distributions for instances (i.e., a sentence with a probability of 0.8 might be strongly classified to relation l but only weakly to relation l’) and MIML-sort-p lacks the information of group-level labels. However, through computing rankings between relations, MIML-sort-r tends to retain the groups that are most likely to be correctly annotated.

#### 5.1. Analysis of the Removed Groups

#### 5.2. Parameter Settings

#### 5.3. Analysis on Relation Types

_{1}) on the official testing set in order to see if the sorting method boosts the baseline only on some particular relations. The results show that although MIML-sort-e has a significant enhancement on some certain relations such as

`per:title`(correctly tagged from 14 to 37 and the proportion of this relation is also large), but it also has considerable improvements on other relations such as

`per:parents`,

`org:member_of`and

`per:origin`. Therefore, we can say that the sorting strategy is useful among different relation types.

## 6. Conclusions

_{1}value on the official testing set significantly by 2.68%, and the other MIML-sort(s) methods also produce considerable improvements from the baselines. We believe that the proposed ranking strategies can be integrated in other learning frameworks. From the results we notice there is still plenty of room for improvements, demanding more efficient and robust methods.

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Miller, S.; Fox, H.; Ramshaw, L.; Weischedel, R. A Novel Use of Statistical Parsing to Extract Information from Text. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference (NAACL), San Diego, CA, USA, 29 April 2000; pp. 226–233.
- Collins, M.; Duffy, N. Convolution Kernels for Natural Language. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 3–8 December 2001; pp. 625–632.
- Zelenko, D.; Aone, C.; Richardella, A. Kernel Methods for Relation Extraction. J. Mach. Learn. Res.
**2003**, 3, 1083–1106. [Google Scholar] - Kambhatla, N. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain, 21–26 July 2004.
- Culotta, A.; Sorensen, J. Dependency Tree Kernels for Relation Extraction. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain, 21–26 July 2004.
- Bunescu, R.C.; Mooney, R.J. A Shortest Path Dependency Kernel for Relation Extraction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, BC, Canada, 6–8 October 2005; pp. 724–731.
- Zhao, S.; Grishman, R. Extracting Relations with Integrated Information Using Kernel Methods. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Harbor, MI, USA, 25–30 June 2005; pp. 419–426.
- Zhou, G.; Su, J.; Zhang, J.; Zhang, M. Exploring Various Knowledge in Relation Extraction. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL), Ann Harbor, MI, USA, 25–30 June 2005; pp. 427–434.
- Bach, N.; Badaskar, S. A Review of Relation Extraction. Available online: orb.essex.ac.uk/CE/CE807/Readings/A-survey-on-Relation-Extraction.pdf (accessed on 20 May 2016).
- Mintz, M.; Bills, S.; Snow, R.; Jurafsky, D. Distant Supervision for Relation Extraction without Labeled Data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009; pp. 1003–1011.
- Riedel, S.; Yao, L.; McCallum, A. Modeling Relations and Their Mentions without Labeled Text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, Barcelona, Spain, 20–24 September 2010; pp. 148–163.
- Hoffmann, R.; Zhang, C.; Ling, X.; Zettlemoyer, L.; Weld, D.S. Knowledge Based Weak Supervision for Information Extraction of Overlapping Relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), Portland, OR, USA, 19–24 June 2011; pp. 541–550.
- Surdeanu, M.; Tibshirani, J.; Nallapati, R.; Manning, C.D. Multi-Instance Multi-Label Learning for Relation Extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP), Jeju Island, Korea, 12–14 July 2012; pp. 455–465.
- Min, B.; Grishman, R.; Wan, L.; Wang, C.; Gondek, D. Distant Supervision for Relation Extraction with an Incomplete Knowledge Base. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Atlanta, GA, USA, 9–15 June 2013; pp. 777–782.
- Ritter, A.; Zettlemoyer, L.; Etzioni, O. Modeling Missing Data in Distant Supervision for Information Extraction. Trans. Assoc. Comput. Linguist.
**2013**, 1, 367–378. [Google Scholar] - Craven, M.; Kumlien, J. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB), Heidelberg, Germany, 6–10 August 1999; pp. 77–86.
- Bunescu, R.C.; Mooney, R.J. Learning to Extract Relations from the Web Using Minimal Supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech, 23–30 June 2007; pp. 576–583.
- Bellare, K.; McCallum, A. Learning Extractors from Unlabeled Text Using Relevant Databases. Available online: http://www.aaai.org/Papers/Workshops/2007/WS-07-14/WS07-14-002.pdf (accessed on 20 May 2016).
- Wu, F.; Weld, D. Autonomously Semantifying Wikipedia. In Proceedings of the 16th International Conference on Information and Knowledge Management (CIKM), Lisbon, Portugal, 6–10 November 2007; pp. 41–50.
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the 26th Advances in Neural Information Processing Systems (NIPS), South Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2787–2795.
- Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; pp. 1112–1119.
- Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2181–2187.
- Fan, M.; Zhao, D.; Zhou, Q.; Liu, Z.; Zheng, T.F.; Chang, E.Y. Distant Supervision for Relation Extraction with Matrix Completion. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), Baltimore, MD, USA, 22–27 June 2014; pp. 839–849.
- Angeli, G.; Tibshirani, J.; Wu, J.Y.; Manning, C.D. Combining Distant and Partial Supervision for Relation Extraction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1556–1567.
- Nagesh, A.; Haffari, G.; Ramakrishna, G. Noisy-or Based Model for Relation Extraction Using Distant Supervision. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1937–1941.
- Zeng, D.; Liu, K.; Chen, Y.; Zhao, J. Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal, 17–21 September 2015; pp. 1753–1762.
- Intxaurrondo, A.; Surdeanu, M.; de Lacalle, O.L.; Agirre, E. Removing Noisy Mentions for Distant Supervision. Proces. Leng. Nat.
**2013**, 51, 41–48. [Google Scholar] - Xu, W.; Hoffmann, R.; Zhao, L.; Grishman, R. Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), Sofia, Bulgaria, 4–9 August 2013; pp. 665–670.
- Takamatsu, S.; Sato, I.; Nakagawa, H. Reducing Wrong Labels in Distant Supervision for Relation Extraction. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), Jeju Island, Korea, 8–14 July 2012; pp. 721–729.
- Xiang, Y.; Zhang, Y.; Wang, X.; Qin, Y.; Han, W. Bias Modeling for Distantly Supervised Relation Extraction. Math. Probl. Eng.
**2015**, 2015, 969053. [Google Scholar] [CrossRef] - Xiang, Y.; Wang, X.; Zhang, Y.; Qin, Y.; Fan, S. Distant Supervision for Relation Extraction via Group Selection. In Proceedings of the 22nd International Conference on Neural Information Processing (ICONIP), Istanbul, Turkey, 9–12 November 2015; pp. 250–258.
- Usunier, N.; Buffoni, D.; Gallinari, P. Ranking with Ordered Weighted Pairwise Classification. In Proceedings of the 26th International Conference on Machine Learning (ICML), Montreal, QC, Canada, 14–18 June 2009; pp. 1057–1064.
- Weston, J.; Bengio, S.; Usunier, N. Wsabie: Scaling up to Large Vocabulary Image Annotation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 16–22 July 2011; pp. 2764–2770.
- Huang, S.; Gao, W.; Zhou, Z.H. Fast Multi-Instance Multi-Label Learning. In Proceedings of the 2014 AAAI Conference on Artificial Intelligence (AAAI), Hilton, QC, Canada, 27–31 July 2014; pp. 1868–1874.
- Ji, H.; Grishman, R.; Dang, H.T.; Griffitt, K.; Ellis, J. Overview of the TAC 2010 Knowledge Base Population Track. In Proceedings of the Third Text Analysis Conference (TAC 2010), Gaithersburg, MD, USA, 15–16 November 2010.
- Ji, H.; Grishman, R.; Dang, H.T. Overview of the TAC 2011 Knowledge Base Population Track. In Proceedings of the Forth Text Analytics Conference (TAC 2011), Gaithersburg, MD, USA, 14–15 November 2011.

**Figure 1.**The plate model of MIML-re. (The smaller rectangle denotes the sentence-level classifier, with number of repetitions |M

_{i}|—the number of sentences in group i. The larger rectangle is the group-level classifier, repeated |N| times—the number of training groups.)

<Obama, US> | |
---|---|

Relations from KB | president_of <Obama, US>born_in <Obama, US> |

DS annotated sentences | S1. Obama is the 44th President of US.S2. Born in Honolulu, Hawaii, US, Obama is a graduate of Columbia University and Harvard Law School.S3. Obama talks up US recovery and urges Republicans to back higher wages. |

Precision | Recall | Final F_{1} | Max F_{1} | Avg F_{1} | Parameters | |
---|---|---|---|---|---|---|

Hoffmann | 30.65 ^{1} | 19.79 | 23.97 | 24.05 | 15.40 | - |

Mintz++ | 26.24 | 24.83 | 24.97 | 25.51 | 21.61 | - |

MIML-re | 30.56 | 24.68 | 27.30 | 28.25 | 22.75 | T ^{2} = 8 |

MIML-semi | 13.38 | 42.88 | 20.39 | 28.28 | 23.27 | T = 8 |

MIML-sort-l | 27.00 | 32.29 | 29.41 | 29.55 | 23.33 | T = 6, θ = 98% |

MIML-sort-p | 27.50 | 29.17 | 28.31 | 28.33 | 22.05 | T = 8, θ = 99% |

MIML-sort-r | 20.95 | 39.93 | 27.48 | 31.29 | 26.87 | T = 2, θ = 98% |

MIML-sort-e | 26.09 | 35.24 | 29.98 | 30.32 | 24.34 | T = 7, θ = 98% |

^{1}The optimal result for a column is marked in bold;

^{2}T includes the 1st epoch for initialization.

**Table 3.**The statistics for the removed data. The figures show the percentage of the removed groups within certain columns.

S = 1 (%) | S = 2 (%) | S = 3 (%) | S = 4 (%) | S ≥ 5 (%) | |
---|---|---|---|---|---|

MIML-sort-l | 46.10 | 25.99 | 11.94 | 6.73 | 9.24 |

(sum) | (46.10) | (72.09) | (84.03) | (90.76) | (100) |

MIML-sort-p | 94.63 | 4.54 | 0.65 | 0.11 | ≈0 |

(sum) | (94.63) | (99.16) | (99.81) | (99.92) | (100) |

MIML-sort-r | 75.53 | 12.48 | 5.07 | 2.44 | 4.48 |

(sum) | (75.53) | (88.01) | (93.08) | (95.52) | (100) |

MIML-sort-e | 86.32 | 11.32 | 1.48 | 0.50 | 0.38 |

(sum) | (86.32) | (97.64) | (99.11) | (99.62) | (100) |

S = 1 (%) | S = 2 (%) | S = 3 (%) | S = 4 (%) | S ≥ 5 (%) | |
---|---|---|---|---|---|

Percentage | 73.15 | 12.69 | 5.66 | 2.85 | 6.25 |

(sum) | (73.15) | (85.84) | (90.91) | (93.76) | (100) |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Xiang, Y.; Chen, Q.; Wang, X.; Qin, Y.
Distant Supervision for Relation Extraction with Ranking-Based Methods^{}. *Entropy* **2016**, *18*, 204.
https://doi.org/10.3390/e18060204

**AMA Style**

Xiang Y, Chen Q, Wang X, Qin Y.
Distant Supervision for Relation Extraction with Ranking-Based Methods^{}. *Entropy*. 2016; 18(6):204.
https://doi.org/10.3390/e18060204

**Chicago/Turabian Style**

Xiang, Yang, Qingcai Chen, Xiaolong Wang, and Yang Qin.
2016. "Distant Supervision for Relation Extraction with Ranking-Based Methods^{}" *Entropy* 18, no. 6: 204.
https://doi.org/10.3390/e18060204