Next Article in Journal
Deepfake Detection Algorithm Based on Dual-Branch Data Augmentation and Modified Attention Mechanism
Next Article in Special Issue
Semantic Similarity Analysis for Examination Questions Classification Using WordNet
Previous Article in Journal
Effects of Diagonal Friction Dampers on Behavior of a Building
Previous Article in Special Issue
An Open-Domain Event Extraction Method Incorporating Semantic and Dependent Syntactic Information
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FREDA: Few-Shot Relation Extraction Based on Data Augmentation

1
College of Information Science and Engineering, Xinjiang University, Urumqi 830049, China
2
Xinjiang Signal Detection and Processing Key Laboratory, Urumqi 830049, China
3
Xinjiang Uygur Autonomous Regin Product Quality Supervision and Inspection Institute, Urumqi 830049, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(14), 8312; https://doi.org/10.3390/app13148312
Submission received: 6 June 2023 / Revised: 14 July 2023 / Accepted: 17 July 2023 / Published: 18 July 2023
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

Abstract

:
The primary task of few-shot relation extraction is to quickly learn the features of relation classes from a few labelled instances and predict the semantic relations between entity pairs in new instances. Most existing few-shot relation extraction methods do not fully utilize the relation information features in sentences, resulting in difficulties in improving the performance of relation classification. Some researchers have attempted to incorporate external information, but the results have been unsatisfactory when applied to different domains. In this paper, we propose a method that utilizes triple information for data augmentation, which can alleviate the issue of insufficient instances and possesses strong domain adaptation capabilities. Firstly, we extract relation and entity pairs from the instances in the support set, forming relation triple information. Next, the sentence information and relation triple information are encoded using the same sentence encoder. Then, we construct an interactive attention module to enable the query set instances to interact separately with the support set instances and relation triple instances. The module pays greater attention to highly interactive parts between instances and assigns them higher weights. Finally, we merge the interacted support set representation and relation triple representation. To our knowledge, we are the first to propose a method that utilizes triple information for data augmentation in relation extraction. In our experiments on the standard datasets FewRel1.0 and FewRel2.0 (domain adaptation), we observed substantial improvements without including external information.

1. Introduction

Relation extraction (RE) is one of the fundamental tasks in building knowledge graphs (KGs) in natural language processing [1]. Its primary purpose is to identify the types of relations between pairs of entities in a sentence. Since Zheng et al. [2] use CNN-based models for RE tasks and propose methods to capture relevant tokens and sentence-level features automatically, neural-network-based RE methods have replaced mainly traditional algorithms and have been widely used in tasks, such as information retrieval [3], dialogue generation [4], and reading comprehension [5].
Most researchers use supervised learning methods for RE [6,7], training models based on large amounts of labelled data. However, labelling the dataset is time-consuming and laborious in real life. To address this problem, some researchers have proposed the distantly supervised relation extraction (DSRE) methods. Mintz et al. [8] constructed an algorithm that does not rely on labelled data, reducing the dependence on the dataset but not focusing on the information in the bag; Pathore et al. [9] exchanged information about the sentences within the bag to make more use of the data in the available bag. The distant supervision method uses a known knowledge base aligned with a large text corpus to generate labelled datasets, which does not require a large amount of manually labelled data but suffers from a relatively severe noise interference problem, leading to the limited performance of RE. Therefore, the researchers propose the few-shot relation extraction (FSRE) method to solve the data scarcity problem [10].
Using a small number of samples in training models has been a research point that attracts more attention [11,12]. In the field of computer vision, the use of few-shot learning (FSL) [13,14] has shown promising results. Inspired by this, various FSL-based methods have been introduced in the task of FSRE. Prototype networks are one of the most popular algorithms. Snell et al. [15] introduce prototype networks into FSRE based on a meta-learning framework [16], and the accuracy rate is effectively improved.
The FSRE task initially uses only sentence information for model training. Gao et al. [17] and Ye et al. [18] demonstrated the usefulness by comparing them with global-level models and did not focus on information across relations in the dataset; Wang et al. [19] alleviate the relation obfuscation problem using two specific mechanisms. All [17,18,19] try to fully use the valuable information in the dataset to achieve a more informative prototype network. Recent FSRE research efforts introduce external information such as relation information (description of relation information) [20,21,22] and entity concepts [23], thus effectively improving the performance of FSRE tasks. However, two problems remain: (1) The current work is overly dependent on external information and does not pay sufficient attention to the information already available in the dataset [24]. (2) The approach in using prototype networks tends to construct prototypes by averaging the instances of each class, ignoring the interactive information between the support set and the query set instances [25].
We propose a data enhancement approach in this paper to address the above issues. On the one hand, by introducing the relation triple information, data augmentation can be applied to the information of the instances in the support set; on the other hand, by incorporating an interactive attention module, the interaction information between the instances of the support set and the query set, as well as the instances of the relation triple and the query set, can be enhanced. As shown in Figure 1, the relation triple information contains two parts, the relation’s name and the entity pairs, which we can obtain from the support set instances. Our method is as follows. First, we pass the relation triples and sentence information through the same encoder, mapping them to the same semantic space. Next, by utilizing the Euclidean distance algorithm, the interactive attention module assigns higher coefficient values to the parts where the instances in the support set and the instances in the query set, as well as the instances in the relation triple and the instances in the query set, interact intensively with each other. Finally, we directly add the support set and relation triple representations after passing through the interactive attention module.
The difference between our work and other existing results is that our approach only uses information already in the instances and does not incorporate external information. Moreover, our ideas are more suitable for application to FSRE for two reasons: (1) When training models, common approaches to data enhancement are the use of unlabeled datasets [26] and the fuller use of labelled datasets [27]. Our approach is to make full use of the dataset and enhance the accuracy of various types of prototypes by adding the relation triple, which improves the results effectively and has a better domain adaptation capability. (2) To improve performance on the test set, we incorporate the interactive attention module during model training. This module calculates the similarity between the query set instances, support set instances, and relation triple instances. We prioritize the highly interactive parts between instances to utilize the inherent information of the dataset fully.
We conducted experiments on FewRel1.0 [28] and FewRel2.0 (domain adaptation) [29] in four few-shot settings, and both of our methods yielded better boosts, which illustrates the effectiveness of our approach. Our contribution is as follows:
(1)
For the first time, we invoke relation triple information to enhance and correlate the support set’s relation and entity pair features through a data augmentation approach.
(2)
We address the issue of limited resources by incorporating an interactive attention module, which focuses on highly interactive portions among instances, thereby improving resource constraints.
(3)
Our model show improvements in both FewRel1.0 and FewRel2.0 (domain adaptation) datasets, with particularly significant results in FewRel2.0 (achieving a maximum improvement of 9.79 points). This demonstrates the strong domain adaptation capability of our model.

2. Related Work

In this section, we will introduce and explain RE and FSRE.

2.1. Relation Extraction

RE is one of the fundamental tasks of information extraction, and its purpose is to provide fundamental support for the construction of structured knowledge by identifying semantic relations between known entity pairs in an instance. Traditional RE has mainly used kernel-based approaches [30] and feature-based approaches [31] and has achieved good results. However, traditional relation extraction relies excessively on manual annotation and the ability to understand context. With the introduction of deep learning in natural language processing, several neural-network-based models were introduced for RE, such as convolutional neural networks (CNNs) [2], long short-term memory (LSTM) [32], and transformer [33], which continue to improve their effectiveness in supervised RE. The large amount of data that drives the model makes its generalization ability overly dependent on high-quality datasets. Therefore, using supervised learning requires labelling a large amount of high-quality data, and labelling a batch of RE datasets is time-consuming and laborious. To address the difficulty of labelling data, Mintz et al. [8] introduce the idea of distant supervision to RE, which typically automatically generates labelled data by using an existing knowledge base aligned to the text. Riedel et al. [34] refer to Mintz et al.’s idea as the “distant supervision hypothesis” and propose the at-least-one hypothesis to introduce multiple instances into distant supervision of data but ignore the possibility of multiple relations per entity pair. Zeng et al. [35] propose an algorithm combining PCNN with multiple instances to classify and extract the generated data, but this approach still tends to introduce excessive noise problems. The main task of the existing DSRE is to denoise and improve the correctness of the labelled dataset as much as possible [36]. However, this method still cannot solve the long-tail problem that exists in the dataset. In order to improve the performance of the RE model more effectively, the FSRE method is proposed to solve the long-tail problem and the data scarcity problem.

2.2. Few-Shot Relation Extraction

FSRE aims to train the model using limited instances as much as possible through reasonable algorithms and technical means and to accurately identify the semantic relations of entity pairs in unlabeled text. By utilizing FSL techniques, FSRE tasks have addressed the issues of limited dataset instances and long-tail problems. This approach avoids heavy reliance on a large amount of annotated data, significantly reducing the cost of building RE models. FSRE has gained considerable attention and research focus in recent years due to its strong feasibility and applicability in real-life scenarios. The focus of the task is introducing the concept of FSL [37]. It is the computer field that first uses the method of FSL, which already has many excellent algorithms for image segmentation. Zhang et al. [38] came up with the idea of representing images as distributions by learning parameterized Gaussian noise regularization; Finn et al. [39] propose a model-independent meta-learning algorithm. In the FSRE field, metric-based and optimization-based algorithms are the most common.
The metric-based approach achieves semantic relation classification of entity pairs based on comparing distances after encoding embeddings and similarity measures [40]. The parameter-based optimization approaches treat the relations between entities as individual tasks and train a multitask classifier to handle all relations in a multitask manner [41]. The current parameter-based optimization approach is inferior to the metric-based approach, and most scholars have focused their research on the metric-based approach. Koch et al. [42] propose a Siamese network based on a metric learning method, which calculates the data distance after inputting two samples into two identical networks, and follows the distance to determine whether the two instances are of the same class. Gao et al. [26] propose a neural snowball approach based on the Siamese network to classify specific new relations from an open-domain perspective using many unlabeled texts. However, because Siamese network use a comparison-by-comparison approach, they are less efficient in multiclassification tasks. Garcia [14] et al. first use graph convolutional networks (GCNs) in FSRE to predict the edge information of the query set nodes using a known subgraph of the support set. Li et al. [43] propose a graph-based model generation module that enables the support sets, query sets, and relation descriptions to generate different classification models to handle FSRE tasks in different tasks. The use of graph neural networks can enrich the relations of text semantics when the class of relations is uncertain. However, the approach used in computing the adjacency matrix is less efficient than the prototype network.
Snell et al. [15] first propose an efficient classification model, the prototype network, based on the idea that there exists a prototype for each class, which is obtained from the average of the embedding vectors of all instances of the support set, and the type to which the query set belongs is classified according to the closest distance principle between the query set and the various class prototypes. Prototype networks are the dominant direction for FSRE tasks due to their efficiency. The following are some of the more representative models in FSRE using the prototype network approach: Proto-HATT [17] improves model robustness and accelerates model convergence by designing instance-level and feature-level attention schemes; MLMAN [18] encodes the instances in the query set and support set and leverages the interaction of local and instance-level information between them to obtain richer semantic representation; REGRAB [44] incorporates global relation graphs with Bayesian meta-learning into the model; TD-proto [45] introduces relation and entity descriptions to enhance prototype networks; ConceptFERE [23] incorporates an entity multiconcept selection module to enhance key entity concept features; CTEG [46] fine-tuning by assigning pseudolabels to untagged data using the domain; HCRP [20] introduce three modules for hybrid prototype learning, contrast learning, and adaptive focus loss to improve the model; SimpleFSRE [21] simplifies the model using a direct join method of relation information; PRM [22] proposes a parameter-free prototype correction method; and MapRE [47] adds a framework that takes into account label uncertainty and label-aware semantic mapping information. However, these models have inferior domain adaptation capabilities.
Moreover, the above studies using the prototype network approach usually do not focus on the interaction information between the support set instances and the query set instances but simply average the embedding matrix of the support set instances for each relation class to obtain the class prototype, failing to exploit the interaction information between instances. Some of the models that have achieved good results also improve the performance of FSRE tasks by incorporating external information and do not take full advantage of the information in the dataset.
We, therefore, propose a data augmentation-based model FREDA for the FSRE task. Firstly, we extract the relation triple information from the support set to fully use the triple information and augment the triple information in the support set instances. Secondly, we successfully reduce the deviation between the obtained and expected relation prototypes by designing an interactive attention module that computes the instances between the support set, the query set, and the relation triple interactively, focusing on the highly interactive parts between the instances. Our method is efficient in capturing similar features between the query set and the support set to obtain more discriminative relation prototypes, and it also gives excellent results in datasets with different domains with strong domain adaptation capability.

3. Methodology

In this section, the parts of our model will be introduced to illustrate the overall framework of our model. Figure 2 shows the comprehensive framework of our model, where the relation triple information is obtained from the support set. Firstly, the support set, query set, and relation triple information go through the same sentence encoder to map to the same semantic space. Secondly, in the interactive attention module, we perform Euclidean distance similarity between the support set and the relation triple representation simultaneously using the query set representation to obtain the similarity weights, thus focusing on highly interactive instances. Then, we fuse the interactive support set and relation triple information through the prototype fusion module. Finally, we obtain the relation semantics belonging to the entity pairs with unknown relations in the query set after the classifier.

3.1. Task Definition

We followed the FSRE task setup for the experiment. According to the N-way-K-shot setting, there are four categories: 5-way-1-shot, 10-way-1-shot, 5-way-5-shot, and 10-way-5-shot. Each class is divided into a training set, a validation set, and a test set, and each dataset contains the support set S and the query set Q . The support set is used for model training, the query set is used for task evaluation, and the support set and query sets have the same relation class N. For example, the 2-way–2-shot setup in Table 1 draws two relations from the support set, two instances are extracted from each relation, and the relations of entity pairs present in the query set instances are determined based on the models trained on the support set instances.
In FSRE, each instance consists of three parts: ( x , e ,   a n d   r ) . Here, x represents the sentence of the instance, e = ( e h ,   e t ) represent the entities, where e h is the head entity and e t is the tail entity, and r represents the relation’s name. In addition, we extract the relation triple information T = ( r , e h ,   e t ) from the instance directly. It is evident that the relation triple information we extract is already present in the instances and can be directly extracted.

3.2. Sentence Encoder

We choose BERT as the encoder, as shown in Figure 3, given an instance x = w 1 , , w c , with x containing c tokens. When the instance is encoded by BERT, its corresponding embedding X = W 1 , , W c is obtained. We will select the instances to go through the encoder to form the hidden state corresponding to the start token mentioned by the head entity and the hidden state corresponding to the start token mentioned by the tail entity as the output of the encoder.
S h = S h k i R d ; i = 1 , , N , k = 1 , , K
Q h = Q h j R d ; j = 1 , , G
S t = S t k i R d ; i = 1 , , N , k = 1 , , K
Q t = Q t j R d ; j = 1 , , G
The S h and S t represent the hidden states corresponding to the start token of the head and tail entity mentions in the support set, respectively. The Q h and Q t represent the hidden states corresponding to the start token of the head and tail entity mentions in the query set, respectively. The S h k i and S t k i represent the hidden states corresponding to the start token of the head entity mentions and tail entity mentions, respectively, in the kth instance of the ith relation in the support set. The Q h j and Q t j represent the hidden states corresponding to the start token of the head entity mentions and tail entity mentions, respectively, in the jth instance of the query set.
We only utilize the hidden states corresponding to the start tokens of the two entity mentions. By concatenating these two pieces of information, we obtain the required representation for the support set, denoted as S , and the query set, denoted as Q . This effectively reduces the introduction of excessive and unnecessary parameters:
S = c a t ( S h ,   S t ) ,
Q = c a t ( Q h ,   Q t )
where S   R 2 d and Q h   R 2 d . d is the size of the contextual representation of the sentence encoder.
The relation triple information representation is obtained by encoding the relation triples, and we can obtain two types of embeddings: the [CLS] token embedding T c and the average of all token embeddings T p ~ . These embeddings are then concatenated to form the desired relation triple information representation:
T = c a t ( T c , T p ~ )
where T   R 2 d belongs to the spliced relation triple information representation. The average of all token embeddings T p ~   R d is computed as follows:
T p ~ = 1 C j = 1 C T P ( j ) .
The T P ( j ) in the equation denotes the embedding matrix of the jth token in a relation triple instance, with a fixed token length of C for each instance.

3.3. Interactive Attention Module

Due to the limited number of instances in the support set and previous works ignoring the class information hidden in the query set, they failed to extract the inter-instance features well. Inspired by Zhang et al. [48], we interact the query set instances and relation triple instances, the query set instances, and support set instances by each other. This approach focuses on capturing highly interactive parts between instances to extract relation features better. Firstly, we use the Euclidean distance algorithm to calculate the similarity α k i and β k i of the support set representation and query set representation, and the relation triple representation and query set representation. They are calculated as follows:
α k i = j = 1 G e x p ( S k i Q j ) 2 k = 1 K j = 1 G e x p ( S k i Q j ) 2 ,
β k i = j = 1 G e x p ( T k i Q j ) 2 k = 1 K j = 1 G e x p ( T k i Q j ) 2
where α k i denotes the similarity of the kth instance of the ith relation in the support set to the query set, S k i denotes the embedding information of the kth instance of the ith relation in the support set, β k i denotes the similarity of the kth instance of the ith relation in the relation triple to the query set, T k i denotes the embedding information of the kth instance of the ith relation in the relation triple, and Q j represents the embedding information of the jth instance of the query set. As shown in Figure 4, in the 2-way–2-shot setup, the information interaction between the support set and the query set, and the support set and the query set, obtain the similarity between them according to the inverse of the difference of the distances between the instances. Then, the similarity of the query set to the instances of the support set is calculated using Equation (9).
Then, we directly multiply the similarity with the corresponding support set representation and relation triple representation. For example, in the support set, when the similarity of an instance is high it means that this instance contributes more to the model test to obtain the correct result, so we give it a higher coefficient. When multiplied together, the corresponding fused support set’s relation prototype S and relation triple’s relation prototype T are obtained:
S = k = 1 K α k i × S ,
T = k = 1 K β k i × T .
The S is the embedding matrix of the support set after the encoder, and T is the embedding matrix of the relation triple after the encoder.
We have obtained the relation prototype representation of the support set and the relation prototype representation of the relation triple after the information interaction. To present the algorithm of the interaction attention module more clearly, we summarize the experimental steps of the module through the following pseudocode of Algorithm 1.
Algorithm 1 Dataset transformation steps in the Interactive Attention module
Input: support set embedding S R 2 d , query set embeddings Q R 2 d and relation triple information embedding T R 2 d .
Output: support set embedding S and query set embedding T after the interactive attention module
1:for episode n = 1 to 4 do
2:foreach  S k i S , Q j Q , T k i T  do
3:use Equation (5) to calculate the similarity α k i between each instance of the support set and the query set; use Equation (6) to calculate the similarity β k i between each instance of the relation triplet and the query set.
4:focusing on instances in the support set and relation triplet that are highly similar to the query set through Equations (7) and (8), we obtain S and T
5:end for
6:end foreach

3.4. Prototype Fusion

Inspired by the fusion method of relation information and instance prototypes propose by Liu et al. [23], we directly add the relation prototype representation S   of support set instances and the relation prototype representation T of relation triple instances to obtain the final prototype, preventing the influence of harmful parameters on the model:
P = S + T
where P denotes the final prototype representation, P R 2 d .
Our work is a multiclass classification with an N relation class. After experimenting with different loss classes, we chose cross-entropy loss as our loss function:
L = i = 1 N y i × log y ^ i
where y i denotes the value of the ith element in the actual label vector, and y ^ i denotes the value of the ith element in the model’s predicted probability vector.

4. Experiment

This section provides a detailed description of our data set, experimental setup, and an in-depth analysis of our experimental analysis and ablation experiments.

4.1. Dataset and Setup

In this work, we evaluate our model on two datasets, FewRel1.0 and FewRel2.0, as shown in Table 2. The FewRel1.0 dataset is derived from Wikipedia, which first initially processed the data by distant supervision and then addressed the noise problem present in distant supervision by crowdsourcing. It contains a total of 100 relations, each with 700 instances, for a total of 70,000 instances. In addition, each sentence has relation names, head entities, and tail entities; 64 of the 100 relations are used for training, 16 for validation, and 20 for testing. In the FewRel2.0 (Domain Adaptation) dataset, while the training set remains unchanged, the validation set and the test set are extracted from the medical database PubMed. The validation set consists of 10 relations, and the test set consists of 15 relations. Seven hundred instances accompany each relation in all sets. Examples from different domains are taken for the validation and test sets to facilitate the domain adaptation capability of the model. The training, validation, and test sets are further divided into a support set for training the model and a query set for prediction.
We experimentally investigate four RE configurations with N of {5, 10} and k of {1, 5}. In the encoder part, we choose base-uncase BERT and CP modules with a feature dimensionality of 768. The maximum length of each sentence we set to 128, and the learning rates are 1 × e 5 and 5 × e 6 are used depending on the task. All our experiments were performed on computers with Intel R Core i9 13900K/F CPU@5.8 GHz and two sheets with 24 G of video memory GeForce RTX 3090 GPU cards, and the models were implemented on the Pytorch1.12.0 framework. Because of our lack of equipment, the batch size was set to 2 in the configuration 10-way–5-shot and 4 for everything else. Our number of iterations was 30,000, the optimizer was chosen to be AdamW, and the loss was chosen to be a cross-entropy loss.

4.2. Comparative Models

We compare 14 baselines on FewRel. Depending on the encoder, we divided the baselines into three parts: a CNN-based model, a BERT-based model, and a BERT model with a post-training task. The characteristics of each baseline are specified as follows:
  • Proto-HATT (Gao et al.; AAAI, 2019) [17] introduces hybrid attention, focusing on feature extraction.
  • MLMAN (Ye et al.; ACL, 2019) [13] improves the model using a multilevel matching and aggregation prototype network.
  • BERT-PAIR (Gao et al.; EMNLP, 2019) [29] makes use of the addition of auxiliary sentences.
  • Proto-BERT (Garcia et al.; ICLR, 2020) [14] preserves important contextual information by constructing multiprototype word embeddings.
  • REGRAB (Qu et al.; ICML, 2020) [44] adds Bayesian learning methods with external global relation graphs.
  • TD-proto (Yang et al.; CIKM, 2020) [45] considers that general content words are also essential.
  • CTEG (Wang et al.; COLING, 2020) [19] adds two types of external information to enhance the information content.
  • ConceptFERE (Yang et al.; ACL, 2021) [23] optimizes entity concept information through the selection of entity concept words.
  • HCRP (Han et al.; EMNLP, 2021) [20] introduces a hybrid prototype network with task adaptation for focus loss.
  • SimpleFSRE (Liu et al.; ACL, 2022) [21] simplifies the model by making direct use of RE.
  • MTB (Soares et al.; ACL, 2019) [33] constructs a task-agnostic relation representation from entity chain text.
  • CP (Peng et al.; EMNLP, 2020) [49] proposes a pretraining framework for entity masking contrast for RE to deepen the understanding of textual context and type information.
  • MapRE (Dong et al.; EMNLP, 2021) [47] incorporates label uncertainty and a label-aware semantic mapping information framework in pretraining and fine-tuning.
  • Proto-CNN (Snell et al.; NIPS, 2017) [15] introduces the first prototype network to an FSRE model.

4.3. Experimental Results

Our model is kept consistent with the baseline model we compare in terms of the settings of the embedding and coding layers.
The experiments in the dataset FewRel1.0 are shown in Table 3, which contains both CNN-based and BERT-based encoders. Whereas the BERT-based approach partly used the original BERT, the other part is based on the original BERT with additional pretraining using Wikipedia data. The last row of the BERT-based and CP-based modules shows the results of our experiments using identical encoders. As can be seen from the table, our results are significantly improved in the test set. Moreover, our model results show a significant improvement compared to the basic models Proto-BERT and CP.
We also conduct experiments on the dataset FewRel2.0 (domain adaptation) to test the domain adaptation capability of our model, as shown in Figure 5. Our model obtains a vast improvement on FewRel2.0 (domain adaptation), which shows that our model has a strong domain adaptation capability and can be better applied to datasets with different domains.

4.4. Analysis of Results

In order to show more clearly that our model has a more fantastic boost in the CP-based encoder module, we show it in the line graph shown in Figure 6. In the figure, we can see that our model RTFRE (CP) effect is significantly better than the other models in all four few-shot settings, which shows the superiority of our model. However, there is a shortcoming in Table 3, where our model is lower than model SimpleFSRE at 10-way–5-shot in the encoder model for BERT-based, although the overall lift is higher. Our analysis suggests that our model is better suited for states with fewer K values, and the smaller experimental result of the boost from the 5-way–5-shot setting could verify our suspicions. Moreover, because of our lack of equipment, our batch was changed to 2 at the 10-way–5-shot, and the minor tuning of the parameters could also be an influence. However, our model test results have a greater boost than our baseline.
On the other hand, to demonstrate that our model yields better relation prototypes across relation classes and that our model is more aggregative when distinguishing between similar relations, we performed a visualization of relation instances on the validation set of the 5-way–1-shot setup. As shown in Figure 7, we conducted two experimental comparisons. Images a and c randomly select five relations, with the same color corresponding to the same relation between the two images. Image a visualizes the clustering of various relations in the dataset using the SimpleFSRE model, while image c visualizes the clustering of various relations in our model. These two images are used to compare the aggregation capabilities of our model and the SimpleFSRE model across different relations. Images b and d specify the extraction of five similar relations, with the same color indicating the same relation between the two images. Image b visualizes the clustering of similar relations in the dataset using the SimpleFSRE model, while image d visualizes the clustering of similar relations in our model. In the figure, we can observe the following: our model has a stronger aggregation ability than the baseline model in the five randomly selected relation classes; in the five selected similar relations, both b and d have a more vital aggregation ability, while our graph has a somewhat denser aggregation in the same relation class, indicating that our model is more capable of distinguishing similar relations.

4.5. Ablation Experiments

In this section, to analyze the various components of the model that played their part, we conducted ablation experiments on the FewRel1.0 dataset using an encoder of BERT as shown in Table 4. SimpleFSRE was the baseline of our model. Then, the relation information descriptions were removed, and the relation triple information drawn from the support set samples was added to the experiments. Finally, we added the interactive attention module, focusing on the parts of the query set that were highly interactive with each instance in the support set and relational triple information, respectively, with highly interactive instances receiving higher coefficient values. The test set results of the experiments are submitted to the Codalab website for evaluation, ensuring the fairness and credibility of the results.
Table 4 shows that the test results improve significantly in the k-way-1 setting when only the relation triple information is added but improve less in the 5-way–5-shot setting and drop in the 10-way–5-shot setting instead. This suggests that our model is more suitable in a less sample setting. When the interactive attention module is added, our model improves in all four settings compared to adding only the relation triple information but still relatively more in the K = 1 setting. The analysis of this ablation experiment shows that the addition of both the relational triad information and the interactive attention module helped our model.

5. Conclusions

The metric learning-based FSRE task aims to obtain a better projection function through embedding information in a small number of support set instances under continuous iteration of the model and to better predict the semantic relations of entity pairs in the query set instances. This relies heavily on the quality of the support set instances in the embedding space, but the support set instances by the FSRE are underutilized. Therefore, we include the relation triple information through data augmentation and an interactive attention module through the Euclidean distance algorithm, both of which are designed to improve the quality of the embedding space of the support set. In addition, the model in this paper is conducted without the use of external information and, as we know from the results of previous experiments on models with external information, the addition of external information can help improve the performance of the model, and we will try to add external information to our model in the next experiments. We will also try to apply our model to other domains such as healthcare and food, to achieve its value in reality, as our model has a strong domain adaptation capability.

Author Contributions

Conceptualization and methodology, J.L.; software, J.L.; validation, J.L., X.Q. and X.M.; formal analysis, J.L.; investigation, J.L.; resources, J.L.; data curation, X.Q.; writing—original draft preparation, J.L.; writing—review and editing, X.Q. and W.R.; visualization, X.M.; supervision, W.R.; project administration, X.Q.; funding acquisition, W.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the major science and technology special projects of Xinjiang Uygur Autonomous Region (2020A03001) and its subprogram key technology development and application demonstration of integrated food data supervision platform in Xinjiang region (2020A03001-2).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
  2. Zeng, D.; Liu, K.; Lai, S.; Zhou, G.; Zhao, J. Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers; Dublin City University and Association for Computational Linguistics: Dublin, Ireland, 2014; pp. 2335–2344. [Google Scholar]
  3. Xiao, S.; Liu, Z.; Han, W.; Zhang, J.; Shao, Y.; Lian, D.; Li, C.; Sun, H.; Deng, D.; Zhang, L.; et al. Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. In Proceedings of the ACM Web Conference, Virtual Event, Lyon, France, 25–29 April 2022; pp. 286–296. [Google Scholar]
  4. Chen, X.; Xu, J.; Xu, B. A working memory model for task-oriented dialog response generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2687–2693. [Google Scholar]
  5. Yasunaga, M.; Ren, H.; Bosselut, A.; Liang, P.; Leskovec, J. QA-GNN: Reasoning with language models and knowledge graphs for question answering. arXiv 2021, arXiv:2104.06378. [Google Scholar]
  6. Gupta, P.; Rajaram, S.; Schütze, H.; Runkler, T. Neural relation extraction within and across sentence boundaries. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6513–6520. [Google Scholar]
  7. Wu, S.; He, Y. Enriching pre-trained language model with entity information for relation classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2361–2364. [Google Scholar]
  8. Mintz, M.; Bills, S.; Snow, R.; Jurafsky, D. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009; pp. 1003–1011. [Google Scholar]
  9. Rathore, V.; Badola, K.; Singla, P. PARE: A Simple and Strong Baseline for Monolingual and Multilingual Distantly Supervised Relation Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Short Papers. Volume 2, pp. 340–354. [Google Scholar]
  10. Xiao, Y.; Jin, Y.; Hao, K. Adaptive prototypical networks with label words and joint representation learning for few-shot relation classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 1406–1417. [Google Scholar] [CrossRef] [PubMed]
  11. He, K.; Huang, Y.; Mao, R.; Gong, T.; Li, C.; Cambria, E. Virtual prompt pre-training for prototype-based few-shot relation extraction. Expert Syst. Appl. 2023, 213, 118927. [Google Scholar] [CrossRef]
  12. Cong, X.; Sheng, J.; Cui, S.; Yu, B.; Liu, T.; Wang, B. Relation-guided few-shot relational triple extraction. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 2206–2213. [Google Scholar]
  13. Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1199–1208. [Google Scholar]
  14. Garcia, V.; Bruna, J. Few-shot learning with graph neural networks. arXiv 2017, arXiv:1711.04043. [Google Scholar]
  15. Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. In Proceedings of the Advances in neural information processing systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
  16. Vanschoren, J. Meta-learning: A survey. arXiv 2018, arXiv:1810.03548. [Google Scholar]
  17. Gao, T.; Han, X.; Liu, Z.; Sun, M. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6407–6414. [Google Scholar]
  18. Ye, Z.X.; Ling, Z.H. Multi-level matching and aggregation network for few-shot relation classification. arXiv 2019, arXiv:1906.06678. [Google Scholar]
  19. Wang, Y.; Bao, J.; Liu, G.; Wu, Y.; He, X.; Zhou, B.; Zhao, T. Learning to decouple relations: Few-shot relation classification with entity-guided attention and confusion-aware training. arXiv 2020, arXiv:2010.10894. [Google Scholar]
  20. Han, J.; Cheng, B.; Lu, W. Exploring task difficulty for few-shot relation extraction. arXiv 2021, arXiv:2109.05473. [Google Scholar]
  21. Liu, Y.; Hu, J.; Wan, X.; Chang, T.H. A simple yet effective relation information guided approach for few-shot relation extraction. arXiv 2022, arXiv:2205.09536. [Google Scholar]
  22. Liu, Y.; Hu, J.; Wan, X.; Chang, T.H. Learn from relation information: Towards prototype representation rectification for few-shot relation extraction. In Findings of the Association for Computational Linguistics; NAACL: Seattle, WA, USA, 2022; pp. 1822–1831. [Google Scholar]
  23. Yang, S.; Zhang, Y.; Niu, G.; Zhao, Q.; Pu, S. Entity concept-enhanced few-shot relation extraction. arXiv 2021, arXiv:2106.02401. [Google Scholar]
  24. Yang, S.; Zhang, Y.; Niu, G.; Zhao, Q.; Pu, S. Knowledge-enhanced domain adaptation in few-shot relation classification. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 2183–2191. [Google Scholar]
  25. Zhenzhen, L.; Zhang, Y.; Nie, J.Y.; Li, D. Improving Few-Shot Relation Classification by Prototypical Representation Learning with Definition Text. In Findings of the Association for Computational Linguistics; NAACL: Seattle, WA, USA, 2022; pp. 454–464. [Google Scholar]
  26. Gao, T.; Han, X.; Xie, R.; Liu, Z.; Lin, F.; Lin, L.; Sun, M. Neural snowball for few-shot relation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 7772–7779. [Google Scholar]
  27. Dong, B.; Yao, Y.; Xie, R.; Gao, T.; Han, X.; Liu, Z.; Lin, F.; Lin, L.; Sun, M. Meta-information guided meta-learning for few-shot relation classification. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 1594–1605. [Google Scholar]
  28. Han, X.; Zhu, H.; Yu, P.; Wang, Z.; Yao, Y.; Liu, Z.; Sun, M. Fewrel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. arXiv 2018, arXiv:1810.10147. [Google Scholar]
  29. Gao, T.; Han, X.; Zhu, H.; Liu, Z.; Li, P.; Sun, M.; Zhou, J. FewRel 2.0: Towards more challenging few-shot relation classification. arXiv 2019, arXiv:1910.07124. [Google Scholar]
  30. Zelenko, D.; Aone, C.; Richardella, A. Kernel methods for relation extraction. J. Mach. Learn. Res. 2003, 3, 1083–1106. [Google Scholar]
  31. Kambhatla, N. Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, Barcelona, Spain, 21–26 July 2004; pp. 178–181. [Google Scholar]
  32. Geng, Z.; Chen, G.; Han, Y.; Lu, G.; Li, F. Semantic relation extraction using sequential and tree-structured LSTM with attention. Inf. Sci. 2020, 509, 183–192. [Google Scholar] [CrossRef]
  33. Soares, L.B.; FitzGerald, N.; Ling, J.; Kwiatkowski, T. Matching the blanks: Distributional similarity for relation learning. arXiv 2019, arXiv:1906.03158. [Google Scholar]
  34. Riedel, S.; Yao, L.; McCallum, A. Modeling relations and their mentions without labeled text. In Lecture Notes in Computer Science, Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2010, Barcelona, Spain, 20–24 September 2010; Proceedings, Part III 21; Springer: Berlin/Heidelberg, Germany, 2010; pp. 148–163. [Google Scholar]
  35. Zeng, D.; Liu, K.; Chen, Y.; Zhao, J. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1753–1762. [Google Scholar]
  36. Bhartiya, A.; Badola, K. Dis-rex: A multilingual dataset for distantly supervised relation extraction. arXiv 2021, arXiv:2104.08655. [Google Scholar]
  37. Jankowski, N.; Duch, W.; Grabczewski, K. (Eds.) Meta-Learning in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2011; pp. 97–115. [Google Scholar]
  38. Zhang, X.; Qiang, Y.; Sung, F.; Yang, Y.; Hospedales, T.M. Deep comparison: Relation columns for few-shot learning. arXiv 2018, arXiv:1811.07100. [Google Scholar]
  39. Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
  40. Fan, M.; Bai, Y.; Sun, M.; Li, P. Large margin prototypical network for few-shot relation classification with fine-grained features. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2353–2356. [Google Scholar]
  41. Obamuyide, A.; Vlachos, A. Model-agnostic meta-learning for relation classification with limited supervision. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
  42. Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese Neural Networks for One-Shot Image Recognition. In Proceedings of the ICML deep learning workshop, Lille, France, 6–11 July 2015; Volume 2. [Google Scholar]
  43. Li, W.; Qian, T. Graph-based Model Generation for Few-Shot Relation Extraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 62–71. [Google Scholar]
  44. Qu, M.; Gao, T.; Xhonneux, L.P.; Tang, J. Few-shot relation extraction via bayesian meta-learning on relation graphs. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 12–19 July 2020; pp. 7867–7876. [Google Scholar]
  45. Yang, K.; Zheng, N.; Dai, X.; He, L.; Huang, S.; Chen, J. Enhance prototypical network with text descriptions for few-shot relation classification. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 19–23 October 2020; pp. 2273–2276. [Google Scholar]
  46. Wang, Y.; Verspoor, K.; Baldwin, T. Learning from unlabelled data for clinical semantic textual similarity. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, Online, 19 November 2020; pp. 227–233. [Google Scholar]
  47. Dong, M.; Pan, C.; Luo, Z. Mapre: An effective semantic mapping approach for low-resource relation extraction. arXiv 2021, arXiv:2109.04108. [Google Scholar]
  48. Zhang, Y.; Cen, M.; Wu, T.; Zhang, H. RAPS: A Novel Few-Shot Relation Extraction Pipeline with Query-Information Guided Attention and Adaptive Prototype Fusion. arXiv 2022, arXiv:2210.08242. [Google Scholar]
  49. Peng, H.; Gao, T.; Han, X.; Lin, Y.; Li, P.; Liu, Z.; Sun, M.; Zhou, J. Learning from context or names? an empirical study on neural relation extraction. arXiv 2020, arXiv:2010.01923. [Google Scholar]
Figure 1. Extraction of the relation triple information.
Figure 1. Extraction of the relation triple information.
Applsci 13 08312 g001
Figure 2. Overall framework of the FREDA model.
Figure 2. Overall framework of the FREDA model.
Applsci 13 08312 g002
Figure 3. BERT model. An example from the support set; red represents the head entity and blue represents the tail entity.
Figure 3. BERT model. An example from the support set; red represents the head entity and blue represents the tail entity.
Applsci 13 08312 g003
Figure 4. Interaction between the support set and the query set in the 2-way–2-shot setup to obtain the similarity of the query set to the support set.
Figure 4. Interaction between the support set and the query set in the 2-way–2-shot setup to obtain the similarity of the query set to the support set.
Applsci 13 08312 g004
Figure 5. Experimental results on the accuracy of the FewRel2.0 test set.
Figure 5. Experimental results on the accuracy of the FewRel2.0 test set.
Applsci 13 08312 g005
Figure 6. Comparison plot of the BERT (CP) model with additional pretraining added, where FREDA (CP) is our model. The values in the graph are the accuracy of each model in the test set.
Figure 6. Comparison plot of the BERT (CP) model with additional pretraining added, where FREDA (CP) is our model. The values in the graph are the accuracy of each model in the test set.
Applsci 13 08312 g006
Figure 7. We choose the validation set of the 5-way–1-shot setup for our experiments, the SimpleFSRE model is chosen for the comparison baseline, and the CP was chosen for the pretraining module. In the four-figure setups, (a) is a visualization of five relation classes randomly selected from the validation set in the SimpleFSRE model, (b) is a visualization of five relation classes with similar relations from the validation set in the SimpleFSRE model, (c) is the visualization of our model with the same settings as the image (a), and (d) is the visualization of our model with the same settings as the image (b).
Figure 7. We choose the validation set of the 5-way–1-shot setup for our experiments, the SimpleFSRE model is chosen for the comparison baseline, and the CP was chosen for the pretraining module. In the four-figure setups, (a) is a visualization of five relation classes randomly selected from the validation set in the SimpleFSRE model, (b) is a visualization of five relation classes with similar relations from the validation set in the SimpleFSRE model, (c) is the visualization of our model with the same settings as the image (a), and (d) is the visualization of our model with the same settings as the image (b).
Applsci 13 08312 g007
Table 1. 2-way–2-shot. Head entity in red font and tail entity in blue font.
Table 1. 2-way–2-shot. Head entity in red font and tail entity in blue font.
Support Set
R1: country of citizenshipInstance1: Charles Gniette was a Belgian field hockey player who competed in the 1920 Summer Olympics.
Instance2: Catherine Loyola (born 1986) is a fashion model and beauty queen from the Philippines.
R2: motherInstance1: Ariston had three other children by Perictione: Glaucon, Adeimantus, and Potone.
Instance2: Dylan and Caitlin brought up their three children, Aeronwy, Llewellyn and Colm.
Query Set
R1 or R2Rugby League Live2 followed in 2012, again developed by Big Ant Studios.
Table 2. FewRel dataset.
Table 2. FewRel dataset.
DatasetSourceApplyRelation NumberInstance Number
Train6444,800
FewRel1.0WikiVal1611,200
Test2014,000
WikiTrain6444,800
FewRel2.0 Val107000
PubMedTest1510,500
Table 3. The results of FSRE accuracy experiments on the FewRel1.0 validation/test set. The top-to-bottom encoder includes the use of CNN, the original BERT, and the addition of additional pretrained BERT. Where FREDA denotes the results of our implementation and the others are from results reported in papers or CodaLab.
Table 3. The results of FSRE accuracy experiments on the FewRel1.0 validation/test set. The top-to-bottom encoder includes the use of CNN, the original BERT, and the addition of additional pretrained BERT. Where FREDA denotes the results of our implementation and the others are from results reported in papers or CodaLab.
EncoderModel5-Way–1-Shot5-Way–5-Shot10-Way–1-Shot10-Way–5-Shot
CNNProto-HATT72.65/74.5286.15/88.4060.13/62.3876.20/80.45
MLMAN75.01/__ __87.09/90.1262.48/__ __77.50/83.05
BERTBERT-PAIR85.66/88.3289.48/93.2276.84/80.6381.76/87.02
Proto-BERT84.77/89.3389.54/94.1376.85/83.4183.42/90.25
REGRAB87.95/90.3092.54/94.2580.26/84.0986.72/89.93
TD-proto__ __/84.76__ __/92.38__ __/74.32__ __/85.92
CTEG84.72/88.1192.52/95.2576.01/81.2984.89/91.33
ConceptFERE__ __/89.21__ __/90.34__ __/75.72__ __/81.82
HCRP90.90/93.7693.22/95.6684.11/89.9587.79/92.10
SimpleFSRE91.29/94.4294.05/96.3786.09/90.7389.68/93.47
FREDA92.23/94.8694.59/96.5285.99/91.6690.40/93.18
MTB__ __/91.10__ __/95.40__ __/84.30__ __/91.80
CP__ __/95.10__ __/97.10__ __/91.20__ __/94.70
MapRE__ __/95.73__ __/97.84__ __/93.18__ __/95.64
HCRP (CP)94.10/96.4296.05/97.9689.13/93.9793.10/96.46
SimpleFSRE (CP)96.21/96.6397.07/97.9393.38/94.9495.11/96.39
FREDA (CP)95.66/97.2697.31/98.3091.81/95.3295.26/96.68
+5.09+2.39+8.25+2.93
(CP) +2.16+1.20+4.12+1.98
Table 4. The results of ablation experiments.
Table 4. The results of ablation experiments.
Model5-Way-1-Shot5-Way-5-Shot10-Way-1-Shot10-Way-5-ShotAverage
SimpleFSRE91.29/94.4294.05/96.3786.09/90.7389.68/93.4793.74
+Triple92.30/94.7394.65/96.4185.38/91.3990.47/92.9493.86
FREDA92.23/94.8694.59/96.5285.99/91.6690.50/93.2394.06
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Qin, X.; Ma, X.; Ran, W. FREDA: Few-Shot Relation Extraction Based on Data Augmentation. Appl. Sci. 2023, 13, 8312. https://doi.org/10.3390/app13148312

AMA Style

Liu J, Qin X, Ma X, Ran W. FREDA: Few-Shot Relation Extraction Based on Data Augmentation. Applied Sciences. 2023; 13(14):8312. https://doi.org/10.3390/app13148312

Chicago/Turabian Style

Liu, Junbao, Xizhong Qin, Xiaoqin Ma, and Wensheng Ran. 2023. "FREDA: Few-Shot Relation Extraction Based on Data Augmentation" Applied Sciences 13, no. 14: 8312. https://doi.org/10.3390/app13148312

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop