Embedding Uncertain Temporal Knowledge Graphs

Li, Tongxin; Wang, Weiping; Li, Xiaobo; Wang, Tao; Zhou, Xin; Huang, Meigen

doi:10.3390/math11030775

Open AccessArticle

Embedding Uncertain Temporal Knowledge Graphs

by

Tongxin Li

,

Weiping Wang

,

Xiaobo Li

,

Tao Wang

^*

,

Xin Zhou

and

Meigen Huang

School of Systems Engineering, National University of Defense Technology, Changsha 410000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(3), 775; https://doi.org/10.3390/math11030775

Submission received: 30 December 2022 / Revised: 31 January 2023 / Accepted: 1 February 2023 / Published: 3 February 2023

(This article belongs to the Special Issue Advancement of Mathematical Methods in Feature Representation Learning for Artificial Intelligence, Data Mining and Robotics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Knowledge graph (KG) embedding for predicting missing relation facts in incomplete knowledge graphs (KGs) has been widely explored. In addition to the benchmark triple structural information such as head entities, tail entities, and the relations between them, there is a large amount of uncertain and temporal information, which is difficult to be exploited in KG embeddings, and there are some embedding models specifically for uncertain KGs and temporal KGs. However, these models either only utilize uncertain information or only temporal information, without integrating both kinds of information into the underlying model that utilizes triple structural information. In this paper, we propose an embedding model for uncertain temporal KGs called the confidence score, time, and ranking information embedded jointly model (CTRIEJ), which aims to preserve the uncertainty, temporal and structural information of relation facts in the embedding space. To further enhance the precision of the CTRIEJ model, we also introduce a self-adversarial negative sampling technique to generate negative samples. We use the embedding vectors obtained from our model to complete the missing relation facts and predict their corresponding confidence scores. Experiments are conducted on an uncertain temporal KG extracted from Wikidata via three tasks, i.e., confidence prediction, link prediction, and relation fact classification. The CTRIEJ model shows effectiveness in capturing uncertain and temporal knowledge by achieving promising results, and it consistently outperforms baselines on the three downstream experimental tasks.

Keywords:

uncertain temporal knowledge graph; temporal knowledge graph; knowledge graph embedding; confidence score

MSC:

68T07; 68T30

1. Introduction

KGs, which store various relation facts in the real world, are extensively applied in downstream tasks such as natural language processing [1], information retrieval [2], and knowledge question answering [3]. A relation fact (or triple) is composited of two entities (as nodes) and the relation that connects them (as the edge), which can be described as

(h, r, t)

or

(s, p, o)

[4]. Although KGs contain millions of such triples, it is known to suffer from incompleteness. This issue gives rise to the task of KG completion, which entails predicting the information missing in KGs. KG embedding, also known as knowledge representation learning, has become the mainstream method for KB completion by building the distributed representations (or vector embeddings) of entities and relations [5].

Specifically, KG embedding represents a symbolic triple

(h, r, t)

as low-dimensional, dense real-valued vectors

(h, r, t)

, each corresponding to the head entity, relation, and tail entity, respectively. Various embedding methods are currently emerging, mainly including translation-distance-based and semantic-matching-based models. TransE [6] is an original model based on translation distance and is known for its effectiveness and simplicity. In the TransE model, the sum of the head entity vector

h

and its relation vector

r

is close to its tail entity vector

t

for each relation fact, i.e.,

h + r \approx t

. TransE can effectively capture the structural and semantic information of the KG, but it cannot handle complex relations. To solve this problem, researchers have proposed multifarious models [7,8,9]. In addition, there are many embedding models based on semantic matching [10,11,12], which have achieved a high accuracy in link prediction tasks.

The above research methods are all reasoning on deterministic and static KGs without considering the uncertain and temporal information of triples, which leads to some key issues. The first is how to embed the uncertain KG. Uncertain KGs, such as ConceptNet [13] and NELL [14], associate each relation fact with a confidence score representing the likelihood of that fact to be true. During the construction of a KG, many automated methods generate noise and conflict, resulting in a certain degree of uncertainty for each triple. Embedding such uncertain knowledge can critically capture the uncertain nature of reality and provide more precise reasoning. The second is to learn information about the temporal dynamics of the relation facts in KGs. Most relation facts in KGs change over time, for example, the fact Claudio Raineri, coach, Chelsea is only true from 2000 to 2004, and ignoring such temporal information may lead to ambiguity and misunderstanding. The temporal information of relation facts also carries essential causal patterns that can assist the link prediction. To sum up, embedding the uncertain and temporal characteristics of relation facts can help KGs to perform better reasoning.

For the uncertainty of triples, uncertain KG embedding (UKGE) [15] calculates a score function based on the DistMult model and considers a probabilistic soft logic to generate confidence scores for unseen relation facts, but it does not fully exploit the structural information in the KG. Structural and uncertain knowledge embedding (SUKE) [16] employs an evaluator and a confidence generator to embed the confidence scores and structural information simultaneously, but the evaluator and the confidence generator are not combined into a unified framework, which means that the entity and relation vectors generated by the two components are not shared. Chen et al. [17] abandoned the probabilistic soft logic to generate extra training samples and leveraged a pool-based semisupervised learning model PASSLEAF to generate confidence scores for unseen relation facts. This model could partially solve the false-negative problem caused by random negative sampling, but it only considered the knowledge confidence and ignored the rich information contained in the graph structure. For embedding temporal information in KGs, a significant number of temporal KG representation learning models have recently emerged. The models TTransE [18] and HyTE [19] learned the distinct representations on each snapshot, and ATiSE [20] simplified the evolution of a temporal KG as a diachronic entity representation. Lately, most of the models apply neural networks to characterize the structural information and temporal evolution of KGs [21,22,23]. However, none of the aforementioned studies exploit both uncertainty and temporal information. Chekol et al. [24] explored Markov logic networks and probabilistic soft logic for reasoning on uncertain temporal KGs without utilizing embedding-based approaches and obtained a high computational complexity and low efficiency.

In response to the above issues, we propose the confidence score, time, and ranking information embedded jointly model CTRIEJ for the uncertain temporal KG embedding, which integrates the uncertainty, temporal information, and structural information into a unified framework. The CTRIEJ model first utilizes the sequence model to incorporate temporal information into the embedding of relations and then applies the sum of two loss functions as the objective function for training, one is the square loss function representing the confidence prediction, and the other is the pairwise ranking loss function representing structural information. When evaluating the model on multiple downstream tasks, we still employ the score function based on semantic matching for the confidence prediction and relation fact classification, and we design a score function based on translation distance and semantic matching to predict missing relation facts in the uncertain temporal KG. In addition, we adopt a self-adversarial negative sampling technique to train the model.

The main contributions of this paper can be summarized as follows:

We leverage a GRU-based sequence model to incorporate temporal information into the embedding of the relation sequence and tie in two score functions on account of semantic matching and translation distance simultaneously to characterize the confidence information and structure information for the uncertain temporal KG in a unified framework.
We exploit multiple score functions to simultaneously infer the existence of relation facts and the confidence scores of existing facts. We further adopt a self-adversarial negative sampling technique, which utilizes the embedding of current entities and relations to generate negative samples.
We evaluate our model on the Wikidata dataset wikidata_5k on three typical tasks: confidence prediction, link prediction, and relation fact classification. The results demonstrate that the performance of the CTRIEJ model is better than other benchmarks.

The rest of the paper is organized as follows. We introduce the definition of uncertain temporal KGs and then review related work in Section 2. In the following two Sections, we propose our CTRIEJ model and conduct related experiments. Finally, we draw a conclusion in Section 5.

2. Related Work

As far as we know, there is currently no embedding learning method for the uncertain temporal KG, so we introduce the related work from three aspects: deterministic KG embedding models, temporal KG embedding models, and uncertain KG embedding models. For the sake of understanding, we first define the relevant problems of the uncertain temporal KG.

2.1. Problem Definition

The relevant definitions of the uncertain temporal KG are given as follows.

Definition 1.

Temporal knowledge graph: A temporal KG can be denoted by

G = (E, R, Q)

, where E and R represent the set of entities and relations, respectively, and Q represents the set of temporal relation facts. Each relation fact

(h, r, t)

in the graph has a valid time

[T_{s}, T_{e}]

, which denotes the closed interval from

T_{s}

to

T_{e}

, with

T_{s} \leq T_{e}

and

T_{s}, T_{e} \in T

, i.e.,

f = (h, r, t, [T_{s}, T_{e}])

. We refer to f as a temporal fact.

For a temporal KG G, its snapshot at time T is the graph (the nontemporal KG):

G (T) = \{(h, r, t) | (h, r, t, [T, T]) \in G\}

.

Definition 2.

Uncertain temporal knowledge graph: An uncertain temporal KG consists of temporal relation facts with confidence scores that typically model the inherent uncertainty. We can represent a fact as

u = (f, s_{f})

, where

f = (h, r, t, [T_{s}, T_{e}])

is a temporal relation fact, and

s_{f} \in R_{[0, 1]}

is a real-valued weight assigned to f.

Example 1.

Uncertain temporal knowledge graph: the following uncertain temporal KG represents sport’s personality Claudio Raineri’s career [24]:

1.: (Claudio Raineri, bdate,1951) 1.0;
2.: (Claudio Raineri, playsFor, Palermo, [1984, 1986]) 0.5;
3.: (Claudio Raineri, coach, Napoli, [2001, 2003]) 0.6;
4.: (Claudio Raineri, coach, Chelsea, [2000, 2004]) 0.9;
5.: (Claudio Raineri, coach, Leicester, [2015, 2016]) 0.7.

Definition 3.

Uncertain temporal knowledge graph embedding: Given an uncertain temporal KG, the embedding can be expressed as a mapping function

f : h \to h \in R^{d_{E}}

,

r \to r \in R^{d_{R}}

,

t \to t \in R^{d_{E}}

,

T_{t o k e n} \to T_{t o k e n} \in R^{d_{T}}

, where

h, r,

and

t

are the vector representations of the head entity, relation, and tail entity, respectively,

T_{t o k e n}

is the vector representation of the temporal token, which is described in detail in Section 4.2,

d_{E}

,

d_{R}

, and

d_{T}

represent the dimension of the entity vector, relation vector, and temporal token vector, respectively. In this model, we make

d_{E} = d_{R} = d_{T} = d

.

2.2. Deterministic Knowledge Graph Embeddings

The deterministic KG contains a series of triples

(h, r, t)

, where

h, t \in E

,

r \in R

. The deterministic KG can be regarded as an uncertain KG with triples whose confidence scores are all one. At present, the deterministic KG embedding models can be mainly divided into three categories: tensor-decomposition-based models, translation-based distance models, and neural-network-based models.

Structured embedding (SE) [25] is one of the earlier knowledge representation methods. For a relation fact, SE projects the head and the tail entity vector into a relation vector space through its two matrices and then calculates the distance between the two projection vectors in this space. This distance reflects the semantic relevance of the two entities under the relation, and the smaller their distance is, the more likely it is that the fact triple is established. In addition, the semantic matching energy model (SME) [26] defines several projection matrices and utilizes bilinear functions to describe the internal relationship between entities and relations. Bilinear functions are also utilized in the latent factor model (LFM) [27], which proposes to employ a relation-based bilinear transformation to characterize the second-order relationship between entities and relations. The DistMult model [11] also explores a simplified form of latent factor, which sets the relation matrix as a diagonal matrix. Based on the LFM, the neural tensor network (NTN) [28] model further employs the bilinear transformation of the relation to characterize the relationship between entities and relations. In addition, some researchers have proposed to apply matrix factorization for knowledge representation learning, and the RESACL model [10] is the representative method in this regard. The basic idea of RESACL is similar to the aforementioned LFM, and the difference lies in that RESACL optimizes all positions in the tensor, including the position with a value of zero, while the LFM only optimizes the triples that exist in the KG.

Bordes et al. were inspired by the translation invariance of the semantic and syntactic relationship in the word vector space and proposed the TransE model [6], which treated the relation in the KG as a translation vector between the head and tail entity. Compared with previous models, TransE has fewer parameters and a low computational complexity, and it can directly establish complex semantic connections between entities and relations. Bordes et al. conducted evaluation tasks such as link prediction on the WordNet and Freebase data sets, and experimental results showed that the performance of TransE was significantly improved, especially on large-scale sparse KGs. However, TransE has difficulty handling one-to-many, many-to-one, and many-to-many relations. To overcome the shortcomings of TransE, TransH [7] introduces a relation hyperplane, which is based on the idea of allowing an entity to have different vector representations in different relation triples. By employing a relation-specific hyperplane, the TransH model distinguishes different roles of the same entity in different triplets. The TransR [8] model also allows entities and relations to be in different dimensional representation spaces and then maps both to the same dimension by exploring the relational-related transformation space. There are many variants based on TransE, including TransM [29], TransF [30], TransA [9], etc., and most of these algorithms were introduced to further solve the defects of TransE and improve the expressive ability of the model. There are not only translation transformations in the representation space, but also rotation transformations. The RotatE [31] model represents the relation in the KG as a rotation operation in complex space based on Euler’s formula. Through such a design, RotatE can express symmetric and antisymmetric relations, reciprocal relations, and compositional relations contemporarily, which was not available in previous models.

According to a variety of neural networks, knowledge embedding models of neural networks can generally be divided into five categories: linear/bilinear neural networks, convolutional neural networks (CNNs) [32,33,34], recurrent neural networks (RNNs) [35,36,37], graph neural networks (GNNs) [38,39,40,41], and generative adversarial networks (GANs) [42].

2.3. Temporal Knowledge Graph Embeddings

Current research in KG embedding focuses on static KGs, where relation facts do not change over time, such as the TransE model, TransH model, RESCAL model, etc., mentioned above. However, KGs are usually dynamic in practical applications, where facts evolve over time and are only valid for a specific period. Previous static KG embedding models completely ignore temporal information, which makes these methods unable to work in practical scenarios. Therefore, a significant number of temporal KG embedding models have emerged.

Know-Evolve [21] updates the embedding representation of entities subject to temporal changes by building an RNN on top of the static KG representation. TTransE [18] utilizes time information to constrain triples and models the time-predicate sequence for inference. TA-TransE and TA-DistMult [22] utilize the temporal information to constrain relation representations and construct temporal relation representations for each knowledge instance with a digital-level long short-term memory (LSTM) model. ATiSE [20] fully mines the impact of time on the evolution of entities, not only including the impact of past time but also mining the impact of future time on entities through the trend, cycle, and randomness of time series. RE-NET [23] converts time into a sequence of events with temporal information, constructs RNN-based encoding of entities in the sequence to capture the influence of their historical information, and finally leverages a relation-aware GCN to aggregate information about the entities within the same time. Chang2vec [43] splits the temporal KGs into multiple static KGs on each snapshot and employs metapath encoding for each KG to recompute the entity representation of nodes that have changed and update their embedding. CyGNet [44] exploits the historical information in KGs by designing a special replication module, while the generation module is designed to predict the knowledge that appears for the first time. xERTE [45] combines low-dimensional static vectors and temporal functions for the embedding representation of entities, not only to represent long-term properties of entities that do not change over time and the characteristics of change affected by time but the model can also visualize the paths interpretably for inference. RE-GCN [46] learns the evolutional representations of entities and relations at each timestamp by modeling the KG sequence recurrently and also incorporates the static properties of entities (such as entity types) via a static graph constraint component to obtain better entity representations.

Most of the above approaches make use of the temporal and structural information in the KG, but all assume that the triples are deterministic, and neither of them considers the confidence score of each relation fact.

2.4. Uncertain Knowledge Graph Embeddings

Some open KGs with uncertain information, such as NELL, ConceptNet, etc., add a confidence score to each triple to describe the uncertainty of this relation fact. Different KGs have different strategies for calculating confidence scores. The confidence level is obtained through the frequency of crowdsourcing annotations in ConceptNet [13], while NELL calculates the confidence value with probabilistic semantics by the EM algorithm [14].

Compared with the deterministic KG, the uncertain KG has additional triple confidence information. Recently, some research has been conducted on the representation and inference of uncertain KGs from different perspectives. GTransE [47] aims to improve the robustness of the representation model in learning noisy data. Specifically, it uses the confidence scores of triples to dynamically adjust intervals in the pairwise ranking loss, so that the higher confidence triples have larger intervals between positive and negative examples, thus making the model more focused on learning higher confidence triples.

UKGE [15] first proposed the task of learning the representation of uncertain KGs and embedding the structural information and confidence information at the same time. Specifically, it calculates the mean square error (MSE) Loss to fit the confidence scores of triples based on the energy function of DistMult. In this way, the confidence information is embedded into the distance of entities and relations, and we can employ the energy function of the triple to predict its confidence score. In addition, UKGE also introduces logic rules as prior knowledge, employs PSL probabilistic soft logic to reason about unseen facts, and applies them as training data to train to embed, thereby preserving the constraints of the rules into the embedding representation.

SUKE [16] still applies the DistMult model as an energy function and explores different logistic functions to transform the energy score into a structural information function and a confidence prediction function. The model consists of two parts: an evaluator and a confidence generator. For unseen triples, the evaluator learns the structural information and uncertain information to evaluate their plausibility and obtains a candidate set. The confidence generator then predicts corresponding confidence scores by learning the uncertain information of triples in the candidate set. However, the embedding vectors of entities and relations generated by the two components are independent of each other, which means that twice as much storage and computational space needs to be allocated.

PASSLEAF [17] argued that if we set the confidence scores of all observed triples to zero, it would cause a false-negative problem. In an uncertain KG, in addition to visible triples with confidence scores, there are more unseen triples that may also have a variety of confidence scores. The model leveraged semi-supervised learning and a sample pool to generate training samples in order to consider confidence scores of unseen triples. Moreover, multiple types of score functions were compared in the experiments of the model.

3. Confidence, Time, and Ranking Information Embedded Jointly

3.1. The Framework Overview

In this section, we propose the CTRIEJ model, which can simultaneously infer the missing relation facts and predict their confidence scores. The overall framework of the model is shown in Figure 1. It consists of three main components: a time-aware embedding model that incorporates time embedding in the relation embedding, a confidence prediction model that characterizes the uncertain information, and a pairwise ranking loss model that represents the structural information. In Section 4.2, a gate recurrent unit (GRU) is employed to process the sequence of the relation and time to obtain the relation embedding incorporating time. In Section 4.3, we describe in detail two functions based on semantic matching and translation distance, which characterize the uncertain information and structural information in the uncertain temporal KG, respectively. Finally, we combine the loss functions of the two components to form a joint embedding model and adopt a self-adversarial negative sampling technique to generate negative samples, which sample the negative triples according to the current embedding vectors. The details are in Section 4.4.

3.2. GRU for Time-Aware Embedding Sequences

Contrary to all previous approaches, we encode sequences of temporal tokens with a GRU. A GRU is a neural network architecture particularly suited for modeling sequential data. Given an uncertain temporal KG where some triples are augmented with temporal information, we can decompose a given timestamp into a sequence consisting of some of the following temporal tokens.

As shown in Figure 2, the month and the day are represented by numbers 0 to 9. In addition to these numbers, the year has an extra “-”, which is used at the beginning to indicate BC. The year usually consists of 4 digits, the month consists of 2 digits to characterize January to December, and the number of days consists of 2 digits to represent one day in a month. Hence, temporal tokens have a vocabulary size of 31. A complete timestamp should contain a start time

T_{s}

and an end time

T_{e}

, which we combine as the sequence of temporal tokens. Moreover, for each triple, we refer to the concatenation of the relation and its sequence of temporal tokens as the relation sequence

r_{s e q} = (r, T_{s_{1 y}}, T_{s_{2 y}}, T_{s_{3 y}}, T_{s_{4 y}}, T_{s_{1 m}}, T_{s_{2 m}}, T_{s_{1 d}}, T_{s_{2 d}}, T_{e_{1 y}}, T_{e_{2 y}}, T_{e_{3 y}}, T_{e_{4 y}}, T_{e_{1 m}}, T_{e_{2 m}}, T_{e_{1 d}}, T_{e_{2 d}})

with length 17, where the suffixes y, m, and d indicate whether the digit corresponds to the year, month, or day information. Now, an uncertain temporal KG can be represented as a set of quadruples of the form

(h, r_{s e q}, t, s)

, where the sequence of relation

r_{s e q}

includes the temporal information. These relation token sequences are used as input to a GRU. The equations defining a GRU are as follows:

\begin{matrix} Γ_{u} = σ (W_{u} \cdot [c_{n - 1}, x_{n}]) + b_{u} \\ Γ_{r} = σ (W_{r} \cdot [c_{n - 1}, x_{n}]) + b_{r} \\ c_{n} = Γ_{u} * (tanh (W_{c} [Γ_{r} * c_{n - 1}, x_{n}]) + b_{c}) + (1 - Γ_{u}) * c_{n - 1} \end{matrix}

(1)

where

n = 1, 2, \dots, 17

,

Γ_{u}

and

Γ_{r}

are update and reset gates, respectively,

c

is the hidden state,

σ (\cdot)

is an activation function, and

x_{n} \in R^{d}

is the embedding of the nth element of the relation token sequence

r_{s e q}

.

Each token of the input sequence

r_{s e q}

first gets its corresponding d-dimensional embedding by a random initialization, and the resulting embedding sequence is used as the input to the GRU. The relational sequence embedding is the last hidden state representation of the GRU, that is

r_{s e q} = c_{17}

. Now that we have the relational sequence embedding, which characterizes temporal information, in the next section, we combine it with the head and tail entity embedding in varied loss functions.

3.3. Incorporating Uncertain Information and Structural Information

We leverage two score functions based on semantic matching and translation distance, namely,

S {(h, r_{s e q}, t)}_{u n c e}

and

S {(h, r_{s e q}, t)}_{r a n k}

, and the corresponding loss function consists of two segments

L_{u n c e}

and

L_{r a n k}

, where

L_{u n c e}

characterizes the confidence prediction and

L_{r a n k}

models the graph structure information. The first component of the score function

S {(h, r_{s e q}, t)}_{u n c e}

can be employed to predict the confidence scores of triples, and the second one

S {(h, r_{s e q}, t)}_{r a n k}

is mainly designed to complete the missing relation facts. The MSE loss function in UKGE treats the semantic-matching-based DistMult model as its energy score function, which shows satisfactory performance, and therefore, our CTRIEJ model preserves the representation of uncertainty information through the MSE loss function. Specifically, we first obtain the energy function based on DistMult:

f = r_{s e q} \cdot (h \circ t)

(2)

where

h

and

t

represent the head and tail entity embedding of the triple,

r_{seq}

denotes the relation sequence embedding obtained by the GRU in the previous step, ∘ is the elementwise product, and · is the inner product. Then, we still leverage two different conversion functions [15] to transform energy scores into confidence scores in the range of 0 and 1:

S {(h, r_{s e q}, t)}_{u n c e}^{l o g i} = \frac{1}{1 + e^{- (w f + b)}}

(3)

S {(h, r_{s e q}, t)}_{u n c e}^{r e c t} = \min (\max (w f + b, 0), 1)

(4)

where w is a weight, b is a bias, and

r_{s e q}

is the sequence of relation tokens with time mentioned in the previous section.

S {(h, r_{s e q}, t)}_{u n c e}^{l o g i}

denotes the confidence score function transformed using the logistic function, and

S {(h, r_{s e q}, t)}_{u n c e}^{r e c t}

denotes the confidence score function transformed using the bounded rectifier.

The MSE loss function containing positive samples

D_{p o s}

and negative samples

D_{n e g}

is as follows:

L_{u n c e} = {|S {(h, r_{s e q}, t)}_{u n c e} - s|}^{2} + {|S {(h^{^{'}}, r_{s e q}^{^{'}}, t^{^{'}})}_{u n c e}|}^{2}

(5)

where

(h, r_{s e q}, t) \in D_{p o s}

is an observed fact in the data set, s is its confidence score,

(h^{^{'}}, r_{s e q}^{^{'}}, t^{^{'}}) \in D_{n e g}

is a corresponding negative sample through random negative sampling, and the function

S {(\cdot)}_{u n c e}

can be either

S {(\cdot)}_{u n c e}^{l o g i}

or

S {(\cdot)}_{u n c e}^{r e c t}

.

Then, the structural loss of the KG that employs the energy function based on TransE is calculated:

S {(h, r_{s e q}, t)}_{r a n k} = - d (h, r_{s e q}, t) = - {∥h + r_{s e q} - t∥}_{l_{1} / l_{2}}

(6)

where

{∥\cdot∥}_{l_{1} / l_{2}}

represents the

l_{1}

or

l_{2}

norm. The smaller the value of the distance function

d (h, r_{s e q}, t)

, the more likely the triple exists.

Following the TransE model, we can acquire a margin-based pairwise ranking loss function. Since the confidence level of each triplet is varied, we employ the confidence score as the weight of the ranking loss for each sample to obtain the following loss function:

L_{r a n k} = s \cdot max (γ - S {(h, r_{s e q}, t)}_{r a n k} + S {(h^{^{'}}, {r_{s e q}}^{^{'}}, t^{^{'}})}_{r a n k}, 0)

(7)

where

γ > 0

represents a margin hyperparameter. This allows us to focus more on learning those triples with higher confidence scores and cut down on the contribution of those triples with lower confidence scores. To validate the generality of our proposed framework, relatively primitive score functions are employed in both segments above. We can further explore higher performance score functions based on semantic matching and translation distance for integration into our framework in future work.

3.4. Joint Loss Function

Negative sampling has been shown to be quite effective for learning KG embeddings. The commonly applied uniform negative sampling produces poor-quality negative samples and does not contribute much to the training of the model. Utilizing GAN to generate negative samples can effectively improve the efficiency of negative sampling, but it can also enhance the complexity of the model. To improve the quality of negative sampling without introducing additional model parameters, we leverage the idea of the self-adversarial negative sampling technique proposed in the RoTATE model [29] to our proposed model by figuring the scores of negative samples on the ground of the current entity and relation embeddings. The higher the scores, the higher the weights of the negative samples, so that the contribution of high-quality negative samples to the model can be raised.

In calculating the MSE loss function, we first utilize uniform negative sampling for a visible triplet

(h, r_{s e q}, t)

to randomly generate n negative samples, and then we assign varied weights to negative samples based on the score function of the current entity and relation embeddings:

w_{u n c e} ((h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}}) |(h, r_{s e q}, t)) = \frac{exp S {(h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}})}_{u n c e}}{\sum_{j = 1}^{n} exp S {(h_{j}^{^{'}}, r_{s e q_{j}}^{^{'}}, t_{j}^{^{'}})}_{u n c e}}

(8)

where

i = 1, 2, \dots, n

, and

w_{u n c e} ((h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}}) |(h, r_{s e q}, t))

represents the weight of the ith negative sample when computing the MSE loss of the triple

(h, r_{s e q}, t)

. In this way, we acquire the MSE loss function with the self-adversarial negative sampling technique.

L_{u n c e} = {|S {(h, r_{s e q}, t)}_{u n c e} - s|}^{2} + \sum_{i = 1}^{n} w_{u n c e} ((h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}}) |(h, r_{s e q}, t)) \cdot {|S {(h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}})}_{u n c e}|}^{2}

(9)

Similarly, when computing the pairwise ranking loss function, we also employ this technique to assign different weights to negative samples and obtain the final ranking loss function.

w_{r a n k} ((h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}}) |(h, r_{s e q}, t)) = \frac{exp S {(h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}})}_{r a n k}}{\sum_{j = 1}^{n} exp S {(h_{j}^{^{'}}, r_{s e q_{j}}^{^{'}}, t_{j}^{^{'}})}_{r a n k}} = \frac{exp - d (h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}})}{\sum_{j = 1}^{n} exp - d (h_{j}^{^{'}}, r_{s e q_{j}}^{^{'}}, t_{j}^{^{'}})}

(10)

L_{r a n k} = s \cdot max (γ - S {(h, r_{s e q}, t)}_{r a n k} + \sum_{i = 1}^{n} w_{r a n k} ((h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}}) |(h, r_{s e q}, t)) \cdot S {(h_{i}^{^{'}}, r_{s e q_{i}}^{^{'}}, t_{i}^{^{'}})}_{r a n k}, 0)

(11)

Combining Equations (9) and (11), we get the final joint loss function with the self-adversarial negative sampling.

\begin{matrix} L_{j o i n t} = L_{u n c e} + L_{r a n k} \\ {∥h∥}_{2} \leq 1, {∥r∥}_{2} \leq 1, {∥t∥}_{2} \leq 1 \end{matrix}

(12)

We employ two different computational models for scoring

S {(h, r, t)}_{u n c e}

, referring to the variant using Equation (3) as CTRIEJ

_{l o g i}

and the variant using Equation (4) as CTRIEJ

_{r e c t}

.

4. Experiments

Our proposed model was evaluated on three tasks: confidence prediction, link prediction, and relation fact classification. Obtaining the confidence scores of existing facts is the goal of confidence prediction, that is, for a given relation fact, with the head and tail entity, relation, and time, the corresponding confidence score should be predicted. The link prediction task aims to forecast the missing relation facts, e.g., given the head entity, relation, and its corresponding time, the missing tail entity should be predicted. Relation fact classification is a binary classification problem. We classified relation facts in wikidata_5k into strong and weak relation facts according to a given threshold

τ

, and the facts with confidence scores above the threshold were considered strong relation facts, otherwise, they were weak relation facts.

4.1. Datasets

At present, universal uncertain temporal datasets are not available. We applied the datasets extracted from Wikidata mentioned in [24]. Wikidata contains structured temporal information obtained from various sources using open information extraction (OIE). Ref. [24] obtained over 6.3 million temporal facts from Wikidata with confidence scores for various relations including plays for (>4 million facts), educated at (>6K), member of (>23K), occupation (>4.5K), spouse (>20K), and so on. Several of the extracted datasets are similar in composition, so we chose only one of them, named wikidata_5k.

Data preprocessing. We first performed preprocessing operations on this dataset. The initial confidence scores in wikidata_5k range from 1 to 10, where 96.4% are less than or equal to 5.0 in the dataset. For normalization, we first bounded the confidence scores to

s \in [1.0, 5.0]

, and then applied the min-max normalization on s to map them into

[0.0, 1.0]

. After data preprocessing, the wikidata_5k dataset contained 2233 entities, 6 relations, and 4818 uncertain temporal relation facts with a mean confidence score of 0.269 and a variance of 0.225.

4.2. Experimental Setup

We divided the dataset into 85% for training, 7% for validation, and 8% for testing. To test if our model could correctly interpret negative links, we added the same number of negative links as existing relation facts into the test set. We used the Adam optimizer for training and the grid search method to select optimal parameters in the following set: the embedding dimension

d \in \{64, 128, 256, 512\}

of entities, relations, and time; the training batch size

b \in \{128, 256, 512, 1024\}

; the learning rate

l_{r} \in \{0.001, 0.005, 0.01\}

; and the margin value

γ \in \{1, 2, 10\}

in the ranking loss. We used the

L_{2}

-norm when computing the translation distance. Through experiments, we concluded that in the wikidata_5k dataset, the best parameters for CTRIEJ

_{l o g i}

were

\{d = 512; b = 256; l_{r} = 0.001; γ = 2\}

, and the best parameters for CTRIEJ

_{r e c t}

were

\{d = 128; b = 256; l_{r} = 0.001; γ = 2\}

. We evaluated the results of all models on the ground of setting the best parameters for each experiment.

4.3. Baselines

We considered three types of baselines in our comparison, which included the deterministic KG embedding models TransE [6] and DistMult [11], the uncertain KG embedding models UKGE

_{r e c t}

and UKGE

_{l o g i}

[15], and the temporal KG embedding models TA-TransE and TA-DistMult [22].

The deterministic KG embedding models: We chose TransE and DistMult for deterministic KG embedding models because these models have demonstrated a high performance. In wikidata_5k, we chose the high-confidence temporal relation facts from KGs for training and set a confidence score threshold $τ$ to distinguish the high-confidence temporal relation facts from the low-confidence ones. These models were only applied in the link prediction and relation fact classification tasks since they could not predict confidence scores. We used the same grid search method to choose the best hyperparameters and the same optimizer for training. The best parameters of TransE were $\{d = 128; b = 256; l_{r} = 0.001; γ = 2\}$ , and the best parameters of DistMult were $\{d = 128; b = 256; l_{r} = 0.001\}$ .
The uncertain KG embedding models: UKGE was the first model for embedding uncertain KGs, and it contains two variants, UKGE $_{l o g i}$ and UKGE $_{r e c t}$ . The hyperparameter search method and optimizer were the same as above. In wikidata_5k, the best parameters of UKGE $_{l o g i}$ were $\{d = 128; b = 512; l_{r} = 0.001; γ = 2\}$ , and the best parameters of UKGE $_{r e c t}$ were $\{d = 64; b = 256; l_{r} = 0.001; γ = 2\}$ .
The temporal KG embedding models: To incorporate temporal information, TA-TransE and TA-DistMult utilize the LSTM to learn time-aware representations of relation types which can be used in conjunction with the existing deterministic KG embedding methods. Likewise, these models are only suitable for link prediction and relation fact classification tasks. The best parameters of TA-TransE were $\{d = 128; b = 256; l_{r} = 0.001; γ = 2\}$ , and the best parameters of TA-DistMult were $\{d = 128; b = 256; l_{r} = 0.001\}$ .

4.4. Confidence Prediction

Evaluation metrics: The goal of confidence prediction is to obtain corresponding confidence scores of the existing relation facts. We acquired the confidence score for each relation fact through Equation (3) or Equation (4) and used the MSE and mean absolute error (MAE) as evaluation metrics for good or bad prediction. The smaller the MSE and MAE, the more accurate the prediction and the better the model performance.

Experimental results: The confidence prediction results are shown in Table 1. The deterministic KG representation learning model could not predict the confidence score, so we only employed the uncertain KG embedding model UKGE as the benchmark model. In general, on the wikidata_5k dataset, both of our variant models outperformed the corresponding UKGE variants, and CTRIEJ

_{r e c t}

performed best on both MSE and MAE. Compared with the best-performing benchmark model UKGE

_{r e c t}

, CTRIEJ

_{r e c t}

reduced the MSE by approximately 13.8% and the MAE by approximately 19.6%. Our proposed model outperformed UKGE on the task of confidence prediction, showing that incorporating temporal and structural information into the model could help more accurately predict confidence scores for relation facts.

4.5. Link Prediction

Evaluation metrics: The link prediction is a typical KG embedding evaluation task, i.e., predicting the missing head or tail entities based on known entities and their relations, or sometimes it means predicting the corresponding relations based on known head entities and tail entities. In the experiments of this paper, we forecast the missing tail entities through the known head entities, relations, corresponding temporal information, and uncertainty information. We obtained the plausibility ranking of each candidate tail entity via computing the score function, and then we calculated the evaluation metrics Hit@K and the average rank. Among them, Hit@K denoted the proportion of candidate tail entities ranked in the top K where the correct tail entities existed, and the average rank was the average of the ranking values for the correct tail entities. Since the confidence score of each triple varied, we followed the PASSLEAF model [17] to linearly weight Hit@K and the average rank to obtain WH@K and WMR as follows:

W H @ K = \frac{\sum_{(h, r_{s e q}, t, s) \in T_{K}} s}{\sum_{(h, r_{s e q}, t, s) \in T} s}

(13)

W M R = \frac{\sum_{(h, r_{s e q}, t, s) \in T} s \cdot r a n k_{(h, r_{s e q}, t)}}{\sum_{(h, r_{s e q}, t, s) \in T} s}

(14)

where T represents the test dataset,

T_{K}

represents the top K data in the test set, and

r a n k_{(h, r_{s e q}, t)}

represents the ranking value of the triplet

(h, r_{s e q}, t)

. We utilized the sum of the energy function through a translation distance and the confidence prediction function through semantic matching as the score function to rank the candidate tail entities. When computing WH@K and WMR with the test set, candidate tail entities may exist in both the training set and validation set, and they cannot be considered wrong. Hence, we removed the candidate tail entities that occurred in the training set and validation set to acquire the filtered WH@K and WMR. The larger the WH@K and the smaller the WMR, the better the model performance. For WH@K, we conducted experiments for

K = 2

and

K = 10

, respectively.

Experimental results: The results of WMR, WH@2, and WH@10 are reported in Table 2. It can be seen that the CTRIEJ models generally outperformed the benchmark model, CTRIEJ

_{l o g i}

performed best on WMR, and CTRIEJ

_{r e c t}

performed best on WH@2 and WH@10. The deterministic KG embedding models TransE and DistMult did not perform as well as our proposed model because they did not consider the temporal information and confidence scores. UKGE performed poorly also because it only considered the confidence scores and did not leverage the temporal information and the structural information. TA-TransE and TA-DistMult only embedded temporal information, so the performance was not as good as that of our proposed model. Overall, for the task of link prediction, our model performed the best, followed by the deterministic KG embedding models and the temporal KG embedding models, and finally the UKGE model, which also showed the importance of temporal information and structural information to the model. In this paper, we employed the sum of the energy function based on translation distance and confidence prediction function as the evaluation function, and we can explore better function fusion methods to rank the triples in the future.

4.6. Relation Fact Classification

Evaluation metrics: We set the confidence score threshold

τ = 0.3

to classify the strong and weak relations for uncertain temporal relation facts. Under this setting, 36.03% of the relation facts in wikidata_5k were considered strong relations. By fitting a function between the predicted confidence scores in the training set and their relation categories, we obtained a binary classification model that was applied to classify relation facts in the test set. We used the F-1 score and accuracy to evaluate how well the models classified.

Experimental results: The results are shown in Table 3. Overall, our two variant models outperformed the baseline models. From the perspective of F-1 scores, the results of the baseline models did not have much difference, and our two variant models greatly improved the evaluation results. Among them, CTRIEJ

_{r e c t}

had the best result, which was nearly 29.6% higher than the best-performing baseline model TA-TransE. In terms of accuracies, our model slightly outperformed the baseline models, with CTRIEJ

_{l o g i}

performing the best, outperforming the best-performing baseline model DistMult by 2.1%. In conclusion, since our model embedded confidence scores, temporal information, and structural information simultaneously, the performance was better than that of the deterministic KG embedding models, the UKGE model, and the temporal KG embedding models.

4.7. Ablation Study

To verify the effect of incorporating temporal and structural information, and adopting the self-adversarial negative sampling method in our model, we took the variant CTRIEJ

_{l o g i}

as an example and proposed its three simplified versions, called CTRIEJ

_{t -}

, CTRIEJ

_{s -}

, CTRIEJ

_{n -}

. In CTRIEJ

_{t -}

, we only kept the head entity, tail entity, relation, and corresponding confidence score of each relation fact and removed their time information. In CTRIEJ

_{s -}

, we reserved the MSE loss function for confidence prediction and removed the ranking loss function characterizing structural information. In CTRIEJ

_{n -}

, we utilized a uniform negative sampling method to obtain negative samples.

We experimentally tested the four evaluation indicators of MSE, MAE, F-1 score, and accuracy on these three models, and the results are shown in Table 4. It can be seen that the three simplified versions did not perform as well as the source model CTRIEJ

_{l o g i}

, thus verifying the effectiveness of our proposed model.

5. Conclusions and Future Work

In this paper, we proposed an embedding model, the CTRIEJ model, for uncertain temporal KGs. The model leveraged a GRU-based sequence model to incorporate temporal information into the embedding of relation sequences and then tied in semantic-matching-based and translation-distance-based energy functions to integrate the confidence scores and structure information of KGs into a unified framework. Moreover, a self-adversarial negative sampling technique was adopted to generate negative samples for training our model. The CTRIEJ model outperformed other benchmarks in three downstream tasks: confidence prediction, link prediction, and relation fact classification. In future work, we will investigate how to integrate better-performing embedding models into our framework and how to better utilize these score functions for evaluating downstream tasks. In addition, predicting the relation facts and the corresponding confidence scores that exist at future moments in uncertain temporal KGs is another topic worth investigating.

Author Contributions

Conceptualization, T.L. and W.W.; methodology, T.L.; software, T.L.; validation, T.L., T.W. and X.L.; writing—original draft preparation, X.Z.; writing—review and editing, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Natural Science Foundation of China under Grant 72101263.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

National University of Defense Technology.

Conflicts of Interest

The authors declare no conflict of interest.

References

Petroni, F.; Rocktäschel, T.; Lewis, P.; Bakhtin, A.; Wu, Y.; Miller, A.H.; Riedel, S. Language models as knowledge bases? arXiv 2019, arXiv:1909.01066. [Google Scholar]
Wang, Q.; Mao, Z.; Wang, B.; Guo, L. Knowledge graph embedding: A survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 2017, 29, 2724–2743. [Google Scholar] [CrossRef]
Wang, R.; Wang, M.; Liu, J.; Chen, W.; Cochez, M.; Decker, S. Leveraging knowledge graph embeddings for natural language question answering. In Proceedings of the International Conference on Database Systems for Advanced Applications, Chiang Mai, Thailand, 22–25 April 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 659–675. [Google Scholar]
Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Philip, S.Y. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 494–514. [Google Scholar] [CrossRef] [PubMed]
Lin, Y.; Han, X.; Xie, R.; Liu, Z.; Sun, M. Knowledge representation learning: A quantitative review. arXiv 2018, arXiv:1812.10901. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 2013, 26, 2787–2795. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Xiao, H.; Huang, M.; Hao, Y.; Zhu, X. TransA: An adaptive approach for knowledge graph embedding. arXiv 2015, arXiv:1509.05490. [Google Scholar]
Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collective learning on multi-relational data. In Proceedings of the ICML, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
Yang, B.; Yih, W.t.; He, X.; Gao, J.; Deng, L. Embedding entities and relations for learning and inference in knowledge bases. arXiv 2014, arXiv:1412.6575. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning, PMLR, New York City, NY, USA, 19–24 June 2016; pp. 2071–2080. [Google Scholar]
Speer, R.; Havasi, C. ConceptNet 5: A large semantic network for relational knowledge. In The People’s Web Meets NLP; Springer: Berlin/Heidelberg, Germany, 2013; pp. 161–176. [Google Scholar]
Mitchell, T.; Cohen, W.; Hruschka, E.; Talukdar, P.; Yang, B.; Betteridge, J.; Carlson, A.; Dalvi, B.; Gardner, M.; Kisiel, B.; et al. Never-ending learning. Commun. ACM 2018, 61, 103–115. [Google Scholar] [CrossRef]
Chen, X.; Chen, M.; Shi, W.; Sun, Y.; Zaniolo, C. Embedding uncertain knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3363–3370. [Google Scholar]
Wang, J.; Nie, K.; Chen, X.; Lei, J. SUKE: Embedding model for prediction in uncertain knowledge graph. IEEE Access 2020, 9, 3871–3879. [Google Scholar] [CrossRef]
Chen, Z.M.; Yeh, M.Y.; Kuo, T.W. PASSLEAF: A Pool-bAsed Semi-Supervised LEArning Framework for Uncertain Knowledge Graph Embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 4019–4026. [Google Scholar]
Leblay, J.; Chekol, M.W. Deriving validity time in knowledge graph. In Proceedings of the Companion Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 1771–1776. [Google Scholar]
Dasgupta, S.S.; Ray, S.N.; Talukdar, P. Hyte: Hyperplane-based temporally aware knowledge graph embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2001–2011. [Google Scholar]
Xu, C.; Nayyeri, M.; Alkhoury, F.; Yazdi, H.S.; Lehmann, J. Temporal knowledge graph embedding model based on additive time series decomposition. arXiv 2019, arXiv:1911.07893. [Google Scholar]
Trivedi, R.; Farajtabar, M.; Biswal, P.; Zha, H. Dyrep: Learning representations over dynamic graphs. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
García-Durán, A.; Dumancic, S.; Niepert, M. Learning Sequence Encoders for Temporal Knowledge Graph Completion. In Proceedings of the EMNLP, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
Jin, W.; Jiang, H.; Qu, M.; Chen, T.; Zhang, C.; Szekely, P.; Ren, X. Recurrent Event Network: Global Structure Inference over Temporal Knowledge Graph. 2019. Available online: https://www.semanticscholar.org/paper/Recurrent-Event-Network-%3A-Global-Structure-Over-Jin-Jiang/2474b36db67907dca830e2e4ddea6512e4dd2f5e (accessed on 20 December 2022).
Chekol, M.; Pirrò, G.; Schoenfisch, J.; Stuckenschmidt, H. Marrying uncertainty and time in knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Bordes, A.; Weston, J.; Collobert, R.; Bengio, Y. Learning structured embeddings of knowledge bases. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011. [Google Scholar]
Bordes, A.; Glorot, X.; Weston, J.; Bengio, Y. A semantic matching energy function for learning with multi-relational data. Mach. Learn. 2014, 94, 233–259. [Google Scholar] [CrossRef] [Green Version]
Jenatton, R.; Roux, N.; Bordes, A.; Obozinski, G.R. A latent factor model for highly multi-relational data. Adv. Neural Inf. Process. Syst. 2012, 25, 3167–3175. [Google Scholar]
Socher, R.; Chen, D.; Manning, C.D.; Ng, A. Reasoning with neural tensor networks for knowledge base completion. Adv. Neural Inf. Process. Syst. 2013, 26, 926–934. [Google Scholar]
Fan, M.; Zhou, Q.; Chang, E.; Zheng, F. Transition-based knowledge graph embedding with relational mapping properties. In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing, Phuket, Thailand, 12–14 December 2014; pp. 328–337. [Google Scholar]
Feng, J.; Huang, M.; Wang, M.; Zhou, M.; Hao, Y.; Zhu, X. Knowledge graph embedding by flexible translation. In Proceedings of the Fifteenth International Conference on the Principles of Knowledge Representation and Reasoning, Cape Town, South Africa, 25–29 April 2016. [Google Scholar]
Sun, Z.; Deng, Z.H.; Nie, J.Y.; Tang, J. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv 2019, arXiv:1902.10197. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Nguyen, D.Q.; Nguyen, T.D.; Nguyen, D.Q.; Phung, D. A novel embedding model for knowledge base completion based on convolutional neural network. arXiv 2017, arXiv:1712.02121. [Google Scholar]
Balažević, I.; Allen, C.; Hospedales, T.M. Hypernetwork knowledge graph embeddings. In Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 553–565. [Google Scholar]
Gardner, M.; Talukdar, P.; Krishnamurthy, J.; Mitchell, T. Incorporating vector space similarity in random walk inference over knowledge bases. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 397–406. [Google Scholar]
Neelakantan, A.; Roth, B.; McCallum, A. Compositional vector space models for knowledge base completion. arXiv 2015, arXiv:1504.06662. [Google Scholar]
Guo, L.; Sun, Z.; Hu, W. Learning to exploit long-term relational dependencies in knowledge graphs. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2505–2514. [Google Scholar]
Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Berg, R.v.d.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the European Semantic Web Conference, Anissaras, Greece, 3–7 June 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 593–607. [Google Scholar]
Welling, M.; Kipf, T.N. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Shang, C.; Tang, Y.; Huang, J.; Bi, J.; He, X.; Zhou, B. End-to-end structure-aware convolutional networks for knowledge base completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3060–3067. [Google Scholar]
Nathani, D.; Chauhan, J.; Sharma, C.; Kaul, M. Learning attention-based embeddings for relation prediction in knowledge graphs. arXiv 2019, arXiv:1906.01195. [Google Scholar]
Cai, L.; Wang, W.Y. KBGAN: Adversarial Learning for Knowledge Graph Embeddings. In Proceedings of the NAACL-HLT, New Orleans, LA, USA, 1–6 June 2018. [Google Scholar]
Bian, R.; Koh, Y.S.; Dobbie, G.; Divoli, A. Network embedding and change modeling in dynamic heterogeneous networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 861–864. [Google Scholar]
Zhu, C.; Chen, M.; Fan, C.; Cheng, G.; Zhang, Y. Learning from history: Modeling temporal knowledge graphs with sequential copy-generation networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 4732–4740. [Google Scholar]
Han, Z.; Chen, P.; Ma, Y.; Tresp, V. xerte: Explainable reasoning on temporal knowledge graphs for forecasting future links. arXiv 2020, arXiv:2012.15537. [Google Scholar]
Li, Z.; Jin, X.; Li, W.; Guan, S.; Guo, J.; Shen, H.; Wang, Y.; Cheng, X. Temporal knowledge graph reasoning based on evolutional representation learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 408–417. [Google Scholar]
Kertkeidkachorn, N.; Liu, X.; Ichise, R. GTransE: Generalizing translation-based model on uncertain knowledge graph embedding. In Proceedings of the Annual Conference of the Japanese Society for Artificial Intelligence, Kumamoto-ken, Japan, 9–12 June 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 170–178. [Google Scholar]

Figure 1. The overall framework of the CTRIEJ model.

Figure 2. The temporal tokens.

Table 1. MSE and MAE of relation fact confidence prediction (

\times 10^{- 2}

).

Table 1. MSE and MAE of relation fact confidence prediction (

\times 10^{- 2}

).

Dataset	wikidata_5k
Metrics	MSE	MAE
UKGE $_{l o g i}$	5.39	17.54
UKGE $_{r e c t}$	4.63	15.28
CTRIEJ $_{l o g i}$	4.38	12.35
CTRIEJ $_{r e c t}$	3.99	12.28

Table 2. Tail entity prediction.

Dataset	wikidata_5k
Metrics	WMR	WH@2	WH@10
TransE	23.46	41.47%	78.32%
DistMult	25.82	47.68%	90.65%
UKGE $_{l o g i}$	177.69	15.42%	17.90%
UKGE $_{r e c t}$	36.76	48.22%	85.51%
TA-TransE	21.37	49.64%	79.19%
TA-DistMult	18.41	41.70%	85.68%
CTRIEJ $_{l o g i}$	13.51	42.74%	88.87%
CTRIEJ $_{r e c t}$	15.41	51.57%	92.77%

Table 3. F-1 scores (%) and accuracies (%) of relation fact classification.

Dataset	wikidata_5k
Metrics	F-1	Accu
TransE	18.1	74.2
DistMult	20.8	77.8
UKGE $_{l o g i}$	19.6	75.6
UKGE $_{r e c t}$	20.1	77.5
TA-TransE	27.4	75.6
TA-DistMult	25.6	75.4
CTRIEJ $_{l o g i}$	31.7	79.4
CTRIEJ $_{r e c t}$	35.5	75.9

Table 4. MSE (

\times 10^{- 2}

), MAE (

\times 10^{- 2}

), F-1 score (%), and accuracy (%).

Table 4. MSE (

\times 10^{- 2}

), MAE (

\times 10^{- 2}

), F-1 score (%), and accuracy (%).

Dataset	wikidata_5k
Metrics	MSE	MAE	F-1	Accu
CTRIEJ $_{l o g i}$	4.38	12.35	31.7	79.4
CTRIEJ $_{t -}$	4.59	13.49	20.8	78.4
CTRIEJ $_{s -}$	4.42	13.19	18.4	78.2
CTRIEJ $_{n -}$	4.51	13.43	23.5	78.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, T.; Wang, W.; Li, X.; Wang, T.; Zhou, X.; Huang, M. Embedding Uncertain Temporal Knowledge Graphs. Mathematics 2023, 11, 775. https://doi.org/10.3390/math11030775

AMA Style

Li T, Wang W, Li X, Wang T, Zhou X, Huang M. Embedding Uncertain Temporal Knowledge Graphs. Mathematics. 2023; 11(3):775. https://doi.org/10.3390/math11030775

Chicago/Turabian Style

Li, Tongxin, Weiping Wang, Xiaobo Li, Tao Wang, Xin Zhou, and Meigen Huang. 2023. "Embedding Uncertain Temporal Knowledge Graphs" Mathematics 11, no. 3: 775. https://doi.org/10.3390/math11030775

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Embedding Uncertain Temporal Knowledge Graphs

Abstract

1. Introduction

2. Related Work

2.1. Problem Definition

2.2. Deterministic Knowledge Graph Embeddings

2.3. Temporal Knowledge Graph Embeddings

2.4. Uncertain Knowledge Graph Embeddings

3. Confidence, Time, and Ranking Information Embedded Jointly

3.1. The Framework Overview

3.2. GRU for Time-Aware Embedding Sequences

3.3. Incorporating Uncertain Information and Structural Information

3.4. Joint Loss Function

4. Experiments

4.1. Datasets

4.2. Experimental Setup

4.3. Baselines

4.4. Confidence Prediction

4.5. Link Prediction

4.6. Relation Fact Classification

4.7. Ablation Study

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI