Householder Transformation-Based Temporal Knowledge Graph Reasoning

Zhao, Xiaojuan; Li, Aiping; Jiang, Rong; Chen, Kai; Peng, Zhichao

doi:10.3390/electronics12092001

Open AccessArticle

Householder Transformation-Based Temporal Knowledge Graph Reasoning

by

Xiaojuan Zhao

^1,2,

Aiping Li

^2,*,

Rong Jiang

²

,

Kai Chen

² and

Zhichao Peng

¹

Information School, Hunan University of Humanities, Science and Technology, Loudi 417000, China

²

College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(9), 2001; https://doi.org/10.3390/electronics12092001

Submission received: 28 March 2023 / Revised: 18 April 2023 / Accepted: 21 April 2023 / Published: 26 April 2023

(This article belongs to the Special Issue Knowledge Engineering and Data Mining Volume II)

Download

Browse Figures

Versions Notes

Abstract

:

Knowledge graphs’ reasoning is of great significance for the further development of artificial intelligence and information retrieval, especially for reasoning over temporal knowledge graphs. The rotation-based method has been shown to be effective at modeling entities and relations on a knowledge graph. However, due to the lack of temporal information representation capability, existing approaches can only model partial relational patterns and they cannot handle temporal combination reasoning. In this regard, we propose HTTR: Householder Transformation-based Temporal knowledge graph Reasoning, which focuses on the characteristics of relations that evolve over time. HTTR first fuses the relation and temporal information in the knowledge graph, then uses the Householder transformation to obtain an orthogonal matrix about the fused information, and finally defines the orthogonal matrix as the rotation of the head-entity to the tail-entity and calculates the similarity between the rotated vector and the vector representation of the tail entity. In addition, we compare three methods for fusing relational and temporal information. We allow other fusion methods to replace the current one as long as the dimensionality satisfies the requirements. We show that HTTR is able to outperform state-of-the-art methods in temporal knowledge graph reasoning tasks and has the ability to learn and infer all of the four relational patterns over time: symmetric reasoning, antisymmetric reasoning, inversion reasoning, and temporal combination reasoning.

Keywords:

householder transformation; temporal knowledge graph reasoning; temporal combination reasoning

1. Introduction

Knowledge graphs (KGs) organize entities and relations in the form of fact triples denoted as

(s, r, o)

, where s and o represent the head and tail entity, respectively, and r is the relation between them, e.g., (Washington, IsCapitalOf, USA). When the facts are associated with a time point or time interval, e.g., (Barack Obama, President of, USA, 2008–2016), they form temporal knowledge graphs (TKGs). TKGs use quadruples

(s, o, r, t)

to represent valid facts, entities, and relations that will evolve over time. KG reasoning is a very important problem and has been studied extensively. It can infer new facts based on the existing facts in KGs, and complete the KGs, which are also known as KG completion (KGC). Combinatorial reasoning is the most complex and potential situation. Figure 1 gives two examples of combinatorial reasoning. Figure 1a is an example of static KG combination reasoning: given two facts, (David, Wife, Marry) and (Marry, Father, William), we expect to find the fact (David, Father–in–law, William) though reasoning. Figure 1b is an example of TKGs combination reasoning: given two facts, (David, Fall–in–love, Marry, 2013-05) and (Marry, Immigrant, France, 2014-09), for questions (David, Make–a–visit, ?, 2014-10), we expect (David, Make–a–visit, France, 2014-10) to have the highest score.

Most reasoning models are designed for static KGs; the reasoning for TKGs is not yet well understood. RotatE [1] is a typical rotation-based model, which can infer and model various relational patterns. TeRo [2] extends RotatE and applies it to TKGs, but TeRo cannot model temporal combination relations. TeRo defines the temporal evolution of an entity embedding as a rotation of the complex vector space, but does not consider that the embedding of relation will evolve over time. We believe that, in TKGs, the entity exhibits the characteristics of time evolution because the relations connected to it have the characteristics of time evolution. When the relations do not exist, the entity only presents its inherent static characteristics. Therefore, we focus on fusing the temporal information and relations in KGs. OTE [3] and HousE [4] extend the rotation to be high-dimensional, which maintains the ability to model various relations. However, OTE and HousE cannot model the TKGs, and OTE cannot guarantee that the rotation matrix is still an orthogonal matrix after the gradient update; HousE rotates each dimension separately, losing the interactive information between the embedded dimensions. We believe that, in high-dimensional space, rotating the embedded representation as a whole can maximize the retention of interactive information between dimensions.

Inspired by these models, we propose a reasoning model based on the Householder transformation for TKGs. First, we fuse the relation and temporal information in the KGs, then use the Householder transformation to obtain an orthogonal matrix about the fused information, and finally define the orthogonal matrix as the rotation of the head-entity to the tail-entity. We calculate the similarity between the rotated vector and the vector representation of the tail entity. Observing the geometric meaning of matrix multiplication, we can decompose it into two parts: scaling and rotation. The orthogonal matrix mainly changes the direction through rotation, so it can model the symmetry/antisymmetric, inverse, and combination relations. Naturally, we want to construct an orthogonal matrix that can update the gradient and contains temporal and relational information. In this way, it can not only ensure that the rotation matrix is always an orthogonal matrix, but also fuse the temporal information in the TKGs.

Our model overcomes the limitations of existing reasoning models and is able to learn and reason about various temporal relational patterns. Experiments have proven that our model has obtained the most advanced performance. In addition, we compared three methods for fusing relational and temporal information, which proved that our model has a strong generalization ability. The rest of the paper is organized as follows: Section 2 presents related work on reasoning for KGs and TKGs; Section 3 presents preliminary information on Householder transformations; Section 4 presents the HTTR model; Section 5 discusses the details of the experiment and the main conclusions; and Section 6 concludes our work.

The main contributions of this paper are as follows:

We design a Householder transformation-based reasoning model for TKGs. In contrast to the state-of-the-art static KGE model OTE, which also uses an orthogonal transformation, the orthogonal matrix obtained by the Householder transformation ensures that the orthogonality properties of the matrix can be preserved after the gradient update. Moreover, our approach is to rotate the orthogonal matrix as a whole, which better preserves the effective information. In contrast to the TKG embedding model, HTTR focuses on the features of relations that evolve over time, and thus HTTR can model the temporal combination relation reasoning. The experiments show that our model significantly outperforms the available state-of-the-art methods;
We defined four temporal relational patterns, temporal symmetric relation, temporal asymmetric relation, temporal inverse relation, and temporal combination relation, and theoretically proved that HTTR can reason over all four temporal relational patterns;
We propose three methods, $C o n c a t e n a t i o n$ , $A d d i n g$ , $M u l t i p l i c a t i o n$ , to fuse relational and temporal information, and compare the performance of these fusion methods in detail. At the same time, as long as the dimensions meet the requirements, we allow other fusion methods to replace the current ones, which shows that the model has a strong generalization ability.

2. Related Work

2.1. Static KG Embedding and Completion

Inspired by the translation invariant phenomenon of the word vector, Bordes et al. [5] define the relation as the translation from the head entity to the tail entity and propose the first translation-based model, TransE. Based on TransE, a series of translation-based methods are later proposed, such as TransH [6], TransR [7], and TransD [8]. KG embedding models based on tensor decomposition include DistMult [9], ComplEx [10], etc., which take advantage of tensor factorization to learn the embeddings of entities and relations in KGs. The former restricts the relation matrix to a diagonal matrix, while the latter further introduces complex numbers into the model. Rotation-based KG embeddings have been shown to be an effective method for modeling entities and relations, such as RotatE [1], Rotate3D [11], QuatE [12], OTE [3], HousE [4], etc. RotatE defines each relation as a rotation from the source entity to the target entity in the complex vector space, and proves that rotation is simple and effective for KGC. It can model the combination relation, but it is a static model. Rotate3D and QuatE utilize the quaternion number system, an extension of complex numbers, to embed entities as well as relations and to simulate a 3D rotation in quaternion space. OTE and HousE extend the rotation to a high-dimensional space through orthogonal transformation, and the orthogonal transformation of relation embedding maintains the ability to model various relations. However, OTE cannot guarantee that it is still an orthogonal matrix after the gradient is updated, and all of these methods do not consider the temporal information to be incorporated into the rotation.

2.2. TKG Reasoning and Link Prediction

Based on previous static methods, many TKG reasoning and link prediction methods have been recently proposed. TTransE [13] is the extended version of TransE on TKGs, which learns an independent embedding representation of each entity or relation. HyTE [14] explicitly combines the time information about the entity and relation spaces by associating each time stamp with the corresponding hyperplane. TA-TransE and TA-DistMult [15] regard the combination of r and t in each quadruple

(s, r, o, t)

as a sequence, which later is taken as an input into the LSTM network to acquire the representation of the combination. DE-SimplE [16] leverages diachronic embedding to extend static methods to TKGs by integrating temporal information to its diachronic embedding. ATISE [17] decomposes a time sequence in a trend component, a seasonal component, and a random component to simulate the temporal evolution in TKGs. TNTComplEx [18] extends static tensor factorization methods to TKGs and expresses the temporal KGE problem as a fourth-order tensor completion problem. TuckERT [19] utilizes the Tucker decomposition of a fourth-order tensor for temporal KGE, and behaves well in several temporal data sets. TeLM [20] uses a linear temporal regularizer and multivector embeddings to decompose TKGs into a fourth-order tensor. TeRo [2] regards the temporal evolution of entity embedding as a rotation in a complex space. TeRo only considers the fusion of the entity embedding representation and time information but not the relation. We believe that, in TKGs, the entity exhibits the characteristics of time evolution because the relations connected to it have the characteristics of time evolution. When the relations do not exist, the entity only presents its inherent static characteristics. RotateQVS [21] represents the temporal entities as rotations in the quaternion vector space and the relations as complex vectors in the Hamilton’s quaternion space. This proves that this method can model key relational patterns in TKGs, such as symmetry, asymmetry, and inverse, and can further capture the time-evolved relations theoretically.

3. Preliminaries

3.1. Householder Transformation

The function of Householder transformation is to perform elementary reflections on any vector

v \in R^{n}

on the hyperplane S and the vector v will obtain the vector

v^{^{'}}

after Householder transformation; as shown in Figure 2, there is a Householder transformation matrix

H

such that

H v = v^{^{'}}

,

ω

is the normal vector of S. The Householder transformation can transform any vector into multiple times of the modulus length of a coordinate vector, thereby setting other positions of this vector to zero except the coordinate position. For example, for

v = {(a_{1}, a_{2}, a_{3})}^{T}

, if we select

a_{1}

as the transformed principal axis, the coordinate vector is

e_{1} = {(1, 0, 0)}^{T}

and the vector x can be transformed into the form of

{(b_{1}, 0, 0)}^{T}

through the Householder transformation; that is, there is a Householder transformation matrix H, so that

H v = {(b_{1}, 0, 0)}^{T} = b_{1} {(1, 0, 0)}^{T}

.

Furthermore, if we have a unit vector

ω \in R^{n}

, we express

ω

in column-block mode, i.e.,

ω = {[ω_{1}, ω_{2}, \dots, ω_{n}]}^{T}

, so the Householder transformation matrix (or the elementary reflection matrix) can be defined as:

\begin{matrix} \begin{matrix} H (ω) = I - 2 ω ω^{T} = [\begin{matrix} 1 - 2 ω_{1}^{2} & - 2 ω_{1} ω_{2} & \dots & - 2 ω_{1} ω_{n} \\ - 2 ω_{2} ω_{1} & 1 - 2 ω_{2}^{2} & \dots & - 2 ω_{2} ω_{n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ - 2 ω_{n} ω_{1} & - 2 ω_{n} ω_{2} & \dots & 1 - 2 ω_{n}^{2} \end{matrix}] \end{matrix} \end{matrix}

(1)

where

ω^{T} ω = 1

, and

I

is an identity matrix.

3.2. QR Decomposition Based on Householder Transformation

The Householder transformation matrix

H

is supposed to satisfy the following properties:

(1): $H$ is a symmetric orthogonal matrix, that is, $H = H^{T} = H^{- 1}$ .
(2): For any $v \in R^{n}$ , if a vector $v^{^{'}}$ satisfying $v^{^{'}} = H v$ , the two-norm value of $v^{^{'}}$ and that of v will be equal, i.e., $| | v^{^{'}} {| |}_{2} = {| | v | |}_{2}$ .

For any matrix

A \in R^{m \times n}

, we perform a Householder transformation on each column of the matrix in turn, and we can obtain the Householder transformation-based QR decomposition

A = Q R

, where Q is an orthogonal matrix and

R \in R^{n \times n}

is the upper-triangular. See Appendix A for a detailed description of the procedure. Each column

r_{j}

of

R

can be regarded as a set of basis vectors

(e_{1}, \dots, e_{j})

, a linear representation of

1 \leq j \leq n

, namely

r_{j} = λ_{1} e_{1} + \dots + λ_{j} e_{j}, 1 \leq j \leq n .

The

Q

obtained by the QR decomposition is an orthogonal matrix.

The second property of the Householder transformation matrix corresponds to the following theorem:

Theorem 1.

Suppose

v, v^{^{'}} \in R^{n}

,

v \neq v^{^{'}}

, and

{| | v^{^{'}} | |}_{2} = {| | v | |}_{2}

, then there is a Householder reflection matrix H such that

H v = v^{^{'}}

.

From Theorem 1, we can easily deduce that, for any

v = {(v_{1}, v_{2}, \dots, v_{n})}^{T} \neq 0

, we can always find a Householder reflection matrix

H

, which makes

H v = σ e_{1}

, where

σ = - s i g n (v_{1}) {| | v | |}_{2}

and

e_{1} = {(1, 0, \dots, 0)}^{T}

. The basic idea of QR decomposition is to decompose the original target matrix into a normal orthogonal matrix

Q

and an upper triangular matrix

R

, which can simplify the matrix calculations. There are three common methods for QR decomposition: Gram-Schmidt orthogonalization, Householder transformation, and Givens rotation (See Appendix B for more details). Chen et al. [22] pointed out that the classical Gram-Schmidt method is extraordinary sensitive to the errors due to round-off.

Jone R. Rice’s [23] experimental results and Åke Björck’s [24] theoretical analysis indicate that if the matrix A to be decomposed orthogonally is ill-conditioned, using the classical Gram-Schmidt method, the computed columns of Q will lose their orthogonality soon and need to be re-orthogonalized. Householder transformation seems to require less computer memory space because R is stored by overwriting part of A, but Householder transformation algorithms are numerically more accurate and faster than the classical Gram-Schmidt algorithm [22]. Given that this requires more computations compared with the Householder transformation and that multi-step Givens rotation can be replaced by a Householder transformation, Householders have the highest efficiency.

Householder transformations can be left-looking (QR-left), as well as right-looking (QR-right). In the QR-right algorithm, once each Householder reflection matrix is constructed, it operates with A. The QR-left algorithm only applies the Householder reflection matrix to the current k-th column and only one column is calculated at a time, which can be more easily applied to the sparse situation [25]. The QR-left algorithm forms the Householder-based sparse QR decomposition.

We use torch-householder (https://github.com/toshas/torch-householder, accessed on 27 March 2023) repository to implement the Householder transformation algorithm for calculating orthogonal matrices and orthonormal frames. The torch-householder is faster, more precise, and uses less memory than any other solution for orthogonal transformations, bundled in PyTorch (this is true at least for PyTorch 1.9 and NVIDIA P100 16Gb GPU) [26].

4. Our Method

E

denotes the set of entities and

R

denotes the set of relations. Usually, we use millions of static triples

(s, r, o)

to represent KGs, where

r \in R

denotes the relation and

s, o \in E

are the head entity and the tail entity, respectively. Adding a separate time dimension to previous triples makes the KGs have temporal characteristics, thus representing millions of quadruples

(s, r, o, t) \in G^{+}

, where

G^{+}

represents the set of all positive quadruples and

t \in T

represents the correct time when a fact occurs. We define

e_{s}, e_{r}, e_{o}, e_{t} \in R^{n \times k}

as the initial embedding of the head entity, tail entity, relation, and time, respectively.

4.1. OTE

HTTR is inspired by OTE [3]. We first briefly review OTE and then describe the HTTR model in detail. OTE defines the projection from s and r to o as below:

\begin{matrix} \begin{matrix} {\tilde{e}}_{o} = f (s, r) = ϕ (e_{r}) e_{s}, \end{matrix} \end{matrix}

(2)

where

ϕ

is the Gram Schmidt process applied to the square matrix

e_{r}

. Then, the distance scoring function in OTE is defined as:

d ((s, r), o) = | | {\tilde{e}}_{o} - e_{o} | | .

(3)

4.2. HTTR Model

Distinguished from static KGs, TKGs contain temporal information, thus the representation of entities and relations for TKG embedding (TKGE) always change with time. Naturally, we consider fusing the temporal information in TKGs into the embeddings of entities and relations.

Creatively, we fuse the embedding of the relation r and the time steps t in the quadruple in a specific method (the methods of fusion are discussed in Section 5.4, and as long as the dimensions meet the requirements, we allow other fusion methods to replace the current ones), and obtain a fused embedding representation

A_{r t}

. Then, we perform a QR decomposition on

A_{r t}

based on the Householder transformation to obtain an orthogonal matrix, such as the Formula (4). During the rotation of the orthogonal matrix, the length (norm) of the vector remains unchanged, and only the direction of the vector can be changed.

A_{r t} = Q_{r t} R,

(4)

where

Q_{r t} \in R^{n \times n}

is an orthogonal matrix, which fuses the relation and temporal information, and also satisfies

Q_{r t} Q_{r t}^{T} = Q_{r t}^{T} Q_{r t} = I

.

Given a quadruple

(s, r, o, t)

, similar to OTE [3], we define the orthogonal matrix

Q_{r t}

as the rotation of the head–entity s to the tail–entity o, as shown in Formula (5):

\hat{e_{o}} = e_{s} Q_{r t},

(5)

Scoring Function

We use the function

sim (\cdot, \cdot)

to measure the similarity between the rotated complex vector

\hat{e_{o}}

and the complex vector representation of the candidate tail entity

e_{o}

. In our model, we use both cosine similarity (See Appendix C for more details) and inner product to calculate similarity, so the scoring function is calculated as follows:

f_{QR} (s, r, o, t) = sim (\hat{e_{o}}, e_{o}^{T}) = sim (e_{s} Q_{r t}, e_{o}^{T}) .

(6)

Optimization

Similar to the method used in [27], for each quadruple in the training process, we minimize the following instantaneous multi-class log-loss expressed as a cross-entropy loss function:

L (f_{QR} (i, j, k, l)) = - f_{QR} (i, j, k, l) + log (\sum_{k^{^{'}}} exp (f_{QR} (i, j, k^{^{'}}, l))) .

(7)

where

(i, j, k, l)

is a positive sample and

(i, j, k^{^{'}}, l)

is the kth negative sample. The negative samples are obtained by randomly replacing the correct tail entity in a positive sample with the wrong tail entity. This loss function is only used to train problems similar to

(s, r, ?, τ)

. For a problem of the form

(?, r, o, τ)

, it can be obtained by reasoning from the corresponding inverse sample

(o, r^{- 1}, ?, τ)

.

Regularization

Regularization includes the regularization commonly used in knowledge graph completion and temporal regularization. In this work, we follow the setting of TeLM [20] and use N3 regularization, which has been shown to help improve the performance of (T)KGE models based on tensor factorization [18,27,28,29]. In TKGs, temporal regularization is often used to impose smooth constraints on the temporal representation; that is, the representation of two adjacent timestamps should be close. There are also other approaches. For example, Singer et al. [30] adds a rotation projection when aligning the representations of adjacent timestamps. Yu et al. [31] proposes an autoregressive temporal regularization. Xu et al. [20] proposes a linear temporal regularization method that adds a bias component between adjacent temporal representations. Our framework allows for the replacement of any of these temporal regularization methods, but we only use the most commonly used temporal smoothing constraints, represented by Formula (8) [20]:

L_{t} = \sum_{i = 1}^{n_{t} - 1} {| | e_{t_{i + 1}} - e_{t_{i}} | |}_{p}^{p} .

(8)

where

n_{t}

is the time step and p = 3 since we use N3 regularization. We discuss the effect of regularization on the model performance in detail in Section 5.5, taking ICEWS14 as an example.

4.3. Reasoning across Multi-Relational Patterns

In this section, we demonstrate that our HTTR can reason across multi-relational patterns. Based on previous work [1,11], four kinds of temporal-relational patterns can be defined as follows:

Definition 1.

Temporal symmetric relation: A relation r is symmetric, if

\forall s, o \in E

,

\forall t \in T

,

r (s, o, t) \Rightarrow r (o, s, t)

.

Definition 2.

Temporal asymmetric relation: A relation r is asymmetric if

\forall s, o \in E

,

\forall t \in T

,

r (s, o, t) \Rightarrow \neg r (o, s, t)

.

Definition 3.

Temporal inverse relation: A relation

r_{1}

is the inverse of

r_{2}

if

\forall s, o \in E

,

\forall t \in T

,

r_{1} (s, o, t) \Rightarrow r_{2} (o, s, t)

Definition 4.

Temporal combination relation: If

\forall s, o, f \in E

,

\forall t_{1}, t_{2}, t_{3} \in T

,

r_{1} (s, o, t_{1}) \land r_{2} (o, f, t_{2}) \Rightarrow r_{3} (s, f, t_{3})

, then we define

r_{3} t_{3}

as the temporal combination relation of

r_{1} t_{1}

and

r_{2} t_{2}

.

Theorem 2.

HTTR can model all of the temporal-relation symmetric reasoning, asymmetric reasoning, inverse reasoning, and combination reasoning patterns for TKGs.

Proof of Theorem 2.

Since we have Formula (5), the aim of our HTTR is to make

e_{o} = e_{s} Q_{r t}

.

For temporal symmetric relation reasoning, we express Definition 1 as:

\begin{matrix} \{\begin{matrix} e_{o} = e_{s} Q_{r t} \\ e_{s} = e_{o} Q_{r t} \end{matrix} \Leftrightarrow Q_{r t}^{2} = I, \end{matrix}

(9)

where

I

is the identity matrix.

For temporal asymmetric relation reasoning, we express Definition 2 as:

\begin{matrix} \{\begin{matrix} e_{o} = e_{s} Q_{r t} \\ e_{s} = e_{o} Q_{r t} \end{matrix} \Leftrightarrow Q_{r t}^{2} \neq I . \end{matrix}

(10)

For temporal inverse relation reasoning, we express Definition 3 as:

\begin{matrix} \{\begin{matrix} e_{o} = e_{s} Q_{r_{1} t} \\ e_{s} = e_{o} Q_{r_{2} t} \end{matrix} \Leftrightarrow Q_{r_{1} t} Q_{r_{2} t} = I . \end{matrix}

(11)

For temporal combination relation reasoning, we express the basic condition in Definition 4 as:

\begin{matrix} \{\begin{matrix} e_{o} = e_{s} Q_{r_{1} t_{1}} \\ e_{f} = e_{o} Q_{r_{2} t_{2}} \end{matrix} \Leftrightarrow e_{f} = e_{s} Q_{r_{1} t_{1}} Q_{r_{2} t_{2}} . \end{matrix}

(12)

Then, if we have

e_{f} = e_{s} Q_{r_{3} t_{3}}

, we obtain

\begin{matrix} Q_{r_{3} t_{3}} = Q_{r_{1} t_{1}} Q_{r_{2} t_{2}} . \end{matrix}

We present a case based analysis of this pattern in Appendix D.

Then, we extend this pattern to multi-hop combination reasoning, e.g., for three-hop combination reasoning we have

Q_{r_{4} t_{4}} = Q_{r_{1} t_{1}} Q_{r_{2} t_{2}} Q_{r_{3} t_{3}}

, which demonstrates that our HTTR has the ability to reason in more complex scenes. □

5. Experiments

5.1. Data Sets

There are four typical TKGs commonly used in previous works: ICEWS [32], GDELT [33], YAGO [34], and WIKI [35]. The time annotations for the first two data sets are time points, such as (Barack Obama, Accuse, North Korea, 20 December 2014), while the latter two are time intervals, such as (Barack Obama, President of, USA, 2008–2016). For time intervals, some are missing the start time or the end time and some time spans are long, while others are short. Prior works such as [2,14,17,18] proposed some methods to discretize the time intervals and we tried them separately in the model verification process; different discretization methods will lead to different results. Therefore, we only evaluate HTTR on ICEWS and GDELT without considering YAGO and WIKI; we will specifically conduct some research on time interval data sets in the future, which is not covered in this work. We use two subsets of ICEWS, ICEWS14 and ICEWS05-15 [15], where the facts correspond to 2014 and 2005–2015, respectively. GDELT (Global Database of Events, Language, and Tone) is the largest, most comprehensive and highest resolution open database ever created by human society. We use a subset of GDELT [16], which corresponds to the facts from 1 April 2015 to 31 March 2016.

We adopt the same method as DE-SimplE [16] to preprocess the training and valididate and test sets to adapt to the TKG completion problem, or the interpolation problem rather than the extrapolation problem. In other words, the time of the fact to be inferred is within the known time span of the data set, not outside the time span. Table 1 shows the statistics of the three data sets. (GDELT is derived from https://github.com/BorealisAI/de-simple/tree/master/datasets/gdelt accessed on 27 March 2023, and ICEWS can be downloaded from https://github.com/soledad921/ATISE accessed on 27 March 2023).

5.2. Experimental Setup

Hyperparameter

We ran our model in Pytorch and used only one GPU. To seek and find the proper hyperparameters for our HTTR, we utilized a grid search empirically over the following hyperparameters ranges: learning rate in

{0.001, 0.05, 0.01, 0.1}

, dropout rate in

{0.0, 0.1, 0.2, 0.3, 0.4, 0.5}

, embedding dimension in

{36, 100, 196, 324, 400}

. We used

M a t r i x (\cdot)

to reshape the fusion of time and relation to obtain a matrix, and rank, which we mention in the following section, refers to the rank of the matrix, e.g.,

M a t r i x (\cdot)

turns a 400-dimensional vector into a

20 \times 20

matrix. In this case, we say that the rank is 20. Our model was optimized with Adagrad and the batch size was set to 1000 for all three data sets.

Evaluation Protocol

Our reasoning task is similar to link prediction, and its goal is to complete a time-wise fact with a missing entity under the time-wise filtered settings [2,16]. We use four standard evaluation metrics to evaluate the effectiveness of the model: Hits@1, Hits@3, and Hits@10 (the proportion of correct triples ranked in the top 1, 3, and 10) and Mean Reciprocal Ranking (MRR). For all metrics, a higher value was better. We reported on the average results of five runs for all experiments and ignored the variance, which was usually very low.

Baselines

We compared our model with several TKG reasoning methods, including TTransE [13], HyTE [14], TA-DistMult [15], DE-SimplE [16], ATiSE [17], TNTComplEx [18], TuckERT [19], TeRo [2], TeLM [20], and RotateQVS [21]. For ICEWS14 and ICEWS05-15, we reported the results of ATISE, TeRo, TuckERT, TNTComplEx, and RotateQVS from the original papers, which used the same evaluation protocol as ours, and other results for some of the baseline from [17]. For GDELT, the results of the baseline came from TuckERT. ChronoR [36] was also inspired by the success of rotation-based models in static KG completion [1,12], which considers general linear transformations on the k-dimensional real space composed of rotation and scaling, and parametrizes the transformation through relation and time. However, there is no openly available source code of ChronoR, so we cannot learn more experimental details for a detailed comparison with our model, and ChronoR was not chosen as the baseline. TeMP [37] is also a recent work on TKGs. The model performs well on ICEWS, a data set with sparse temporal information, while it has no advantage on GDELT, a data set with rich temporal information. We think this is because RGCN can better learn the representation of static KGs, and the results of ablation experiments are not provided in this paper, that is, replacing the embedding input to the temporal processing module with randomly initialized embedding, which is the same as the baseline model we selected.

5.3. Main Results

In this section, we analyze and quantitatively compare our model with previous state-of-the-art models. Table 2 shows the results of KG reasoning on the three data sets.

For our HTTR, we find that it consistently outperforms all baselines on all metrics over all data sets, except Hits@10 on the ICEWS14 data sets. Especially, as shown in Table 2, compared with the recent TuckERT work, the performance improvement of HTTR on GDELT is very significant, with an increase of 14.6 points, 11.5 points, 5.4 points, and 12.0 points on Hits@1, Hits@3, Hits@10, and MRR, respectively. The improvement of the model in ICEWS05-15 is also more significant than that of ICEWS14. Our model fuses the relation with temporal information so that the characteristic representation of the relations can be better learned and the performance of the model can be effectively improved. We believe that there are two reasons to interpret the above results. First, the ratios of the number of time steps

| T |

to the number of relations

| R |

for GDELT and ICEWS05-15 are 18.3 and 16.0, respectively, while the ratio for ICEWS14 is 1.6; the higher the ratio, the greater the impact of temporal information on the relation. Second, the time steps in GEDLT are very dense. GDELT contains a large number of global news facts in two years, which shows that the facts in GDELT are updated very quickly so GDELT has a stronger temporal property than ICEWS14 and ICEWS05-15. Comparing the two ICEWS data sets, the number of relations between them is very similar, but ICEWS05-15 has more entities than ICEWS14. The most noteworthy difference is that ICEWS05-15 has 4017 time steps, while ICEWS14 has only 365. This further demonstrates that our model can well handle TKG reasoning with rich temporal information.

TeLM and RotateQVS are two recent works on TKGs. RotateQVS’s quaternion representation (

a + b i + c j + d k

) doubles the embedding parameters of TeRo, which uses complex representation (

a + b i

), and TeLM uses two-grade multivectors, which can be written as (

M = a + b e_{1} + c e_{2} + d e_{1} e_{2}

); more details are in [20]. For a fairer comparison, the following models are extended: (i) TeLM-Small, TeLM with the embedding dimension as our HTTR; (ii) RotateQVS-Small, RotateQVS with the embedding dimension as our HTTR. We obtained the code of TeLM through open channels (https://github.com/soledad921/TeLM, accessed on 27 March 2023), changed the rank from 2000 to 200, and obtained the TeLM-Small results. The results of RotateQVS-Small are from the original paper. The rows with ^⋆ in Table 2 show the results of similar settings for the initial embedding dimension. Comparing HTTR with these results, our model has obvious advantages in all of the metrics of the three data sets. Compared with TeRo and TeRo-Large, when the dimension was simply changed from 200 to 2000, the model performance did not improve, but decreased. This shows that adding dimensions does not necessarily lead to a performance improvement. For TeLM-Small and RotateQVS-Small, the performance of the model decreased significantly after reducing the initial embedding dimension. This is the same as the conclusion obtained in our experiment, with more details provided in Section 5.6. Increasing the initial embedding dimension within a certain range can effectively enhance the model performance.

5.4. Comparative Analysis of the Fusion Methods of Time and Relation

This section mainly discusses three methods for fusing

e_{r}

and

e_{t}

to obtain

A_{r t}

as mentioned in Section 4.2. We first represent time, entities, and relations in the same embedding dimension, and then define

Φ (t)

to encode the time steps considering the importance of temporal information to the model.

Φ (\cdot)

captures the temporal dependencies between entities based on Bochner’s theorem and Mercer’s theorem [38]. For more details, refer to [38,39,40]. The embedding of time and relation can be performed through Concatenation (Formula (13)), Adding (Formula (14)), and Multiplication (Formula (15)). Here, we only explore the combination of these three ways, but other explorations can be taken further, such as some existing fusion methods used in TA-DistMult [15].

Concatenation

In the concatenation fusion mode, for each relation

r \in R

and each time step

t \in T

, we design a way of integration

r | | t

. For example, the temporal fact (Barack Obama, Make a visit, New Zealand, 2014-06-20) encodes as (Barack Obama, Make a visit

| |

2014-06-20, New Zealand). When both the initial embedding dimensions of

e_{r}

and

e_{t}

are set to 200, through concatenation, a 400-dimensional vector is obtained, and then

M a t r i x (\cdot)

turns it into a

20 \times 20

matrix. For each relation

r \in R

and each time step

t \in T

, suppose

e_{r} = {[v_{r}^{1}, v_{r}^{2}, \dots, v_{r}^{n}]}^{T}

,

e_{t} = {[v_{t}^{1}, v_{t}^{2}, \dots, v_{t}^{n}]}^{T}

, by concatenation, we obtain

e_{r} | | e_{t} = {[v_{r}^{1}, v_{r}^{2}, \dots, v_{r}^{n}, v_{t}^{1}, v_{t}^{2}, \dots, v_{t}^{n}]}^{T}

.

A_{r t} = M a t r i x (e_{r} | | e_{t})

(13)

Adding

Adding, that is point-wise addition, for this fusion method in order to obtain a

20 \times 20

matrix, the initial embedding dimension of

e_{r}

and

e_{t}

is set to 400. For each relation

r \in R

and each time step

t \in T

, suppose

e_{r} = {[v_{r}^{1}, v_{r}^{2}, \dots, v_{r}^{n}]}^{T}

,

e_{t} = {[v_{t}^{1}, v_{t}^{2}, \dots, v_{t}^{n}]}^{T}

, by adding fusion, we obtain

e_{r} + e_{t} = {[(v_{r}^{1} + v_{t}^{1}), (v_{r}^{2} + v_{t}^{2}), \dots, (v_{r}^{n} + v_{t}^{n})]}^{T}

.

A_{r t} = M a t r i x (e_{r} + e_{t})

(14)

Multiplication

For the multiplication fusion method, a time step is used as a coefficient affecting the relation embedding, and the initial embedding dimension of

e_{r}

and

e_{t}

is also set to 400 to obtain a

20 \times 20

matrix. For each relation

r \in R

and each time step

t \in T

, suppose

e_{r} = {[v_{r}^{1}, v_{r}^{2}, \dots, v_{r}^{n}]}^{T}

,

e_{t} = {[v_{t}^{1}, v_{t}^{2}, \dots, v_{t}^{n}]}^{T}

, by multiplication fusion, we obtain

e_{r} * e_{t} = {[(v_{r}^{1} * v_{t}^{1}), (v_{r}^{2} * v_{t}^{2}), \dots, (v_{r}^{n} * v_{t}^{n})]}^{T}

.

A_{r t} = M a t r i x (e_{r} * e_{t})

(15)

Comparative Experiment Results

Table 3 shows the results of the three methods for fusing relational and temporal information on three data sets. The results on all three data sets show that in HTTR, the way of Concatenation is significantly better than that of Adding and Multiplication. In fact, Concatenation is just an operation to connect features. It does not play the role of feature fusion. It allows the network to learn how to fuse features without losing information in the process. The real feature fusion happens in the Householder transformation immediately following. Through the Adding and Multiplication, new features will be obtained and these new features can reflect some characteristics of the original features, but some information about the original features will also be lost in the process. We can think of point-wise addition as a special form of concatenating, but the computation of addition is much smaller than that of concatenation, which saves parameters and computation. When computing resources are limited, Adding is the second best choice.

5.5. Ablation Study

w. HT vs. w/o HT

To verify the effect of Householder transformation, we design a variant of HTTR that removes the Householder transformation module from the model, and conduct further experiments on ICEWS14, using the validation set every 10 epochs. The model tends to converge at 150 epochs. In order to express the comparison more intuitively and clearly, Figure 3 only shows the results of the 60–150 epochs. The results show that the Householder transformation module significantly improves the performance of the model and that the improvement is basically consistent in all four metrics, indicating that HTTR can model temporal information well and is stable.

Regularization

N3 regularization has been shown to help improve the performance of tensor factorization-based (Temporal) Knowledge Graph embedding models, often referred to as embedding regularization [18,27,28]. In general, it is assumed that the embedding representations of adjacent time steps should be as close as possible. Therefore, in the work of TKGs, it is common practice to use time as a regularization term to impose smooth constraints on the temporal embedding, also commonly referred to as temporal regularization [18,20,29]. Lacroix et al. [18] studied the impact of regularization on ICEWS05-15, finding that adding embedding regularization and temporal regularization to TNTComplEx can bring an improvement of 5% on MRR. Similar to TNTComplEx and TeLM, we carefully added N3 regularization and temporal regularization to HTTR, and verified the effect of regularization on HTTR on ICEWS14 and ICEWS05-15. At the same time, we obtained the source code of TNTComplEx and TeLM on Github, set the regularization parameter to 0, and obtained the result without regularization (w/o reg.). For TNTComplEx, on ICEWS05-15, the reproduced MRR results were consistent with those reported in the original paper. From Table 4, there are two main conclusions. First, adding regularization can improve the model performance. Second is the case that our method significantly outperforms the other two methods without considering regularization. The effect of regularization on HTTR is almost negligible, while for the other two methods, removing regularization greatly reduces the performance of the model. Among them, the performance of the TNTComplEx drops by 4–9% and TeLM drops by 7–11%, so the model is prone to overfitting. This further shows that our model can model temporal information well.

5.6. Effect of Rotation Dimension

In order to verify the effect of the rotation dimension on model performance, we conducted a detailed analysis on ICEWS14 (Figure 4). We set the training epochs to 300, verified it every 60 epochs, and the other parameters were the same as the previous main experimental settings. Then, we set the rank to 6, 10, 16, 18, and 20, respectively for verification. On ICEWS14, HTTR tended to converge after 100 epochs of training, so the broken lines in the chart were relatively stable. When the rank was set to 16 and 18, the two lines were very close, so we omitted the case of rank18 from the figure. On the whole, with the increase in rank, the performance of the model improved, and the improvement amplitude of the four metrics was similar. When the rank increased from 6 to 10, the performance of the model improved. When the rank continued to increase to 16, the performance coulld still be greatly improved. However, when the rank further increased from 16 to 18 and 20, the training time improved greatly, but the improvement was not obvious. This indicates that, within a certain range, the model performance could be improved by increasing the rotation dimension.

6. Conclusions

In this work, we propose a novel TKG reasoning model HTTR. We introduce three methods to fuse relational and temporal information, then use the QR decomposition based on Householder transformation to obtain an orthogonal matrix containing relation and temporal information. We define the orthogonal matrix as the rotation of the head-entity to the tail-entity, and calculate the similarity between the rotated vector and the representation of the tail entity to discover new facts. The experimental results on reasoning show that our model can outperform all of the baselines over all of three commonly used benchmark data sets. In addition, HTTR can effectively model the temporal combination reasoning of TKGs. Our model mainly improves reasoning performance by better obtaining knowledge representation so our method can be extended to more complex temporal reasoning tasks, which will be reflected in future work.

Author Contributions

Conceptualization, X.Z. and A.L.; methodology, X.Z. and R.J.; software, K.C.; validation, X.Z. and K.C.; formal analysis, Z.P.; investigation, R.J.; resources, K.C.; data curation, K.C.; writing—original draft preparation, X.Z. and R.J.; writing—review and editing, X.Z. and R.J.; visualization, Z.P.; supervision, A.L. and X.Z.; project administration, A.L.; funding acquisition, A.L. and Z.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2022YFB3104103), the National Natural Science Foundation of China (No. 62072131), and the Hunan Provincial Natural Science Foundation of China (2021JJ30379).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Householder QR Decomposition

The QR decomposition based on the Householder transformation is equivalent to performing the Householder transformation on each column of the matrix. Let matrix

A \in R^{n \times n}

, we now show how to compute

A \to Q R

, the QR decomposition, as a sequence of the Householder transformation applied to

A

, which eventually zeroes out all elements of that matrix below the diagonal. In the first iteration, we write

A

in column-wise chunks as

A = (α_{1}, α_{2}, \dots, α_{n})

. If

α_{1} \neq 0

, the Householder transform computed from the first column of

A

is:

H_{1} α_{1} = - a_{1} e_{1}

, where

| a_{1} | = | | α_{1} | |

,

e_{1} \in R^{n}

. Then, applying this Householder transform to

A

yields:

H_{1} A = (H_{1} α_{1}, H_{1} α_{2}, \dots, H_{1} α_{n}) = (\begin{matrix} - a_{1} & * \\ 0 & A_{n - 1} \end{matrix}) .

(A1)

If

α_{1} = 0

, we proceed directly to the next step, which is equivalent to taking

H_{1} = I

,

a_{1} = 0

,

A_{2} = H_{1} A = A

. Next, we write matrix

A_{n - 1} \in R^{(n - 1) \times (n - 1)}

in column-wise chunks as

A_{n - 1} = (β_{1}, β_{2}, \dots, β_{n - 1})

; if

β_{1} \neq 0

, there is an order matrix

H_{2}^{^{'}}

of order

n - 1

such that

H_{2}^{^{'}} β_{1} = - a_{2} e_{2}

,

| a_{2} | = | | β_{1} | |

,

e_{2} \in R^{n}

, so there are:

H_{2}^{^{'}} A_{n - 1} = (H_{2}^{^{'}} β_{1}, H_{2}^{^{'}} β_{2}, \dots, H_{2}^{^{'}} β_{n - 1}) = (\begin{matrix} - a_{2} & * \\ 0 & A_{n - 2} \end{matrix}) .

(A2)

Let

H_{2} = (\begin{matrix} 1 & * \\ 0 & H_{2}^{^{'}} \end{matrix})

(A3)

then

H_{2}

is the Householder matrix of

n - 1

order, and let

H_{2} H_{1} A = (\begin{matrix} - a_{1} & * & * \\ 0 & - a_{2} & * \\ 0 & 0 & A_{n - 2} \end{matrix})

(A4)

If

β_{1} = 0

, then proceed directly to the next step.

For the

n - 2

order matrix

A_{n - 2}

, continue to perform similar transformations so we can obtain the Householder matrix

H_{1}, H_{2}, \dots, H_{n - 1}

such that:

H_{n - 1} \dots H_{2} H_{1} A = (\begin{matrix} - a_{1} & * & * & * \\ 0 & - a_{2} & * & * \\ 0 & 0 & \dots & * \\ 0 & 0 & 0 & - a_{n} \end{matrix}) = R .

(A5)

We know that

H

is a symmetric orthogonal matrix with

H = H^{T} = H^{- 1}

. If

A \in C^{n \times n}

; then H is n unitary matrix, correspondingly,

H = H^{H} = H^{- 1}

. Rearranging this, we find that:

A = H_{1} H_{2} \dots H_{n - 1} R .

(A6)

which shows that if

Q = H_{1} H_{2} \dots H_{n - 1}

, then

A = Q R

.

Among them, each column

r_{j}

of

R

can be regarded as a linear representation of a set of basis vectors

(e_{1}, \dots, e_{j})

,

1 \leq j \leq n

, namely:

r_{j} = λ_{1} e_{1} + \dots + λ_{j} e_{j}, 1 \leq j \leq n .

(A7)

Appendix B. Givens Transformation and Householder Transformation

Givens transformation, also known as Givens rotation, can convert a matrix into an upper triangle after a multi-step Givens transformation. It is a commonly used QR decomposition method. Given a vector

x = {[x_{1}, \dots, x_{i}, \dots, x_{j}, \dots, x_{n}]}^{T} \in R^{n}

,

x_{i}

and

x_{j}

are not b both 0, so there is a Givens matrix (elementary rotation matrix)

G (i, j, θ)

, such that x is rotated in the coordinate plane determined by the

(i, j)

th dimensions, thereby turning one of its components to 0; that is

G (i, j, θ) x = {[x_{1}, \dots, {x_{i}}^{^{'}}, \dots, 0, \dots, x_{n}]}^{T}

, where

{x_{i}}^{^{'}} = \sqrt{{x_{i}}^{2} + {y_{j}}^{2}}

,

θ = a r c t a n \frac{x_{j}}{x_{i}}

. Therefore, it is also called a plane rotation transformation. Each step of the Givens transformation transforms one of the components of the vector into 0, but one Householder transformation can transform several components of a vector into 0, so the multi-step Givens transformation can be replaced by a Householder transformation.

Appendix C. Similarity

Cosine similarity, also called cosine distance, measures the similarity between two vectors by calculating the cosine value of the angle between them. In Euclidean space, the inner (or dot) product, also known as the scalar product, can be intuitively defined as:

\vec{a} \cdot \vec{b} = | \vec{a} | | \vec{b} | cos θ,

(A8)

where

| \vec{x} |

represents the modulus of the vector

\vec{x}

and

θ

represents the angle between the two vectors. If the vector is normalized before calculating the similarity, such as L2 regularization, then the vector dot product after normalization is actually the calculation of the cosine of the included angle.

Appendix D. Case Based Analysis

We take case-based study to further demonstrate the capabilities of our model in reasoning about temporal combination relation patterns by visualizing and quantitatively analyzing intuitive examples from ICEWS14. Figure A1 illustrates a case of temporal combination reasoning where we mainly focus on three facts: Fact 1: (North Atlantic Treaty Organization, Host a visit, Defense/Security Ministry (Ukraine), 2014-03-18). Fact 2: (Defense/Security Ministry (Ukraine), Accuse Military (Russia), 2014-03-21). Fact 3: (North Atlantic Treaty Organization, Use conventional military force, Military (Russia), 2014-04-04). Intuitively, this case of combination reasoning can be formulated as follows: given Fact 1 and Fact 2, we try to reason and obtain Fact 3. Based on Theorem 2, if our model can model the combination reasoning, we have

Q_{r_{3} t_{3}} = Q_{r_{1} t_{1}} Q_{r_{2} t_{2}}

. We present a visualization in Figure A2, which shows the absolute difference matrix between

Q_{r_{1} t_{1}} Q_{r_{2} t_{2}}

and

Q_{r_{3} t_{3}}

in this case. We can find that the absolute difference between two matrices is very small, which demonstrates that our model is effective for modeling combination reasoning.

Figure A1. A case of temporal combination reasoning.

Figure A2. Visualization of the absolute difference matrix between

Q_{r_{1} t_{1}} Q_{} r_{2} t_{2}

and

Q_{r_{3} t_{3}}

. In our case, the darker the color, the smaller the difference.

Figure A2. Visualization of the absolute difference matrix between

Q_{r_{1} t_{1}} Q_{} r_{2} t_{2}

and

Q_{r_{3} t_{3}}

. In our case, the darker the color, the smaller the difference.

References

Sun, Z.; Deng, Z.; Nie, J.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Xu, C.; Nayyeri, M.; Alkhoury, F.; Yazdi, H.S.; Lehmann, J. TeRo: A Time-aware Knowledge Graph Embedding via Temporal Rotation. In Proceedings of the COLING 2020, Barcelona, Spain, 8–13 December 2020; pp. 1583–1593. [Google Scholar]
Tang, Y.; Huang, J.; Wang, G.; He, X.; Zhou, B. Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 2713–2722. [Google Scholar]
Li, R.; Zhao, J.; Li, C.; He, D.; Wang, Y.; Liu, Y.; Sun, H.; Wang, S.; Deng, W.; Shen, Y.; et al. HousE: Knowledge Graph Embedding with Householder Parameterization. In Proceedings of the ICML, Baltimore, MA, USA, 17–23 July 2022; Volume 162, pp. 13209–13224. [Google Scholar]
Bordes, A.; Usunier, N.; Garciaduran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. Adv. Neural Inf. Process. Syst. 2013, 26, 2787–2795. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the AAAI, Quebec City, QC, Canada, 27–31 July 2014; pp. 1112–1119. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the AAAI 2015, Austin, TX, USA, 25–30 January 2015; pp. 2181–2187. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the ACL 2015, Beijing, China, 26–31 July; pp. 687–696.
Yang, B.; Yih, W.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv 2015, arXiv:1412.6575. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, E.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning (ICML 2016), New York, NY, USA, 19–24 June 2016; pp. 2071–2080. [Google Scholar]
Gao, C.; Sun, C.; Shan, L.; Lin, L.; Wang, M. Rotate3D: Representing Relations as Rotations in Three-Dimensional Space for Knowledge Graph Embedding. In Proceedings of the CIKM, Online, 19–23 October 2020; pp. 385–394. [Google Scholar]
Zhang, S.; Tay, Y.; Yao, L.; Liu, Q. Quaternion Knowledge Graph Embeddings. In Proceedings of the NeurIPS 2019, Vancouver, BC, USA, 8–14 December 2019; pp. 2731–2741. [Google Scholar]
Leblay, J.; Chekol, M.W. Deriving Validity Time in Knowledge Graph. In Proceedings of the WWW ’18: The Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 1771–1776. [Google Scholar]
Dasgupta, S.S.; Ray, S.N.; Talukdar, P.P. HyTE: Hyperplane-Based Temporally Aware Knowledge Graph Embedding; Association for Computational Linguistics: Cedarville, OH, USA, 2018; pp. 2001–2011. [Google Scholar]
García-Durán, A.; Dumancic, S.; Niepert, M. Learning Sequence Encoders for Temporal Knowledge Graph Completion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4816–4821. [Google Scholar]
Goel, R.; Kazemi, S.M.; Brubaker, M.; Poupart, P. Diachronic Embedding for Temporal Knowledge Graph Completion. In Proceedings of the AAAI 2020, New York, NY, USA, 7–12 February 2020; pp. 3988–3995. [Google Scholar]
Xu, C.; Nayyeri, M.; Alkhoury, F.; Yazdi, H.S.; Lehmann, J. Temporal Knowledge Graph Completion Based on Time Series Gaussian Embedding. In Proceedings of the ISWC 2020, Online, 14–17 September 2020; pp. 654–671. [Google Scholar]
Lacroix, T.; Obozinski, G.; Usunier, N. Tensor Decompositions for Temporal Knowledge Base Completion. In Proceedings of the ICLR, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Shao, P.; Yang, G.; Zhang, D.; Tao, J.; Che, F.; Liu, T. Tucker decomposition-based Temporal Knowledge Graph Completion. Knowl.-Based Syst. 2022, 238, 107841. [Google Scholar] [CrossRef]
Xu, C.; Chen, Y.; Nayyeri, M.; Lehmann, J. Temporal Knowledge Graph Completion using a Linear Temporal Regularizer and Multivector Embeddings. In Proceedings of the NAACL-HLT 2021, Online, 6–11 June 2021; pp. 2569–2578. [Google Scholar]
Chen, K.; Wang, Y.; Li, Y.; Li, A. RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion. arXiv 2022, arXiv:2203.07993. [Google Scholar]
Chen, S.; Billings, S.A.; Luo, W. Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control. 1989, 50, 1873–1896. [Google Scholar] [CrossRef]
Rice, J. Experiments on Gram-Schmidt orthogonalization. Math. Comput. 1966, 20, 325–328. [Google Scholar] [CrossRef]
Bjrck, K. Solving least squares problems by Gram–Schmidt orthogonalization. Bit Numer. Math. 1967, 7, 1–21. [Google Scholar] [CrossRef]
Davis, T.A. Direct Methods for Sparse Linear Systems||5. Orthogonal Methods. 2006, pp. 69–82. Available online: https://epubs.siam.org/doi/abs/10.1137/1.9780898718881.ch5 (accessed on 27 March 2023).
Obukhov, A. Efficient Householder Transformation in PyTorch. 2021. Available online: https://zenodo.org/record/5068733#.ZEX6085BxPY (accessed on 27 March 2023).
Lacroix, T.; Usunier, N.; Obozinski, G. Canonical Tensor Decomposition for Knowledge Base Completion. PMLR 2018, 80, 2869–2878. [Google Scholar]
Xu, C.; Nayyeri, M.; Chen, Y.; Lehmann, J. Knowledge Graph Embeddings in Geometric Algebras. In Proceedings of the COLING 2020, Online, 8–13 December 2020; pp. 530–544. [Google Scholar]
Jain, P.; Rathi, S.; Mausam; Chakrabarti, S. Temporal Knowledge Base Completion: New Algorithms and Evaluation Protocols. In Proceedings of the EMNLP 2020, Virtual, 16–20 November 2020; pp. 3733–3747. [Google Scholar]
Singer, U.; Guy, I.; Radinsky, K. Node Embedding over Temporal Graphs. In Proceedings of the IJCAI 2019, Macao, China, 10–16 August 2019; pp. 4605–4612. [Google Scholar]
Yu, H.; Rao, N.; Dhillon, I.S. Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction. In Proceedings of the NIPS 2016, Barcelona, Spain, 5–10 December 2016; pp. 847–855. [Google Scholar]
Boschee, E.; Lautenschlager, J.; O’Brien, S.; Shellman, S.; Starz, J.; Ward, M. Icews coded event data. In Proceedings of the Harvard Dataverse 12, Cambridge, MA, USA; 2015. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28075 (accessed on 27 March 2023).
Leetaru, K.; Schrodt, P.A. GDELT: Global data on events, location, and tone. In Proceedings of the ISA Annual Convention, San Francisco, CA, USA, 3–6 April 2013. [Google Scholar]
Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the WWW’07: 16th International World Wide Web Conference, Banff, AB, Canada, 8–12 May 2007; pp. 697–706. [Google Scholar]
Jens, L.; Robert, I.; Max, J.; Anja, J.; Dimitris, K.; Pablo, N.M.; Sebastian, H.; Mohamed, M.; Patrick, v.K.; Sören, A.; et al. DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 2015, 6, 167–195. [Google Scholar]
Sadeghian, A.; Armandpour, M.; Colas, A.; Wang, D.Z. ChronoR: Rotation Based Temporal Knowledge Graph Embedding. In Proceedings of the AAAI 2021, Virtual, 2–9 February 2021; pp. 6471–6479. [Google Scholar]
Wu, J.; Cao, M.; Cheung, J.C.K.; Hamilton, W.L. TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion. In Proceedings of the EMNLP 2020, Online, 16–20 November 2020; pp. 5730–5746. [Google Scholar]
Xu, D.; Ruan, C.; Körpeoglu, E.; Kumar, S.; Achan, K. Self-attention with Functional Time Representation Learning. In Proceedings of the NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 15889–15899. [Google Scholar]
Xu, D.; Ruan, C.; Körpeoglu, E.; Kumar, S.; Achan, K. Inductive representation learning on temporal graphs. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Han, Z.; Chen, P.; Ma, Y.; Tresp, V. Explainable Subgraph Reasoning for Forecasting on Temporal Knowledge Graphs. In Proceedings of the 9th International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]

Figure 1. Examples of combination reasoning.

Figure 2. Schematic diagram of Householder transformation. S is a hyperplane,

ω

is the normal vector of S, v is the vector to be transformed,

v^{^{'}}

is the vector of v after a Householder transformation, and the vector y is parallel to

ω

. v can be regarded as a sum of x and y, and

v^{^{'}}

can be regarded as a sum of x and

- y

.

Figure 2. Schematic diagram of Householder transformation. S is a hyperplane,

ω

is the normal vector of S, v is the vector to be transformed,

v^{^{'}}

is the vector of v after a Householder transformation, and the vector y is parallel to

ω

. v can be regarded as a sum of x and y, and

v^{^{'}}

can be regarded as a sum of x and

- y

.

Figure 3. Results on ICEWS14 (w. HT: with Householder transformation, w/o HT: without Householder transformation).

Figure 4. Effect of rotation dimension (Take ICEWS14 as an example).

Table 1. Data set Statistics.

Data Set	Entities	Relations	Time Steps	Train	Valid	Test	Total Triples
ICEWS14	7128	230	365	72,826	8941	8963	90,730
ICEWS05-15	10,488	251	4017	368,962	46,275	46,092	461,329
GDELT	500	20	366	2,735,685	341,961	341,961	3,419,607

Table 2. Results of the experiments. H@N means Hits@1/3/10 and all results are presented as percentages. The best results are written in bold. Dashes: results are not reported in the corresponding literature.

	ICEWS14				ICEWS05-15				GDELT
	MRR	Hits@10	Hits@3	Hits@1	MRR	Hits@10	Hits@3	Hits@1	MRR	Hits@10	Hits@3	Hits@1
TTransE	25.5	60.1	-	7.4	27.1	61.6	-	8.4	11.5	31.8	16.0	0.0
HyTE	29.7	65.5	41.6	10.8	31.6	68.1	44.5	11.6	11.8	32.6	16.5	0.0
TA-DistMult	47.7	68.6	-	36.3	47.4	72.8	-	34.6	20.6	36.5	21.9	12.4
DE-Simple	52.6	72.5	59.2	41.8	51.3	74.8	57.8	39.2	23.0	40.3	24.8	14.1
ATiSE	55.0	75.0	62.9	43.6	51.9	79.4	60.6	37.8	-	-	-	-
TNTComplEx	56.0	74.0	61.0	46.0	60.0	78.0	65.0	50.0	22.4	38.1	23.9	14.4
TuckERT	59.4	73.1	64.0	51.8	62.7	76.9	67.4	55.0	41.1	61.4	45.3	31.0
TeRo ^⋆	56.2	73.2	62.1	46.8	58.6	79.5	66.8	46.9	24.5	42.0	26.4	15.4
TeRo-Large	53.4	72.2	59.6	43.2	53.4	80.0	62.7	39.5	25.6	43.7	27.8	16.3
TeLM	62.5	77.4	67.3	54.5	67.8	82.3	72.8	59.9	38.2	54.6	41.0	29.8
TeLM-Small ^⋆	59.1	74.5	63.9	50.7	62.6	77.4	67.1	54.7	28.5	45.2	30.7	19.8
RotateQVS	59.1	75.4	64.2	50.7	63.3	81.3	70.9	52.9	27.0	45.8	29.3	17.5
RotateQVS-Small ^⋆	57.5	73.7	62.5	48.9	59.1	80.2	68.5	47.3	25.9	42.8	27.0	16.5
HTTR	61.6	77.0	66.6	54.4	66.3	81.5	71.7	58.1	53.1	66.8	56.8	45.6

Table 3. Comparative results of three methods for fusing relational and temporal information. The metrics are expressed as percentages. The best results are shown in bold.

Data Sets	$A_{rt}$	MRR	Hits@10	Hits@3	Hits@1
	$M a t r i x (e_{r} \| \| e_{t})$	61.6	77.0	66.6	54.4
ICEWS14	$M a t r i x (e_{r} + e_{t})$	59.4	74.8	65.3	51.2
	$M a t r i x (e_{r} * e_{t})$	58.0	73.2	63.6	49.2
	$M a t r i x (e_{r} \| \| e_{t})$	66.3	81.5	71.7	58.1
ICEWS05-15	$M a t r i x (e_{r} + e_{t})$	64.6	81.3	71.0	55.3
	$M a t r i x (e_{r} * e_{t})$	63.0	79.6	70.7	53.5
	$M a t r i x (e_{r} \| \| e_{t})$	53.1	66.8	56.8	45.6
GDELT	$M a t r i x (e_{r} + e_{t})$	52.1	64.4	55.2	45.3
	$M a t r i x (e_{r} * e_{t})$	51.0	64.0	54.6	43.7

Table 4. Effect of regularization.

	ICEWS14				ICEWS05-15
	MRR	Hits@10	Hits@3	Hits@1	MRR	Hits@10	Hits@3	Hits@1
TNTComplEx	56.0	74.0	61.0	46.0	60.0	78.0	65.0	50.0
TNTComplEx (w/o reg.)	49.6	65.4	53.4	41.1	54.9	73.7	60.0	45.9
improvement	−6.4	−8.6	−7.6	−4.9	−5.1	−4.3	−5.0	−4.1
TeLM	62.5	77.4	67.3	54.5	67.8	82.3	72.8	59.9
TeLM (w/o reg.)	54.3	67.6	57.6	47.2	58.8	72.6	62.4	51.5
improvement	−8.2	−9.8	−9.7	−7.3	−9.0	−9.7	−10.4	−8.4
HTTR	61.6	77.0	66.6	54.4	66.3	81.5	71.7	58.1
HTTR (w/o reg.)	61.8	76.4	67.0	53.6	66.1	81.8	72.0	58.1
improvement	+0.2	−0.6	+0.4	-0.8	−0.2	+0.3	+0.3	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, X.; Li, A.; Jiang, R.; Chen, K.; Peng, Z. Householder Transformation-Based Temporal Knowledge Graph Reasoning. Electronics 2023, 12, 2001. https://doi.org/10.3390/electronics12092001

AMA Style

Zhao X, Li A, Jiang R, Chen K, Peng Z. Householder Transformation-Based Temporal Knowledge Graph Reasoning. Electronics. 2023; 12(9):2001. https://doi.org/10.3390/electronics12092001

Chicago/Turabian Style

Zhao, Xiaojuan, Aiping Li, Rong Jiang, Kai Chen, and Zhichao Peng. 2023. "Householder Transformation-Based Temporal Knowledge Graph Reasoning" Electronics 12, no. 9: 2001. https://doi.org/10.3390/electronics12092001

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Householder Transformation-Based Temporal Knowledge Graph Reasoning

Abstract

1. Introduction

2. Related Work

2.1. Static KG Embedding and Completion

2.2. TKG Reasoning and Link Prediction

3. Preliminaries

3.1. Householder Transformation

3.2. QR Decomposition Based on Householder Transformation

4. Our Method

4.1. OTE

4.2. HTTR Model

4.3. Reasoning across Multi-Relational Patterns

5. Experiments

5.1. Data Sets

5.2. Experimental Setup

5.3. Main Results

5.4. Comparative Analysis of the Fusion Methods of Time and Relation

Comparative Experiment Results

5.5. Ablation Study

5.6. Effect of Rotation Dimension

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Householder QR Decomposition

Appendix B. Givens Transformation and Householder Transformation

Appendix C. Similarity

Appendix D. Case Based Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI