JKRL: Joint Knowledge Representation Learning of Text Description and Knowledge Graph

Xu, Guoyan; Zhang, Qirui; Yu, Du; Lu, Sijun; Lu, Yuwei

doi:10.3390/sym15051056

Open AccessArticle

JKRL: Joint Knowledge Representation Learning of Text Description and Knowledge Graph

by

Guoyan Xu

,

Qirui Zhang

^*,

Du Yu

,

Sijun Lu

and

Yuwei Lu

College of Computer and Information, Hohai University, Nanjing 211100, China

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(5), 1056; https://doi.org/10.3390/sym15051056

Submission received: 10 April 2023 / Revised: 28 April 2023 / Accepted: 9 May 2023 / Published: 10 May 2023

(This article belongs to the Special Issue Machine Learning and Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

The purpose of knowledge representation learning is to learn the vector representation of research objects projected by a matrix in low-dimensional vector space and explore the relationship between embedded objects in low-dimensional space. However, most methods only consider the triple structure in the knowledge graph and ignore the additional information related to the triple, especially the text description information. In this paper, we propose a knowledge graph representation model with a symmetric architecture called Joint Knowledge Representation Learning of Text Description and Knowledge Graph (JKRL), which models the entity description and relationship description of the triple structure for joint representation learning of knowledge and balances the contribution of the triple structure and text description in the process of vector learning. First, we adopt the TransE model to learn the structural vector representations of entities and relations, and then use a CNN model to encode the entity description to obtain the text representation of the entity. To semantically encode the relation descriptions, we designed an Attention-Bi-LSTM text encoder, which introduces an attention mechanism into the Bi-LSTM model to calculate the semantic relevance between each word in the sentence and different relations. In addition, we also introduce position features into word features in order to better encode word order information. Finally, we define a joint evaluation function to learn the joint representation of structural and textual representations. The experiments show that compared with the baseline methods, our model achieves the best performance on both Mean Rank and Hits@10 metrics. The accuracy of the triple classification task on the FB15K dataset reached 93.2%.

Keywords:

knowledge graph; representation learning; structure embedding; text description

1. Introduction

Knowledge graphs (KGs) adopt intuitive triplets to represent entities in the objective world and the relationships between them, storing a large number of facts in the form of triplets (head entity, relationship and tail entity). KGs were first applied to improve the capabilities of search engines. Subsequently, they have shown great application value in assisting intelligent question answering, natural language processing, big data analysis, recommendation computing and interpretable artificial intelligence [1,2,3]. Among them, knowledge representation is the foundation of these applications. However, due to the continuous accumulation of knowledge, the scale of KGs has expanded very quickly, and the forms of knowledge have become more and more diverse. The disadvantages of knowledge representation in traditional forms are becoming more and more obvious, such as the difficulty of reasoning about the semantic relationship between entities, the serious problem of data sparsity, the high complexity of calculation and the difficulty of applying it to large KGs. Knowledge representation learning [4,5,6] aims to solve the sparsity problem of KGs and improve their internal integrity.

Knowledge representation learning studies the vector representation of the research object in the low-dimensional vector space through matrix projection and explores the relationship between the embedded objects in the low-dimensional space. Representation learning for KGs aims to learn the representation of entities and relationships in KGs in low-dimensional vector space, which retains the semantic information of entities and relationships from various dimensions [7]. By calculating the semantic similarity between vectors, the internal structural association of KGs is further mined, and the overall quality of KGs is improved on this basis [8]. By representing entities and relationships, representation learning can solve problems such as data sparsity and knowledge reasoning difficulties faced by traditional representation methods, and is widely applied to downstream tasks related to KGs, such as knowledge acquisition, entity disambiguation, relation extraction and KG completion [9,10].

Similar to the idea of unsupervised semantic-aware-based graph representation learning [11], text information can provide rich semantic resources for graph representation and play an important auxiliary role in the representation learning optimization of a graph representation model. Therefore, recent research on KG representation learning research focuses on how to use the internal information of a knowledge graph to optimize the representation of knowledge. Most KG embedding methods only consider the triple structure in KGs, which ignores the additional information related to triples, especially text description information. Text description information is generally divided into entity description and relationship description. Entity description is defined as a text that describes a topic and an entity, often as a supplement to describe an entity from different aspects. The relation description is defined as a sentence containing triple entity pairs

(h, t)

, which can replace the direct relation

r

in the triple group

(h, r, t)

to express the relationship between entity pairs

(h, t)

. The distant supervision hypothesis [12] holds that if two entities have a certain relationship in a KG, then the sentences with these two entities usually contain this relationship, which indicates the correlation between the relationship description and the specific triple.

The combination of triplet structure and text description for KG embedding has become a hot topic in current research. Based on the triple structure, the entities and relationships in the KG are represented as vectors in the low-dimensional semantic space. The semantic information of entities and relationships is mined based on text description and mapped to the same semantic space. The Description-embodied Knowledge Representation (DKRL) [13] method integrates entity description information into triple embedding, and uses the Continuous Bag-of-Words [14] (CBOW) and a convolutional neural network (CNN) to encode entity descriptions. However, DKRL only takes some high-frequency words in the entity description as input, and only focuses on some high-frequency words and encodes the text representation of the entity as input, which is likely to cause semantic loss. Joint (Joint Representation Learning of Text and Knowledge) [15] proposed a joint representation learning model based on text and knowledge. It constructs a relationship text description by complex alignment of entities and relationships and uses a CNN to encode the relationship text to obtain the text representation of the relationship. However, when encoding the relationship description, Joint not only does not consider the word order information, but also ignores the difference in the degree of influence of each word on the relationship in the sentence. In addition, the DKRL model only considers the fusion entity description information, and the Joint model only considers the fusion relationship description information, both of which only use a text description from one perspective to enrich the representation of entities or relationships.

In addition, some works [13,16,17,18,19,20] introduced text description to improve the representation of KGs, but these methods still have shortcomings: (1) lack of joint representation learning framework based on triple structure and text description information, which effectively enhances the semantic representation of entities and relationships at the same time; (2) when encoding the text description information, the word order dependency cannot be effectively encoded; (3) the semantic importance of different words in text description sentences is not distinguished.

In order to solve the above problems, this paper proposes a KG representation model with a symmetric architecture called Joint Knowledge Representation Learning of Text Description and Knowledge Graph (JKRL). It models the entity description and relationship description on a triple structure for joint representation learning of knowledge and balances the contribution of triple structure and text description in the vector learning process. First, we adopt the TransE model to learn the structural vector representations of entities and relations, and then use a CNN model to encode the entity description to obtain the text representation of the entity. In particular, due to the relationship description statement having a strong correlation with a specific triple, we design an Attention-Bi-LSTM text encoder when we encode the relationship description. The attention mechanism [21] is introduced into the Bi-LSTM [22] model to calculate the semantic relevance of each word and different relations in the sentence, so as to dynamically select the most relevant information from the relationship description to obtain the accurate relationship text representation. In addition, we consider the relative distance between the word and the head–tail entity in the relational description sentence and introduce the position feature into the word feature to better encode the word order information. According to the characteristics of triple structure, entity description and relationship description, the JKRL model uses different coding models to extract their respective deep semantics, and the overall model architecture is symmetrical.

The main contributions of this work can be summarized as follows:

A KG representation method called Joint Knowledge Representation Learning of Text Description and Knowledge Graph is proposed, which jointly learns the representation of entities and relations based on both structural and textual information, effectively improving the representation of entities and relations.
An Attention-Bi-LSTM text encoder is designed to encode textual descriptions, which extracts word order features better by introducing word position features and attention mechanisms, and dynamically selects the most relevant information in the textual descriptions by calculating the semantic relevance of words based on different relationships.
Experimental results on the FB15k dataset demonstrate that the JKRL model has strong competitiveness compared with the baseline model in KG completion tasks and can improve the quality of the vector representation of entities and relationships.

2. Related Work

Judging from the existing KG representation models, these models mainly use two types of information to assist knowledge embedding. The first is the existing triplet information in the KG. Most of these models are directly based on a single triplet. The group is modeled and only the triple itself is used for embedding; The second is additional information related to the triple, such as structural information, text description and entity type. Because additional information can enhance the semantic information in the vector representation of entities and relations in triples, such models can often improve the quality of vector representation of entities and relations.

2.1. Knowledge Representation Model Based on Single Triples

The knowledge representation model based on a single triplet only uses the internal information of the triplet itself, including the translation distance model, semantic matching model and new neural network model.

2.1.1. Translational Distance Models

Translation distance models usually calculate the semantic similarity between vectors based on the distance score function through the relational translation operation and then use the distance between two entities to measure the rationality of the triplet facts. TransE [23] regards relation as a transformation from the head entity to tail entity and measures the distance between the sum of the head entity vector and relation vector to the tail entity vector based on the distance scoring function. TransH [24] improved this process by projecting entities and relations onto the hyperplane of relations. TransR [25] introduced independent vector space for entities and relations, and used the specific vector space of relations to process different relations and mine more semantic information. TransD [26] is a simplification of TransR, which decomposes the projection matrix in TransR and introduces the respective projection vectors of the head and tail entities. TransMS [27] uses nonlinear functions to translate semantics and obtains tail entity vectors by translating head entity vectors and relational vectors.

2.1.2. Semantic Matching Models

Semantic matching models are usually based on similarity functions to match the latent semantics of entity and relation embeddings to measure the credibility of triples, and such models often have high computational complexity. DistMult [28] restricts the relation matrix

M_{r}

to be a diagonal matrix and proposes a simplified bilinear formulation

f_{r} (h, t) = h^{T} d i a g (M_{r}) t

to capture the interaction of head and tail entity vectors in the same dimension. HolE [29] learns representations of entities and relations by introducing products of compressed tensors. An analogy in [30] focuses on multi-relational reasoning to model the analogy structure of relational data, and its scoring function is

f_{r} (h, t) = h^{T} M_{r} t

, where the relational matrix

M_{r}

is defined as a normal matrix for analogy reasoning. The ComplEx [31] model extends DistMult by introducing holographic embedding and complex embedding, embedding entities and relations into complex space.

2.1.3. Neural Network Models

The neural network model based on deep learning is still based on the triple fact itself and optimizes the representation of entities and relationships by introducing neural network modules. ConvE [32] is the first model to use a CNN to complete KG embedding, and it is the simplest multi-layer convolutional network structure among all neural network models. ConvKB [33] uses 1D convolutions to preserve the translational properties of TransE and is able to capture the global relationship between entities. Experiments prove that ConvKB exhibits good link prediction performance and can be extended to other network data for application.

2.2. Knowledge Representation Model Fused with Additional Information

Representation learning based on a single triplet does not incorporate additional information about triplets to improve the representation ability of the model. The additional information is multi-source and can be divided into two categories. The first category is the information derived from the triple itself, which is included in the KG, such as structural information, entity type, etc., and in the second category the tuples are related but the sources of information are not inside the triples, such as entity descriptions, images associated with entities, etc.

2.2.1. Graph Structure Information

The graph structure information of a knowledge map can be regarded as being composed of the relationship path between entity neighborhood structure and entity pair. Entity neighborhood structure information includes neighborhood node information and neighborhood edge information. Relational path information refers to the sequence composed of entity nodes and the relationship edge between the entity pair without a direct relationship. Lin et al. [34] integrated the relationship path information based on the research of TransE, so that it can solve complex relationship problems and improve the effect of TransE on knowledge representation. The Graph Aware Knowledge Embedding (GAKE) model [35] extracts entity features from contextual information based on the word vector algorithm and applies them to downstream tasks, proving that the neighborhood structure information of entities can enhance the semantic representation of target entities.

2.2.2. Entity Description Information

Entity description information refers to text or images related to entities, and this article mainly refers to text descriptions related to entities. Text-Enhanced Representation Learning for Knowledge Graph (TEKE) [36] adopted a network structure in which entities and entity description information are combined, and combines the vector representation of entities based on text descriptions with the translation distance model to improve representation of the entity. DKRL [13] used two models of CBOW and CNN to encode the entity description to obtain the text vector representation of the entity, and improved the scoring function of the TransE model. Through the interaction between the triple structure and the entity description, the translation of the text vector of the entity in the relational space is realized. Joint [15] adopted a joint representation learning based on text and knowledge. Through a complex alignment of entity and relationship, the relational text description was constructed. Jointly [20] is a joint presentation framework based on the gating mechanism, which utilizes both structural information and textual information of entities and integrates the representation of structure and text into a unified architecture through the gating mechanism. An et al. [37] proposed an accurate text-enhanced KG representation learning method that enhances representation by utilizing entity description and relational description. In addition, an interactive attention mechanism based on relational description and entity description was proposed to learn more accurate text representation, so as to further improve the representation ability of KGs. Yao et al. [38] took the entity description and relation description of triples as input, introduced the pre-trained language model Bert [39] to learn rich language information from a large number of texts and highlighted the most important words associated with triples. Wang et al. [40] proposed a structure-augmented text representation model to tackle the link prediction task for KG completion. This model first applied a Siamese-style textual encoder to a triple for two contextualized representations. Then, based on the two representations, a scoring module was proposed where two parallel scoring strategies are used to learn both contextualized and structured knowledge. Shen et al. [41] proposed to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information. This method embeds KGs via fine tuning pre-trained language models with respect to a probabilistic structured loss, where the forward pass of the language models captures semantics and the loss reconstructs structures. In addition, a large number of studies [42,43,44,45,46] have improved the representation of KGs by introducing text description and achieved good results.

2.2.3. Other Information

In addition to the above two mainstream and triple-related additional information, researchers try to integrate other information from different perspectives to improve the knowledge representation ability of the model. Xie et al. [47] combined the entity type and the triple structure, introduced the projection matrix of the entity type to map the head and tail entities to the relationship vector space, and then performed translation operations with the relationship vector in the relationship vector space, effectively solving the multi-type of entities and hierarchy issues. Esteban et al. [48] proposed to integrate the timeliness of quadruple facts to express KGs, which has a high application prospect for representing dynamic KGs. Guan et al. [49] incorporated the relation fact of n-element and represented it as a set of role–value key–value pairs. In addition, there is a Hyper-Relational KG Embedding model (HINGE) [50] that incorporates hyper-relational facts, where a hyper-relational fact contains a basic triple

(h, r, t)

and a set of associated key–value pairs

(k_{i}, v_{i})

.

3. Joint Knowledge Representation Learning of Text Description and Knowledge Graph

3.1. Symbols and Definitions

The KG is represented as

G = (E, R, T)

, where

E

denotes the entity set and

R

denotes the relation set. Every triple

(h, r, t) \in T

in the KG satisfies

h, t \in E

,

r \in R

. In the JKRL model, for a triplet

(h, r, t)

, the vector representation based on the triplet structure of the knowledge map is called a structural vector, which is denoted as

h_{s}

,

r_{s}

,

t_{s}

; the vector representation based on the text description is called text vectors, denoted as

h_{d}

,

t_{d}, r_{d}

. The symbol of the text description sentence is

s

, and the text description sentence set

T = {s_{1}, s_{2}, \dots, s_{n}}

. Each sentence is a sequence of words, denoted as

s = {x_{1}, x_{2}, \dots, x_{n}}

, where

x_{i}

represents the embedding of word vectors. Entities and relations are embedded in a continuous vector space

R^{k}

. The meanings of the specific symbols are shown in Table 1.

3.2. The Overall Architecture of the Model

The JKRL model is a joint representation learning framework that integrates text description information of entities and relationships with KGs. The overall architecture is shown in Figure 1. The model consists of three knowledge representation learning parts, which are the representation learning of triple structure, representation learning of entity description and representation learning of relation description.

Representation learning of triple structure (blue area): Learns the structural representation of entities and relationships from the perspective of KG. Based on the universality of the representation learning framework, a simple and efficient TransE model is used to embed the structure vector of the head and tail entities and the structure vector of the relationship, which are denoted as $h_{s}$ , $t_{s}, r_{s}$ .
Representation learning of entity description (green area): A CNN model is used to encode the entity description of the triple $(h, r, t)$ to obtain the text representation of the head entity and tail entity, which are denoted as $h_{d} a n d t_{d}$ .
Representation learning of relation description (yellow area): The relation description of the triple $(h, r, t)$ refers to the text containing the entity pair $(h, t)$ . The Attention-Bi-LSTM text encoder is designed to mine the semantic information in the relation description to obtain the text representation of the relation, which is denoted as $r_{d}$ .

The JKRL model performs joint representation learning based on structural representation and text representation. We define a joint evaluation function and perform vector representation to minimize the loss of training entities and relationships. The loss function of the JKRL model is defined as follows:

L_{Θ} (G, D) = L_{Θ_{E_{S}} Θ_{R_{S}}} (G) + α L_{Θ_{E_{D}}} (G, D) + β L_{Θ_{R_{D}}} (G, D) + λ {‖Θ‖}_{2}

(1)

In the above formula,

{‖Θ‖}_{2}

is the L2 regularization of the parameter

Θ

, the parameter set

Θ = {Θ_{E_{S}}, Θ_{E_{D}}, Θ_{R_{S}}, Θ_{R_{D}}}

,

Θ_{E_{S}}

represents the parameter set of the learning entity structure representation,

Θ_{E_{D}}

represents the parameter set of the learning entity text representation,

Θ_{R_{S}}

represents the parameter set of the learning relationship structure representation and

Θ_{R_{D}}

represents the parameter set of the learning relationship text representation.

L_{Θ_{E_{S}} Θ_{R_{S}}} (G)

is the loss function of representation learning based on triple structure,

L_{Θ_{E_{D}}} (G, D)

is the loss function of representation learning based on entity description and

L_{Θ_{R_{D}}} (G, D)

is the loss function of representation learning based on relation description. The

α

and

β

in the equation are to balance the role of triple structure and text description information in vector joint representation learning. Formula (1) can balance the contribution of the two vector representations in the learning process.

3.3. Representation Learning of Triple Structure

Traditional KG representation is to generate vectors in low-dimensional continuous space based on triple structure by representation learning technology and construct scoring functions to evaluate the reliability of triple facts. TransE is a classic KG representation learning model, which has been widely used and verified in many studies combined with text descriptions, such as DKRL [13], Joint [15], Jointly [20], etc. Since the JKRL model needs to jointly learn the representation of triplets and text descriptions and uses deep neural networks such as CNN and Attention-Bi-LSTM, the model has many trainable parameters and has high space complexity. Therefore, in the representation learning of triple structure, we employ TransE, a simple but effective model, to learn structured vector representations of entities and relations. In practice, more complex models, such as TransH, TransR and DisMult [24,25,28], may overfit or be too complex to train.

TransE assumes that the vector space dimensions of entities and relations are consistent and initializes the triple

(h, r, t)

to obtain

(h_{s}, r_{s}, t_{s}

). Then,

r_{h t}

is set to be the translation of the head entity vector

h_{s}

to the tail entity vector

t_{s}

:

r_{h t} = t_{s} - h_{s}

(2)

Since triple embedding

(h_{s}, r_{s}, t_{s}

) denotes an explicit relation vector

r_{s}

between

h_{s}

and

t_{s}

, the triple scoring function is defined as follows:

φ_{r} (h, t) = {‖r_{h t} - r_{s}‖}_{2} = {‖(t_{s} - h_{s}) - r_{s}‖}_{2}

(3)

The above formula shows that each triple

(h, r, t)

existing in the KG should satisfy

h_{s} + r_{s} = t_{s}

in the ideal state, and the result of the triple

h_{s} + r_{s}

that does not exist or is incorrect in the KG should be far away from

t_{s}

. The structural vectors

h_{s}

,

t_{s}

and

r_{s}

of entities and relationships are obtained by training the model. Therefore, the scoring function for all triples is defined as

L_{Θ_{E_{S}} Θ_{R_{S}}} (G) = \sum_{(h, r, t) \in S} \sum_{(h^{'}, r^{'}, t^{'}) \in S^{'}} {[φ_{r} (h, t) - φ_{r^{'}} (h^{'}, t^{'}) + μ]}_{+}

(4)

where S represents the set of positive triples in the KG,

S^{'}

represents the set of negative triples after negative sampling, specifically represented as

S^{'} = {(h^{'}, r, t)} \cup {(h, r^{'}, t)} \cup {(h, r, t^{'})}

,

h^{'} \in E

represents the head entity obtained by sampling,

r^{'} \in R

represents the relationship obtained by sampling and

t^{'} \in E

represents the tail entity obtained by sampling.

3.4. Representation Learning of Entity Descriptions

The representation learning of entity description aims to extract the semantics related to the topic entity in the text by encoding the entity description statement to model the vector representation of the entity. After obtaining the text vector

h_{d}

of the head entity and the text vector

t_{d}

of the tail entity, the scoring function of each triple is defined as follows:

σ_{r} (h, t) = {‖t_{d} - h_{d} - r_{s}‖}_{2}

(5)

For all triples in the KG, the loss function of the module is defined as

L_{Θ_{E_{D}}} (G, D) = \sum_{(h, r, t) \in S} \sum_{(h^{'}, r, t^{'}) \in S^{'}} {[σ_{r} (h, t) - σ_{r} (h^{'}, t^{'}) + η]}_{+}

(6)

where

η

is an interval parameter greater than 0.

S

is a positive triple set,

S^{'}

is a negative triple set and

S^{'} = {(h^{'}, r, t)} \cup {(h, r, t^{'})}

.

The entity description text encoding process is shown in Figure 2. DKRL only encodes some high-frequency words in the entity description, which easily causes semantic information loss. Since the entity description text may present an entity from various aspects and is not sensitive to the sequence information of the text, a CNN can effectively capture local features in the entity description text by sliding on the text sequence through the convolutional kernel and has the advantage of parallel computing with high computational efficiency, which is suitable for processing large-scale text data. Therefore, we adopt a CNN model to encode the text representation of the entity, and mine the deep semantic information hidden in the word order inside the entity description. The specific steps are as follows:

Input: Firstly, all stop words and punctuations in the entity text description are deleted to construct the expected vocabulary for training. Then, the entity annotation tool is used to automatically annotate the entities in $G$ to obtain the annotated text, forming the entity text description $s = {x_{1}, x_{2}, \dots, x_{n}}$ related to a specific triple. Finally, the word2vec tool is used to train the word vector to correspond to each word in the text description to form the input vector $X = (x_{1}, x_{2}, \dots, x_{n})$ .
Convolution layer: The input and output of the convolution layer are represented by $X$ and $C$ , respectively. Because the length of the text description statement is different, the length of the input vector $X = (x_{1}, x_{2}, \dots, x_{n})$ is inconsistent. After setting the maximum sentence length $L$ for the convolution operation, if the sentence length is less than $L$ , the end of the embedding vector of the sentence is filled with 0, and if the sentence length is greater than $L$ , the redundant words are removed. We need to add 0 to the end of the embedded vector of the sentence. Then, the input vector $X$ is processed based on the sliding window mechanism. We define the size of the window as $k$ , set the step size of the sliding window as 1, and the content of the sliding window is $x_{i : i + k - 1} = [x_{i}, x_{i + 1}, \dots, x_{i + k - 1}]$ . The corresponding output vector $c_{i}$ is extracted by a convolution operation. The specific formula of this process is defined as follows:

$c_{i} = σ_{c} (w_{c} {\cdot x}_{i : i + k - 1} + b_{i})$

(7)

where $w_{c} \in R^{m * k n}$ represents the convolution kernel, m is the dimension of the output vector, n represents the dimension of the input vector, $b_{i}$ represents the bias term, and $σ_{c}$ represents the activation function.
Pooling layer: In order to reduce the number of parameters of the CNN encoder and the influence of noise, the feature vectors’ output by the convolutional layer needs to be pooled. We use the maximum pooling layer to optimize the semantic features in the feature vectors. It is divided into multiple non-overlapping windows, and the maximum value is selected in each window to form a new output vector $p$ in order to obtain the most significant features in each dimension:

$p = m a x (c_{i}, c_{i + 1}, \dots, c_{i + m - 1})$

(8)
Output layer: The text vector of the entity is obtained by multiplying the vector matrix of the input vector. The text vector form of the entity is defined as

$e_{d} = w_{0} \cdot p + b_{0}$

(9)

where $w_{0}$ is the parameter matrix and $b_{0}$ is the bias term, and finally the head and tail entity text vectors $h_{d}$ and $t_{d}$ are obtained.

3.5. Representation Learning of Relation Description

Relation description reference sentences refer to sentences containing triple entity pairs

(h, t)

. The representation learning of relation description aims to reduce the noise in relation description sentences and obtain the vector representation of relation based on text description. For each triple, the scoring function is defined as the minimum distance between

r_{s}

and

r_{d}

:

ψ_{r} (s) = {‖r_{s} - r_{d}‖}_{2}

(10)

The loss function for all relation description statement sets is defined as follows:

L_{Θ_{R_{D}}} (G, D) = \sum_{s \in T} \sum_{r^{'} \neq r} {[ψ_{r} (s) + ψ_{r^{'}} (s) + γ]}_{+}

(11)

where

γ

is an interval parameter greater than 0, and T represents the set of text description statements.

Since the relationship description information is only one sentence, the text length is short and contains less information. After comprehensive consideration of data characteristics, computing resources, dataset size and time sensitivity, we finally choose Bi-LSTM, which has a moderate computational complexity and number of parameters, as the basic model for text encoder design among commonly used text modeling models such as CNN, RNN and Transformer [22,51,52]. The Bi-LSTM model can remember the whole semantic information of the sentence through the gating mechanism when the sentence length is short, and the problem of forgetting will not occur.

Because the text relationship hidden in the relationship description has a strong correlation with a specific triple, it is necessary to retain the word order features and the dependencies between words in the statement when modeling the relationship description, so as to extract the deep semantic features of the relationship description. When modeling text description, most methods not only encode the word order features in relational description poorly, but also ignore the semantic relevance of the word and the relation in the triplet, because the text description may present an entity from various aspects, and the relationship may only be closely related to part of the statement. In the JKRL model, we designed an Attention-Bi-LSTM text encoder, as shown in Figure 3. On the basis of Bi-LSTM, the attention mechanism layer is introduced. According to the different relationships in the corresponding triples, the most relevant word semantics are dynamically selected from the relationship description for coding. The Attention-Bi-LSTM is used to model the relationship description as follows:

The relationship description statement is preprocessed and represented by word embedding. The feature vector of each word is obtained by word2vec training. In addition, the relative distance between each word and the head and tail entities in the sentence is calculated.
In the Attention-Bi-LSTM encoder, the Bi-LSTM hidden layer performs forward and backward LSTM encoding on the input vector.
In the attention layer of Attention-Bi-LSTM, the corresponding attention weight is calculated according to the correlation degree between each word and the triplet relationship, and finally the text vector $r_{d}$ of the relationship is obtained in the output layer.

Word embedding representation

The input vector of the relation description

s = {x_{1}, x_{2}, \dots, x_{n}}

is composed of all word vectors in the sentence. For each word, the word feature (WF) and the position feature (PF) together form the word vector.

Word2vec [53] is used for embedding learning. Word2vec encodes the context information of words in the corpus. The word embedding obtained can be directly used as the representation of word features:

W F = {x_{1}, x_{2}, \dots, x_{n}}

(12)

Relational description sentences contain head and tail entities, whose semantic information can enhance the expression of relational semantics in related triples. The position feature is defined as the relative distance between each word in the sentence and the head and tail entities. In order to measure the distance between a word and the head–tail entity, the position feature [53]

P F

is defined as follows:

P F = {[d_{1 h}, d_{1 t}], [d_{2 h}, d_{2 t}], \dots, [d_{n h}, d_{n t}]}

(13)

where

d_{i h}

denotes the relative distance between the

i

word and the head entity

h

, and

d_{i t}

denotes the relative distance between the

i

word and the tail entity

t

. For example, for the relational description statement “Putin is the president of Russia”, the relative distance between the word “president” and the head entity “Putin” is 3, and the relative distance between the word “president” and the tail entity “Russia” is −2. The relative distance between the word and its left entity is recorded as positive distance, and the distance between the word and its right entity is recorded as negative distance. Finally, we splice all the word features and their corresponding position features in the sentence to obtain the vector representation

s

of the sentence as the input vector of the Bi-LSTM encoder.

s = {[x_{1}, d_{1 h}, d_{1 t}], [x_{2}, d_{2 h}, d_{2 t}], \dots, [x_{n}, d_{n h}, d_{n t}]}

(14)

2.: Bi-LSTM coding

A recurrent neural network (RNN) is used to process sequence information. Due to the need to process both current information and historical information when dealing with long-term dependencies, a RNN has gradient disappearance and gradient explosion problems [54]. LSTM is an enhanced neural network based on RNN, which realizes the interaction between short-term memory and long-term memory through a gating mechanism and solves the problem that RNN cannot remember long-term historical input information. The LSTM hidden layer contains the memory cell, input gate

i_{t}

, forgetting gate

f_{t}

and output gate

o_{t}

;

s_{t}

is set to be the input at time t,

c_{t - 1}

is the output of the memory cell at time t − 1 and

h_{t - 1}

is the output of the hidden layer at time t − 1. Therefore, for the hidden state

h_{t}

at time t, the calculation process of the LSTM hidden layer is as follows:

i_{t} = σ (W_{s i} s_{t} + W_{h i} h_{t - 1} + W_{c i} c_{t - 1} + b_{i})

(15)

f_{t} = σ (W_{s f} s_{t} + W_{h f} h_{t - 1} + W_{c f} c_{t - 1} + b_{f})

(16)

o_{t} = σ (W_{s o} s_{t} + W_{h o} h_{t - 1} + W_{c o} c_{t} + b_{o})

(17)

\tilde{c_{t}} = t a n h (W_{s c} s_{t} + W_{h c} h_{t - 1} + b_{c})

(18)

c_{t} = f_{t} c_{t - 1} + i_{t} \tilde{c_{t}}

(19)

h_{t} = o_{t} t a n h (c_{t})

(20)

where,

W_{s i}

,

W_{h i}

,

W_{c i}

,

W_{s f}

,

W_{h f}

,

W_{c f}

,

W_{s o}

,

W_{h o}

,

W_{c o}

,

W_{s c}

and

W_{h c}

are weight matrices,

b_{i}

,

b_{f}

,

b_{o}

and

b_{c}

are bias vectors and

σ

is the Sigmoid function.

Since LSTM cannot reverse the information from back to front, the Bi-LSTM neural network contains two LSTM hidden layers: one is responsible for the forward transmission of information, and the other is responsible for the reverse transmission of information. Finally, the two parts of information are combined to generate the final output of Bi-LSTM. For Bi-LSTM, each word vector is input into two LSTM hidden layers:

s_{t}

is the embedding vector of the word input at time t,

h_{t - 1}^{f}

is the output of the previous moment of the forward LSTM and

h_{t + 1}^{b}

is the output of the previous moment of the backward LSTM. The output of the forward and backward LSTM at time t is as follows:

h_{t}^{f} = H (s_{t}, h_{t - 1}^{f}, c_{t - 1}, b_{t - 1})

(21)

h_{t}^{b} = H (s_{t}, h_{t + 1}^{f}, c_{t - 1}, b_{t - 1})

(22)

where

H (\dots)

represents the encoding process of LSTM. Each output unit of Bi-LSTM is connected to the forward and backward LSTM hidden layers at this time. The output vector at time

t

is as follows:

z_{t} = σ (W_{h z}^{f} h_{t}^{f} + W_{h z}^{b} h_{t}^{b} + b_{z})

(23)

where

W_{h z}^{f}

,

W_{h z}^{b}

are weight matrices and

b_{z}

is the bias vector.

3.: Attention mechanism

Relational description sentences contain head and tail entities whose semantic information can enhance the expression of relational semantics in related triples. However, since the relationship description statement contains multiple words, not all words have an effect on the semantic information of the modeling description statement. If a simple summary will introduce word-level noise, it is necessary to calculate the correlation between each word in the relationship description and its corresponding specific triple. We add an attention layer to the Bi-LSTM encoder, calculate the weight of each word and relationship in the sentence based on the attention mechanism and obtain the text vector

r_{d}

of the relationship according to the weight.

For the word at position

i

in the relation description, we define its attention weight to the given relation

r

as

α_{i} (r)

:

e_{i} (r) = V_{a}^{T} t a n h (W_{a} z_{i} + U_{a} r)

(24)

α_{i} (r) = s o f t m a x (e_{i} (r)) = \frac{e x p (e_{i} (r))}{\sum_{j = 1}^{n} e x p (e_{j} (r))}

(25)

where

z_{i}

represents the output of the Bi-LSTM hidden layer at position

i

and

W_{a}

,

U_{a}

and

V_{a}

are parameter matrices. The output state

c

of the attention mechanism layer is calculated as follows:

c = \sum_{i = 1}^{n} α_{i} (r) \cdot z_{i}

(26)

Finally, the output vector

c

of the attention mechanism layer is nonlinearly transformed through the activation function to obtain the relational text vector

r_{d}

:

c r_{d} = σ (c)

(27)

4. Experiment and Result Analysis

4.1. Experimental Environment

The hardware environment used in this experiment is Intel Core i5-9300HF, 16 GB memory, GTX1660Ti.

4.2. Dataset

In order to evaluate the effectiveness of our proposed method, we used the most common dataset FB15K [23] in the KG completion task. FB15K is a subset of Freebase [55], and the text description corresponding to the entities in the dataset is publicly available. We chose the FB15K dataset as the experimental data because it is one of the most commonly used datasets in the field of KG representation learning and has been extensively researched and validated. The dataset contains multiple types of entities and contains multiple complex relationships, which can help us evaluate the performance of the model on different types of triples. In particular, the attention mechanism in A-BI-LSTM assigns weight dynamically according to different relations and word correlations, while the FB15K dataset contains a variety of complex relations which can better verify the role of the A-Bi-LSTM module. The statistical information of the dataset is shown in Table 2.

Entity description refers to the statement that contains the subject entity. We checked whether the entity in the dataset had a corresponding description, filtered a total of 47 entities that were not sufficiently described and entities with a text description length of less than 3, and deleted all triples in the dataset containing the relevant entity. After preprocessing, the average word count of entity descriptions is 69, the shortest description contains 3 words, and the longest description contains 343 words.

Relational description refers to sentences containing entity pairs. Corpus NYT [15] is an existing work. Sentences were selected from New York Times articles to align with FB15K for joint learning. To ensure alignment accuracy, we only considered those sentences with anchor text linking to the entities in FB15K. We extracted 876,227 sentences containing both head and tail entities in FB15K triples and annotated them with the corresponding relations in triples. The sentences were labeled with 29,252 FB15K triples, including 629 relations and 5244 entities. Figure 4 shows an instance of a triplet in the dataset that contains both the relationship description and the entity description.

4.3. Evaluation Indicators

The KG completion task can be divided into two parts: the link prediction task and triple classification task. The link prediction task can be further divided into the entity prediction task and relationship prediction task. Link prediction tasks and triplet classification tasks have been widely used to evaluate the quality of knowledge representation models.

1.: Link prediction task

The link prediction task is divided into the entity prediction task and relationship prediction task. The purpose of the entity prediction task is to predict the lack of head entities or tail entities in the triple, that is, given

(h, r)

prediction

t

or given

(r, t)

prediction

h

; the purpose of the relationship prediction task is to predict the missing relationship in the triple, that is, to predict

r

for a given

(h, t)

.

In order to measure the performance of the model, for a triple which is to be tested, that is, an entity or relationship has partial content missing, each entity or relationship in the KG

G

is used to fill the missing content to obtain a candidate triple. Because the larger the score, the lower the credibility of the candidate triple, then the evaluation function

‖h_{s} + r_{s} - t_{s}‖

is used to calculate the score corresponding to each candidate triple and arrange it in ascending order. Finally, the established triple facts are ranked and the corresponding Mean Rank (MR) value and Hits@N value are calculated.

Mean Rank refers to the average value of the correct entity or relationship ranking position in the test set. For the head entity prediction task, the trained model traverses the candidate entities to replace the head entity

h

of the triple

(h, r, t)

in the test set and ranks the generated triples. The score is calculated for each triplet generated by replacing all candidate entities, the scores are sorted in descending order and the correct answer is recorded. The specific calculation formula of Mean Rank according to this process is as follows:

M R = \frac{1}{|Q|} \sum_{i = 1}^{|Q|} {r a n k}_{i}

(28)

where

|Q|

denotes the number of triples in the test set, and

{r a n k}_{i}

denotes the ranking position of the

i

th test triple.

Hits@N refers to the proportion of the ranking position of the correct answer of the test triple in the top N. Therefore, it can be seen that the smaller the MR value is, the higher the ranking position of the correct triples is. The larger the Hits@N value is, the higher the proportion of the ranking position of all correct triples in the top N position is. This shows that the higher the predictive ability of the model for correct answers, the better the knowledge map completion performance of the model. Lower MR values and higher Hits@N indicate that the model has better prediction performance. We call this evaluation setting “Raw”. Since a false predicted triplet may also exist in knowledge graphs, it should be regard as a valid triplet. Hence, we should remove the false predicted triplets included in training, validation and test sets before ranking. We call this evaluation setting “Filter”. The evaluation results are reported under these two settings.

2.: Triple classification task

The triple classification task is mainly to distinguish the authenticity of a triple

(h, r, t)

, which is usually considered as a binary classification problem. The triple

(h, r, t)

in the test set is scored by the model. If the score of the triple is higher than the set threshold

δ_{r}

, the triple is considered to be correct, otherwise it is considered to be invalid. The threshold

δ_{r}

is related to the relationship in its corresponding triples, and

δ_{r}

is determined by maximizing the accuracy of triple classification corresponding to different relationships in the validation set.

The triplet classification task usually uses accuracy as the evaluation index, which is defined as follows:

A c c u r a c y P = \frac{T h e c o r r e c t n u m b e r o f c l a s s i f i e d s a m p l e s}{N u m b e r o f a l l s a m p l e s}

(29)

4.4. Baseline Model

In order to prove the effectiveness of our model, we compare JKRL with the following two models: (1) translation distance model, including TransE [23], TransH [24], TransR [25] and TransD [26]; (2) Joint representation learning models integrating text description and structure, including DKRL [13], Joint [15], Jointly (LSTM) [20] and Jointly (A-LSTM) [20]. In order to analyze the effect of the model more comprehensively, we designed three variant models of JKRL(CNN+Bi-LSTM), JKRL(A-Bi-LSTM) and JKRL(CNN+A-Bi-LSTM).

TransE [23]: TransE is the first time a method has been proposed for training KG embedding based on the translation distance function, and it is also one of the most classic and widely used KG embedding methods.

TransH [24]: TransH relates each relationship to a relationship-specific hyperplane to alleviate the modeling problem of complex relationships.

TransR [25]: TransR believes that entities based on different relationships should exhibit different attributes by introducing relationship-specific vector spaces to deal with different relationships to mine more semantic information.

TransD [26]: The TransD model decomposes the projection matrix in TransR and introduces the projection vectors of head and tail entities to effectively distinguish the differences between head and tail entities.

DKRL [13]: The model of entity description for KG representation learning is introduced, and two coding methods of the continuous word bag model and convolutional neural network are explored to extract the semantic information of entity description. However, DKRL only considers the fusion of entity description information to enrich the semantic information of entity embedding.

Joint [15]: a joint learning framework based on knowledge graph and relational representation using CNN to encode the relation description. However, Joint only improves the representation of the relationship based on the relationship description.

Jointly (Bi-LSTM) [20]: Jointly (Bi-LSTM) is a joint representation learning framework that integrates text descriptions. This model uses TransE to learn the structure representation of triples and uses Bi-LSTM to encode the entity description information to obtain the text representation of entities. Finally, the gating mechanism is applied to integrate the structure and text representation of entities as the final joint representation.

Jointly (A-Bi-LSTM) [20]: Based on Jointly (Bi-LSTM), the attention mechanism is introduced to capture the most relevant content of sentence level and relationship to model the text representation of entities.

JKRL(CNN+Bi-LSTM): When the JKRL model is learning the representation of relation description, the location feature in the word feature is deleted, and the Bi-LSTM excluding the attention layer is used for encoding.

JKRL(A-Bi-LSTM): A-Bi-LSTM is used to encode the representation learning of entity description and relationship description in the JKRL model. Specifically, in the representation learning of entity description, the relative distance between each word and the head or tail entity in the sentence is added to the word feature representation as a positional feature, and the attention weight between each word and the head or tail entity is calculated at the attention level to obtain the text vector of the entity.

JKRL(CNN+A-Bi-LSTM): This is the JKRL model proposed in this paper. A CNN is used for representation learning for entity description and A-Bi-LSTM is used for representation learning for relational description.

4.5. Experimental Settings

We limited the hyperparameter range of the JKRL model and determined the optimal hyperparameter setting of the model by minimizing the MR value on the verification set. The range of all hyperparameters in the experiment was set as follows: the learning rate of the KG structure vector

λ_{G} \in {0.01,0.001,0.0005}

, the learning rate of the text vector

λ_{D} \in {0.01,0.025,0.05}

,

μ

,

η

,

γ

and the learning rate

λ

were set to 1. The maximum sentence length

L \in {80,90,100}

when using CNN to perform convolution operations on entity description text. The value range of the balance parameters α and β was

{0.01,0.001,0.0001}

, and the dimension range of all vectors

k \in {50,100,150,200}

. After combining different parameters for grid search, the optimal configuration of the model was as follows:

λ_{G} = 0.001

,

λ_{D} = 0.025, L = 90, α = 0.001, β = 0.0001, k = 100

.

In order to explore the influence of balance parameters

α

,

β

and vector dimension k on the final performance of the model, we visualized the changes in MR values of the verification set under different values of these hyperparameters under optimal configurations of other hyperparameters, as shown in Figure 5.

As can be seen from the figure, the model performance is optimal when the balance parameter

α

value is 0.001 and

β

value is 0.0001. Their function is to balance the contribution degree of the representation learning based on entity description and relation description to the triplet structure learning in the joint evaluation function so as to prevent the loss function of textual description from affecting the all-important representation of triples. When the dimension

k

value of a vector is 100, the model performance is optimal, which belongs to a small dimension, indicating that the JKRL model can effectively improve the sparsity of knowledge graph data.

4.6. Analysis of Experimental Results

1.: Entity prediction task

Table 3 shows the experimental results of the JKRL and baseline models in the entity prediction task on the FB15K dataset. Through the observation and analysis of the data in the table, we can draw the following conclusions:

Joint representation learning models combining structure and text description are better than the Trans series models based on structural representation in Mean Rank and Hits@10, indicating that effectively integrating the text description can provide rich semantic resources for knowledge graph representation, thus improving entity and relationship representation. It plays an important auxiliary role in the optimization of knowledge graph representation learning.
JKRL outperforms other baseline models that combine structure and text description for joint representation in Mean Rank and Hits@10, indicating that the JKRL model architecture can extract features from the entity description and relationship description, so as to effectively conduct joint representation learning based on structure and text description. Since DKRL and Jointly only consider the fusion of entity description information, and Joint only considers the fusion of relationship description information, they only use text descriptions to enrich the representation of entities or relationships from one perspective, resulting in limited improvement in representation. JKRL combines entity description and relation description to carry out joint representation learning of knowledge graphs. In addition, according to the characteristics of and differences in the two pieces of descriptive information, the corresponding coding model is designed to fully extract the deep semantic features in the text description, thus effectively improving the representation learning ability of knowledge graph.
In particular, JKRL(A-Bi-LSTM) achieved better experimental results than Jointly(A-Bi-LSTM), which indicates that JKRL can better extract word order features by introducing word position features and using A-Bi-LSTM to encode relationship descriptions. It also shows that combining entity description and relationship description at the same time to improve the knowledge map representation learning is better than using unilateral text description information.
The data index of JKRL(CNN+A-Bi-LSTM) is significantly better than that of JKRL(CNN+Bi-LSTM). The attention mechanism of JKRL(CNN+A-Bi-LSTM) is to dynamically allocate weights according to different relationships and word correlations. Because the FB15K dataset contains a variety of complex relationships, JKRL(A-Bi-LSTM) can play a better role, which shows the effectiveness of the Attention-Bi-LSTM encoder of the model.
JKRL(A-Bi-LSTM) adopts the A-Bi-LSTM encoder for both entity description and relation description, and both introduce word location features. However, compared with JKRL(CNN+A-Bi-LSTM), the performance was not improved, and even decreased. This shows that the effect of A-Bi-LSTM in text feature extraction for entity description is not as good as CNN. Since the entity description text may present an entity from various aspects, and the position of the head and tail entities in the text has no direct influence on the semantic expression of the text, the introduction of the attention mechanism and location features in the A-Bi-LSTM encoder does not play an effective role in the representation of the entity description. Noise is even introduced to reduce the entity representation effect, which leads to the overall joint learning performance decline.

The relationship types in the KG can be divided into four categories: 1-1 relationship, 1-N relationship, N-1 relationship and N-N relationship. We further compared the prediction performance of the model under different relationship types. Table 4 details the average Hits@10 values of the predicted head entity and the predicted tail entity under different relationship types.

Through the observation and analysis of the data in the table, we can draw the following conclusions:

The joint representation model that combines structure and text description is superior to the Trans series model in overall entity prediction results. TransD enables the interaction between entities and relationships and also achieved good entity prediction performance. This shows that in addition to optimizing the knowledge graph representation based on the triple structure, the semantic representation of entities or relationships can be effectively enhanced by fusing the triple structure and text description, and the representation ability of the knowledge graph can be improved.
Compared with the structure-based Trans series representation methods, JKRL outperforms the Trans series models in all indicators. Compared with joint representation models that fuse structure and textual descriptions, JKRL shows the best predictive performance for most relation types. It shows that JKRL not only effectively encodes the representation of the triplet structure, entity description and relationship description text, but also effectively balances the contribution of each vector representation in the learning process through the joint evaluation function, and jointly promotes the representation ability of the KG, thus improving the performance of entity prediction.
Joint has the best entity prediction performance among all baseline models. JKRL has similar indexes to Joint in N-1, N-N and 1-N relations and is significantly better than Joint in other multiple indexes, which indicates the importance of introducing entity description and relationship description in the JKRL model. Entity description encoding based on CNN and relationship description encoding based on Attention-Bi-LSTM can effectively extract text features and improve the model’s ability of knowledge representation.
By comparing JKRL and its variant models, we can rank their entity prediction capabilities as follows: JKRL(CNN+A-Bi-LSTM) > JKRL(A-Bi-LSTM) > JKRL(CNN+Bi-LSTM). This shows that the attention mechanism in A-Bi-LSTM plays an extremely important role in encoding the relational description. The attention mechanism calculates the semantic relevance of each word and different relations in the relational description, so as to dynamically select the most relevant information from the relational description and obtain the accurate relational text representation. However, due to the high correlation between the entity description text and the entity itself, the relative distance between the word and the entity has little connection with the expression of the word semantics. If A-Bi-LSTM is used to encode entity description, the attention mechanism has limited improvement in the extraction of semantic features, and the introduction of position features will bring noise and cause the overall model performance to decline. However, CNN can effectively capture local features in the entity description text by sliding on the text sequence through the convolution kernel and has the advantage of parallel computation with higher computational efficiency.

In addition, we visually compared the prediction performance of Joint, Jointly(Bi-LSTM), Jointly(A-Bi-LSTM), JKRL(CNN+Bi-LSTM), JKRL(A-Bi-LSTM) and JKRL(CNN+A-Bi-LSTM) under different relationship types. The specific comparison results are shown in Figure 6, where “PH” and “PT” represent head entity prediction and tail entity prediction, respectively.

2.: Triple classification task

Table 5 shows the experimental results of the triple classification tasks of all models on the FB15K dataset.

Through the observation and analysis of the data in the table, we can draw the following conclusions:

Compared with all baseline models, JKRL has the best performance, with an accuracy rate of 93.2% on the triplet classification task, which greatly exceeds the structure-based Trans series representation models. Compared with DKRL and Joint, it increased by 3.1% and 4.6%, respectively, and the accuracy was significantly improved. Compared to Jointly, the accuracy was improved by 1.7%. This proves the effectiveness of JKRL, which can effectively extract the semantic features of text description to improve the representation of entities and relations.
The triple classification accuracy of the JKRL(Bi-LSTM) model is 91.8%, which is 1.4% lower than that of the JKRL(A-Bi-LSTM) model. This shows the effectiveness of the Attention-Bi-LSTM encoding the relational description. The attention mechanism is introduced to calculate the weight of each word, which reduces the noise at the word level in the text description, and can dynamically select the most relevant content from the relationship description for modeling, so as to obtain a more accurate text representation of the relationship.
JKRL(A-Bi-LSTM) uses A-Bi-LSTM to encode both entity description and relationship description. In the case of increased model complexity, the accuracy rate is slightly decreased, decreasing by 0.6% compared with JKRL(CNN+A-Bi-LSTM). This once again verifies the validity of entity description encoding using CNN, which can not only effectively capture the deep semantic features in the entity description text, but also has higher computational efficiency.

3.: Runtime comparison

Different model architectures have a great impact on the computational workload. We calculated the training time consumption of JKRL and its variant models for triplet classification tasks, as shown in Table 6.

As can be seen from the table, JKRL(CNN+A-Bi-LSTM) took a 16.5% longer time than JKRL(CNN+Bi-LSTM) in training. On the basis of Bi-LSTM, the attention mechanism calculates the semantic relevance of each word and different relations in the relationship description. Because the text of the relationship description is short and contains relatively few words, the number of word weights needed to be calculated for each relationship description is relatively small, which makes A-Bi-LSTM effectively improve the representation ability of relational description with a small increase in consumption time. JKRL(CNN+A-Bi-LSTM) saves 60.4% of the time compared to JKRL(A-Bi-LSTM) training. Since the entity description text is long and has a large amount of data, using a CNN to encode it can effectively capture the local features in the entity description text, and has the advantage of parallel computing. Therefore, it not only captures the deep semantic features of the relational description more effectively, but also greatly improves the operation efficiency of the whole model.

5. Conclusions

In view of the fact that most knowledge representation models based on text description cannot fully mine the semantic features of text description, this paper proposes a KG representation method, JKRL, based on structure and text description, which jointly learns the structure-based representation and text-based representation of entities and relationships and effectively balances the contribution of triple structure and text description in the process of vector joint representation learning. The method is based on training the structured representation of entities and relations using TransE and using a convolutional neural network to encode the entity descriptions for obtaining textual representations of entities. Specifically, to extract the semantic features of the hidden text relations in the relationship descriptions, an Attention-Bi-LSTM text encoder is designed to encode the relationship descriptions. Firstly, position features are introduced into the word features, then Bi-LSTM is used to encode the words, and finally, an attention layer is used to compute the weights of each word based on the attention mechanism and dynamically select the most relevant content based on the weights to obtain the textual representation of the relationship. This method performs joint representation learning based on structured and textual representations, effectively balancing the contribution of structured and textual representations in the vector learning process. The experimental results show that the JKRL model proposed in this paper has strong competitiveness compared with the most advanced methods and can improve the quality of the vector representation of entities and relationships in the KG.

The method proposed in this paper starts from the perspective of additional information, introduces text description information and improves the way of modeling information to mine more accurate and richer semantic features to enhance the knowledge representation effect of the model. However, this study did not consider the introduction of multiple sources of information, such as relationship paths and entity types. How to integrate multiple sources of information with triple structures to improve the accuracy of knowledge representation is a future research direction.

Author Contributions

Conceptualization, G.X. and Q.Z.; methodology, G.X. and Q.Z.; investigation, Q.Z.; software, G.X.; validation, D.Y. and Y.L.; formal analysis, Q.Z. and D.Y.; resources, G.X.; data curation, G.X.; writing—original draft preparation, G.X.; writing—review and editing, Q.Z. and S.L.; supervision, S.L. and Y.L.; project administration, D.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Water Resources Science and Technology Projects in Jiangsu Province under grant number 2017065 and the National Key R & D Program of China under grant number 2018YFC0407106.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are grateful to the editor and anonymous reviewers for their suggestions for improving the quality of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, M.; Sun, Z.; Zhang, S.; Zhang, W. Enhancing knowledge graph embedding with relational constraints. Neurocomputing 2021, 429, 77–88. [Google Scholar] [CrossRef]
Li, Z.; Liu, H.; Zhang, Z.; Liu, T.; Shu, J. Recalibration convolutional networks for learning interaction knowledge graph embedding. Neurocomputing 2021, 427, 118–130. [Google Scholar] [CrossRef]
Gong, F.; Wang, M.; Wang, H.; Wang, S.; Liu, M. SMR: Medical knowledge graph embedding for safe medicine recommendation. Big Data Res. 2021, 23, 100174. [Google Scholar] [CrossRef]
Gesese, G.A.; Biswas, R.; Sack, H. A Comprehensive Survey of Knowledge Graph Embeddings with Literals: Techniques and Applications. DL4KG@ ESWC 2019, 2377, 31–40. [Google Scholar]
Wang, M.; Qiu, L.; Wang, X. A survey on knowledge graph embeddings for link prediction. Symmetry 2021, 13, 485. [Google Scholar] [CrossRef]
Ferrari, I.; Frisoni, G.; Italiani, P.; Moro, G.; Sartori, C. Comprehensive Analysis of Knowledge Graph Embedding Techniques Benchmarked on Link Prediction. Electronics 2022, 11, 3866. [Google Scholar] [CrossRef]
Xu, Z.; Sheng, Y.; He, l.; Wang, Y. Review on knowledge graph techniques. J. Univ. Electron. Sci. Technol. China 2016, 45, 589–606. [Google Scholar]
Shu, S.; LI, S.; Hao, X.; Zhang, L. Knowledge graph embedding technology: A review. J. Front. Comput. Sci. Technol. 2021, 15, 2048. [Google Scholar]
Xie, Q.; Ma, X.; Dai, Z.; Hovy, E. An interpretable knowledge transfer model for knowledge base completion. arXiv 2017, arXiv:1704.05908. [Google Scholar]
Shi, B.; Weninger, T. Proje: Embedding projection for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
Frisoni, G.; Moro, G.; Carlassare, G.; Carbonaro, A. Unsupervised event graph representation and similarity learning on biomedical literature. Sensors 2021, 22, 3. [Google Scholar] [CrossRef]
Mintz, M.; Bills, S.; Snow, R.; Jurafsky, D. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009; pp. 1003–1011. [Google Scholar]
Xie, R.; Liu, Z.; Jia, J.; Luan, H.; Sun, M. Representation learning of knowledge graphs with entity descriptions. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Han, X.; Liu, Z.; Sun, M. Joint representation learning of text and knowledge for knowledge graph completion. arXiv 2016, arXiv:1611.04125. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph and text jointly embedding. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1591–1601. [Google Scholar]
Zhong, H.; Zhang, J.; Wang, Z.; Wan, H.; Chen, Z. Aligning knowledge and text embeddings by entity descriptions. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 267–272. [Google Scholar]
Zhang, D.; Yuan, B.; Wang, D.; Liu, R. Joint semantic relevance learning with text data and graph knowledge. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, Beijing, China, 26–31 July 2015; pp. 32–40. [Google Scholar]
He, M.; Du, X.; Wang, B. Representation learning of Knowledge Graphs via fine-grained relation description combinations. IEEE Access 2019, 7, 26466–26473. [Google Scholar] [CrossRef]
Xu, J.; Chen, K.; Qiu, X.; Huang, X. Knowledge graph representation with jointly structural and textual encoding. arXiv 2016, arXiv:1611.08661. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the Advances in Neural Information Processing Systems 26 (NIPS 2013), Lake Tahoe, NA, USA, 5–10 December 2013. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; Volume 1, pp. 687–696. [Google Scholar]
Yang, S.; Tian, J.; Zhang, H.; Yan, J.; He, H.; Jin, Y. TransMS: Knowledge Graph Embedding for Complex Relations by Multidirectional Semantics. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; pp. 1935–1942. [Google Scholar]
Yang, B.; Yih, W.-T.; He, X.; Gao, J.; Deng, L. Embedding entities and relations for learning and inference in knowledge bases. arXiv 2014, arXiv:1412.6575. [Google Scholar]
Nickel, M.; Rosasco, L.; Poggio, T. Holographic embeddings of knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
Liu, H.; Wu, Y.; Yang, Y. Analogical inference for multi-relational embeddings. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2168–2178. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 2071–2080. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Nguyen, D.Q.; Nguyen, T.D.; Nguyen, D.Q.; Phung, D. A novel embedding model for knowledge base completion based on convolutional neural network. arXiv 2017, arXiv:1712.02121. [Google Scholar]
Lin, Y.; Liu, Z.; Luan, H.; Sun, M.; Rao, S.; Liu, S. Modeling relation paths for representation learning of knowledge bases. arXiv 2015, arXiv:1506.00379. [Google Scholar]
Feng, J.; Huang, M.; Yang, Y.; Zhu, X. GAKE: Graph aware knowledge embedding. In Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; pp. 641–651. [Google Scholar]
Wang, Z.; Li, J.; Liu, Z.; Tang, J. Text-enhanced representation learning for knowledge graph. In Proceedings of the International joint conference on artificial intelligent (IJCAI), New York, NY, USA, 9–15 July 2016; pp. 4–17. [Google Scholar]
An, B.; Chen, B.; Han, X.; Sun, L. Accurate text-enhanced knowledge graph representation learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Volume 1, pp. 745–755. [Google Scholar]
Yao, L.; Mao, C.; Luo, Y. KG-BERT: BERT for knowledge graph completion. arXiv 2019, arXiv:1909.03193. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Wang, B.; Shen, T.; Long, G.; Zhou, T.; Wang, Y.; Chang, Y. Structure-augmented text representation learning for efficient knowledge graph completion. In Proceedings of the Web Conference 2021, Virtual, 19–23 April 2021; pp. 1737–1748. [Google Scholar]
Shen, J.; Wang, C.; Gong, L.; Song, D. Joint language semantic and structure embedding for knowledge graph completion. arXiv 2022, arXiv:2209.08721. [Google Scholar]
Chen, M.; Tian, Y.; Chang, K.-W.; Skiena, S.; Zaniolo, C. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. arXiv 2018, arXiv:1806.06478. [Google Scholar]
Cochez, M.; Garofalo, M.; Lenßen, J.; Pellegrino, M.A. A first experiment on including text literals in KGloVe. arXiv 2018, arXiv:1807.11761. [Google Scholar]
Wu, Y.; Wang, Z. Knowledge graph embedding with numeric attributes of entities. In Proceedings of the Third Workshop on Representation Learning for NLP, Melbourne, Australia, 20 July 2018; pp. 132–136. [Google Scholar]
Trisedya, B.D.; Qi, J.; Zhang, R. Entity alignment between knowledge graphs using attribute embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 297–304. [Google Scholar]
Pezeshkpour, P.; Chen, L.; Singh, S. Embedding multimodal relational data for knowledge base completion. arXiv 2018, arXiv:1809.01341. [Google Scholar]
Xie, R.; Liu, Z.; Sun, M. Representation learning of knowledge graphs with hierarchical types. In Proceedings of the IJCAI, New York, NY, USA, 9–15 July 2016; pp. 2965–2971. [Google Scholar]
Esteban, C.; Tresp, V.; Yang, Y.; Baier, S.; Krompaß, D. Predicting the co-evolution of event and knowledge graphs. In Proceedings of the 2016 19th International Conference on Information Fusion (FUSION), Heidelberg, Germany, 5–8 July 2016; pp. 98–105. [Google Scholar]
Guan, S.; Jin, X.; Wang, Y.; Cheng, X. Link prediction on n-ary relational data. In Proceedings of the The World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 583–593. [Google Scholar]
Rosso, P.; Yang, D.; Cudré-Mauroux, P. Beyond triplets: Hyper-relational knowledge graph embedding for link prediction. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 1885–1896. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Chen, Y. Convolutional Neural Network for Sentence Classification; University of Waterloo: Waterloo, ON, Canada, 2015. [Google Scholar]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems 26 (NIPS 2013), Lake Tahoe, NA, USA, 5–10 December 2013. [Google Scholar]
Wang, J.Q.; Du, Y.; Wang, J. LSTM based long-term energy consumption prediction with periodicity. Energy 2020, 197, 117197. [Google Scholar] [CrossRef]
Bollacker, K.; Cook, R.; Tufts, P. Freebase: A shared database of structured general human knowledge. In Proceedings of the AAAI, Vancouver, BC, Canada, 22–26 July 2007; pp. 1962–1963. [Google Scholar]

Figure 1. Overall architecture of JKRL model.

Figure 2. Entity description text encoding process diagram.

Figure 3. Attention-Bi-LSTM encoder.

Figure 4. Example of a triplet in the dataset that contains text description.

Figure 5. (a) Variation trend of MR under different values of balance parameters

α

and

β

; (b) variation trend of MR under different values of the vector dimension

k

.

Figure 5. (a) Variation trend of MR under different values of balance parameters

α

and

β

; (b) variation trend of MR under different values of the vector dimension

k

.

Figure 6. Comparison of entity prediction results of the model under different relationship types.

Table 1. Symbols used in the JKRL model.

Symbol	Description
$h, r, t$	Head entity, relation entity and tail entity
$h_{s}, r_{s}, t_{s}$	Structure vector of head entity, relation and tail entity
$h_{d}, r_{d}, t_{d}$	Text vectors of head entity, relation and tail entity
$x_{i}$	Word vector
$α_{i} (r)$	The attention weight of word $i$ under relation $r$
$s$	Text description statement
$T$	Text description statement set

Table 2. Statistics of FB15k dataset.

Dataset	Number of Entities	Relation Number	Training Set	Validation Set	Test Set
FB15k	14,904	1341	472,860	48,991	57,803

Table 3. Entity prediction results.

Evaluating Indicator	Mean Rank		Hits@10
Evaluating Indicator	Raw	Filter	Raw	Filter
TransE	243	125	0.349	0.471
TransH	212	87	0.457	0.644
TransR	198	77	0.482	0.687
TransD	194	81	0.534	0.773
DKRL	181	91	0.496	0.674
Jointly(Bi-LSTM)	179	90	0.493	0.697
Jointly(A-Bi-LSTM)	167	73	0.529	0.755
Joint	—	—	—	0.787
JKRL(CNN+Bi-LSTM)	162	78	0.538	0.786
JKRL(A-Bi-LSTM)	154	71	0.559	0.802
JKRL(CNN+A-Bi-LSTM)	148	67	0.574	0.813

Table 4. Entity prediction results under different relationship types.

Experiment	Head Entity Prediction				Tail Entity Prediction				Overall
Relation Classification	1-1	1-N	N-1	N-N	1-1	1-N	N-1	N-N	Overall
TransE	0.437	0.657	0.182	0.472	0.437	0.197	0.667	0.500	0.471
TransH	0.668	0.876	0.287	0.645	0.655	0.398	0.833	0.672	0.644
TransR	0.788	0.892	0.341	0.692	0.792	0.374	0.904	0.721	0.687
TransD	0.861	0.955	0.398	0.785	0.854	0.506	0.944	0.812	0.773
DKRL	—	—	—	—	—	—	—	—	0.674
Jointly(Bi-LSTM)	0.813	0.889	0.188	0.452	0.801	0.254	0.896	0.524	0.697
Jointly(A-Bi-LSTM)	0.838	0.951	0.211	0.479	0.830	0.308	0.947	0.531	0.755
Joint	0.827	0.891	0.450	0.807	0.817	0.577	0.874	0.828	0.787
JKRL(CNN+Bi-LSTM)	0.852	0.919	0.402	0.765	0.838	0.512	0.917	0.803	0.786
JKRL(A-Bi-LSTM)	0.867	0.935	0.411	0.776	0.848	0.519	0.938	0.818	0.802
JKRL(CNN+A-Bi-LSTM)	0.872	0.958	0.424	0.780	0.865	0.527	0.955	0.832	0.813

Table 5. Experimental results of triplet classification.

Model	Accuracy
TransE	0.798
TransH	0.877
TransR	0.839
TransD	0.880
DKRL	0.901
Jointly(Bi-LSTM)	0.905
Jointly(A-Bi-LSTM)	0.915
Joint	0.886
JKRL(CNN+Bi-LSTM)	0.918
JKRL(A-Bi-LSTM)	0.926
JKRL(CNN+A-Bi-LSTM)	0.932

Table 6. Training time results of JKRL and its variant models on triplet classification.

Model	Training Time	Triple Classification Accuracy
JKRL(CNN+Bi-LSTM)	8.7834 h	0.918
JKRL(A-Bi-LSTM)	25.8745 h	0.926
JKRL(CNN+A-Bi-LSTM)	10.2359 h	0.932

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, G.; Zhang, Q.; Yu, D.; Lu, S.; Lu, Y. JKRL: Joint Knowledge Representation Learning of Text Description and Knowledge Graph. Symmetry 2023, 15, 1056. https://doi.org/10.3390/sym15051056

AMA Style

Xu G, Zhang Q, Yu D, Lu S, Lu Y. JKRL: Joint Knowledge Representation Learning of Text Description and Knowledge Graph. Symmetry. 2023; 15(5):1056. https://doi.org/10.3390/sym15051056

Chicago/Turabian Style

Xu, Guoyan, Qirui Zhang, Du Yu, Sijun Lu, and Yuwei Lu. 2023. "JKRL: Joint Knowledge Representation Learning of Text Description and Knowledge Graph" Symmetry 15, no. 5: 1056. https://doi.org/10.3390/sym15051056

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

JKRL: Joint Knowledge Representation Learning of Text Description and Knowledge Graph

Abstract

1. Introduction

2. Related Work

2.1. Knowledge Representation Model Based on Single Triples

2.1.1. Translational Distance Models

2.1.2. Semantic Matching Models

2.1.3. Neural Network Models

2.2. Knowledge Representation Model Fused with Additional Information

2.2.1. Graph Structure Information

2.2.2. Entity Description Information

2.2.3. Other Information

3. Joint Knowledge Representation Learning of Text Description and Knowledge Graph

3.1. Symbols and Definitions

3.2. The Overall Architecture of the Model

3.3. Representation Learning of Triple Structure

3.4. Representation Learning of Entity Descriptions

3.5. Representation Learning of Relation Description

4. Experiment and Result Analysis

4.1. Experimental Environment

4.2. Dataset

4.3. Evaluation Indicators

4.4. Baseline Model

4.5. Experimental Settings

4.6. Analysis of Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI