Enhancing Knowledge of Propagation-Perception-Based Attention Recommender Systems

Zhang, Hanzhong; Wang, Yinglong; Chen, Chao; Liu, Ruixia; Zhou, Shuwang; Gao, Tianlei

doi:10.3390/electronics11040547

Open AccessArticle

Enhancing Knowledge of Propagation-Perception-Based Attention Recommender Systems

by

Hanzhong Zhang

^1,2

,

Yinglong Wang

^3,*,

Chao Chen

²,

Ruixia Liu

²

,

Shuwang Zhou

^1,2 and

Tianlei Gao

^1,2

¹

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

²

Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan 266590, China

³

Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 266590, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(4), 547; https://doi.org/10.3390/electronics11040547

Submission received: 7 January 2022 / Revised: 6 February 2022 / Accepted: 8 February 2022 / Published: 11 February 2022

(This article belongs to the Special Issue Recommender Systems: Approaches, Challenges and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Researchers have introduced side information such as social networks or knowledge graphs to alleviate the problems of data sparsity and cold starts in recommendation systems. However, most of the methods ignore the exploration of feature differentiation aspects in the knowledge propagation process. To solve the above problem, we propose a new attention recommendation method based on an enhanced knowledge propagation perception. Specifically, to capture user preferences in a fine-grained manner in a knowledge graph, an asymmetric semantic attention mechanism is adopted. It identifies the influence of propagation neighbors on user preferences through a more precise representation of the preference semantics for head and tail entities. Furthermore, in consideration of the memory and generalization of different propagation depth features and adaptively adjusting the propagation weights, a new propagation feature exploration framework is designed. The performance of the proposed model is validated by two real-world datasets. The baseline model averagely increases by 9.65% and 9.15% for the Area Under Curve (AUC) and Accuracy (ACC) indicators, which proves the effectiveness of the model.

Keywords:

recommender systems; knowledge graph; attention mechanism; heterogeneous propagation

1. Introduction

With the rapid growth of data, it becomes very difficult to satisfy the personalized information needs of users. In order to let users find items of interest, recommender systems (RS) were proposed by researchers [1] and applied in electronic business [2], movies [3,4], books and other fields. In recent years, due to the increasing application and better performance on RS in commercial websites such as Amazon and YouTube, it has become a research hotspot.

The collaborative filtering (CF) algorithm is a successful method in the recommendations systems. It adopts the interaction records of users to explore the similarity of users or items, thereby realizing the modeling of user preferences. With the similar users have similar preferences, it recommends the most similar items. CF is mainly divided into two categories [5]: user-based recommendation and item-based recommendation. However, in the application scenario of the Internet, on the one hand, the number of users is much larger than that of items, and the number of resources consumed by users is relatively large. On the other hand, a single user has only a few records in the historical interaction matrix, so it has greater sparseness, which is not conducive to the result of the recommendation. Although item-based recommendations shows the superiority of the algorithm, they can still be affected by data sparseness. In order to solve the above problems and optimize the effect of the recommendation, RS introduces side information auxiliary recommendation, such as social networks [6], text video content information [7], item reviews [8], etc. Among them, knowledge graphs (KG) are also a kind of efficient side information.

A KG [9] is a large heterogeneous graph composed of triples (head entity, relationship, and tail entity), which can clearly show the relationship among entities in the data. In recent years, many large-scale KGs have appeared continuously, such as DBpedia [10], extracted from Wikipedia and YAGO [11], a knowledge dataset developed by Max Planck Institute in Germany. Because the semantic relationship of the KG alleviates the problems of cold starts and interpretability of the recommender systems [12], enhances the capture of user interest, and then shows strong potential, recommendations based on the KG have attracted the attention of researchers. The recommendation process based on the KG is illustrated in Figure 1. Finding recommended items through the relationships provided by the KG improves the diversity and interpretability of recommendations.

Through recent research, a large number of algorithms combining KG to solve the sparse interactive data have been continuously proposed. For example, DKN [13] is a recommended method based on news that integrates the relevant information of the KG into the news semantics, and then uses a convolutional neural network (CNN) to make the final recommendation. KGCN, proposed by Wang [14] et al., introduced graph neural networks to explore accurate user item embeddings in knowledge graph aggregation. RippleNet [15] uses the corresponding entities of historical items in the KG and then obtains a set of entities with different propagation times by relations. Finally, user preferences are obtained by aggregating the entity features of each hop. It provides a new idea for recommender systems in KG propagation. Later, AKUMP [16] pointed out a different tree-like approach to explore the relationships of entities at each level using a self-attentive mechanism. Recently, Wang et al. [17] combined multi-task learning with RippleNet to improve the recommendation effectiveness of knowledge graphs for outward propagation.

However, most of the existing outward propagation methods ignore the expression of preferences in the semantic features of entities in propagation. This may limit the accuracy of the propagation model for the prediction of items to users, which in turn affects the recommendation gain in the scenario. First, the semantic representation of the head entity to the user’s preferences is different under different relationships. Second, only the head entity and the relationship are used to identify the weight of the tail entity, which cannot clearly express the one-to-many relationship of triples. For example, the two triples (Interstellar, genre, and Adventure) and (Interstellar, genre, Fantasy) shown in Figure 1 have the same head entity and relationship. Therefore, it is inappropriate for the tail entity to express the same preference semantics. In terms of feature aggregation at different depths, the existing KG propagation methods only enable a simple summation operation directly on the results obtained from each propagation to obtain deep-level features as user preference representations. Although the summed higher-order propagated features contain lower-order features, the lower-order features are continuously diluted during the propagation process. The lower-order features are closer to the user’s original interaction entity and have a deeper impact on user preferences. Therefore, there are greater restrictions on the expression of preferences.

To solve the above problems, we design a new recommendation method, Enhance Knowledge Propagation Perception Net (EKPNet). Its purpose is to solve the problem of click-through rate predictions of implicit feedback. First, we initialize the user’s click history on the item, and use the dissemination to spread the clicked historical item through the KG to enhance the model’s expression of user interest. Then, we use an asymmetrical semantic attention mechanism. When it samples in the KG, the head and tail entities are mapped to the corresponding preference semantic space, thus better expressing the influence of different entities on user preferences. Then, to address the negative impact of propagation depth on the model, we use a new propagation feature exploration architecture that preserves features at different propagation depths and extends the model to a nonlinear neural network to account for feature interactions while taking into account the memory and generalization of features at different depths. Finally, we use a numerical simulation [18] to discuss the effectiveness of EKPNet. In summary, the contributions of this article are as follows:

We design an attention mechanism with asymmetric semantics in EKPNet for the KG propagation. It enhances the mining of user preferences by mapping the semantics of head and tail entities into different preference spaces.
A new communication exploration framework has been proposed. The deep learning network is used to explore the entity characteristics of different depth propagation aggregates in the KG. It balances both the memorization and generalization of features, and adaptively adjusts the weights to different depths.
We test our model through a large number of experiments on two real-world public datasets. Compared with several state-of-the-art baselines in multiple indicators, EKPNet has a substantial improvement.

The structure of this article is as follows: Section 2 introduces research on relevant technologies. Section 3 describes the basic concepts used in the paper and the problems that need to be dealt with. Section 4 shows the model architecture and each module method proposed as well as the loss function, optimization process and time complexity. Section 5 presents the experimental configuration, followed by numerical simulations to make comparisons and a discussion of the results. Finally, conclusions and further work are presented in Section 6.

2. Related Work

2.1. Knowledge Graph-Based Recommender

The KG is a new area of research for recommender systems, and has been developed rapidly in the recommended industry and academia. At this stage, recommendation methods based on knowledge graphs are divided into three categories: path-based methods, embedding-based methods and unified methods.

The first type is path-based methods. This method was first developed in [19]. Generally, the similarity of the meta path and the path of the user project are compared as the recommendation indicator to enrich the representation of the user or item. For example, Yu et al. [20] designed multiple meta-paths for the first time in the HETE-MF algorithm, then used meta-paths to calculate the similarity among items to obtain a similarity matrix, and finally achieved efficient recommendation through matrix decomposition. Zhao et al. [21] introduced the concept of meta-graph in order to break through the limitation of a single path and then used Factorization Machine (FM) and matrix factorization to further mine the path information. However, these methods must rely on experts with professional knowledge to manually design meta-paths, so many feature combinations will be missed, and there are countless entity relationships in the context of massive data, which makes it difficult to implement manual design.

The second type is method based on embedding. Usually, this kind of method uses Knowledge Graph Embedding (KGE) [22] to map the entities and relationships into low-dimensional embedding vectors, and then integrate the embeddings into the recommendation framework. For example, Zhang et al. [23] combined three types of data at the same time. It encodes the knowledge graph through the distance model TransR [24], extracts text content and visual content through an auto-encoder, and finally merges the embeddings into the CKE framework. However, it requires a large number of different types of auxiliary information, which is unrealistic for real-world recommendations. Similarly, in [13], a DKN architecture was proposed, which utilized CNN to learn title content, and learned historical clicks through an attention mechanism. Recently, some researchers have applied Graph Neural Networks (GNN) to recommender systems. For example, the BEM framework [25] uses GNN to learn behavior graphs, and uses distance model transE [26] to learn knowledge association graphs, and finally inputs the embedding vectors obtained separately into the Bayesian framework. All in all, although embedding-based methods are widely used in recommendation and show flexibility for multiple data types, this method is more biased towards link prediction problems than recommendation problems.

The third type is unified method. Unified methods combine the advantages of the first two methods. It not only uses the entity semantics of user items but also explores the semantic connectivity of the knowledge graph. Its overall is to use the item embedding of historical interaction and the multi-hop neighbor embedding of item association to make recommendations. For example, RippleNet was proposed by Wang et al. [15]. It provides an efficient end-to-end framework by spreading the historical items of user interaction in the knowledge graph to enhance the expression of user embedding. The two combinations have been proven to be more effective. Later, Tang [16] et al. pointed out a different tree method, AKUPM, which uses the self-attention mechanism to explore the relationship between entities at each level to improve the quality of recommendations. Recently, Wang et al. [17] combined multi-task learning with RippleNet to improve the effectiveness of recommendations. However, the existing methods do not focus on the influence of users’ preferences for the semantic features of entities in knowledge propagation, so we design a new architecture for exploring entity preference features in the knowledge propagation.

2.2. Attention Mechanism

The attention mechanism simulates the natural reaction of human beings to observe things and to focus on the part of the weight learning to improve the task effect [27]. Nowadays, it has achieved good results in the fields of computer vision [28], waveform analysis [29] and natural language processing [30]. Similarly, it gradually appeared in the field of recommender systems. For example, in [31], the attention mechanism was combined with FM to distinguish the importance of different feature interactions. Zhou et al. [32] proposed DIN in the context of e-commerce, using the attention mechanism to explore the importance of different historical interaction behaviors of users. Wang et al. [13] presented the DKN, which introduced an attention network to calculate the weight of the historical item corresponding to the target item, and then explore the expression of user interest. In [33], a self-attentive integration network framework was presented to capture the interaction among features. Its computational focus was changed from between different sequences to between sequences of itself, and adaptively integrate side information through the integration layer. AKUPM introduces a self-attentive mechanism in the KG propagation, and pays more attention to the relationship between the tail entities in each layer of the propagation. Then, Tu [34] et al. proposed a graph attention method in the field of knowledge graph aggregation. It is the first time that the knowledge distilling and refining method was combined with a graph attention network, and it has obtained better results in knowledge graph aggregation research.

In our work, an attention mechanism with asymmetric semantics is designed for knowledge graph outward propagation, which considers the simultaneous use of head and tail entity preference semantics in the attention mechanism and therefore efficient filtering of irrelevant entities.

3. Preliminaries

Before introducing the EKPNet model, it is necessary to clarify the basic concepts and objectives of the task.

3.1. Implicit Feedback

Implicit feedback refers to historical data that do not contain users’ tendencies, such as browsing, watching, clicking and other behaviors, which indirectly reflect the users’ point of view [35]. Compared with the data form of explicit feedback (such as ratings, etc.), the sparsity of implicit feedback is smaller than that of explicit feedback, and it contains richer information to mine user preferences. In the classic recommendation scenario, the user set is expressed as

U = \{u_{1}, u_{2}, \dots, u_{|U|}\}

and the item set is expressed as

O = \{o_{1}, o_{2}, \dots, o_{|O|}\}

. User item pairs are represented by the interaction matrix

Y = {y_{u o} | u \in U, o \in O}

, where

y_{u o} = \{\begin{matrix} 1, user u interacts with item o; \\ 0, otherwise . \end{matrix}

(1)

indicates whether there is an interactive relationship between the user and the item. It is also a record of implicit feedback. In particular, an element with a value of 1 in Y only means that there is an interaction between the user and the item, but it does not mean that the user likes the item. Similarly, the element of 0 in the matrix can only mean that there is no interaction, and it cannot simply say that the user hates it. In this article, we regard the samples with interaction (i.e.,

y_{u o} = 1

) as positive instances, and the samples without interaction (i.e.,

y_{u o} = 0

) as negative samples.

3.2. Knowledge Graph

The other part of the data is the knowledge graph G. It is composed of the head entity–relationship–tail entity triples (h, r, t), which represent the relationship with each entity, and is an efficient data source for the recommender systems. We still use the previous work experience, assuming that each item

o \in O

in the interaction matrix Y has a corresponding entity in the G.

3.3. Problem Formulation

In this article, our task is to explore the interaction matrix Y and the knowledge graph G to predict the probability that users will adopt the target that has not been interacted. More specifically, we have to learn a function

{\hat{y}}_{u o} = F (u, o | θ, G)

(2)

where

{\hat{y}}_{u o}

is the predicted probability of interaction between user u and target item o obtained by the function, θ is the parameter of prediction model F, and G is the knowledge graph used for prediction. According to the predicted probability, we can also provide users with a list of recommended top-N results.

4. The Proposed Method

The architecture of the EKPNet model is shown in Figure 2. It is composed of three main sections. The first part is the entity propagation layer. It obtains the set of entity representations with different propagation times by the set of user history and the knowledge graph. The second part is the asymmetric attention mechanism layer, which is used for high-quality learning of user preferences in triples. The third part is the propagation feature exploration prediction layer. Its function is to explore the propagation of each layer returned from the previous layer according to different levels and finally obtain the prediction results.

4.1. Knowledge Graph Propagation

The relationship information among entities is expressed in the knowledge graph. The connected entities have strong relevance, so they have great potential in the exploration of user interests. However, for users, not all entities are useful for expressing user preferences. If entities that are not related to users are introduced into the user preference expression, a large number of data noises will be introduced, which will affect the recommendation result. Therefore, in order to filter the noise of irrelevant entities, we use the knowledge graph propagation method from the user interaction entity as a starting point and use relational connection propagation to explore user-related entities. Assume that all entity sets of knowledge graph G are E, and the relation set is R.

First, for a certain user u, we retrieve the entity corresponding to the user’s historical interaction item in the KG, and obtain the entity set of the user u at the 0th hop (not yet started to propagation):

S_{u}^{0} = \{s_{u, 0}^{0}, s_{u, 1}^{0}, \dots, s_{u, i}^{0}, \dots, s_{u, n_{0}}^{0}\}

(3)

where

s_{u, i}^{0}

is the i-th entity of the user u in the 0-th hop set, and

s_{u, i}^{0} \in E = \{e_{1}, e_{2}, \dots\}

. The number in the superscript indicates the current transmission frequency.

n_{0}

is the number of entities in the 0-th hop.

Then, as shown in Figure 3, we use the relationship of the triples as a link to obtain the multi-hop recursive entity representation set of user u:

S_{u}^{p} = {t_{u}^{p} | (h_{u}^{p}, r_{u}^{p}, t_{u}^{p}) \in G, h_{u}^{p} \in S_{u}^{p - 1}}

(4)

where

p = 1, \dots, P

is the number of propagations. There are

h_{u}^{p}, t_{u}^{p} \in E, r_{u}^{p} \in R = \{r_{1}, r_{2}, \dots\}

, and if there are repetitive entities in the propagation, they will only remain in the set that appears for the first time. Through the constant iteration of formula (4), P + 1 sets of P hops of the user are finally obtained:

ε = \{S_{u}^{0}, S_{u}^{1}, \dots, S_{u}^{P}\}

(5)

The above-mentioned knowledge map dissemination process is shown in Figure 3, which clearly shows the process of user history using relationship propagation in the knowledge graph. In the picture, users used three viewing history records to conduct physical propagation twice. Search the three entities corresponding to the historical records in the KG, and use the relationship in the graph to find the associated entities. The entities in the ripples of different color depths represent the set of different propagation times. It is meaningful to use the adjacent entities of the KG to enrich the user representation, because each entity in the graph has a strong relationship with the user, and the adjacent entities can be regarded as the richness of user preferences and item features. Therefore, the propagation process is also a process of exploring the high-level features of the entity.

4.2. Asymmetric Semantic Attention Mechanism

In the

(h, r, t)

of the KG, the same head entities have different effects on users’ preferences under different relationships. For example, for the entity Forrest Gump movie, the user prefers the genre to the actor; thus, the movie Forrest Gump should have more weight when spreading on the genre relationship. On the one hand, the use of head entities and relationships to identify user preferences cannot identify one-to-many relationships. For example, the actor relationship in the movie Forrest Gump corresponds to multiple entities. Therefore, it is not appropriate to use the same preference semantic expression for these tail entities. Based on this, we design an asymmetric attention mechanism to map the head and tail entities to the corresponding preference semantic space. Its semantic space is jointly determined by entities and relationships.

In addition, propagation is based on the KG triples. If only the self-attention mechanism of the tail entity is used without the head and tail entity information, the information will be lost. Therefore, in the attention mechanism of this unit, we consider the semantics between the front and back entities. For knowledge graph triples

(h_{i}^{p}, r_{i}^{p}, t_{i}^{p})

, the structure of our asymmetric attention mechanism is as follows:

f_{i, o}^{p} = φ (ψ (v^{h_{i}^{p}}), v^{r_{i}^{p}}, v_{p}^{o}) ψ (v^{t_{i}^{p}})

(6)

where

f_{i, o}^{p}

is the feature vector of the i-th entity in the p-th propagation of a certain user u corresponding to the target item o, which is the output part of the asymmetric attention mechanism.

φ (., ., .)

is the similarity calculation function, which is used to measure the attention weight of the propagating entity.

ψ (.)

is a semantic transformation function of asymmetric preference, which enables entities to express preference semantic features in the attention mechanism.

v^{h_{i}^{p}}, v^{r_{i}^{p}}, v^{t_{i}^{p}} \in ℝ^{d}

are the embedding of the head entity

h_{i}^{p}

, the relation

r_{i}^{p}

and the tail entity

t_{i}^{p}

of the i-th triplet in the user’s p-th propagation.

v_{p}^{o} \in ℝ^{d}

is the embedding of the target item o at the p-th hop. However, the

v_{p}^{o}

of each hop is constantly updated. At the end of this section, we give the update method of

v_{p}^{o}

. Below we will introduce several functions in the above formula one by one. We first introduce the semantic transformation function

ψ (.)

of asymmetric entities. In the triple

(h_{i}^{p}, r_{i}^{p}, t_{i}^{p})

, the corresponding head-to-tail entity semantic transformation formula is as follows:

ψ (v^{h_{i}^{p}}) = T_{〈h_{i}^{p}, r_{i}^{p}〉} v^{h_{i}^{p}} ψ (v^{t_{i}^{p}}) = T_{〈r_{i}^{p}, t_{i}^{p}〉} v^{t_{i}^{p}}

(7)

where

T_{〈h_{i}^{p}, r_{i}^{p}〉} \in ℝ^{d \times d}

and

T_{〈r_{i}^{p}, t_{i}^{p}〉} \in ℝ^{d \times d}

are the semantic matrix corresponding to the head and tail entities. The head and tail entities are mapped to different semantic spaces through their respective semantic matrices to realize the expression of their respective preference semantics. Although the use of different semantic matrices can enhance the expression of semantic features, on the one hand, the number of entities in the KG is a lot, and for each different combination of head entity and relationship or tail entity and relationship, there are different semantic matrices; these will result in a sharp increase in the number of parameters. On the other hand, if a certain head entity relationship combination or tail entity relationship combination does not include in the training set, the matrix will not be trained, resulting in a sharp drop in recommendation performance.

Therefore, for the above-mentioned problem of our model, we used a similar method to the translation model TransD [36] to optimize the transfer function. However, unlike TransD, our focus is on optimizing perception in the attention mechanism rather than translating entities. Specifically, first construct a semantic vector

γ \in ℝ^{d}

for each entity and relationship, and then take out the three semantic vectors

γ_{h}, γ_{r}, γ_{t}

corresponding to the head and tail entities and the relationship in the triples. Finally, the semantic vectors are, respectively, outer product with the relational semantic vector to obtain the semantic matrix of the head and tail entities. Then, the above formula, Formula (7), becomes the following formula:

ψ (v^{h_{i}^{p}}) = (γ_{h_{i}^{p}} \otimes γ_{r_{i}^{p}}^{T}) v^{h_{i}^{p}} ψ (v^{t_{i}^{p}}) = (γ_{t_{i}^{p}} \otimes γ_{r_{i}^{p}}^{T}) v^{t_{i}^{p}}

(8)

where

γ_{h_{i}^{p}}

and

γ_{t_{i}^{p}}

are the corresponding semantic vectors of the

h_{i}^{p}

and

t_{i}^{p}

.

γ_{r_{i}^{p}}

represents the semantic vector of the corresponding relationship between the head and tail entities, and

γ \in ℝ^{d}

.

\otimes

denotes the vector outer product operation.

Before calculating the similarity calculation function

φ (., ., .)

, we first introduce a triplet with a target item association weight function

ϕ (., ., .)

. It maps the degree of matching between the triple and the target item by the form of the dot product similarity of h + r with the target item. Therefore, the association weight is expressed as:

ϕ (ψ (v^{h_{i}^{p}}), v^{r_{i}^{p}}) = (ψ (v^{h_{i}^{p}}) + v^{r_{i}^{p}}) v_{p}^{o}

(9)

Finally, we use the softmax function to integrate the results of Formula (9) to obtain the weight of the importance of each triplet in a single propagation:

φ (ψ (v^{h_{i}^{p}}), v^{r_{i}^{p}}, v_{p}^{o}) = \frac{e x p (ϕ (ψ (v^{h_{i}^{p}}), v^{r_{i}^{p}}))}{\sum_{i = 1}^{n} e x p (ϕ (ψ (v^{h_{i}^{p}}), v^{r_{i}^{p}}))}

(10)

By using the above equation, we obtain the triad attention weights, and the weighted attention mechanism allows us to understand which triad should be given more attention. The tail entities are the subjects after propagation; thus, a single weighted tail entity feature vector

f_{i, o}^{p}

is obtained by multiplying the tail entities with their corresponding attention weights by using Equation (6). Finally, the weighted results of all entities of a whole hop are summed to obtain a representation of the feature embedding of the p-th hop:

ρ_{u, o}^{p} = \sum_{i = 1}^{n_{p}} f_{i, o}^{p}

(11)

where

n_{p}

is the number of p-th hop tail entity. After going through the asymmetric semantic attention mechanism, we obtain the characteristics of a whole hop. For after each propagation, the next hop is associated with the previous hop entity. Therefore, the representation of the items that is used for an association comparison needs to be updated. So, a single-layer neural network is used to update the representation vector. The specific form is as follows:

v_{p}^{o} = (v_{p - 1}^{o} + ρ_{u, o}^{p}) W_{e} + b_{e}

(12)

where

W_{e} \in ℝ^{d \times d}

and

b_{e} \in ℝ^{d}

are the weight matrix and bias vector during transformation. Finally, we summarize the feature of all the hops, and obtain the set of all the features of the p hops of the item o in the G by the user u:

τ_{u, o} = \{ρ_{u, o}^{1}, ρ_{u, o}^{2}, \dots, ρ_{u, o}^{P}\}

(13)

4.3. Propagation Feature Exploration Architecture

Through the asymmetric attention mechanism, we obtain the representation vector of the feature of multi-hops propagation. The representation vector of each layer can be understood as the preference feature of different times of propagation. Based on past experiences, previous propagation models use a simple addition vector aggregation method. This method first obtains the user’s representation vector by summing the feature vectors propagated by the user p times and then outputs the prediction result by a dot product operation with the representation vector of item o. Finally, the prediction result is output by the sigmoid function. The calculation process of this method is as follows:

v^{u} = ξ_{u, o}^{p} = ρ_{u, o}^{1} + ρ_{u, o}^{2} + \dots + ρ_{u, o}^{p}

(14)

{\hat{y}}_{u v} = σ (v^{u}^{T} v^{o})

(15)

where

v^{u}

is the embedding of u, and

v^{o}

is the original feature embedding of o.

σ (x) = \frac{1}{1 + \exp (x)}

is the sigmoid function. The direct addition method exploits user interaction history items and item propagation characteristics to represent user features. The advantage of this method is that it has a good expression and feature constraint effect for users’ interest feature, and it converges quickly. However, its shortcomings are also obvious. Each addition operation will dilute the low-level features of the propagation, and the low-level features need higher weights because they are closer to the user’s original features. In addition, it ignores the interactive exploration between features. Therefore, we propose a new propagation feature exploration architecture to make full use of its advantages and solve its shortcomings.

First, we pre-train the weights through Formulas (14) and (15) to obtain the embedding vector and attention semantic vector of each entity and relationship in this model. On the one hand, because in the propagation process, the feature with the smaller number of propagation times is closer to users’ original feature, and the more important it is to express the users’ feature. Therefore, the proportion of features with a smaller number of propagation times should be increased. On the other hand, the aggregated user vectors of different propagation times all express the characteristics of users with different propagation depths. Therefore, we use the idea of Wide&Deep [37] to keep the feature vectors of low and high order at the same time to ensure the memory and generalization of features. The model summarizes P user feature vectors of different depths

ξ_{u, o}^{p}

through formula (14), and each order feature includes the feature of the previous propagation order to enhance the weight of the original feature. These features plus the feature vector of the target item o make the P + 1 feature vectors perform linking operations. Additionally, in order to explore the deep interaction among features, n-layer neural network is used to process the connected vector, and finally obtain the prediction result. The details are as follows:

z_{0} = ξ_{u, o}^{1} ||ξ_{u, o}^{2} ||\dots|| ξ_{u, o}^{P}|| v_{p}^{o}

(16)

z_{l} = a (z_{l - 1} W^{l} + b^{l}), l = 1, 2, \dots, L

(17)

{\hat{y}}_{u o} = σ (z_{L - 1} W^{L} + b^{L})

(18)

where

W^{l}

and

b^{l}

are the weights and biases in the neural network and

| |

is a vector connection.

l

indicates which layer is in the neural network, and L is the total number of network layers.

z_{0}

is the input of the neural network and

z_{l}

is the output of the lth neural network layer.

a (\cdot)

represents the activation function of the neural network. In the experiment, we choose different nonlinear activation functions for the hidden layer and compare them. Finally, we obtain the final probability result through sigmoid.

4.4. Learning Algorithm

In this article, we use the given knowledge graph G and the user–item interaction matrix Y to calculate the maximum posterior probability of the model parameter Θ. Θ contains all the parameters in the model, such as the representation vectors of head and tail entities and relationships

v^{h}, v^{t}, v^{r}

, the corresponding semantic vectors

γ_{h}, γ_{t}, γ_{r}

in the attention mechanism, the parameters

W_{e}

and

b_{e}

for item transformation in the propagation process, and the parameters

W^{l}

and

b^{l}

of the neural network in the deep exploration framework. The form of the posterior probability of maximizing Θ is as follows:

a r g m a x (p (Θ | G, Y))

(19)

The maximization formula is equivalent to maximizing the following formula, and is obtained by reasoning through Bayes’ theorem:

\begin{array}{l} p (Θ | G, Y) = \frac{p (Θ, G, Y)}{p (G, Y)} \propto p (Θ, G, Y) \\ = p (Θ) p (G | Θ) p (Y | Θ, G) \\ = p (Θ) p (G | Θ) p (Y | Θ) \end{array}

(20)

where

p (Θ)

is the prior probability of the model parameter

Θ

,

p (G | Θ)

is the posterior probability of the knowledge graph G under the premise of the given parameter

Θ

,

p (Y | Θ)

is the premise of the given parameter

Θ

. Observe the posterior probability of interaction Y. In the above-mentioned maximization problems, according to the rules of Maximum Likelihood Estimation (MLE), the logarithm of the original expression is taken and transformed into a multinomial sum. Then, take the opposite number of the function, transform the objective optimization problem into a minimization optimization problem, and obtain the objective function E, as shown below:

E = - l o g (p (Θ | G, Y)) = - l o g p (G | Θ) - l o g p (Y | Θ) - l o g p (Θ)

(21)

The function in the first term of Formula (22) shows the accuracy of the model’s coding accuracy of the entity relationships. We use TransE’s knowledge graph coding method, assuming that it conforms to the Gaussian distribution and defines the following likelihood function:

\begin{array}{l} l o g p (G | Θ) = l o g \prod_{(h, r, t) \in G} N (I^{h, r, t} - σ (v^{h} + v^{r} - v^{t}), λ_{1}) \\ = - \frac{λ_{1}}{2} \sum_{(h, r, t) \in G} {||I^{h, r, t} - σ (v^{h} + v^{r} - v^{t})||}_{2}^{2} \end{array}

(22)

where

I^{h, r, t}

indicates whether this triplet exists in graph G. If it exists, it is 1; otherwise, it is 0. For the second term in Formula (22), for each interaction

y_{u o} \in Y

, The likelihood function can be constructed using the Bernoulli distribution

\begin{array}{l} \log p (Y | Θ) = l o g \prod_{u, v \in Y} {\hat{y}}_{u o}^{y_{u o}} \cdot {(1 - {\hat{y}}_{u o})}^{1 - y_{u o}} \\ = \sum_{y_{u o} \in Y} y_{u o} l o g {\hat{y}}_{u o} + (1 - y_{u o}) \log (1 - {\hat{y}}_{u o}) . \end{array}

(23)

The last term p(Θ) in (22) is a priori probability that can be ignored in extreme value problems. Finally, by substituting Formulas (23) and (24) into (22), and adding the l2 regularization of the parameters, we obtain the final loss function:

\begin{array}{l} L = - \sum_{u, v \in Y} y_{u v} l o g {\hat{y}}_{u v} + (1 - y_{u v}) \log (1 - {\hat{y}}_{u v}) \\ + \frac{λ_{1}}{2} \sum_{(h, r, t) \in G} | | I^{h, r, t} - σ {(e^{h} + e^{r} - e^{t}) | |}_{2}^{2} + \frac{λ_{2}}{2} ({| | w | |}_{2}^{2} + {| | b | |}_{2}^{2}) \\ + \frac{λ_{3}}{2} ({| | h | |}_{2}^{2} + {| | r | |}_{2}^{2} + {| | t | |}_{2}^{2} + | | γ_{h} | |_{2}^{2} + | | γ_{r} | |_{2}^{2} + | | γ_{t} | |_{2}^{2}) \end{array}

(24)

For the above loss function, we use the advanced Adam [38] optimization method to solve the local optimal problem. We give parameter pre-training and parameter optimization processes of model training in Algorithm 1 and Algorithm 2, respectively.

Time complexity: Algorithm 1 is a pre-trained model of propagation with time complexity

O (\sum_{i = 0}^{p} n * k^{2}),

where

n

is the size of the fixed sampling triplet in propagation and

k

is the dimension of the transformation matrix. Algorithm 2 adds the propagation feature exploration layer based on Algorithm 1 so the time complexity is

O (\sum_{i = 0}^{p} n * k^{2} + \sum_{i = 0}^{l} k_{l - 1} k_{l})

, where

k_{l}

and

k_{l - 1}

represent the current and previous transformation sizes.

Algorithm 1 Learning in the pre-training for EKPNet

Input: implicit feedback matrix Y, knowledge graph G
Set: batch size b, learning rate α, dimensionality k

Initialize $v^{e}, v^{r}$ according to TransE Algorithm and randomly initialize
$γ_{e}, γ_{r}, W_{e}, b_{e}$ , $β_{1} = 0.9, β_{2} = 0.999, η = 0.002$
Retrieve the set $S_{u}^{p}$ for each user u from the knowledge graph G
for number of training sample do
Sample a mini batch of true and false triples from G;
Sample mini batch of positive and negative interactions from Y;
Prediction value ${\hat{y}}_{u o}$ via Equations (6)–(15)
Calculate the loss and gradients $\frac{\partial L}{\partial Θ}$ of all parameters $Θ$ via Equations (19)–(24);
Initialize $v_{0}, s_{t}, t \leftarrow 0$
while $θ_{t}$ not converged do
$v_{t} \leftarrow β_{1} v_{t - 1} + (1 - β_{1}) \frac{\partial L}{\partial Θ}$
$s_{t} \leftarrow β_{2} s_{t - 1} + (1 - β_{2}) \frac{\partial L}{\partial Θ} ° \frac{\partial L}{\partial Θ}$
$Θ_{t + 1} \leftarrow Θ_{t} - η \frac{v_{t}}{\sqrt{s_{t}}}$ (update parameters)
end while
end for
Return $v^{e}, v^{r}, γ_{e}, γ_{r}, W_{e}, b_{e}$

Output: Parameters of the model

v^{e}, v^{r}, γ_{e}, γ_{r}, W_{e}, b_{e}

Algorithm 2 EKPNet Learning Algorithm

Input: implicit feedback matrix Y, knowledge graph G, The parameters of Algorithm 1.
Set: batch size b, learning rate α, dimensionality k

Initialize $e^{E}, e^{R}, γ_{E}, γ_{R}, W_{e}, b_{e}$ according to TransE Algorithm and randomly
initialize $W_{i}, b_{i}$
$β_{1} = 0.9, β_{2} = 0.999, η = 0.0001$
Retrieve the set $S_{u}^{p}$ for each user u from the knowledge graph G
Repeat
for number of training sample do
Sample a mini batch of true and false triples from G;
Sample mini batch of positive and negative interactions from Y;
Prediction value ${\hat{y}}_{u o}$ via Equations (6)–(13) and Equations (16)–(18)
Calculate the loss and gradients $\frac{\partial L}{\partial Θ}$ of all parameters $Θ$ via Equations (19)–(24);
The parameters are updated in the same way as in Algorithm 1
end for
Return $v^{e}, v^{r}, γ_{e}, γ_{r}, W_{e}, b_{e}$

Output: results predictions

5. Experimental Results and Discussion

5.1. Experiment Preparation

5.1.1. Dataset

We choose two real world datasets in completely different backgrounds to prove the effectiveness of the model in different scenarios. The datasets are as follows:

MovieLens-1M [39] is a movie rating dataset widely used in the field of movie recommendations. It is the result of the GroupLens research project by the University of Minnesota. It includes 1,000,209 score records of 6040 users with a value of 1–5 for 3900 movies. It also contains basic movies and user attributes, such as age, gender, occupation, and movie category.

Book-Crossing [40] is a book scoring dataset widely used in the field of book recommendation. The data are collected in the Book-Crossing community. It contains 1,149,780 scoring records of 271,379 books by 278,858 users with a value of 1–10. It contains basic demographic data and basic information about the book.

Since the goal of our experiment is to predict the probability of interaction between the target user and the item through implicit feedback, the scoring data must be converted into implicit data. Specifically, we set a threshold in the dataset to define whether the user interacts. For each record, we set the score to be not lower than the threshold as positive feedback data (label 1) and randomly sample the data of non-interactive user items as negative feedback data (label 0), and the sampling size is the same as the number of positive samples. In addition, we will delete users who do not include positive implicit feedback in the experiment, because the positive implicit feedback of users is needed to verify the results in model testing.

The knowledge graph used in this article was published in [15]. The methods of constructing a knowledge graph for movie datasets and book datasets are the same. First, a subset is constructed by matching the movie name and book name entries in the dataset with the items in the Microsoft Satori knowledge base. Then, delete the unmatched item or multiple matched items, and finally, given the matched knowledge graph, the entity set is expanded to four hops. Through the above process, MovieLens finally obtained 1,241,995 triples corresponding to 182,011 entities and 12 kinds of relationships, and Book-Crossing finally obtained 77,903 entities and 198,771 triples corresponding to 25 kinds of relationships. In order to express this more clearly, the statistics of the two datasets are shown in Table 1, where inter-avg indicates the average interaction of each item, link-avg indicates the average number of links per item, and sparsity indicates the sparsity of the dataset.

5.1.2. Baseline

Six advanced recommendation models are included in the experiment to verify the effectiveness of EKPNet. After reproducing the following model, we set the baseline parameters to the best parameters in the original paper.

CKE [23] is a classic embedding-based recommendation model that integrates knowledge graph, vision, and text into the Bayesian recommendation framework. However, since the dataset does not contain image and text information, we simplify the model and use TransR encoding, with the embedding used as the model input.

DKN [13] is a news recommendation framework based on embedding. It uses word and entity embedding as the model input to construct the feature vector of each news, and then explores the user’s interest through the attention mechanism formed by CNN. In this experiment, we use movie titles and book titles as the model text input.

Wide&Deep [37] is one of the most classic general recommendation frameworks in the recommender system. The input features are processed through the linear part of wide and the nonlinear part of deep to consider the depth and shallowness of feature exploration at the same time. The ID of the item and the embedding obtained after the knowledge graph TransR are used as the input of the model.

RippleNet [15] is a most advanced unified high-efficiency model that combines the advantages of path-based and embedding-based methods. It uses user click history to propagate potential preferences to obtain user representations, and finally combines with item representations to predict the interaction probability.

KGCN [14] is another state-of-the-art kind of unified approach. It enables non-spectral graph convolutional network methods to be applied to knowledge graph to efficiently learn knowledge graph information as well as to mine potential features of users.

Ripp-MKR [17] is an improved method based on RippleNet, which integrates multi-task learning into the communication framework, and learns the knowledge graph and historical interaction matrix at the same time.

5.1.3. Experimental Settings

In the EKPNet model experiment, for two datasets, the ratio of our training, validation and test set is set to 6:2:2. We conducted three experiments for each result and took the means. To demonstrate the validity of the model, the evaluation indicators were selected through two tasks: CTR click prediction evaluation and TOP-K recommendation. For the CTR task, we chose AUC and ACC indicators. For TOP-K recommendation evaluation, we chose Precision@K, Recall@K and F1@K evaluation indicators, where K is the number of recommended items. In order to show the recommendation effect in different K, K was set to 1, 2, 5, 10, 20, 50, 100.

The specific settings of pre-training parameters and training parameters are shown in Table 2, where d represents the feature embedding dimension, P represents the maximum number of propagations, N represents the number of propagation-selected entities,

λ_{1}, λ_{2}, λ_{3}

represent the regularization coefficients, and L represents the number of neural network layers.

The experimental hardware environment includes 64-core Intel(R) Xeon(R) CPU, A100-PCIE-40GB, memory 48G, and system Ubuntu. The software uses Python programming language, PyTorch framework and NumPy scientific computing library.

5.2. Experimental Results

Table 3 and Figure 4, Figure 5 and Figure 6 show the performance of the model in CTR prediction and TOP-K recommendation tasks, respectively. Through them, we can find that:

Compared with these state-of-the-art baselines, EKPNet has a better effect on datasets in two different fields. More specifically, the asymmetric semantic attention mechanism and the multi-level propagation feature exploration architecture play an active role in mining user preferences and providing recommended items. Compared with the effect of baseline on the MovieLens-1M and Book-Crossing datasets, EKPNet has average increased AUC values by 10.8% and 8.5%, and ACC by 11.2% and 7.1%.
For the top-K task our model has a greater advantage on the MovieLens-1M dataset. Its relatively low sparsity in KG may lead to better training of the semantic vectors we introduced. Another reason why the EKPNet model improves less in the Book-Crossing dataset compared to Ripp-MKR is that Ripp-MKR introduces neural collaborative filtering, while the Book-Crossing dataset has more interaction information to assist joint learning. In addition, our model targets the CTR task and the cross-entropy loss used in the loss function, rather than the selection ranking loss, so there are some limitations for the ranking task.
By observing the results between the two data, it is found that the results of the Book-Crossing dataset in all models are lower than the MovieLens-1M dataset. The main reason is shown in Table 3 in Section 5.1. On the one hand, in the user project interaction matrix, the Book-Crossing dataset has an average number of 117 fewer interactions per user than the MovieLens-1M dataset, with a difference of 5.05% in sparsity. On the other hand, in the knowledge graph dataset of the two data, the Book-Crossing dataset has an average of four fewer links per entity than the MovieLens-1M dataset. The sparse data make each model have insufficient information to explore the characteristics of user items and its performance is not satisfactory.
Compared with other baselines, DKN’s method is overwhelmed by most models, because compared with news titles, the names of movies and books are shorter, the information contained is very limited, and it is difficult to reflect the characteristics of movies or books.
The poor performance of the CKE model for us may be due to the lack of image and text information in the data, which limits the expression of the model. Then, unlike RippleNet and EKPNet, the model only directly uses related entities to represent features, and does not use related entities to enrich feature representations.
Compared with the above two baselines, Deep&Wide achieved relatively good results, indicating that deep learning obtains better results in exploring the intersection of entity features. However, compared with the other three unification-based methods the results are poor, and it can be seen that exploring the knowledge graph is crucial for the prediction results.
RippleNet, KGCN and Ripp-MKR exhibit the best performances among all the baselines. Therefore, it is proved that exploring structural information in the knowledge graph can effectively improve the model effect.
EKPNet has a clear advantage over RippleNet and Ripp-MKR in both tasks. It demonstrates that in outward propagation, considering the impact of distinguishing the semantics of different classes of head and tail entities in the attention mechanism and exploring the aggregation features at different levels of propagation have a positive impact on the final results. The EKPNet also exhibits a better performance compared to KGCN, proving that our model also has some advantages compared to graph convolutional networks.
For the inward aggregated attention mechanism model KCAN, we obtained an AUC of 0.907 in the MovieLens-1M dataset and 0.928 in the EKPNet model. Our model improved by 2.31% compared to KCAN. However, KCAN treats users as entity vectors in the knowledge graph, and every time a new user interaction occurs, the graph needs to be reconstructed and the entire model retrained. Additionally, our work uses the user’s interaction history items in the knowledge graph outward propagation to indicate that the user can solve the problem of new users appearing. Therefore, EKPNet exhibits a better performance.

5.3. The Influence of Activation Function and MLP Structure

In the deep exploration framework, choosing the structure of the deep model and the activation function of each layer can directly affect the effect of the model. To verify the relationship between the model and MLP, we designed a comparative experiment. The two datasets in the above experiment were used to observe the performance of different layers and different activation functions (such as relu function, sigmoid function, and tanh function).

The results are shown in Figure 7. For the activation function, since previous studies have shown that the two datasets are quite different in either the knowledge graph or the interaction matrix, the model performance is quite different. For the MovieLens-1M dataset, relu and tanh show similar performances, but for the Book-Crossing dataset, the effect of the tanh function is significantly better than relu, which may be due to the normalization ability of tanh in line with the characteristic distribution of the data. The same is that the performance of the sigmoid function is always the worst.

For the MLP structure related to the capacity of the model, we used the same width as the input, and chose six different depths of mlp to explore the effect of depth. In the two datasets, the relu and sigmoid functions are more sensitive to depth, while tanh is relatively stable. The experimental results show that it has the best performance when the structure is made up of four layers. A shallower model cannot perfectly fit the data features, and the problem of gradient disappearance leads to a deeper model structure that will make the effect worse. Therefore, a deeper MLP cannot ensure a higher effect.

Finally, we conclude that the overall effect of the model is the best when the four-layer mlp and tanh activation functions are selected.

5.4. Module Ablation Experiment

In order to explore the necessity of the proposed two parts for the model effect, we compare the original model with three different model variants. EKP_no represents the simplest propagation model, EKP_att represents only the asymmetric semantic attention mechanism, and EKP_frame is a model that only uses the propagation feature to explore the architecture. EKPNet represents a complete model that uses two modules at the same time. The final result is shown in Table 4. Using the attention mechanism and the propagation feature exploration framework alone in the MovieLens-1M dataset boosted the effect by 0.45% and 0.51%, respectively. Using both modules together boosted it by 1.22%. Using the two parts alone in the Book-Crossing dataset improves it by 1.41% and 3.50%, respectively, and using both modules at the same time improved it by 5.73%. EKP_att and EKP_frame both saw a large improvement compared to the original propagation model, indicating the effectiveness of the two improved parts in the propagation model. Additionally, the two parts have more similar enhancement effects for MovieLens-1M model. However, EKP_frame has better results in the Book-Crossing dataset, probably because the Book-Crossing data are sparse and the vector of the attention mechanism is not fully trained. The effect also shows that the combination of the two parts yields better results.

5.5. Cost of the Model

To discuss the cost of the model in training, we compare the propagation models RippleNet and Ripp-MKR, which are most relevant to the model in terms of time consumed. As shown in Table 5, our model is higher in time cost than the RippleNet with a simple structure because it introduces an attention mechanism as well as a propagation exploration framework. As for Ripp-MKR, it has more consumption because of the need for simultaneous multi-task learning during propagation. Since the graph neural network aggregation-based approach is very time consuming for graph training, it takes a long time to use KGCN.

6. Conclusions

In this article, we use the propagation of a knowledge graph to assist in the recommendation and design of a brand new EKPNet architecture. Compared with the existing propagation model, there are two main advantages. First, a new and effective asymmetric semantic attention mechanism is proposed to sample the knowledge graph. It improves the attention effect by distinguishing the preference semantics of the head and tail entities in the attention mechanism. Then, a communication feature exploration architecture is used to retain the communication features from shallow to deep. Features with different propagation depths enhance the memory and generalization capabilities during propagation, and finally use deep neural networks to explore feature intersections. Two structures are used to enhance the exploration of entity features in propagation. We conducted extensive experiments on two datasets in the real world and demonstrated the superiority of the EKPNet model through comparison with the baseline.

For future work, we will verify the effect of our model in more application scenarios. We mainly use offline recommendations in our experiments and the knowledge graph is stationary. To enhance the value of industrial applications, it is a promising research direction to study the propagation of recommender systems in dynamic knowledge graphs in the future. It can further obtain efficient online recommendation methods. In addition, our approach provides a new way of thinking for research in other heterogeneous networks such as social networks.

Author Contributions

H.Z.: Conceptualization, Methodology, Software, and Writing—original draft. Y.W.: Supervision and Project administration. C.C. and R.L.: Investigation, Writing—review and editing. S.Z.: Data curation and Resources. T.G.: Supervision and Project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key R&D Program (Demonstration of R&D and Application of Integrated Science and Technology Service Platform for Central Plains Urban Agglomeration), grant number 2018YFB1404500.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We used open-source datasets and reference them in this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Resnick, P.; Varian, H.R. Recommender systems. Commun. ACM 1997, 40, 56–58. [Google Scholar] [CrossRef]
Zhang, Y.; Abbas, H.; Sun, Y. Smart e-commerce integration with recommender systems. Electron. Mark. 2019, 29, 219–220. [Google Scholar] [CrossRef] [Green Version]
Zhang, H.; Ji, Y.; Li, J.; Ye, Y. A triple wing harmonium model for movie recommendation. IEEE Trans. Ind. Inform. 2015, 12, 231–239. [Google Scholar] [CrossRef]
Hu, Y.; Xiong, F.; Lu, D.; Wang, X.; Xiong, X.; Chen, H. Movie collaborative filtering with multiplex implicit feedbacks. Neurocomputing 2020, 398, 485–494. [Google Scholar] [CrossRef]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, 1–5 May 2001; pp. 285–295. [Google Scholar]
Deng, S.; Huang, L.; Xu, G.; Wu, X.; Wu, Z. On deep learning for trust-aware recommendations in social networks. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 1164–1177. [Google Scholar] [CrossRef] [PubMed]
Aslanian, E.; Radmanesh, M.; Jalili, M. Hybrid recommender systems based on content feature relationship. IEEE Trans. Ind. Inform. 2016, 99, 1. [Google Scholar] [CrossRef]
Xu, Y.; Yang, Y.; Han, J.; Wang, E.; Zhuang, F.; Xiong, H. Exploiting the sentimental bias between ratings and reviews for enhancing recommendation. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018. [Google Scholar]
Guo, Q.; Zhuang, F.; Qin, C.; Zhu, H.; Xie, X.; Xiong, H.; He, Q. A survey on knowledge graph-based recommender systems. IEEE Trans. Knowl. Data Eng. 2020, 50, 937. [Google Scholar] [CrossRef]
Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P.; Auer, S.; et al. Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 2015, 6, 167–195. [Google Scholar] [CrossRef] [Green Version]
Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007; pp. 697–706. [Google Scholar]
Xu, W.; Gao, X.; Sheng, Y.; Chen, G. Recommendation System with Reasoning Path Based on DQN and Knowledge Graph. In Proceedings of the 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM), Seoul, Korea, 4–6 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–8. [Google Scholar]
Wang, H.; Zhang, F.; Xie, X.; Guo, M. DKN: Deep knowledge-aware network for news recommendation. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1835–1844. [Google Scholar]
Wang, H.; Zhao, M.; Xie, X.; Li, W.; Guo, M. Knowledge graph convolutional networks for recommender systems. In Proceedings of the 2019 World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3307–3313. [Google Scholar]
Wang, H.; Zhang, F.; Wang, J.; Zhao, M.; Li, W.; Xie, X.; Guo, M. Ripplenet: Propagating user preferences on the knowledge graph for recommender systems. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018; pp. 417–426. [Google Scholar]
Tang, X.; Wang, T.; Yang, H.; Song, H. AKUPM: Attention-enhanced knowledge-aware user preference model for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1891–1899. [Google Scholar]
Wang, Y.; Dong, L.; Li, Y.; Zhang, H. Multitask feature learning approach for knowledge graph enhanced recommendations with RippleNet. PLoS ONE 2021, 16, e0251162. [Google Scholar] [CrossRef] [PubMed]
Mahdy, A.M. Numerical solutions for solving model time-fractional Fokker–Planck equation. Numer. Methods Partial. Differ. Equ. 2021, 37, 1120–1135. [Google Scholar] [CrossRef]
Sun, Y.; Han, J.; Yan, X.; Yu, P.S.; Wu, T. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 2011, 4, 992–1003. [Google Scholar] [CrossRef]
Yu, X.; Ren, X.; Gu, Q.; Sun, Y.; Han, J. Collaborative filtering with entity similarity regularization in heterogeneous information networks. IJCAI HINA 2013, 27. Available online: http://hanj.cs.illinois.edu/pdf/hina13_xyu.pdf (accessed on 5 January 2022).
Zhao, H.; Yao, Q.; Li, J.; Song, Y.; Lee, D.L. Meta-graph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 635–644. [Google Scholar]
Wang, Q.; Mao, Z.; Wang, B.; Guo, L. Knowledge graph embedding: A survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 2017, 29, 2724–2743. [Google Scholar] [CrossRef]
Zhang, F.; Yuan, N.J.; Lian, D.; Xie, X.; Ma, W.Y. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 353–362. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Ye, Y.; Wang, X.; Yao, J.; Jia, K.; Zhou, J.; Xiao, Y.; Yang, H. Bayes EMbedding (BEM) Refining Representation by Integrating Knowledge Graphs and Behavior-specific Networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 679–688. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; p. 26. [Google Scholar]
Cheng, Z.; Ding, Y.; He, X.; Zhu, L.; Song, X.; Kankanhalli, M.S. A3NCF: An Adaptive Aspect Attention Model for Rating Prediction. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; pp. 3748–3754. [Google Scholar]
Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2204–2212. [Google Scholar]
Han, K.J.; Prieto, R.; Ma, T. State-of-the-art speech recognition using multi-stream self-attention with dilated 1d convolutions. In Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore, 14–18 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 54–61. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Xiao, J.; Ye, H.; He, X.; Zhang, H.; Wu, F.; Chua, T.S. Attentional factorization machines: Learning the weight of feature interactions via attention networks. arXiv 2017, arXiv:1708.04617. [Google Scholar]
Zhou, G.; Zhu, X.; Song, C.; Fan, Y.; Zhu, H.; Ma, X.; Yan, Y.; Jin, J.; Li, H.; Gai, K. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1059–1068. [Google Scholar]
Yun, S.; Kim, R.; Ko, M.; Kang, J. Sain: Self-attentive integration network for recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 1205–1208. [Google Scholar]
Tu, K.; Cui, P.; Wang, D.; Zhang, Z.; Zhou, J.; Qi, Y.; Zhu, W. Conditional Graph Attention Networks for Distilling and Refining Knowledge Graphs in Recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Queensland, Australia, 1–5 November 2021; pp. 1834–1843. [Google Scholar]
Hu, Y.; Koren, Y.; Volinsky, C. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 263–272. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; Volume 1, pp. 687–696. [Google Scholar]
Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Harper, F.M.; Konstan, J.A. The movielens datasets: History and context. Acm Trans. Interact. Intell. Syst. 2015, 5, 1–19. [Google Scholar] [CrossRef]
Ziegler, C.N.; McNee, S.M.; Konstan, J.A.; Lausen, G. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web, Chiba, Japan, 10–14 May 2005; pp. 22–32. [Google Scholar]

Figure 1. Knowledge graph-based recommendation process.

Figure 2. The overall framework of EKPNet.

Figure 3. Entity propagation process in the context of movie recommendation.

Figure 4. Precision@K of different baselines in the two datasets in the topK task.

Figure 5. Recall@K of different baselines in the two datasets in the topK task.

Figure 6. F1@K of different baselines in the two datasets in the topK task.

Figure 7. Model performance under different activation functions and different quantity levels.

Table 1. Statistics of two datasets: MovieLens-1M and Book-Crossing.

		MovieLens-1M	Book-Crossing
User–Item Interaction	#users	6036	17,860
	#items	2445	14,967
	#interaction	753,772	139,746
	#inter-avg	125	8
	# sparsity	94.89%	99.94%
Knowledge Graph	#entities	182,011	77,903
	#relations	12	25
	#KG triples	1,241,995	198,771
	#link-avg	7	3

Table 2. The detailed parameters of the EKPNet experiment.

	d	P	N	$λ_{1}$	$λ_{2}$	$λ_{3}$	lr	L
Movielens-pre	16	2	32	0.01	5 × 10⁻³	-	1 × 10⁻²	-
Movielens	16	2	32	0.001	1 × 10⁻⁷	1 × 10⁻⁷	5 × 10⁻³	3
BookCrossing-pre	4	3	32	0.01	1 × 10⁻⁷	-	1 × 10⁻²	-
BookCrossing	4	3	32	0.001	1 × 10⁻⁴	1 × 10⁻⁴	5 × 10⁻³	4

Table 3. The performance of different baselines in the two datasets in the CRT prediction task.

Model	MovieLens-1M				Book-Crossing
Model	AUC	imp	ACC	imp	AUC	imp	ACC	imp
CKE	0.765	21.31%	0.719	19.05%	0.654	13.30%	0.625	8.32%
DKN	0.683	35.87%	0.612	39.87%	0.631	17.43%	0.604	12.09%
Wide&Deep	0.896	3.57%	0.823	4.01%	0.693	6.93%	0.621	9.02%
RippleNet	0.917	1.20%	0.844	1.42%	0.698	6.16%	0.640	5.78%
KGCN	0.907	2.1%	0.838	1.8%	0.687	5.4%	0.631	4.6%
Ripp-MKR	0.920	0.8%	0.845	1.1%	0.720	2.1%	0.650	2.7%
EKPNet	0.928	-	0.856	-	0.741	-	0.677	-

Table 4. The performance of different variants in the CRT prediction scenario.

ACC	EKP_no	EKP_att	EKP_frame	EKPNet
MovieLens-1M	0.9172	0.9213	0.9219	0.9284
Book-Crossing	0.7006	0.7105	0.7251	0.7408

Table 5. The cost of the model.

Time	RippleNet	EKPNet	Ripp-MKR	KGCN
MovieLens-1M	352.8 s	386.3 s + 252.1 s	924.1 s	1016.2 s
Book-Crossing	136.3 s	152.2 s + 122.4 s	524.2 s	734.1 s

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Wang, Y.; Chen, C.; Liu, R.; Zhou, S.; Gao, T. Enhancing Knowledge of Propagation-Perception-Based Attention Recommender Systems. Electronics 2022, 11, 547. https://doi.org/10.3390/electronics11040547

AMA Style

Zhang H, Wang Y, Chen C, Liu R, Zhou S, Gao T. Enhancing Knowledge of Propagation-Perception-Based Attention Recommender Systems. Electronics. 2022; 11(4):547. https://doi.org/10.3390/electronics11040547

Chicago/Turabian Style

Zhang, Hanzhong, Yinglong Wang, Chao Chen, Ruixia Liu, Shuwang Zhou, and Tianlei Gao. 2022. "Enhancing Knowledge of Propagation-Perception-Based Attention Recommender Systems" Electronics 11, no. 4: 547. https://doi.org/10.3390/electronics11040547

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Knowledge of Propagation-Perception-Based Attention Recommender Systems

Abstract

1. Introduction

2. Related Work

2.1. Knowledge Graph-Based Recommender

2.2. Attention Mechanism

3. Preliminaries

3.1. Implicit Feedback

3.2. Knowledge Graph

3.3. Problem Formulation

4. The Proposed Method

4.1. Knowledge Graph Propagation

4.2. Asymmetric Semantic Attention Mechanism

4.3. Propagation Feature Exploration Architecture

4.4. Learning Algorithm

5. Experimental Results and Discussion

5.1. Experiment Preparation

5.1.1. Dataset

5.1.2. Baseline

5.1.3. Experimental Settings

5.2. Experimental Results

5.3. The Influence of Activation Function and MLP Structure

5.4. Module Ablation Experiment

5.5. Cost of the Model

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI