A Knowledge Graph Embedding Based Service Recommendation Method for Service-Based System Development

Xie, Fang; Zhang, Yiming; Przystupa, Krzysztof; Kochan, Orest

doi:10.3390/electronics12132935

Open AccessArticle

A Knowledge Graph Embedding Based Service Recommendation Method for Service-Based System Development

¹

School of Computer Science, Hubei University of Technology, Wuhan 430068, China

²

Detroit Green Technology Institute, Hubei University of Technology, Wuhan 430068, China

³

Department of Automation, Lublin University of Technology, Nadbystrzycka 38D, 20-618 Lublin, Poland

⁴

Department of Measuring-Information Technologies, Lviv Polytechnic National University, Bandery 12, 79013 Lviv, Ukraine

^*

Authors to whom correspondence should be addressed.

Electronics 2023, 12(13), 2935; https://doi.org/10.3390/electronics12132935

Submission received: 19 April 2023 / Revised: 17 June 2023 / Accepted: 27 June 2023 / Published: 4 July 2023

Download

Browse Figures

Versions Notes

Abstract

:

Web API is an efficient way for Service-based Software (SBS) development, and mashup is a key technology which merges several web services to deal with the increasing complexity of software requirements and expedite the service-based system development. The efficient service recommendation method is vital for the software development. However, the existing methods often suffer from data sparsity or cold start issues, which should lead to bad effects. Currently, this paper starts with SBS development, and proposes a service recommendation method based on knowledge graph embedding and collaborative filtering (CF) technology. In our model, we first construct a refined knowledge graph using SBS-service co-invocation record and SBS and service related information to mine the potential semantics relationship between SBS and service. Then, we learn the SBS and service entities in the knowledge graph. These heterogeneous entities (SBS and service, etc.) are embedded into the low-dimensional space through the representation learning algorithms of Word2vec and TransR, and the distances between SBS and service vectors are calculated. The input of recommendation model is SBS requirement (target SBS), the similarities functional SBS set is extracted from knowledge graph, which can relieve the cold start problem. Meanwhile, the recommendation model uses CF to recommend service to target SBS. Finally, this paper verifies the effectiveness of method on the real-word dataset. Compared with the several state-of-the-art methods, our method has the best service hit rate and ranking quality.

Keywords:

service recommendation; service-based system; knowledge graph; graph embedding; representation learning

1. Introduction

With the rapid development of web applications, the number of web APIs has increased, which has brought the great trouble for developers. They have to choose the preferred web APIs to mashup from overwhelming the large amount of web information. Mashup technology becomes more and more popular, and it can merge the Web APIs effectively to construct the service-based system on the basis of the existing service resources. Additional, mashup can integrate the various service functions together, which has the advantage of rapid system development and powerful system scalability. Service-based system development is an intensive knowledge involved, often complex, fuzzy and iterative processes during the life-cycle management [1].

The ideology of recommendation system was put forward by Resnick et al. [2] in 1994, and it was gradually regard as a significant method to solve the information flooding. In recent years, the applications scenarios of recommendation system are becoming more and more abundant, and the core function of them is to recommend the most similar service to a user by measuring the similarity between user needs and the existing service. In general, the most mainstream service recommendation methods are based on CF which is a successful technology in the recommendation system research. The core of CF is to predict potential favorite service for user by employing rate data selected from the similar user [3].

To recommend service more efficiently, multiple aspects of historical information generated in the past service usage are used [4], including the item’s profile, user feedback and review on service, user preference on service, etc. However, the existing recommendation methods on SBS development incorporate only the limited interaction record and the little contextual knowledge. The various types of these data can cover much different potential information, and there are many fundamental logical relationships between them [5,6]. The motivation of this paper is that if we can adequately merge a diversified system and service information into a knowledge model in terms of the potential relations, the more efficient service recommendation result should be achieved.

The great challenge is incorporating the data and depicting the logical relations which originate from the intricate data. We find that the knowledge graph (KG) is a good way for this purpose. The concept of KG was proposed by Google in 2012 [7] which aim to construct a new intelligent search engine. Essentially, the KG is a kind of semantic network that contains the relations between heterogenous entities and can describe entities and their relations. The node and edge in graph represent “entity” and “relationship” respectively. At present, researches build a series of knowledge graph, such as MicroSoft Satori, DBpedia KG, AceKG and BaiduKG, and so on. These cover various fields and provide data support for applications in different scenarios. To integrate the various data of system and service into the KG, we need do many things, including determining the entity type and relation type, extracting entities and identifying relations.

The contributions of this paper are summarized as followings:

We present the method for constructing a collaborative service knowledge graph, entitled HSSG, which incorporates multiple types of SBS as well as service data and considers the co-invocation record of such data.
We put forward the knowledge graph embedding methods for the service recommendation problem from multi-source SBS and service data in the public service registry library to improve data sparsity in the collaboration process. The embedding approaches include Word2Vec and TransR. These technologies have specific superiorities.
We conduct a wide range of experiments on the PWeb dataset to validate the feasibility of our method. The experimental results show that our method can obtain high improvement in service recommendation hit rate and ranking quality.

This paper is organized as follows. Section 2 shows the related works. Section 3 formulates the service recommendation problem and present our motivation. Section 4 proposes a knowledge graph embedding approach to achieve scalable service recommendation from multi-source data. Section 5 conducts the experiments to validate the feasibility of the proposed approach. Finally, we conclude this paper and indicate the research direction for our future works.

2. Materials and Methods

Most of service recommendation methods are based on the collaborative filtering technology, which obtains recommendation according to the user and service similarity. But these methods are limited to data sparsity problem, because they have poor predictive ability for a new user. Considering that the sparsity of the SBS-service co-invocation matrix is over 98% [8], it is hard to get robust service recommendation results. In order to improve the recommendation performance, valid semantic information will improve recommendation process.

2.1. CF-Based Service Recommendation

The CF-based service recommendations make recommendation by using the similarity of user or service. Yu et al. [9] proposed the CF method for web service recommendation to solve data sparsity problem by adopting the regularized matrix factorization approach. In [10,11] there was proposed the location-aware CF approach for quality of service (QoS) prediction service recommendation. Gang et al. [12] proposed a time-aware CF approach according to the implicit feedback on real-word web service. In [13,14] there was proposed a hybrid CF approach to predict the missing QoS values.

As the deep learning has tremendous success in many fields, researchers try to use deep neural network for CF. Deng et al. [15] proposed the Deep CF model that combining representation learning-based CF and matching function learning-based CF. In [16] there was proposed the CML approach that utilizing the auxiliary knowledge according to text, image and tags to enhance the CF performance.

2.2. Knowledge Graph Embedding

The KG embedding is a graph representation, which is widely adopted for its simplicity of constructing or processing. The KG is defined as a set of triples

S

that has the form

(h, r, t)

, where

h

represents head entity,

t

represents tail entity, and

r

represents a relationship between them. Because the KG is usually incomplete, and researchers have been done for predicting the missing link in KG. Knowledge graph embedding is one important method for such a task. It is designed to learn the low-dimensional representations of heterogeneous entities and relationships, which can model relation pattern in KG for inferring the missing link with the similar pattern.

Let

ε

denotes the entities and

ℜ

denotes the relations for a knowledge graph, then for each triple

(h, r, t) \in S

, where

h, t \in ε

and

r \in ℜ

. The entities in the KG are usually embedded as vectors, and the score function takes the form

f_{r} (h, t)

, where

h

and

t

represent the embeddings of head entity and tail entity respectively. The

f_{r} (h, t)

measures the potential relation that a triple

(h, r, t)

is an instance of relation

r

.

With the development of the KG, the KG embedding based service recommendation method becomes crucial and has gotten wide attention [17,18,19,20,21,22]. Liu et al. [23] proposed an event- recommendation scheme based on random-walking and historical preference re-ranking. Xie et al. [24] propose the network GAN-based recommendation method for mashup development and construct the mashup-API similarity matrix.

Recently, there are many researches using knowledge graph embedding-based method to service recommendation for SBS development. Mikolov et al. [25] present the Word2Vec word representation learning model in 2013. Accordingly, the representation learning technology has obtained wide attention. The major knowledge representation learning methods include the neural network model [26,27,28], matrix decomposition [29], and translation model [30,31]. TransR [32] is a kind of translation model for knowledge graph. It embeds heterogeneous entities and relations within the same vector space, and represents these entities in the distinct semantic space bridged by relation-specific metrics.

3. Problem Formulation and Framework

We propose a Collaborative Information Embedding (CIE) framework for supporting service recommendation in service-based system development. This framework mainly contains two parts: (1) graph representation learning for embedding; and (2) collaborative learning for recommendation.

3.1. Problem Formulation

The system developer provides the SBS requirement

M r e q

as target SBS for the recommendation model, which mainly contains a detailed description of system function, type, the existing service component, and so on. Then,

M r e q

is send to the service recommendation process, and the results is the service list. Finally, the developer selects the needed service according to his subjective wills to complete SBS development.

3.2. Research Framework

The whole service recommendation process includes two components: (1) offline processing stage and (2) online recommendation stage. Figure 1 shows the framework.

In the offline processing stage, the SBS set

M

and service set

S

in the public registry library should been preprocessed. Then, the steps of this stage are as follows:

Service knowledge graph construction: we construct a new knowledge graph Heterogeneous SBS-service Graph (HSSG) with SBS and service attributes and SBS-service co-invocation matrix.
Recommendation model training: according to the HSSG, we extract the text feature and structure feature of service, and complete the representation learning based on Word2Vec and TransR technologies respectively. Through the co-invocation record between SBSs and services, we get the collaborative vector representation of SBSs and services.

In the online recommendation stage, when target SBS

M r e q

is sent to service recommendation model, the

K

nearest SBSs of target SBS will be selected. Finally, we estimate the relevance of the target SBS to other SBSs, and receive the service recommendation list by the CF method.

4. Knowledge Graph Embedding

In order to seek the potential relations between SBS and service, the related information of them is used to construct the service knowledge graph. Then, the Word2Vec and TransR algorithms are used to embed the service into low-dimension space. The

K

neighbor SBSs of target SBS are selected, and the CF method is used to obtain the recommendation list by calculating the similarities between SBS and service.

4.1. Knowledge Graph Construction

In the public service registry library, the historical data of mashups and APIs are always sparse, and more than 90% of mashups use less than 5 API components. The knowledge graph is a directed graph, which connects knowledge related to entities with complication relations. Therefore, we construct a service knowledge graph to mine the potential relation between target SBS and services.

Considering the number of entities and data quality, we use the PWeb as the original data source for building service knowledge graph. We get over 12,926 APIs and 5657 mashups as entities, and the meta relation structure between entities in PWeb are shown in Figure 2.

Definition 1.

HSSG. According to the relevant knowledge in the service registry library, the knowledge graph

H S S G = < V, E >

, where

V

represents the heterogeneous entities set, and

E

represents the attribute and relation set of the entities.

Definition 2.

SBS Entity. A SBS which attends in the recommendation system is a SBS entity. It has some attributes, e.g., category and description. A concrete SBS is extracted from mashup in PWeb as an individual SBS.

Definition 3.

Service Entity. A service involved in recommendation system is a service entity. It has some attributes, e.g., category, description, and provider. A concrete service is extracted from API in PWeb as an individual service.

Definition 4.

SBS-service Co-invocation. The co-invocation information between SBS and service is represented by the matrix

Y \in R^{| M | * | N |}

, where

M

represents the SBS set, and

N

represents the service set. The element of matrix

y_{m, n}

indicates the invocation information, where

y_{m, n} = 1

means the SBS

m

has invocated the service

n

, and

y_{m, n} = 0

means the SBS

m

hasn’t invocated the service

n

.

The knowledge graph HSSG is constructed according to the meta relation structure in Figure 2. Meanwhile, the examples of SBS and service in HSSG is shown in Figure 3. According this graph, we can obtain many SBSs and services, and they have their own attributes and interrelationships. There are three categories of attribute, e.g., category, description, and provider.

4.2. Embedding Service Entity into Low-Dimension Space

In this section, we present the steps of how to extract a service entity representation from textual knowledge and structural knowledge, respectively.

Definition 5.

Structural Feature. The structural feature of service in the HSSG refers to the affiliation relations (ARs) of service. This model considers the attributes of category and provider as structural feature.

Definition 6.

Textual Feature. The textual feature of service in the HSSG refers to the natural language description of service. This model considers the attribute of description as textual feature.

4.2.1. Textual Embedding

In this section, the word embedding tool Word2Vec is used to vectorize the description of service in HSSG. According to definition 6, the service’s description is regarded as textual feature of service. The word

w

can be represented with vector

e (w)

. In our method, the service’s textual feature is calculated by taking the mean value of all feature words’ vectors in the description. That is, the feature word set is

W = {w_{1}, w_{2}, \dots, w_{| W |}}

, where

| W |

represents the number of words in the service description. The vector representation of the feature word set

E = {e (w_{1}), e (w_{2}), \dots, e (w_{| W |})}

can be obtained by the Word2Vec algorithm, while the textual feature vector representation is defined in the following equation

X = \frac{\sum_{i = 1}^{| W |} e (w_{i})}{| W |} .

(1)

Word2Vec is the method based on neural probability network, which can fully express the similarity between words by learning word in the context window to obtain the word vector representation. For each service

j

, the text embedding of

j

is vectorized by

X_{j}

. Figure 4 shows the steps of obtaining a service’s textual vector representation.

4.2.2. Structural Embedding

Through learning the local information between services, we can obtain the potential structural feature of these entities in the knowledge graph. In this paper, the TransR algorithm is introduced to transform each service entity into the low-dimensional vector to express its structural feature.

To mine the fully structural feature between services, the ARs (category and provider) content of service is taken into account during the processing. According to the approach proposed in [31], the Bayesian TransR version is used to practice the information.

The TransR algorithm completes the graph embedding by learning the relation triple

(v_{h}, r, v_{t}) \in S

, where

S

represents service relation triple set,

v_{h}, v_{t}

represent the head entity and tail entity,

r

represents the relation between them. The triple represents the specific relations with entity pairs in the graph. The formal expression of the embedding process is as follows:

For a given service embedded set

E

, we assume that the relation triples are independent from each other. The objective function

p (S | E)

is to maximize the joint probability of the existence of the relation triples, which is defined in Equation (2).

p (S | E) = \prod_{(v_{h}, r, v_{t}) \in S} p ((v_{h}, r, v_{t}) | E)

(2)

The essence of this method is to map services into different relation spaces. In the different relation spaces, the two vertices with a link can be closed to each other, and the two without a link can be far away from each other. This embedding method can effectively mine the structural features of service entities under multiple relations, which projects the entities into the different relation space to train the association triples. we use the Bayesian TransR version to construct the objective function

p (S | E)

. For the relation triple of service

(v_{h}, r, v_{t}) \in S

, the training result, it is expected that the distance between

v_{h}

and

v_{t}

will be relatively close in the relation space

r

. Otherwise, if the triple

(v_{h}, r, v_{t^{'}}) \notin S

, the distance

v_{h}

and

v_{t^{'}}

will be far away from each other.

This paper mainly studies the two relations categories and a provider, respectively. According to the relation triple in HSSG, we can obtain many triples in the “r-Category” relation, such as (Facebook, Category, Social), (Twitter, Category, Social), (Facebook, Category, Webhooks), and (Twitter, Category, Blogging), etc. Based on the explicit link, we can easily observe that Facebook and Twitter have the same attribute. Meanwhile, because Webhooks and Blogging are the category of Facebook and Twitter respectively, we can find that some potential relation between them through the triple learning. Although there is no direct relation between these entities, the implicit association between them can be reflected through knowledge embedding. Similarly, in the relation of provider, the service entities with the same provider are close to each other in the corresponding space. According to the knowledge transfer in different relation spaces, there may be some association among the providers of the Webhooks and Blogging. Obviously, this graph embedding method can overcome the data sparsity in the service recommendation.

The TransR algorithm represents the entities and relations in the distinct semantic space bridged by relation-specific matrices. The basic idea of representation learning of service’s structure feature is described as follows: first, to represent the structural knowledge, we use the relation triple

(v_{h}, r, v_{t})

in HSSG, where entities are embedded into vectors

v_{h}, v_{t} \in ℜ^{k}

and the relation is embedded into

r \in ℜ^{d}

. For each relation

r

, we set a projection matrix

M_{r} \in ℜ^{k * d}

, which can project entities from the entity space to the relation one.

We use the maximum posteriori probability to train the triple set. Assuming the attribute value

v_{t}

is independent to the service

v_{h}

in each relation

r

, and each relation space projections of service are also independent from each other. Then, the specific expression of existence probability

p (S | E)

is shown in the following equation

p (S | E) = \prod_{(v_{h}, r, v_{t}) \in S} \prod_{v_{t} \in V_{1}, v_{t^{'}} \in V_{2}} p (v_{t} ≻_{v_{h}} v_{t^{'}}) .

(3)

where,

p (v t ≻_{v_{h}} v_{t^{'}})

represents the attribute value

v_{t}

has a higher expectation than

v_{t^{'}}

in the specific relation space, and

≻_{v_{h}}

represents the potential preference expectation of service

v_{h}

. The object function is shown in Equation (4), and our goal is to maximize it during the training period.

p (v_{t} ≻_{v_{h}} v_{t^{'}}) = σ (g_{r} (v_{h}, v_{t}) - g_{r} (v_{h}, v_{t^{'}}))

(4)

where,

σ (x) = \frac{1}{1 + e^{- x}}

is the logistic sigmoid function, and

g (\cdot)

is the score function that represents the relevance between

v_{h}

and

v_{t}

in a specific relation.

The score function

g_{r} (v_{h}, v_{t})

is defined in Equation (5), where the projection matrix

M_{r}

can project the service entity from the entity space to the corresponding relation one

g_{r} (v_{h}, v_{t}) = - {‖ v_{h} M_{r} + r - v_{t} M_{r} ‖}_{2}^{2} .

(5)

According to the characteristics of HSSG, we extend TransR algorithm to a Bayesian version and give the generative process is as follows.

For each service $v_{i} \in V$ , draw $v_{i} \sim Ν (0, λ_{v}^{- 1} I)$ .
For each relation $r \in R$ , draw $r \sim Ν (0, λ_{r}^{- 1} I)$ and $M_{r} \sim Ν (0, λ_{M_{r}}^{- 1} I)$ .
For each extracted quadruple $(v_{h}, r, v_{t}, v_{t^{'}}) \in Q$ , draw the probability $σ (g_{r} (v_{h}, v_{t}) - g_{r} (v_{h}, v_{t^{'}}))$ , where $Q$ is the training set of quadruple satisfying the condition, that $(v_{h}, r, v_{t})$ is a positive sample and $(v_{h}, r, v_{t^{'}})$ is a negative sample.

It is routine to corrupt a correct triple sample

(v_{h}, r, v_{t})

by replacing the tail entity with wrong entity of same type, then construct an incorrect triple sample

(v_{h}, r, v_{t^{'}})

. The step 3 shows that when the score function of a correct triple is larger than that of an incorrect one, the quadruple is much easier to adopt.

Through the service structural embedding by the Bayesian TransR version, we use the embedding vector

V_{j}

to represent the service

j

.

4.3. Collaborative Learning

According to the HSSG, we consider the pairwise ranking between entities for learning. Specifically, when

R_{i j} = 1

and

R_{i j^{'}} = 0

, we say that entity

i

prefers entity

j

over

j^{'}

, and use the pairwise preference probability

p (j > j^{'}; i | Θ)

to denote it. Here

Θ

represents the parameters in the model.

During the collaborative learning, we use a latent vector

η_{j}

as the representation for service

j

. To simultaneously obtain a service latent representation in collaborative learning and representation in the HSSG, the service entity latent vector should be expressed as follows:

e_{j} = η_{j} + V_{j} + X_{j} .

(6)

where,

V_{j}

the represents latent structural vector,

X_{j}

represents the latent textual vector, respectively.

The pairwise preference probability function is the object function, which is given in Equation (7).

p (j > j^{'}; i | Θ) = σ (u_{i}^{T} e_{j} - u_{i}^{T} e_{j^{'}})

(7)

The generative process of our framework CIE by using collaborative learning is given in Table 1. We use the general prior density to initialize each entity, and employ the normal distribution of zero mean and covariance matrix.

Table 1. The generative process of CIE framework.

Input: Information from HSSG, service description set

t^{S}

, SBS-service co-invocation matrix

M_{r}

, triple relation of service set

D

, epochs number

n

Output: Weight parameters

λ, r, M_{r}, t^{S}

Initialize

v_{i} \sim Ν (0, λ_{v}^{- 1} I)

to each service entity

v_{i} \in V

.
Initialize

r \sim Ν (0, λ_{r}^{- 1} I)

and

M_{r} \sim Ν (0, λ_{M_{r}}^{- 1} I)

, respectively.
According to Equations (5)–(7) to compute

t^{S}

.
Initialize a latent service offset vector

η_{j} \sim Ν (0, λ_{I}^{- 1} I)

to service

j

.
Set the service

j

latent vector as:

e_{j} = η_{j} + V_{j} + X_{j}

.
Set the SBS

i

latent vector as:

u_{i} \sim Ν (0, λ_{U}^{- 1} I)

For Do

e p o c h s = 1, 2, \dots, n

According to quadruple

(v_{h}, r, v_{t}, v_{t^{'}}) \in Q

, set the probability

σ (g_{r} (v_{h}, v_{t}) - g_{r} (v_{h}, v^{t^{'}}))

.

(i, j, j^{'}) \in D

, set probability

σ (u_{i}^{T} e_{j} - u_{i}^{T} e_{j^{'}})

, optimize parameters.
End For

where,

Q

is a set of training quadruple, where each

(v_{h}, r, v_{t})

is a correct sample, and each

(v_{h}, r, v_{t^{'}})

is an incorrect sample,

D

is a set of training triple, where

(i, j, j^{'})

satisfies the condition that

R_{i j} = 1

and

R_{i j^{'}} = 0

(

j^{'}

is randomly sampled from SBS

i

’s un-invocated service).

Parameters Learning. The computing process of full posterior probability of parameter is extremely hard. We put the information embedding to train the model, and maximize the posterior probability of

u, e, r, M_{r}, λ

, which is equivalent to maximize the loglikelihood in Equation (8).

\begin{array}{l} L = \sum_{(i, j, j^{'}) \in D} I n σ (u_{i}^{T} e_{j} - u_{i}^{T} e_{j^{'}}) \\ + \sum_{(v_{h}, r, v_{t}, v_{t^{'}})} I n σ ({‖ v_{h} M_{r} + r - v_{t} M_{r} ‖}_{2}^{2} - {‖ v_{h} M_{r} + r - v_{t^{'}} M_{r} ‖}_{2}^{2}) \\ - \frac{λ_{I}}{2} \sum_{j} {‖ u_{j} - W_{j} - Y_{j} ‖}_{2}^{2} - \frac{λ_{I}}{2} \sum_{j} {‖ e_{j} - V_{j} - X_{j} ‖}_{2}^{2} \\ - \frac{λ_{x}}{2} \sum_{j} {‖ u_{j} ‖}_{2}^{2} - \frac{λ_{v}}{2} \sum_{v} {‖ v ‖}_{2}^{2} - \frac{λ_{r}}{2} \sum_{r} {‖ r ‖}_{2}^{2} - \frac{λ_{M}}{2} \sum_{r} {‖ M_{r} ‖}_{2}^{2} \end{array}

(8)

The aim is to maximize the object function in Equation (8). We use the stochastic gradient descent (SGD) algorithm to iterate.

Service Recommendation. The service recommendation for a SBS development is according to the ranking criterion. We use

r (m_{i}, s_{j})

to represent the relevance of SBS

m_{i}

and service

s_{j}

, and calculate the cosine distance between them which is shown in Equation (9).

r (m_{i}, s_{j}) = u_{i}^{T} e_{j}

(9)

If it has the existing relation of

u_{i}^{T} e_{j_{1}} > u_{i}^{T} e_{j_{2}} > \dots > u_{i}^{T} e j_{n}

with the SBS

m_{i}

, the service recommendation ranking should be

j_{1} > j_{2} > \dots > j_{n}

.

5. Experiment and Evaluation

We use the real-world data from PWeb to compare our method with several traditional recommendation methods. To demonstrate the advantage of our proposal, we compare CIE with four methods: CF, SVD, TF-IDF, PaSRec. Furthermore, we use two evaluation measures to examine the intrinsic nature of CIE framework.

5.1. Experimental Setting

We obtained 12,962 real APIs (regarded as services) and 5657 real mashups (regarded as SBS) from PWeb. The statistical parameters of the experimental dataset is summarized in Table 2.

Dataset Preparation. In the dataset, the functionality of SBS and service are embodies in their textual descriptions and categories. The category attribute is manually added by the PWeb administrators. We perform the following four steps to preprocess the extracted data: spelling correction, tokenization, stopword removal, and lemmatization. After the preparation of the dataset, we use Word2Vec to train the textual embedding, and adopt Skip-gram as the network architecture and Hierarchical SoftMax as the optimization model. Here, the window width and the vector dimensionality are set to 5 and 80, respectively.

Two service recommendation scenarios were selected for the experiment, and the specific descriptions are summarized as follows:

(1): SRec-1: The given SBS requirements include a detailed description of system functions and possible type information of the system.
(2): SRec-2: The given SBS requirements include a detailed description of system functions, possible type information of the system, and some existing service components.

The main difference between the two recommendation scenarios is that SRec-1 mainly recommends services for new SBS; while SRec-2 mainly extends or updates service components for the existing SBS.

In order to evaluate the effectiveness of recommendation methods in two recommendation scenarios. Three sets of training and testing sets were created during the experiment. Among them, one group is used for SRec-1, and two groups are used for SRec-2. The specific descriptions are summarized as follows:

(1): For SRec-1, randomly select 80% of SBS as the training set and the remaining 20% as the test set.
(2): For SRec-2, the difference between the two sets of experimental data is that the number of service components included in the SBS in the test set is different. In SRec-2 (1), SBS with 30% of service components greater than 1 will be randomly selected as the test set, while the rest will be used as the training set; In SRec-2 (2), SBS with 30% of service components greater than 2 will be randomly selected as the test set, while the rest will be used as the training set. SRec-2 (n) “means randomly selecting n services from the SBS of the test set as components with known requirements, and the remaining ones are used for evaluation.

Parameters Setting. Hyper-parameter settings of our method are given to modify equation (8) since the best performance is achieved. These parameters are learned by SGD of 1000 iterations. Then, we complete the latent low-dimension embedding with SBS and service. In the structural embedding process,

λ_{x}, λ_{v}, λ_{r}

are set to 0.001 and

λ_{M}

is set to 0.01. In the collaborative learning process,

λ_{I}

is set to 0.0025.

5.2. Evaluation Metrics

We adopt two metrics to evaluate the service recommendation performance of the CIE method.

Mean Average Precision (MAP): at top N services in the ranking list are defined as in the following equation

M A P @ N = \frac{1}{| {C S}_{m r e q} |} \sum_{i = 1}^{N} (\frac{N_{i}}{i} \cdot I (i)) .

(10)

where,

{C S}_{m r e q}

represents the number of component services of target SBS,

N_{i}

represents the number of component services of SBS

i

, and

I (i)

represents whether the service at ranking list

i

is a component service of target SBS.

Normalized Discounted Cumulative Gain (NDGG): at top N services in the ranking list are defined as in the following equation:

N D C G @ N = \frac{1}{{I D C G}_{N}} \sum_{i = 1}^{N} \frac{2^{I (i) - 1}}{\log 2 (1 + i)} .

(11)

where,

{I D C G}_{N}

represents the ideal maximum

D C G

score that can be obtained for target SBS.

The structural information embedding component aims to extract richer semantic features of the service itself using the network embedding method TransR. That section explores whether embedding more relational triplet information can improve recommendation performance. Two sets of embedded words were prepared and their different combinations were used as inputs to the model. Structured knowledge was manually extracted using services from cloud service supermarkets (CloudCRM,) with a scale similar to the amount of experimental data obtained from PWeb.

The SaaS service management platform CloudCRM developed is based on the open source system SugarCRM for development and designed by our research group. By registering a large number of publicly available web services, and we provide on-demand service recommendations to users based on the multi-tenant technology. The following will focus on the specific application of the recommended method on this platform. This platform can register a large number of heterogeneous service resources and recommend services based on user needs.

Before recommending services on the platform, we use the RGPS based service clustering method to organize SBS and services in CloudCRM. Based on the clustering results of the registration service, it determines the service’s domain (classification) information and corresponding topic cluster information to facilitate the platform’s management of the service and effectively promote service recommendation.

During the service registration process on the platform, it is necessary to enter the basic information of the service. If the registered service is the latest published service, the platform will allocate topics to the service based on the “entered information”. Based on the service description and other tags filled in by the registrant, as well as the selected domain information, the platform provides the registrant with the top 5 topic clusters that are most relevant to the service description information for selecting. Unlike with field, the topic cluster has not been officially named. In order to facilitate registrants in choosing a topic, the platform will provide the 10 most frequently occurring words in the topic as a selection reference. At the same time, registrants need to provide information about the users, goals, and processes of the service, which is conducive to a more comprehensive collection of service characteristics. On the cloud supermarket platform, it is also possible to obtain the features of domain services from the four dimensions of RGPS. The acquisition of this information is beneficial for feature expansion of semantically sparse services.

In the CloudCRM platform, topic clustering is completed during the service registration process, and it is beneficial for the platform to organize and manage service resources and promote the effectiveness of service recommendations.

In the specific service recommendation process, the platform provides two demand acquisition modes based on the recommendation scenarios of two system requirements. For Recommended Scenario 1 (SRec-1) uses a “natural language description” approach to enter the requirement information. For recommendation scenario 2 (SRec-2) uses a “progressive selection” approach to enter required information and included service component information. After this stage, the platform will analyze the system requirements entered by the developers, and provide a service recommendation list based on the methods of our work. Finally, the platform will return the “recommended service list” to the user for selection.

5.3. Performance Comparison

We choose several state-of-the-art recommendation methods for comparison. Considering that our method incorporates knowledge graph embedding and collaborative filtering, and these selected methods cover the CF method, matrix factorization method, content-based method and hybrid method.

CF [33]: it is a classical recommendation method that has been widely used in many recommendation scenarios.
SVD [34]: it is a classical matrix factorization method used in recommendation system.
TF-IDF [35]: it is a content-based recommendation method, which recommends services according to the cosine distance similarity with SBS requirements.
PaSRec [36]: it is a latest service recommendation method, which is a hybrid approach by integrating content with CF and exploits the knowledge graph composed of mashups and services as well as implicit feedbacks.

In this section, we conduct a series of experiments to evaluate our proposed method on a real-world dataset. The performance results of different methods are given in Figure 5a,b.

Figure 5 shows the performance results of different methods. From the results, our method exhibits improvements over the others. We can obtain the following observations: (1) the three CF-based methods (CF, SVD, PaSRec, CIE), the CIE and the PaSRec perform the better on all evaluation metrics. The limitations of the CF and SVD exist in that they cannot efficiently draw the co-invocation information between SBS and service; (2) As the content-based method, the TF-IDF draws the textual contents of SBS and service and then recommends a similar service to SBS. TF-IDF performs better than CF and SVD. The results indicates that textual information of SBS and service can provide substantial support for service recommendation; (3) the PaSRec is better than the CF, SVD, and TF-IDF on all metrics. It is a hybrid method by merging the textual and structural information of SBS and service with the CF. The progressiveness of this method shows the effectiveness of the hybrid service recommendation method. Even though this method considers more comprehensive information of SBS and service, our model still achieves better performance by using the knowledge graph embedding to represent the heterogenous entity.

6. Conclusions

This method performs representation learning on heterogeneous objects (SBS and Service) in the network, and fully capturing the implicit associations between objects and further improving the service recommendation effect. This method is mainly divided into two stages: the feature information extraction stage, which combines the heterogeneous network embedding the TransR technology and the word embedding Word2vec technology to extract the structural features and content features of services in the network and vectorization of them. In the collaborative joint learning stage, based on the combined call information of SBS-Service, and the unified representation learning is performed on SBS and Service in heterogeneous information networks. Service recommendation is implemented based on the distance between SBS and Service in vector space.

We proposed a knowledge graph embedding based service recommendation method for the service-based system development (CIE), which can capture the potential relations between SBS and service in the recommendation by using the knowledge graph embedding algorithms. Considering that the textual and structural content of service and the co-invocation record of SBS-service is also crucial in the recommending process. Our method further introduced collaborative filtering technology with representation learning. At last, the experiments performed on the real-world dataset of PWeb demonstrated that the proposed method can achieve a great improvement compared with several recommendation methods. Besides, our research sheds light on the heterogeneous information in the KG, which can be used in the more application scenarios. In the future, we will try to modify our proposed method with the larger knowledge base, and integrate more heterogeneous information to improve service recommendation.

Author Contributions

All authors contributed to the study conception and design. Conceptualization, F.X. and K.P.; methodology, Y.Z. and O.K.; software, O.K; validation, K.P.; formal analysis, F.X. and Y.Z.; investigation, F.X.; data curation, K.P.; writing—original draft preparation, F.X., K.P. and Y.Z.; writing—review and editing,F.X., Y.Z. and O.K.; visualization; Y.Z.; and K.P.; supervision, O.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Key Project of Hubei Education Department under Grant No. D20201402; the Natural Science Foundation of Hubei Province under Grant No. 2020CFB807; the Science Start-up Foundation for High-level Talents of HBUT under Grant No. 430100391.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wortmann, H.; Alblas, A. Product Platform Life Cycles: A Multiple Case Study. Int. J. Technol. Manag. 2009, 48, 188. [Google Scholar] [CrossRef]
Resnick, P.; Iacovou, N.; Suchak, M.; Bergstrom, P.; Riedl, J. Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative work, Chapel Hill, NC, USA, 22–26 October 1994. [Google Scholar]
Guo, Q.; Zhuang, F.; Qin, C.; Zhu, H.; Xie, X.; Xiong, H.; He, Q. A survey on knowledge graph-based recommender systems. IEEE Trans. Knowl. Data Eng. 2020, 34, 3549–3568. [Google Scholar] [CrossRef]
Adomavicius, G.; Manouselis, N.; Kwon, Y. Multi-criteria recommender systems. In Recommender Systems Handbook; Springer: Boston, MA, USA, 2011; pp. 769–803. [Google Scholar]
Chen, J.-L.; Hembara, N.O.; Hvozdyuk, M.M. Nonstationary Temperature Problem for a Cylindrical Shell with Multilayer Thin Coatings. Mater. Sci. 2018, 54, 339–349. [Google Scholar] [CrossRef]
Yu, X.; Ren, X.; Sun, Y.; Gu, Q.; Sturt, B.; Khandelwal, U.; Norick, B.; Han, J. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, NY, USA, 24–28 February 2014; pp. 283–292. [Google Scholar] [CrossRef]
Dong, X.; Gabrilovich, E.; Heitz, G.; Horn, W.; Lao, N.; Murphy, K.; Strohmann, T.; Sun, S.; Zhang, W. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014. [Google Scholar]
Xiong, R.; Wang, J.; Zhang, N.; Ma, Y. Deep hybrid collaborative filtering for web service recommendation. Expert Syst. Appl. 2018, 110, 191–205. [Google Scholar] [CrossRef]
Yu, Q.; Zheng, Z.; Wang, H. Trace norm regularized matrix factorization for service recommendation. In Proceedings of the 2013 IEEE 20th International Conference on Web Services, Santa Clara, CA, USA, 28 June–3 July 2013. [Google Scholar]
Liu, J.; Tang, M.; Zheng, Z.; Liu, X.; Lyu, S. Location-aware and personalized collaborative filtering for web service recommendation. IEEE Trans. Serv. Comput. 2015, 9, 686–699. [Google Scholar] [CrossRef]
Beshley, M.; Kryvinska, N.; Beshley, H.; Kochan, O.; Barolli, L. Measuring end-to-end delay in low energy SDN IoT Platform. Comput. Mater. Contin. 2021, 70, 19–41. [Google Scholar] [CrossRef]
Tian, G.; Wang, J.; He, K.; Sun, C.; Tian, Y. Integrating implicit feedbacks for time-aware web service recommendations. Inf. Syst. Front. 2017, 19, 75–89. [Google Scholar] [CrossRef]
Zheng, Z.; Ma, H.; Lyu, M.R.; King, I. Wsrec: A collaborative filtering based web service recommender system. In Proceedings of the 2009 IEEE International Conference on Web Services, Los Angeles, CA, USA, 6–10 July 2009. [Google Scholar]
Su, J.; Beshley, M.; Przystupa, K.; Kochan, O.; Rusyn, B.; Stanisławski, R.; Yaremko, O.; Majka, M.; Beshley, H.; Demydov, I.; et al. 5G multi-tier radio access network planning based on voronoi diagram. Measurement 2022, 192, 110814. [Google Scholar] [CrossRef]
Deng, Z.H.; Huang, L.; Wang, C.D.; Lai, J.H.; Philip, S.Y. DeepCF: A unified framework of representation learning and matching function learning in recommender system. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 61–68. [Google Scholar]
Hsieh, C.K.; Yang, L.; Cui, Y.; Lin, T.Y.; Belongie, S.; Estrin, D. Collaborative metric learning. In Proceedings of the International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 193–201. [Google Scholar]
Wang, X.; Liu, X.; Liu, J.; Chen, X.; Wu, H. A novel knowledge graph embedding based API recommendation method for Mashup development. World Wide Web 2021, 24, 869–894. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, J.; Luo, J. Knowledge graph embedding based collaborative filtering. IEEE Access 2020, 8, 134553–134562. [Google Scholar] [CrossRef]
Wang, H.; Wang, Z.; Hu, S.; Xu, X.; Chen, S.; Tu, Z. DUSKG: A fine-grained knowledge graph for effective personalized service recommendation. Future Gener. Comput. Syst. 2019, 100, 600–617. [Google Scholar] [CrossRef]
Grad-Gyenge, L.; Filzmoser, P.; Werthner, H. Recommendations on a knowledge graph. In Proceedings of the 1st International Workshop on Machine Learning Methods for Recommender Systems, Vancouver, BC, Canada, 30 April–2 May 2015. [Google Scholar]
Yin, Y.; Yu, F.; Xu, Y.; Yu, L.; Mu, J. Network location-aware service recommendation with random walk in cyber-physical systems. Sensors 2017, 17, 2059. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jiang, Z.; Liu, H.; Fu, B.; Wu, Z.; Zhang, T. Recommendation in heterogeneous information networks based on generalized random walk model and bayesian personalized ranking. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 5–9 February 2018. [Google Scholar]
Chen, J.; Su, J.; Kochan, O.; Levkiv, M. Metrological software test for simulating the method of determining the thermocouple error in situ during operation. Meas. Sci. Rev. 2018, 18, 52–58. [Google Scholar] [CrossRef] [Green Version]
Xie, F.; Chen, L.; Ye, Y.; Zheng, Z.; Lin, X. Factorization machine based service recommendation on heterogeneous information networks. In Proceedings of the 2018 IEEE International Conference on Web Services (ICWS), San Francisco, CA, USA, 2–7 July 2018. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Jacyna, M.; Semenov, I. Models of vehicle service system supply under information uncertainty. Eksploat. I Niezawodn. Maint. Reliab. 2020, 22, 694–704. [Google Scholar] [CrossRef]
Huang, X.; Fang, Q.; Qian, S.; Sang, J.; Li, Y.; Xu, C. Explainable interaction-driven user modeling over knowledge graph for sequential recommendation. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019. [Google Scholar]
Lyu, Y.; Zhang, Q.; Chen, A.; Wen, Z. Interval Prediction of Remaining Useful Life based on Convolutional Auto-Encode and Lower Upper Bound Estimation. Eksploat. I Niezawodn. Maint. Reliab. 2023, 25, 165811. [Google Scholar] [CrossRef]
Zhang, F.; Yuan, N.J.; Lian, D.; Xie, X.; Ma, W.Y. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
Chen, J.; Yatskiv, V.; Sachenko, A.; Su, J. Wireless sensor networks based on modular arithmetic. Radioelectron. Commun. Syst. 2017, 60, 215–224. [Google Scholar] [CrossRef]
Xu, W.; Cao, J.; Hu, L.; Wang, J.; Li, M. A social-aware service recommendation approach for mashup creation. In Proceedings of the 2013 IEEE 20th International Conference on WEB Services, Santa Clara, CA, USA, 28 June–3 July 2013. [Google Scholar]
Paterek, A. Improving regularized singular value decomposition for collaborative filtering. In Proceedings of the KDD Cup and Workshop, San Jose, CA, USA, 15 August 2007; pp. 5–8. [Google Scholar]
Xia, B.; Fan, Y.; Tan, W.; Huang, K.; Zhang, J.; Wu, C. Category-aware API clustering and distributed recommendation for automatic mashup creation. IEEE Trans. Serv. Comput. 2015, 8, 674–687. [Google Scholar] [CrossRef]
Liang, T.; Chen, L.; Wu, J.; Dong, H.; Bouguettaya, A. Meta-path based service recommendation in heterogeneous information networks. In Proceedings of the International Conference on Service-Oriented Computing, Banff, AB, Canada, 10–13 October 2016; Springer: Cham, Switzerland, 2016; pp. 371–386. [Google Scholar]

Figure 1. The flowchart of Collaborative Information Embedding (CIE) framework.

Figure 2. The meta relation structure in PWeb.

Figure 3. The examples of SBS and service in HSSG.

Figure 4. An example of service’s textual vector representation using Word2Vec.

Figure 5. (a) The MAP@N results and (b) The NDCG@N results.

Table 2. The detailed statistic of the dataset.

Statistics	Value
number of services	12,926
number of SBSs	5657
number of SBS-service co-invocation	22,639
number of categories of SBS	6598
number of categories of service	19,360
number of provider of service	305
average number of words in SBS	40.56
average number of words in a service	24.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, F.; Zhang, Y.; Przystupa, K.; Kochan, O. A Knowledge Graph Embedding Based Service Recommendation Method for Service-Based System Development. Electronics 2023, 12, 2935. https://doi.org/10.3390/electronics12132935

AMA Style

Xie F, Zhang Y, Przystupa K, Kochan O. A Knowledge Graph Embedding Based Service Recommendation Method for Service-Based System Development. Electronics. 2023; 12(13):2935. https://doi.org/10.3390/electronics12132935

Chicago/Turabian Style

Xie, Fang, Yiming Zhang, Krzysztof Przystupa, and Orest Kochan. 2023. "A Knowledge Graph Embedding Based Service Recommendation Method for Service-Based System Development" Electronics 12, no. 13: 2935. https://doi.org/10.3390/electronics12132935

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Knowledge Graph Embedding Based Service Recommendation Method for Service-Based System Development

Abstract

1. Introduction

2. Materials and Methods

2.1. CF-Based Service Recommendation

2.2. Knowledge Graph Embedding

3. Problem Formulation and Framework

3.1. Problem Formulation

3.2. Research Framework

4. Knowledge Graph Embedding

4.1. Knowledge Graph Construction

4.2. Embedding Service Entity into Low-Dimension Space

4.2.1. Textual Embedding

4.2.2. Structural Embedding

4.3. Collaborative Learning

5. Experiment and Evaluation

5.1. Experimental Setting

5.2. Evaluation Metrics

5.3. Performance Comparison

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI