In this section, we first outline the experimental objectives and the dataset used in our experiments. Then, we conduct four sets of experiments to verify the proposed hybrid method based on a well-known real dataset Epinions widely applied in social networks. Afterwards, we make detailed analyses employing two common accuracy metrics in recommendation systems based on the results of related experiments.
5.1. Experimental Objectives and Dataset Description
To demonstrate the feasibility, effectiveness and advantages of our proposed hybrid recommendation method, we design and conduct three sets of experiments. Experiment 1 is designed to verify the feasibility of the proposed hybrid method. Experiment 2 is to make comparisons between the traditional CF method and the improved CF method employing PCC based on the social trust network. Experiment 3 is to demonstrate the advantage of introducing LSI factor into the proposed method. Experiment 4 is to show the advantage of employing the two-step way of determining reference users of target users.
In the following experiments, we select the real dataset Epinions in a social network and download them from the website (
http://www.trustlet.org/downloaded_epinions.html) as the data source of our experiments. The dataset was collected by Paolo Massa in a 5-week crawl (November/December 2003) from the Epinions.com website. It consists of two data files: ratings_data.txt and trust_data.txt. The former file is made up of triple tuples: user_id, item_id and rating_value, in which user_id is in range [1, 49,290], item_id is in range [1, 139,738], and rating_value is in range [1, 5]. In addition, the latter file is composed of trust relationship triple tuples shaped like source_user_id (i.e., trustor_id), target_user_id (i.e., trustee_id), trust_statement_value, among which the trust_statement_value only uses 1 to represent trust.
5.2. Preliminary Work
In this section, we first outline how to construct users’ social trust networks based on an Epinions dataset, then describe how to determine reference users in later experiments, next depict our general experimental idea, and finally make necessary preparations for the following experiments.
To facilitate the descriptions of next experiments, we rename the two files from Epinions dataset to ratings.txt and trust.txt, respectively. Meanwhile, we delete the extra explanation information in these two files and improve the data format to be available in a Python programming environment. In addition, for the ease of constructing users’ social trust network, we use 0 to denote no trust relationship between two users, as we have mentioned in
Section 4.1. It is equivalent to complementing a zero value to trust_statement_value not included in trust.txt file, which will not have an effect to experimental results.
Since the trust.txt file derived from Epinions has provided us with an existing trust relationship, it is easy for us to construct users’ social trust network according to Algorithm 1. In fact, the set U concerns all users whose ID ranges in [1, 49290]. T and W can also be acquired according to the trust.txt file. Considering that the LSI factor included in W is only used by a small amount of randomly selected target users, we take a simplified way to deal with it in later experiments. That is, we begin to calculate LSI only when we are ready to use it in later prediction.
Based on the social trust network, for a certain target user who needs to be offered the rating prediction about an unknown item , the preliminary job is to select the appropriate reference users for . Considering that the Epinions dataset provides us with rich trust relationship, in the following experiments, we adopt a simplified way to search for the reference users of . That is, determining reference users of a target user is only based on the direct trust relationship. Meanwhile, considering Leave One Out method mentioned later, to discover the reference users of a certain target user is thus equivalent to search for the users who satisfy the following conditions at the same time:
- (1)
who have been trusted by the target user, i.e., direct trust users of the target user in their social trust network;
- (2)
who have rated the target item that is unknown for the target user;
- (3)
who have rated at least 3 items;
- (4)
who have the similarities to the target user not smaller than the designated threshold of PCC.
In order to achieve our experimental objectives, we intend to utilize a Leave One Out evaluation method as mentioned by Liu et al. in their work [
53] to perform all the experiments. The idea of a Leave One Out method is to hide one or more known ratings of a target user on purpose and then attempt to predict them successfully, aiming to compare between the prediction results and real ratings. In doing so, we first randomly determine one or more target users who have rated at least 3 items, then randomly select one item among those rated by target users as the unknown item to be predicted later. Next, we describe the idea to realize this process in Python programming.
For ease of use later, we first count rated items for each user in ratings.txt and keep the result in a list of counts for all users. Then, we randomly generates an integer ranging in [1, 49290], which is called user ID. In addition, we use this user ID as a position index to search the list of counts to see if the corresponding count is equal to or bigger than 3. If it is, then we select this user as a target user. If not, we repeat the process until we find one. After we determine the user ID of a target user, then we generate another random number which is not bigger than the item count that the target user has rated in order to determine the order of the rated item to be selected. By this way, now we can acquire the user ID of the selected target user and the item ID which is regarded as the unknown item to be predicted later.
In addition, we select all the users who have rated at least three items and save their related rating information to a new list; each element in the list is a triple tuple including user ID, rated item ID, and item rating. In fact, the total number of users who have rated at least three items is 28487.
In order to facilitate the following experiments, all related algorithms needed in experiments are realized in Python 3.7.0. The operating system environment is Microsoft Windows 7 (64 bit), Service Pack 1 (Microsoft Corporation, Seattle, WA, USA). The CPU is Intel (R) Core (TM) i5-4300U @ 1.90GHz 2.50GHz (Intel Corporation, Santa Clara, CA, USA). RAM is 8.00 GB (7.73 GB available).
5.3. Experiment 1: Verifying the Feasibility of the Proposed Method by an Example
In this experiment, we verify the proposed method by a simple example and perform the experiment in detail according to the procedures in
Section 4. To attain this goal, we utilize the Leave One Out evaluation method mentioned above.
Now, let’s take an example in detail. We randomly generate a target user
whose ID is 11155, and randomly select one of his rated item as an unknown item
to be predicted later, whose ID is 3588. We denote them as
and
separately. In fact,
has rated 27 items and the detailed rating information is shown in
Table 1. From the table, we can see that the real rating of item
is 4, represented as
. In this experiment, we set the threshold of PCC to be
. Next, we will employ our proposed hybrid method to predict the rating
of the item
for the target user
.
To search for the reference users of
, the first thing we need to do is to determine the initial reference users being trusted by
according to the trust relationship in their social trust network. Meanwhile, these users should have rated the item
. In terms of the above conditions and Step 1 in Algorithm 2 of
Section 4.2, we get the initial reference users of
, including
. Based on the Step 2 in Algorithm 2, we obtain their common rated items with
and similarities to
, which are listed in
Table 2. Here, we can find that the minimum of the common rated items should be 2 in order to be able to calculate PCC similarity later. Since the threshold of PCC similarities is set to be
, it is easy to see that the final reference users of
are
and
.
Next, we need to calculate each LSI value of two reference users to the target user, i.e., LSI (
) and LSI(
). According to Equation (
5), we can realize an algorithm to find all trustors of each reference user by traversing each line of the trust.txt file. Here, we obtain the count of each reference user, which is 12 and 25, respectively. Then, we can get
,
. In terms of Equation (
4) in
Section 4.3, we can further simplify the prediction rating formula of
for the target user
as the following Equation (
7):
where we can calculate average results according to their known ratings:
(Note: the rating
is not included),
,
; while known ratings
(i.e., the rating of
to the target item
),
(i.e., the rating of
to the target item
), thus we can further get the final rating prediction for the target user, as shown in the following Equation (
8):
Compared with the real rating of , i.e., , it is easy to see, the Mean Absolute Error i.e., MAE = .
If we take traditional CF method i.e., Equation (
3) to predict the rating, for this example, we get the rating prediction value
. Similarly, we can get the rating prediction using improved CF method. The comparison experiment results of three methods in which we take one step and two steps in Algorithm 2 separately to select reference users are shown as
Table 3. For ease of description, from now on, we abbreviate traditional CF method as t-CF, improved CF method as i-CF, and the hybrid method combining LSI factor and improved CF method as h-CF.
From the results in
Table 3, we can find that the proposed hybrid method in this example achieves great improvement in rating prediction accuracy. However, the experiment with only one sample is far from enough. MAE and RMSE (Rooted Mean Squared Error) are the same and simple in this example because we randomly take only one item as the target item to be predicted ratings. In later sections, we will conduct experiments with great amounts of random samples and define MAE and RMSE in this study to further verify the effectiveness of the proposed method.
After predicting multiple unknown ratings successfully by repeating the same process above, we can make recommendations by offering the Top-k rank items with higher ratings to the target user. Here, we won’t describe that due to its ease of understanding.
So far, the example in this section accomplishes the goal of presenting the feasibility of the proposed method in rating prediction. In addition, it can offer subsequent help in making recommendations for target users.
5.4. Preparing Data for Experiment 2 and Experiment 3
Considering the occasional cases may often occur in experiments with a small size of samples when verifying experimental conclusions, we base our experiments on a certain amount of random samples. To achieve this goal, we first produce 100 groups of random target users. In each group, there is a common random item that is rated by each target user in this group. Afterwards, we should determine if all these groups meet the needs of the following experiments. If not, they should be omitted. The data preparation for later experiments are described as follows.
5.4.1. Determining Target Users in Each Group
In the previous section, our experiment is based on a single random target user and one of his random rated items. In this experiment, we will determine a lot of target users divided into 100 groups. For each group, we first randomly select one user from the list of those who have rated at least three items as the first target user and one of his rated items as the target item to be predicted by using the same way mentioned in
Section 5.3. Then, we search the whole trust network for all users who have rated the same items as the first target user, and save them including related rating information in order to make rating predictions for all target users in this group about the same target item at a time.
For example, when we randomly determine the first target user and the target item , we can search the ratings.txt file for all the users who have rated item and save all the related information in a file. Each line in this result file has the same structure as ratings.txt and looks like “”. Repeat the same process as above until we generate 100 files i.e., 100 groups of target users, accompanied with 100 corresponding target items, and respective ratings from each of the target users in 100 groups. This means that each file may consist of multiple user IDs, the same target item ID, and different ratings to the same item.
After that, we get 100 random item IDs, respectively, from 100 group files, as listed in
Table 4. Then, we check 100 corresponding group files and filter out 20 groups that consist of a few quantity of reference users from 1 to 4 in order to avoid the PCC’s calculation failure later due to no common rated items. The 20 deleted target item IDs are shown as
Table 5. The remaining 80 groups are kept for the next job.
5.4.2. Determining Reference Users for Target Users in Each Group
According to our proposed method, determining reference users for each target user in one group need two steps as described in
Section 4.2. In Step 1, we select the initial reference users from the trust network
G for each target user in each group. In Step 2, we calculate PCC similarities between each reference user and the current target user and filter out the final reference users according to the predefined PCC similarity threshold.
In Step 1, when we attempt to search each initial reference user for each target user in a group, we need to traverse each line of trust.txt and meanwhile traverse the group file to check if the trustor in the current line is a target user in the current group. If so, then we take the trustee in the current line of trust.txt out and further check if the trustee has rated the same target item (i.e., to find if the trustee is also on the list of target users in the current group file). If also so, then save his corresponding user ID to a list of initial reference users of the current target user; if not, continue to traverse by the same way until to the end of any file: group file or trust.txt file. All lists of the initial reference users of each target user in the same group form a new list and are also saved in a new file.
Based on the above operation, we can get 80 files about the initial reference users matching 80 group files one by one. In order to guarantee the normal execution of next operation, we then check each of them to pick out the files in which include at least one target user whose list of the initial reference users is not empty. As a result, eight files have no initial reference users of any target user and thus they are deleted after this check, in which corresponding target item IDs concerns are listed in
Table 6.
In Step 2, we need to further filter reference users for all the target users according to the given threshold of PCC similarities. Before this, we should calculate PCC similarities between each target user and each of his reference users. In fact, before we compute PCC similarities, as mentioned in Experiment 1, we should check if the minimum of common rated items between each target user and each of his initial reference users is 2, and, meanwhile, the target item is included. If not, the value of PCC similarity can not be calculated, i.e., becomes meaningless. Based on this idea, we further filtering out one invalid file which includes the target item ID 71858. Now, we have 71 files for next calculation of PCC similarities and later ratings’ prediction. For the ease of later experiments, we take a simplified way to carry out later experiments. In Experiment 2, we in essence adopt the default threshold to determine the reference users. In Experiment 3, we just verify the role of the two-step way to determine reference users in rating prediction. Both of two experiments reveals the feasibility and effectiveness of the proposed method.
5.5. Experiment 2: Comparing Five Methods Based on PCC Similarity
In this section, we aim to make comparisons among five different methods based on PCC: the traditional CF method (t-CF), RTCF, f-PCC, the improved CF method (i-CF) and the hybrid recommendation method (h-CF), wherein the RTCF method is proposed by Parham Moradi et al. in Ref. [
1] and f-PCC by Shuai Ding et al. in Ref. [
33]. To achieve this goal, we attempt to conduct five sets of experiments based on the obtained 71 files in the previous section, separately employing five methods based on PCC, i.e., Equation (
3), Equation (
9), Equation (
10), Equation (
4), and Equation (
6). According to the descriptions in
Section 5.4, obviously, we do not set the PCC similarity threshold for reference users in any group. This means that the values of PCC similarities vary in range [−1, 1] in this section.
According to the RTCF method, when predicting an unknown target item for a target user who satisfies both conditions proposed in this study; the calculation formula is shown in Equation (
9):
where
. In fact, the trust value
between user
i and
j is always regarded as 1 when we only consider direct trust relationship. The
can be calculated according to Equation (
1).
Similarly, we can derive the prediction formula of the f-PCC adapted from Ref. [
33], which can be used to predict an unknown rating for the target user, as shown in Equation (
10):
where
,
,
represents the number of common items rated by user
i and
j in this study. The
can also be calculated according to Equation (
1).
After performing five sets of experiments, we correspondingly obtain five sets of results separately from 71 group files. To evaluate the prediction performance of these methods, we select
and
metrics to make comparisons for the final prediction results. In doing so, we first compute
and
for each group file, denoted as
and
, where
k means the
group file, and, here,
. Suppose the number of valid target users in each group file is
, and the target item in
file is represented as
; then, we can get the calculation formulas for
and
shown in Equation (
11):
We then separately compute the average values of
and
for all group files, which are represented as
and
. In addition, they are separately defined according to the following Equation (
12). Here,
represents the number of valid files, which is equal to the number of target items to be predicted:
As mentioned in
Section 4.3, the traditional CF method employing PCC similarity may generate extreme values in the process of rating predictions under some circumstances. Among the first set of experiments employing the traditional CF method, we find that the prediction results include three groups with extreme large values (the target item IDs are separately 16903, 30804, 393) and eight groups with second larger values (the target item IDs are separately 12945, 11054, 8560, 10367, 39524, 13710, 3424, and 615). For ease of description, we first make analyses based on the remaining 60 group results, and then make separate analyses about the 11 group results with extreme values.
Excluding another two groups (the corresponding target item IDs are 42424 and 13491 separately) for which the RTCF method obtains invalid results, the results of
and
employing Equation (
11) for five methods based on the same 58 group files are separately shown in
Figure 4 and
Figure 5. The average values of
and
in terms of Equation (
12) for the 58 groups are shown in
Figure 6. We can see that the hybrid method acquires the best performance in most cases in prediction accuracy among five methods, and the improved CF method can generate smaller errors and maintain a more steady tendency than the traditional CF method in most cases, especially in RMSE. The comparisons of
and
in
Figure 6 can verify the conclusions drawn above from the perspective of average level.
For the remaining 11 files, we respectively calculate the average values for
and
according to Equation (
11). Due to the remaining 11 group results with extreme values caused by the traditional CF method employing PCC, it is difficult to make graphs for them, so we take the form of a table to make comparisons. The comparison results of
and
for five methods based on the rest 11 groups are presented in
Table 7. Apparently, it is easy to find that the hybrid method can achieve the best performance in prediction accuracy compared with the other four methods on the whole, especially in the
metric. We can also see that the improved CF can overcome the weakness of the traditional CF method and greatly reduce the errors to a limited and acceptable range.
In terms of Equation (
12), we can acquire the average value of
and
, respectively, for the 11 groups. The values of
and
for the 11 groups are shown in
Table 8. Based on these results, we can also draw the conclusion that the proposed hybrid method can attain best accuracy performance than the other four CF methods based on PCC in extreme cases. It is worth noting that the proposed hybrid method outperforms the improved CF method mainly owing to the introduction of the LSI factor.
In addition, from
Table 7 and
Table 8, we can also conclude that the h-CF is able to gain higher item coverage.
5.6. Experiment 3: Demonstrating the Advantage of Determining Reference Users Employing the Proposed Two-Step Way
In this set of experiments, we attempt to demonstrate the advantage of determining reference users of a target user shown in
Section 4.2. To achieve this goal, we conduct our experiments based on 71 groups of random target users with the same random target item in each group. Next, we will depict the experiment process and make analyses based on the results.
In order to identify the effects of different thresholds of PCC similarity on the prediction results, we perform several sets of experiments based on the traditional CF method employing PCC and respectively set the threshold of PCC similarity to be equal
,
,
, and
. Afterwards, we make comparisons according to the average values of
and
between each set of these results and the set of experimental results in Experiment 3 without any threshold based on the traditional CF employing PCC. The experimental results are shown in
Table 9.
From the results in
Table 9, we can see that the prediction accuracy varies with the change of the PCC similarity threshold. At first, when the threshold increases, the average errors of
and
decrease greatly, respectively. When the PCC threshold is set at
, the average errors reach the smallest amounts. However, as the PCC threshold continues to increase, the average errors begin to increase. This fluctuation implies that there exists an optimal threshold value of PCC for a set of the same samples. It also reveals that the two-step way of determining reference users can help improve the prediction accuracy to some extent. In addition, we can also observe that the number of valid groups decreases with the increase of PCC threshold. This reveals that the amount of the reference users of target users decreases with the enhanced condition of PCC similarity, which will affect the prediction accuracy to some extent.
As is observed from Experiment 2, the hybrid method is based on the traditional CF method and its performance has become better than CF; we can conclude that the hybrid method can obey the same rule on the influence of the threshold in a two-step way of determining reference users. However, the appropriate threshold depends on the real application scenarios.