Improving Collaborative Filtering Recommendations with Tag and Time Integration in Virtual Online Communities

Jo, Hyeon; Hong, Jong-hyun; Choeh, Joon Yeon

doi:10.3390/app131810528

Open AccessArticle

Improving Collaborative Filtering Recommendations with Tag and Time Integration in Virtual Online Communities^†

by

Hyeon Jo

¹

,

Jong-hyun Hong

² and

Joon Yeon Choeh

^3,*

¹

Headquarters, HJ Institute of Technology and Management, Bucheon 14721, Republic of Korea

²

Korea Institute for Defense Analyses, Seoul 02455, Republic of Korea

³

Intelligent Contents Laboratory, Department of Software, Sejong University, Seoul 05006, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

This paper is a revised and expanded version of a paper entitled ‘A Recommendation System Based on Big Data: Separation of Preference and Similarity’ presented at Intelligent Systems Conference 2022, Amsterdam, The Netherlands, 1–2 September 2022.

Appl. Sci. 2023, 13(18), 10528; https://doi.org/10.3390/app131810528

Submission received: 27 August 2023 / Revised: 16 September 2023 / Accepted: 20 September 2023 / Published: 21 September 2023

(This article belongs to the Special Issue High-Performance Computing, Networking and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, virtual online communities have experienced rapid growth. These communities enable individuals to share and manage images or websites by employing tags. A collaborative tagging system (CTS) facilitates the process by which internet users collectively organize resources. CTS offers a plethora of useful information, including tags and timestamps, which can be utilized for recommendations. A tag represents an implicit evaluation of the user’s preference for a particular resource, while timestamps indicate changes in the user’s interests over time. As the amount of information increases, it is feasible to integrate more detailed data, such as tags and timestamps, to improve the quality of personalized recommendations. The current study employs collaborative filtering (CF), which incorporates both tag and time information to enhance recommendation precision. A computational recommender system is established to generate weights and calculate similarities by incorporating tag data and time. The effectiveness of our recommendation model was evaluated by linearly merging tag and time data. In addition, the proposed CF method was validated by applying it to big data sets in the real world. To assess its performance, the size of the neighborhood was adjusted in accordance with the standard CF procedure. The experimental results indicate that our proposed method significantly improves the quality of recommendations compared to the basic CF approach.

Keywords:

collaborative filtering; recommender system; preference; social bookmarking; tag; time

1. Introduction

The number of online communities has grown rapidly, producing a wealth of information [1,2]. Among these communities, the collaborative tagging system (CTS) has empowered users to voluntarily form various communities to exchange their interests and materials [3,4]. The primary role of CTS is to provide a straightforward and collaborative way to assign labels to internet resources through tags. Sites representative of CTS include social bookmarking (e.g., Margarin), image sharing (e.g., Instagram), picture sharing (e.g., Flickr), and citation sharing (e.g., CiteULike). This study focuses on social bookmarking sites that have been in use for a relatively long time. Users bookmark websites (resources) by assigning tags. From the data obtained from CTS, various collaborative schemes can be enhanced to offer advanced personalized services.

Collaborative filtering (CF) has emerged as a leading algorithm in the domain of recommendation systems [5]. CF provides a way to deliver personalized recommendations and possesses a benefit over content-based filtering, as it can sift through various types of items ranging from text and music to videos and photos [6].

In this study, we selectively chose two factors, tag and time, to identify users’ preferences and calculate the similarity between users. Tags offer a semantic insight, encapsulating users’ perceptions and categorizations of content and giving depth to our understanding of user preferences. Conversely, the temporal dynamics provided by “time” capture the ever-evolving nature of user interactions, adding breadth by revealing the patterns and nuances of content consumption. While individual studies have explored these factors independently [7,8,9], their synergistic combination is relatively uncharted, representing a promising frontier in personalizing recommendations. By integrating the semantic richness of tags with the dynamic insights of time, our study endeavors to craft a more holistic, nuanced, and responsive recommendation model. Generally, a user assigns one or more tags to bookmark a resource in social bookmarking sites. Consequently, these sites can collect much more tag information than resources. If a user uses “tag 1” once and “tag 2” ten times, it can be inferred that the user’s preference for “tag 2” is greater than “tag 1”. Moreover, time information is critical for deciphering someone’s interests. If a user used “tag 1” one year ago and “tag 2” yesterday, it can be deduced that the user’s current interest lies with “tag 2” rather than “tag 1”. Leveraging the data from CTS is essential in order to discern a user’s prevailing interests and enhance the precision of tailored recommendations.

This research introduces a CF model structured in three phases: weight formulation, calculation of user similarity, and resource suggestion. Three methods were devised for weight determination utilizing tag and time data. First, tag-specific weight is allocated to every resource grounded on tag details. Next, time-oriented weighting determines the weight for each resource, reflecting the moment at which a user marked the resource. Finally, the hybrid weight approach assigns a weight to every resource using a combined measure of both tag and time data.

This study contributes to the literature on personalized recommendations for CTS in several specific ways. Firstly, it considers tag and time information to identify users’ preferences and designs a computational method to provide customized recommendations for CTS users. Secondly, time is reflected on a cardinal scale to measure users’ interests more accurately. Thirdly, it calculates users’ similarities by using each preference vector of the user, which varies according to the type of weight. Finally, it offers detailed experimental evaluations with actual big data. The findings from this research clearly indicate that the introduced methodology offers a significant improvement in the quality of recommendations. This is a testament to the efficacy and potential real-world application of the approach taken in this study.

As for the structure and layout of this paper, it is organized in a manner that facilitates ease of understanding and coherence. The subsequent section, Section 2, provides an exhaustive review of the prevailing literature and studies concerning CTS and CF. This aids in contextualizing the current research landscape. In Section 3, the research’s design and computational methods are thoroughly detailed, emphasizing transparency and result replicability. Section 4 delves deeply into the experimental processes, highlighting the dataset, applied methods, evaluation metrics, and derived results. To conclude, Section 5 encapsulates the research’s main findings and their significance, suggesting potential paths for future studies within recommendation systems and collaborative tagging.

2. Literature Review

The evolving digital landscape has resulted in an exponential growth of online content, necessitating the development of sophisticated methods for information filtering and recommendation. Our literature review delves into two major methodologies: CTS and CF, highlighting their significance, advancements, and respective applications.

2.1. CTS

In the online environment, a substantial user community has united to arrange and classify resources collaboratively. They assign tags to resources such as images, videos, and websites. The tags are the keywords which they think of. In the process of classifying and processing objects in cyberspace, a CTS was created [10]. CTS was developed to provide network users with a simple way to collaboratively assign labels to Internet resources by tags. Research on CTS has been conducted frequently during the last two decades. Former studies have focused on the concept of CTS, characteristics of tags, social tag trends, and distribution of tags [3,11,12,13,14,15,16]. Tagging systems have turned out to be useful for classifying items and providing recommendations [17,18,19]. CTS also helps with educational effectiveness [20,21,22].

With the spread of CTS, there has been a growing fascination with information retrieval that relies on tags [10,23,24,25]. A number of researchers have studied weight measurement by using tags [2,3,26,27,28]. As the online community evolved, CTS was introduced into social network sites (SNS) and social commerce [29,30]. Several works on recommender systems have utilized tag information to enhance performance quality [27,31,32,33]. Bateman et al. [34] designed the method that utilizes tag and hierarchical clustering. They improved the performance of the annotating system. Klašnja-Milićević et al. [20] presented a novel method that integrates collaborative tagging practices into online lecture systems. By synergizing social tagging with sequential pattern mining, they developed a unique approach tailored for recommendations. Multiple experiments were carried out to validate the effectiveness and applicability of this hybrid technique, specifically in the context of e-learning platforms. Wu and Zhang [24] investigated social tagging networks and enhanced personalized recommendations. Jo [35] examined user preferences and similarities by analyzing the frequency and timing of tag usage, positioning the tagging time on a cardinal scale to ensure accuracy. The author designed a combined methodology that integrated preferences and similarities independently, without relying on one another.

In summary, CTS has developed with the growth of online communities. At the beginning of the spread of the CTS, basic studies on concepts and functions were mainly conducted. As tags were found to be semantically important, a great body of studies has used tags for information extraction. Afterward, CTS was expanded to SNS and e-commerce. Some studies on recommendation systems using tags have been actively carried out.

2.2. CF

CF recommender systems have evolved significantly over the years, with numerous studies proposing techniques to enhance their accuracy and efficiency. This literature review provides an overview of recent research findings in this domain, highlighting significant developments and trends. CF recommender systems are fundamentally driven by the belief that the opinions of others significantly influence buyer decisions [36]. Such systems typically utilize user–item matrices to make recommendations based on numerical weights [37]. In recent years, however, these systems have begun to leverage additional forms of user feedback, including click behaviors, purchase history, and weblogs, for preference indication [38].

Recent research has shown a growing interest in integrating contextual data into CF recommender systems. In this vein, Chen [39] proposed a system that anticipates user preferences by examining the actions of users with similar inclinations, especially when operating in analogous contexts, thereby improving recommendation quality. Adomavicius et al. [40] integrated contextual factors, such as time and place, into their recommendation process, while Palmisano et al. [41] demonstrated that the granularity of contextual information impacts the prediction of customer behavior.

An emerging trend in recent studies is the utilization of user-generated tags to improve CF recommender systems. Kim et al. [31] employed collaborative tagging to filter user preferences, offering enhanced recommendation quality. Similarly, Shang et al. [32] introduced a personalized recommendation framework that harnesses the triadic relationships between users, objects, and tags. Deep learning has also been integrated into recommender systems, providing robust representations of users and items. Wu et al. [42] noted that these developments steer recommendation algorithms away from conventional approaches based solely on user–item interactions and towards models incorporating auxiliary information. By offering a holistic understanding of user behavior and preferences, such models can potentially improve recommendation accuracy. Srifi et al. [43] emphasized the value of user-generated content, such as reviews, in enhancing recommendation quality.

Notably, Wang et al. [44] introduced the Neural Graph Collaborative Filtering (NGCF) recommendation system, a novel framework that seamlessly blends user–item interactions into the embedding phase. This design enhances the modeling of higher-order connectivity within user–item networks. However, concerns have been raised about the computational cost of using a multi-layer perceptron (MLP) for embedding combination [45]. Addressing data sparsity, a recurring challenge in CF, has been a focal point in recent research. Duan et al. [46] suggested a Review-Based Matrix Factorization method that utilizes review texts as side information to improve recommendation accuracy. In parallel, Lin et al. [47] addressed data sparsity by introducing neighborhood-enriched contrastive learning (NCL), which enhances performance by considering neighboring relationships among users and items. Moreover, Xia et al. [48] presented hypergraph contrastive collaborative filtering (HCCF), a cutting-edge self-supervised recommendation framework designed to combat the over-smoothing effect and scarcity of supervision signals. Fkih [49] further investigated the roles of different similarity measures, such as ITR, IPWR, and AMI, in CF, noting that their performance varied depending on whether they were applied in user-based or item-based recommender systems. Studies have also demonstrated the successful application of CF in big data analytics. Zarzour et al. [50] proposed CF algorithms using improved k-means clustering and principal component analysis, demonstrating superior performance over traditional CF algorithms. Shen et al. [51] developed a MapReduce-based recommendation algorithm, indicating enhanced efficiency and accuracy when applied to big data.

Recent studies have also explored the application of CF in health and wellness, such as in personalized health recommendations. Sahoo et al. [52] devised an intelligent health recommender system using a deep learning approach that integrated a restricted Boltzmann machine (RBM) with a convolutional neural network (CNN). The efficacy of this model was gauged through performance metrics like mean absolute error and root square mean error. Their results indicated that the RBM-CNN model was more effective, with fewer errors compared to other approaches. More sophisticated techniques are also being employed to improve CF. Najafabadi and Mahrin [53] designed a CF model utilizing a graph-based framework to capture the relationship between users and items, as well as to model user priorities. Their distinctive method enabled the formation of user profiles that encompassed both historical and present interests. Moreover, their model showcased enhanced results over standard CF techniques in various metrics.

The integration of contextual information into CF has been a rising trend to enhance its efficacy. A notable advancement in this domain was made by Zhang et al. [25]. They refined a hybrid collaborative filtering mechanism by harnessing both tags and temporal factors. In their method, tag information was exhaustively utilized to represent users and items. Moreover, while determining similarities between users or items, both rating and tag data were considered. Notably, they infused a time-rating aspect to capture user interests more accurately. Their findings indicate that their approach can effectively address data sparsity issues and consistently deliver accurate predictions.

Big data analytics has further revolutionized the field of CF. Zarzour et al. [50], and Shen et al. [51] demonstrated this by applying their algorithms to big data, thereby validating the methodologies. The latter utilized a MapReduce framework in combination with an item-based recommendation algorithm, resulting in improved efficiency and accuracy. Similarly, Nilashi et al. [54] proposed a multi-criteria CF for hotel recommendations, incorporating dimensionality reduction with predictive machine learning techniques to boost scalability. The model was validated using a big dataset and demonstrated robust performance. In another groundbreaking study, Wang et al. [55] presented a CF model that incorporated a network representation learning framework for citation recommendation. This approach was tested on scholarly big data and was shown to outperform other techniques according to several evaluation metrics.

In conclusion, the landscape of CF-based recommendation systems is continually evolving, spurred by advancements in deep learning, representation learning, the integration of auxiliary information, and techniques for addressing challenges such as data sparsity and computational costs. The implementation of these sophisticated methods, when coupled with big data analytics, has the potential to further revolutionize this domain. Despite the impressive progress already made, there remain numerous opportunities for innovation and refinement, promising a future of increasingly personalized, efficient, and accurate recommendation systems.

3. Computational Approach

The computational approach is structured as a clear progression of steps: initially, the generation of weights, followed by calculating similarities and, finally, recommending resources. At the outset, a user–item preference matrix is created using both tag and time data. This is followed by the derivation of three distinctive user similarities based on the weight type. The culmination of this process is the recommendation of resources, evaluated through specific metrics.

Digging deeper into the resource recommendation phase, there are two meticulous sub-stages. Initially, neighbors are identified. The CF system recognizes users who exhibit preferences akin to the primary user by employing an array of similarity calculation techniques. The information retrieval domain has yielded various similarity measurement paradigms, with three standing out prominently: Cosine, Jaccard, and Pearson [56,57,58]. Based on the chosen similarity measure’s outcomes, certain users showcasing paramount similarity are designated as neighbors. It is worth noting that the number of these neighbors profoundly influences the CF’s resultant quality [37]. It is imperative for recommendation systems to discern an optimal neighborhood size for effective predictive outcomes. A scanty neighborhood size can jeopardize accuracy. Conversely, an extensive neighborhood, while potentially elevating accuracy, can complicate computations [59]. In the subsequent phase, an array of methodologies is available to amalgamate neighbor ratings to forecast a score for items that the primary user has not rated. Each neighbor’s preference rating is typically modulated by the similarity value determined earlier. Essentially, the closer the similarity between a neighbor and the target user, the more profound their impact on the prediction. Post prediction, items that garner the top N predictive scores are put forth as recommendations. The efficacy of the recommendation systems is then gauged by juxtaposing the recommended items against the target users’ testing sets.

3.1. Weight Calculation

The weight calculation phase of this study is dedicated to understanding the optimal ways to assimilate both tag and time details from the CTS into the resultant weights. This exploration has culminated in the proposition of three distinctive schemes designed to shape the weights within a user–resource matrix. To begin with, there is the tag weight, denoted as W_tag. The assigned weight for every element in the user–resource matrix is derived from the tags that a user utilizes for bookmarking, combined with the frequency of those tag uses. This ensures that the weight gives prominence to both the relevance of the tag and its recurrence in the user’s interactions. The second scheme, known as time weight or W_time, is a tad more intricate. It employs an adaptive exponential forgetting function. The beauty of this function lies in its ability to inherently mirror the shifts in a user’s interests over time. This ensures that the weightings remain dynamic and relevant, adjusting to the ebb and flow of user interests. Lastly, the study integrates both the tag-based and the time-based weights to formulate a unified weight, termed as the hybrid weight or W_hybrid. This hybrid scheme is developed with an aim to balance and optimize the contributions from both the tag-specific information and the temporal dynamics, resulting in a more holistic measure for the modified ratings.

3.1.1. Tag-Based Weight

This study is predicated on the belief that the regularity with which a user employs a particular tag serves as a reliable indicator of their level of interest or preference for that specific tag. Essentially, the more often a user utilizes a tag, the greater their inclination or fascination towards the subject or concept signified by that tag. Hence, this study defines the tag-based weight of a user (u) to a resource (r) as follows [27]:

W_{t a g} (u, r) = \frac{\sum_{t_{r} \in t a g (u, r)} f r (u, t_{r})}{\sum_{A l l} f r (u, t_{i})}

(1)

The equation presented employs the notation fr(u,tr) to depict the frequency with which a user u attaches a specific tag to a resource r. Here, tr symbolizes the collection of tags linked with the resource r at the instance when the user u earmarks or bookmarks it. Essentially, the formula captures the tagging behavior of the user towards a particular resource, helping to infer user interests or preferences based on their tagging patterns. As per the definition of tag weight, the resulting value is a real number between 0 and 1. A higher tag weight suggests a greater level of interest from the user towards a particular resource.

3.1.2. Time-Based Weight

Every user bookmarks various resources at different times. As Lathia et al. [60] point out, human interests evolve over time. Several research endeavors have delved into the evolution of users’ interests, typically harnessing either forgetting functions or designated time windows to discern and monitor fluctuations in user behaviors as time progresses. A notable limitation of the majority of time-window techniques is their predisposition to sideline or overlook information from the distant past [61]. On the other hand, exponential forgetting functions have garnered traction in time-sensitive applications, primarily due to their capacity to subtly downplay the significance of bygone behavioral patterns while still retaining them [62]. CF needs to account for both recent and old bookmarks to comprehend users’ preferences and similarities accurately. In this regard, Zheng and Li [27] introduced an innovative adaptive exponential forgetting function tailored to cater to this nuanced need. According to this forgetting function, a person’s memory gradually fades over time. The function treats bookmarking time as an ordinal scale. However, the time-based weight (W_time) should account for cardinal time information, as the duration is crucial for understanding a user’s interest. Therefore, this paper defines a modified exponential forgetting function as follows:

W_{t i m e} (u, r) = \exp (- \ln 2 \times \frac{t}{h l_{t}})

(2)

t = {\begin{array}{l} 0, t_{i} = t_{l} \\ \frac{t_{l} - t_{i} + 1}{t_{l} - t_{f} + 1}, t_{i} \neq t_{l} \end{array}

The parameter W_time(u,r) stands for the time weight, which corresponds to the degree of interest user u has in resource r. Within this context, t signifies a relative time point, while t_i denotes the specific day on which a tag was added. The variables t_l and t_f indicate the most recent and earliest tagging days, respectively. For the latest tagging day, t_l, t assumes a value of 0, and for the initial tagging day, t_f, t is set to 1. The value of t remains consistent for identical tagging days. This research introduces the term hl_t, which represents the half-life designated for each user. In this scenario, hl_t is set at 0.5. This implies that when t equals hl_t, the time weight W_time(u,r) reduces to half its original value, or 0.5. Essentially, this function encapsulates half the total duration of a user’s tagging activities. Thus, for a specific user, a smaller t value, corresponding to a more recent bookmark, warrants a greater time weight. Conversely, bookmarks that are more dated will have a diminished time weight, signifying reduced relevance or interest.

3.1.3. Hybrid Weight

This paper sets out to merge both tagging and temporal data to bolster the efficacy of CF. Several prior studies have carried out exploratory analyses using a linear combination when multiple factors are incorporated into weights [27,32]. Drawing from these works, this study constructs a hybrid weight as shown below:

W_{h y b r i d} (u, r) = λ * W_{t a g} (u, r) + (1 - λ) * W_{t i m e} (u, r)

(3)

In this equation, the parameter λ serves as a pivotal factor, modulating the relative importance or emphasis between the tag-centric and time-centric weights. The proposed method integrates tag and time information to form hybrid weights. By merging these two types of weights, this study can simultaneously account for the preferences and evolving interests of CTS users. As a result, the hybrid weight can identify users’ primary current interests. Using the weight matrix generated, user similarity is calculated to provide personalized recommendations.

3.1.4. Specific Example

Table 1 displays the tagging information for different resources (R1–R5) by three users (U1–U3). Each cell under a resource column for a user row represents a tagging activity by that user for that resource. Every tagging action comprises one or several tags, with the date of the action specified in parentheses.

Tag-Based Weight

In the case of U1 for resource R3, we can see that the user used Tag1, Tag4, and Tag5. If U1 has used Tag1 three times, Tag2 one time, Tag3 one time, Tag4 two times, and Tag5 one time across all their activities, then the tag-based weight for resource R3 can be calculated as a proportional representation of these counts. For example, the weight for Tag1 can be 3/(3 + 1+1 + 2 + 1) = 0.375; for Tag4 it can be 1/(3 + 1+1 + 2 + 1) = 0.25; and for Tag5 it can be 2/(3 + 1+1 + 2 + 1) = 0.125. Thus, W_tag(U₁, R₃) = 0.375 + 0.25 + 0.125 = 0.75. Table 2 shows an example of tag-based weight.

2.: Time-Based Weight

Time-based weight can be calculated according to the recency of the tag’ usage. If the maximum period of use of U1 is from 6 May to 15 May, then the time span is 10 days. The weight for U1’s tagging activity on resource R3 can be calculated as the proportion of this recency period to the total period, which would be W_time(U₁,R₃) = exp(−ln2 × (1/0.5)) = 0.25. Table 3 illustrates an example of time-based weight.

3.: Hybrid Weight

Hybrid weight is a combination of tag-based and time-based weights. For U1’s tagging activity on resource R3, for example, the hybrid weight for Tag1 can be calculated as a linear combination of the tag-based weight (0.75) and time-based weight (0.25). If we consider both tag-based and time-based weights to be of equal importance to, the hybrid weight for Tag1 could be (0.75 + 0.25)/2 = 0.5. Table 4 describes an example of hybrid weight.

3.2. User Similarity Calculation

The objective of determining user similarity is to decipher the intricate relationships and affinities that exist between individual users. This study calculated three types of similarity, which are formed by tag-based weight, time-based weight, and hybrid weight. Leveraging the widely-adopted cosine similarity methodology, when we implement each of these weight vectors, the similarity measure between any two users, specifically user ‘u’ and user ‘v’, can be articulated through the subsequent equation [63]:

s i m_{i} (u, v) = \cos (\vec{u}, \vec{v}) = \frac{\sum_{r \in R} W_{i} (u, r) \times W_{i} (v, r)}{\sqrt{\sum_{r \in R} w_{i}^{2} (u, r)} \times \sqrt{\sum_{r \in R} w_{i}^{2} (v, r)}}

(4)

Here, i = (Tag, Time, Hybrid), and R is the set of resources. Consequently, there are three types of similarity: tag-based similarity (S_tag), time-based similarity (S_time), and hybrid similarity (S_hybrid). W_i(u,r) and W_i(v,r) symbolize the respective weight vectors corresponding to users ‘u’ and ‘v’. Employing the aforementioned Equation (4), this research endeavors to pinpoint the ‘k’-closest neighbors pertaining to a specific user. It is paramount to note that a user’s score is indicative of their similarity level: a loftier score implies a greater degree of resemblance to the intended or reference user.

3.3. Resource Preference Generation

To generate a top ‘N’ resource list based on similarity, a prediction of a user’s preference for a resource should be made. For this purpose, this paper employs the “All But 1” protocol, as suggested by Breese et al. [38]. In the methodology adopted by this research, one resource related to the target user is randomly selected and excluded from the comprehensive collection of resources. The formula for preference value of a user u for resource r is as follows [27].

S c o r e (u, r) = \frac{\sum_{v \in K N N (u)} (W_{i} (v, r)) \cdot s i m_{i} (u, v)}{| \sum_{v \in K N N (u)} s i m K |}

(5)

In this context, KNN(u) stands for the collection of ‘k’ closest neighbors to the user ‘u’. Meanwhile, W_i(v,r) illustrates each weight vector of the neighbor user ‘v’ related to the resource ‘r’. The term sim_i(u,v) designates the measure of similarity between users ‘u’ and ‘v’, which is computed as described in Equation (4). The preference matrix for resources by users can be visualized, where each row represents an individual user and the columns are indicative of various resources. For clarity and easy reference, Table 5 elucidates every symbol and offers a corresponding interpretation.

3.4. Algorithm Code

Algorithm 1 is designed to evaluate the synergy of ‘tag’ and ‘time’ data in a CF recommendation system using the dataset. The raw user interaction logs, tags, and timestamps are initially loaded. To experiment with the effect of the mixture of tag and time data, iterations are performed with varying weights to combine the two datasets. For each user pair, cosine similarity is calculated based on this mixture, creating a similarity matrix. Then, a neighborhood scaling process is applied, determining the similarity level with varying numbers of neighbors. The top preferences for each user are sorted and stored for different neighborhood sizes. In the evaluation phase, the generated preference matrix is compared against a test dataset to measure the recall and precision of the top 10 recommendations. This approach aids in identifying the optimal mix of tag and time data and the best neighborhood size for accurate recommendations.

Algorithm 1. Algorithm code.

rawlog = load(‘MarTraLog.txt’);
rawtag = load(‘MarTraTag.txt’);
rawtime = load(‘MarTraTime.txt’);
Test01 = load(‘MarTraTest.txt’);

% Iteration Experiment Start

for hi = 1:11

rawmix = rawtag*(1–0.1*(hi-1)) + rawtime*0.1*(hi-1);

for i = 1:id
for j = 1:id

simmix(i,j)=rawmix(i,:)*rawmix(j,:)’/(norm(rawmix(i,:))*norm(rawmix(j,:)));

end
end

% Neighbor scale

parameter = [3 6 9 12 15 18 21 24 27 30];

for j = 1:length(parameter)

k = parameter(j);

for i = 1:id

[m n] = sort(simmix(i,:), ‘descend’);
index = n(2:k + 1);
Preefer_mat(i,:) = simmix(i,index)*rawmix(index,:)/k;
Result{j} = prefer_mat;

end
end

% Evaluation Part

for j = 1:length(parameter)

prefer01 = Result{j};

for i = 1:size(prefer01,1)

[pm pn] = sort(prefer01(i,:),’descend’);
index = pn(1:10);
Recall01(i,1) = length(find(Test01(i,index) = =1))/length(find(Test01(i,:)~ = 0));
Precision01(i,1) = length(find(Test01(i,index) = =1))/10;

end

Recall_001(:,j + 10*(hi-1)) = Recall01;
Precision_001(:,j + 10*(hi-1)) = Precision01;

end
end

4. Experiment

In this section, we introduce the dataset and experimental procedure of a real virtual online community where our proposed model was applied. Additionally, we present performance evaluation metrics and display the results of the analysis in which we applied our model to the actual experimental data.

4.1. Dataset and Experimental Procedure

This research utilized two authentic datasets, one each from Margarin [64] and Delicious [65]. Margarin is a popular social bookmarking platform in Korea, whereas Delicious holds a significant presence in the American digital landscape. Social bookmarking platforms like these allow users to earmark or bookmark specific URLs they find interesting, valuable, or worth sharing. To further categorize and describe these bookmarks, users have the flexibility to attach tags to them. For the purpose of this study, data from Margarin were meticulously gathered through web crawling techniques, whereas the Delicious dataset was generously provided as referenced in [66].

To provide a more detailed overview, the Margarin dataset encompassed a total of 15,765 distinct resources, accompanied by 11,065 unique tags. Throughout the observation period, users bookmarked links 18,850 times. In contrast, the Delicious dataset was more extensive, featuring 43,028 individual resources and 9383 separate tags. Users on Delicious showed more prolific bookmarking behavior, with a grand total of 104,687 bookmarking actions. The subsequent sections will delve more deeply into a descriptive breakdown of the data sourced from both platforms. Table 6 contains the information on the datasets.

During the data assessment phase, weights were derived from the frequency of tags which users employed and the timestamps of their tagging actions. For the Margarin platform, the research constructed a user-resource weight matrix with approximately 4.7 million cells. Similarly, the Delicious dataset resulted in a matrix containing approximately 12.9 million cells. Every user’s similarity with every other user was computed, implying that calculations took place for all possible user combinations. The selection of neighborhood size was pivotal for the efficacy of CF. To this end, the study evaluated varying neighborhood sizes, ranging from 3 to 30 in increments of 3, leading to 10 distinct calculations. Additionally, by adjusting the λ value in intervals of 0.1, ranging from 0 to 1, the experiment was iterated 11 times. Consequently, the Margarin dataset necessitated 520,245,000 operations, whereas the Delicious dataset required a whopping 1,419,924,000 operations.

To gauge the efficiency of the recommended algorithm, it was juxtaposed against the foundational CF methodology. The latter determines recommendations based on user bookmarking activities, employing a binary-coded user–resource matrix and leveraging the cosine distance metric, as cited in [27,33].

For the experimental phase, each data subset was bifurcated: the training set encompassed the initial 80% of entries, while the latter 20% formed the testing set. Margarin’s dataset showcased 15,765 resources interlaced with 11,065 tags. On the other hand, Delicious featured a comprehensive set of 43,028 resources and 9383 tags. A subsequent section elaborates on the performance assessment. The top 10 resources were extracted from the training dataset.

4.2. Evaluation Metrics

In this research, the effectiveness of the introduced model was gauged using three primary metrics: precision, recall, and F-measure [6,67,68].

Precision measures the proportion of accurately recommended items to the total items in the recommendation list. Its value can be influenced by the length of this list. A higher value of precision indicates more accurate recommendations. The formula for calculating precision, denoted by Equation (6), is:

Precision = \frac{size of hit set}{size of recommended set} = \frac{| test \cap top N |}{N}

(6)

Recall, on the other hand, captures the fraction of items that have been rightly recommended out of all the relevant items. In the context of this study, relevant items refer to the resources that users saved in the test dataset. The value of recall can also vary with the size of the recommendation list, and a higher recall value signifies better accuracy. It is articulated using Equation (7):

Recall = \frac{size of hit set}{size of relevant set} = \frac{| test \cap top N |}{| test |}

(7)

Lastly, the F-measure provides a consolidated metric derived from both precision and recall. It is the harmonic mean of the two, ensuring a balance between them. An optimal model will have both high precision and recall, leading to a high F-measure value. This is represented by Equation (8).

F = \frac{2 \times Recall \times Precision}{(Recall + Precision)}

(8)

4.3. Results

This section describes detailed experimental results of our study, performed on two real big datasets, namely Margarin and Delicious. The first part of the experiment compared tag-based weight, time-based weight, and hybrid weight methods with basic CF according to the size of the neighborhood. In the second part, the statistical significance of the recommendation improvement was verified.

4.3.1. Results of Margarin

The following experiments investigate the effect of neighborhood size and the parameter λ, which determines the proportions of tag and time used to construct the hybrid weight. In the research, the neighborhood size played a pivotal role in influencing the model’s performance. The neighborhood size, denoted by ‘k’, refers to the number of nearest users or items considered when making a recommendation. The team experimented with various neighborhood sizes, starting from a small group of three and increasing in increments of three, up to a maximum size of thirty. The results, as visualized through plots, underscored that the choice of ‘k’ had a considerable impact on the efficacy of the recommendations. Different sizes of ‘k’ might be more suitable for various datasets or user scenarios, and pinpointing the optimal value is crucial for enhancing recommendation accuracy and relevance. As k increases, the performance improves, although the rate of metric increase diminishes beyond a certain threshold. The optimal λ value was found by examining the highest performance of the hybrid weight across all three metrics. The study adjusted the λ value in steps of 0.1, ranging from 0 to 1. For the Margarin dataset, across all three evaluation criteria, a λ value of 0.6 emerged as the most optimal. The hybrid weight’s efficacy surpassed that of weights relying on just a single element, specifically when λ was set to 0 for the time-based weight and λ was set to 1 for the tag-based weight.

Figure 1 displays the precision values compared to the basic CF across various neighborhood sizes. Hybrid weight and time-based weight showed higher values than basic CF overall. Contrary to expectations, tag-based weight resulted in lower values than basic CF. This could be because existing information was unnecessarily processed when calculating the tag weight in data with a high sparseness level. On average, the hybrid weight was 0.77% higher than CF, and the time weight was 0.49% higher than basic CF.

Figure 2 illustrates the recall scores in relation to various neighborhood sizes when compared to the standard CF. Hybrid weight and time weight were greater than basic CF overall. The performance of tag weight and basic CF differed according to the size of the neighborhood and showed similar values overall. Tag weight showed lower values than basic CF. On average, the hybrid weight was larger than basic CF by 13.27%, and the time weight was larger than basic CF by 11.81%.

Figure 3 showcases a comparison of the F-measure results from the experiments. Hybrid weight and time weight showed better performances than basic CF. Tag-based weight and basic CF showed no difference until k was 15. However, basic CF showed better performance than tag-based weight above a k value of 15. On average, hybrid weight was greater than basic CF by 5.63%, and time-based weight was greater than basic CF by 5.14%.

4.3.2. Results of Delicious

This study found the optimized λ value by exploring the performance of hybrid weight for the three metrics. In the Delicious dataset, the best λ scores were found to be 0.0, 0.8, and 0.5 for precision, recall and F-measure, respectively. The performances of hybrid weight were better than those of the weight containing only one factor in the Delicious dataset as well.

Figure 4 presents a comparison of the precision values with the basic CF across different neighborhood sizes. Hybrid weight and time-based weight showed higher values than basic CF overall. Tag was the same as or higher than basic, except when the neighboring group was 3, 6, or 15. On average, hybrid weight was higher than CF by 0.16%, and time weight was higher than basic CF by 0.27%.

Figure 5 illustrates the recall results in relation to the basic CF across a range of neighborhood sizes. Hybrid weight, tag-based weight, and time-based weight were greater than basic CF overall. On average, hybrid weight was larger than basic CF by 0.27%, and time weight was larger than basic CF by 11.81%.

Figure 6 presents a comparative analysis of the experimental outcomes based on the F-measure. Hybrid weight and time weight showed better performance than basic CF. Tag-based weight and basic CF showed no difference until k was 15. However, basic CF showed a better performance than tag-based weight above a k of 15. On average, hybrid weight was greater than basic CF by 5.63%, and time-based weight was greater than basic CF by 5.14%.

Diving deeper into the performance variations, several aspects stood out. Firstly, both datasets showed that the hybrid weight, which harmoniously combines tag and time information, consistently outperformed the basic CF. This underlines the added value of integrating supplementary data sources into the recommendation process. The diminishing returns of performance improvement as ‘k’ increased hint at the existence of an optimal neighborhood size beyond which additional neighbors do not contribute significantly to the recommendation quality. This could be attributed to the noise introduced by less relevant neighbors. The surprising underperformance of the tag-based weight, especially in the Margarin dataset, prompts further examination. The high sparsity levels of the dataset may mean that the tag information, when used in isolation, does not offer sufficient robustness for generating accurate recommendations. Tags might represent niche interests or may not have been used consistently across users, leading to weaker recommendations. The Delicious dataset further ratified the robustness of the hybrid approach, with the λ values indicating that a mix of tag and time weights was essential for optimal results in most cases. However, the distinct λ values for each metric underscore that different components might dominate under varying evaluation criteria.

5. Conclusions and Further Research

CTS, a popular social network service, is emerging as a crucial tool for categorizing vast web resources for easy searching and sharing. This research focuses on creating and testing a new CF approach to improve recommendation quality. By proposing a CF model that simultaneously considers both tag and time information for weight generation and similarity calculations, the study demonstrates that the hybrid method substantially enhances personalized recommendations compared to models that account for either time or tag alone. Across two distinct datasets, Margarin and Delicious, our evaluation found that the hybrid and time-based weight strategies consistently outperformed the basic CF in terms of precision, recall, and F-measure. Specifically, in the Margarin dataset, the hybrid weight strategy led to an improvement average of 0.77% in precision, 13.27% in recall, and 5.63% in F-measure, while the time-based weight showed respective increases of 0.49%, 11.81%, and 5.14%. For the Delicious dataset, the hybrid weight strategy led to improvements averaging 0.16% in precision, 0.27% in recall, and 5.63% in F-measure, and the time-based weight demonstrated increases of 0.27%, 11.81%, and 5.14%. The tag-based weight’s performance, however, was inconsistent, excelling in certain scenarios and faltering in others. These results underscore the significance of appropriate weight selection tailored to specific datasets. Moreover, it was revealed that the time weight function works well to develop CF performance. The reflection ratio of tag and time differed depending on the dataset. However, the optimized λ was greater than 0.5 in both datasets. When the ratio of tags was higher than that of time to configure the hybrid weight, the recommendation performance was improved. The recommended performance was enhanced as the size of the neighborhood increased, but did not improve at a certain level. From a cost–benefit point of view, it seems efficient to perform big data analysis with a neighborhood of 30 or less and to make recommendations to users.

The evolution of online content, as corroborated by the literature, showcases the need for advanced information filtering and recommendation techniques, an area which our research critically addresses. CTS, as a collaborative tool, has become invaluable in categorizing online resources, with the literature underscoring its adaptability from basic functionalities to more advanced uses in contemporary online communities, such as SNS and e-commerce [29,30]. Further, tags’ semantic importance has been pivotal in information extraction, leading to their incorporation in several recommendation systems [19,33]. In line with the recent trends and findings from researchers such as Bateman et al. [34] and Klašnja-Milićević et al. [20], our study also converges on the theme of integrating tag information while also echoing Jo [35]’s methodological approach, which considers user preferences and similarities. Our research thereby forms a synthesis of the most prominent concepts and methodologies found in the literature, channeling them into a cohesive, updated model.

In our study, the synergy of tag and time empowered the model to intricately decipher evolving user predilections, highlighting not merely their choices, but the temporality of those choices, a feature pivotal in domains like e-commerce, media platforms, and social networks. Contrary to conventional models, which are more static, our system thrives on adaptability, harnessing real-time data, which ensures its resilience and aptness in the face of swift digital transformations. This innovative fusion yields a richer insight into user behaviors over traditional bookmark-centric models. Empirical analysis of both the Margarin and Delicious datasets reinforced this, with our hybrid model consistently eclipsing the performance of rudimentary CF. Moreover, the model adeptly sidesteps challenges inherent to solely tag-based or time-based systems, balancing the dynamism of tags with the contextuality of time, thereby underscoring its robustness and efficacy.

The study’s limitations and future directions encompass several avenues. First, the research solely relied on the Margarin and Delicious datasets. There are many other prominent social tagging platforms, including Facebook, Twitter, Instagram, and CiteULike. Hence, it is paramount for future endeavors to consider a wider array of datasets in order to augment the generalizability of the findings. Second, the study did not juxtapose the proposed algorithm with other existing methodologies. The primary objective was understanding the depth of insights that tag and time data can offer over and above basic bookmarking details. Comparing the proposed methodology with other recommendation algorithms will be a pertinent step in future research. Another dimension for further investigation is the differential impact of tag and time information on preference versus similarity. Exploring varied combinations of recommendation systems by independently measuring preference and similarity can open new doors. Moreover, future research should also delve into analyzing the effects and behaviors segmented by age group and gender. Such a demographic-based analysis may offer more nuanced insights into user interactions and preferences.

Author Contributions

Conceptualization, H.J., J.-h.H. and J.Y.C.; methodology, H.J., J.-h.H. and J.Y.C.; formal analysis, H.J. and J.-h.H.; data curation, J.Y.C.; writing—original draft, H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2022S1A5A2A03052219).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are available from the corresponding authors upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Brown, A.; Reade, J.J. The wisdom of amateur crowds: Evidence from an online community of sports tipsters. Eur. J. Oper. Res. 2019, 272, 1073–1081. [Google Scholar] [CrossRef]
Choi, J.Y.; Rosen, J.; Maini, S.; Pierce, M.E.; Fox, G.C. Collective collaborative tagging system. In Proceedings of the Grid Computing Environment Workshop, Austin, TX, USA, 16 November 2008; pp. 1–7. [Google Scholar]
Golder, S.A.; Huberman, B.A. Usage patterns of collaborative tagging system. J. Inf. Sci. 2006, 32, 198–208. [Google Scholar] [CrossRef]
Xu, Y.; Yin, D.; Zhou, D. Investigating users’ tagging behavior in online academic community based on growth model: Difference between active and inactive users. Inf. Syst. Front. 2019, 21, 761–772. [Google Scholar] [CrossRef]
Wang, Y.; Deng, J.; Gao, J.; Zhang, P. A hybrid user similarity model for collaborative filtering. Inf. Sci. 2017, 418–419, 102–118. [Google Scholar] [CrossRef]
Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 2004, 22, 5–53. [Google Scholar] [CrossRef]
Lops, P.; Jannach, D.; Musto, C.; Bogers, T.; Koolen, M. Trends in content-based recommendation. User Model. User-Adapt. Interact. 2019, 29, 239–249. [Google Scholar] [CrossRef]
Shokeen, J.; Rana, C. A study on features of social recommender systems. Artif. Intell. Rev. 2020, 53, 965–988. [Google Scholar] [CrossRef]
Cui, Z.; Xu, X.; Xue, F.; Cai, X.; Cao, Y.; Zhang, W.; Chen, J. Personalized Recommendation System Based on Collaborative Filtering for IoT Scenarios. IEEE Trans. Serv. Comput. 2020, 13, 685–695. [Google Scholar] [CrossRef]
Qassimi, S.; Abdelwahed, E.H. The role of collaborative tagging and ontologies in emerging semantic of web resources. Computing 2019, 101, 1489–1511. [Google Scholar] [CrossRef]
Mathes, A. Folksonomies: Cooperative Classification and Communication through Shared Metadata; University of Illinois Urbana: Urbana, IL, USA, 2004. [Google Scholar]
Sen, A. Metadata management: Past, present and future. Decis. Support Syst. 2004, 37, 151–173. [Google Scholar] [CrossRef]
Halpin, H.; Robu, V.; Shepherd, H. The complex dynamics of collaborative tagging. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007; pp. 211–220. [Google Scholar]
Dichev, C.; Xu, J.; Dicheva, D.; Zhang1, J. A Study on Community Formation in Collaborative Tagging Systems. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Washington, DC, USA, 9–12 December 2008. [Google Scholar]
Li, X.; Guo, L.; Zhao, Y. Tag-based Social Interest discovery. In Proceedings of the Social Networks & Web 2.0 Conference, Geneva, Switzerland; Beijing, China, 21–25 April 2008; pp. 675–684. [Google Scholar]
Xu, Z.; Fu, Y.; Mao, J.; Su, D. Towards the Semantic Web: Collaborative Tag Suggestions. In Proceedings of the Collaborative Web Tagging Workshop, Austin, TX, USA, 4 November 2006. [Google Scholar]
Chuang, S.L.; Chein, L.F. Enriching web taxonomies through subject categorization of query terms from search engine logs. Decis. Support Syst. 2003, 35, 113–127. [Google Scholar] [CrossRef]
Jacob, E. Classification and categorization: A difference that makes a difference. Libr. Trends 2004, 52, 515–540. [Google Scholar]
Klašnja-Milićević, A.; Ivanović, M.; Vesin, B.; Budimac, Z. Enhancing e-learning systems with personalized recommendation based on collaborative tagging techniques. Appl. Intell. 2018, 48, 1519–1535. [Google Scholar] [CrossRef]
Klašnja-Milićević, A.; Vesin, B.; Ivanović, M. Social tagging strategy for enhancing e-learning experience. Comput. Educ. 2018, 118, 166–181. [Google Scholar] [CrossRef]
Balakrishnan, B. Motivating engineering students learning via monitoring in personalized learning environment with tagging system. Comput. Appl. Eng. Educ. 2018, 26, 700–710. [Google Scholar] [CrossRef]
Beldjoudi, S.; Seridi, H.; Karabadji, N.E.I. Recommendation in collaborative e-learning by using linked open data and ant colony optimization. In Proceedings of the International Conference on Intelligent Tutoring Systems, Montreal, QC, Canada, 11–15 June 2018; pp. 23–32. [Google Scholar]
Morrison, P.J. Tagging and searching: Search retrieval effectiveness of folksonomies on the World Wide Web. Inf. Process. Manag. 2008, 44, 1562–1579. [Google Scholar] [CrossRef]
Wu, P.; Zhang, Z.-K. Enhancing personalized recommendations on weighted social tagging networks. Phys. Procedia 2010, 3, 1877–1885. [Google Scholar] [CrossRef]
Zhang, C.; Yang, M.; Lv, J.; Yang, W. An improved hybrid collaborative filtering algorithm based on tags and time factor. Big Data Min. Anal. 2018, 1, 128–136. [Google Scholar]
Brooks, C.H.; Montanez, N. Improved annotation of blogosphere via auto-tagging and hierarchical clustering. In Proceedings of the 15th international Conference on World Wide Web, Scotland, UK, 23–26 May 2006. [Google Scholar]
Zheng, N.; Li, Q. A recommender system based on tag and time information for social tagging systems. Expert Syst. Appl. 2011, 38, 4575–4587. [Google Scholar] [CrossRef]
Shang, M.-S.; Zhang, Z.-K.; Zhou, T.; Zhang, Y.-C. Collaborative filtering with diffusion-based similarity on tripartite graphs. Phys. A Stat. Mech. Its Appl. 2010, 389, 1259–1264. [Google Scholar] [CrossRef]
Banda, L.; Singh, K.; Abdel-Basset, M.; Thong, P.H.; Huynh, H.X.; Taniar, D. Recommender Systems Using Collaborative Tagging. Int. J. Data Warehous. Min. 2020, 16, 183–200. [Google Scholar] [CrossRef]
Wang, C.-D.; Deng, Z.-H.; Lai, J.-H.; Philip, S.Y. Serendipitous recommendation in e-commerce using innovator-based collaborative filtering. IEEE Trans. Cybern. 2018, 49, 2678–2692. [Google Scholar] [CrossRef]
Kim, H.N.; Ji, A.T.; Ha, I.; Jo, G.S. Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electron. Commer. Res. Appl. 2010, 9, 73–83. [Google Scholar] [CrossRef]
Shang, M.-S.; Lu, L.; Zhang, Y.-C.; Zhou, T. Empirical analysis of web-based user-object bipartite networks. Europhys. Lett. 2010, 90, 48006. [Google Scholar] [CrossRef]
Najafabadi, M.K.; Mohamed, A.; Onn, C.W. An impact of time and item influencer in collaborative filtering recommendations using graph-based model. Inf. Process. Manag. 2019, 56, 526–540. [Google Scholar] [CrossRef]
Bateman, S.; Brooks, C.; McCalla, G.; Brusilovksy, P. Applying Collaborative Tagging to E-learing. In Proceedings of the 16th International World Wide Web Conference, Banff, AB, Canada, 8 May 2007. [Google Scholar]
Jo, H. A Recommendation System Based on Big Data: Separation of Preference and Similarity; Springer: Cham, Switzerland, 2023; pp. 390–398. [Google Scholar]
Shardanand, U.; Maes, P. Social information filtering: Algorithms for automating “word of mouth”. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY, USA, 7–11 May 1995; pp. 210–217. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Analysis of recommendation algorithms for E-commerce. In Proceedings of the Second ACM Conference on Electronic Commerce, Minneapolis, MN, USA, 17–20 October 2000; pp. 158–167. [Google Scholar]
Breese, J.S.; Heckerman, D.; Kadie, C. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, Vancouver, BC, Canada, 19–22 July 1998. [Google Scholar]
Chen, A. ContextAware Collaborative Filtering System: Predicting the User’s Preferences in Ubiquitous Computing. In Proceedings of the ACM 2005, Los Angeles, CA, USA, 29–31 July 2005. [Google Scholar]
Adomavicius, G.; Sankaranarayanan, R.; Sen, S.; Tuzhilin, A. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. Inf. Syst. 2005, 23, 103–145. [Google Scholar] [CrossRef]
Palmisano, C.; Tuzhilin, A.; Gorgoglione, M. Using context to improve predictive modeling of customers in personalization applications. IEEE Trans. Knowl. Data Eng. 2008, 20, 1535–1549. [Google Scholar] [CrossRef]
Wu, L.; He, X.; Wang, X.; Zhang, K.; Wang, M. A survey on accuracy-oriented neural recommendation: From collaborative filtering to information-rich recommendation. arXiv 2022, arXiv:2104.13030. [Google Scholar] [CrossRef]
Srifi, M.; Oussous, A.; Ait Lahcen, A.; Mouline, S. Recommender systems based on collaborative filtering using review texts—A survey. Information 2020, 11, 317. [Google Scholar] [CrossRef]
Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.-S. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
Rendle, S.; Krichene, W.; Zhang, L.; Anderson, J. Neural collaborative filtering vs. matrix factorization revisited. In Proceedings of the 14th ACM Conference on Recommender Systems, Online, 22–26 September 2020; pp. 240–248. [Google Scholar]
Duan, R.; Jiang, C.; Jain, H.K. Combining review-based collaborative filtering and matrix factorization: A solution to rating’s sparsity problem. Decis. Support Syst. 2022, 156, 113748. [Google Scholar] [CrossRef]
Lin, Z.; Tian, C.; Hou, Y.; Zhao, W.X. Improving graph collaborative filtering with neighborhood-enriched contrastive learning. In Proceedings of the ACM Web Conference 2022, New York, NY, USA, 27 April 2022; pp. 2320–2329. [Google Scholar]
Xia, L.; Huang, C.; Xu, Y.; Zhao, J.; Yin, D.; Huang, J. Hypergraph contrastive collaborative filtering. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 70–79. [Google Scholar]
Fkih, F. Similarity measures for Collaborative Filtering-based Recommender Systems: Review and experimental comparison. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 7645–7669. [Google Scholar] [CrossRef]
Zarzour, H.; Maazouzi, F.; Soltani, M.; Chemam, C. An improved collaborative filtering recommendation algorithm for big data. In Proceedings of the IFIP International Conference on Computational Intelligence and Its Applications, Oran, Algeria, 8–10 May 2018; pp. 660–668. [Google Scholar]
Shen, J.; Zhou, T.; Chen, L. Collaborative filtering-based recommendation system for big data. Int. J. Comput. Sci. Eng. 2020, 21, 219–225. [Google Scholar] [CrossRef]
Sahoo, A.K.; Pradhan, C.; Barik, R.K.; Dubey, H. DeepReco: Deep learning based health recommender system using collaborative filtering. Computation 2019, 7, 25. [Google Scholar] [CrossRef]
Najafabadi, M.K.; Mahrin, M.N.r. A systematic literature review on the state of research and practice of collaborative filtering technique and implicit feedback. Artif. Intell. Rev. 2016, 45, 167–201. [Google Scholar] [CrossRef]
Nilashi, M.; Ahani, A.; Esfahani, M.D.; Yadegaridehkordi, E.; Samad, S.; Ibrahim, O.; Sharef, N.M.; Akbari, E. Preference learning for eco-friendly hotels recommendation: A multi-criteria collaborative filtering approach. J. Clean. Prod. 2019, 215, 767–783. [Google Scholar] [CrossRef]
Wang, W.; Tang, T.; Xia, F.; Gong, Z.; Chen, Z.; Liu, H. Collaborative filtering with network representation learning for citation recommendation. IEEE Trans. Big Data 2020, 8, 1233–1246. [Google Scholar] [CrossRef]
Boyack, K.; Klavans, R.; Börner, K. Mapping the backbone of science. Scientometrics 2005, 64, 351–374. [Google Scholar] [CrossRef]
Nehete, S.P.; Devane, S.R. Improving Performance of Collaborative Filtering. In ICT for Competitive Strategies; CRC Press: Boca Raton, FL, USA, 2020; pp. 423–432. [Google Scholar]
Moghadam, P.H.; Heidari, V.; Moeini, A.; Kamandi, A. An exponential similarity measure for collaborative filtering. SN Appl. Sci. 2019, 1, 1172. [Google Scholar] [CrossRef]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, 1–5 May 2001; pp. 285–295. [Google Scholar]
Lathia, N.; Hailes, S.; Capra, L. kNN CF: A temporal social network. In Proceedings of the 2008 ACM Conference on Recommender Systems, Lausanne, Switzerland, 23–25 October 2008. [Google Scholar]
Ding, Y.; Li, X. Time weight collaborative filtering. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management, Bremen, Germany, 31 October–5 November 2005; pp. 485–492. [Google Scholar]
Aggarwal, C.C.; Han, J.; Wang, J.; Yu, P.S. A framework for projected clustering of high dimensional data streams. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, ON, Canada, 31 August–3 September 2004; pp. 852–863. [Google Scholar]
Salton, G.; Wong, A.; Yang, C.S. A vector space model for automatic indexing. Commun. ACM 1975, 18, 613–620. [Google Scholar] [CrossRef]
Mar.gar.in. Margarin Is a Social Bookmark. Available online: http://mar.gar.in/ (accessed on 15 September 2023).
DeliciousAI. Delicious. Available online: https://www.delicious.com/ (accessed on 15 September 2023).
DAI-Lab. Distributed Artificial Intelligence Laboratory. Available online: http://www.dai-labor.de (accessed on 8 December 2021).
Najafabadi, M.K.; Mahrin, M.N.; Chuprat, S.; Sarkan, H.M. Improving the accuracy of collaborative filtering recommendations using clustering and association rules mining on implicit data. Comput. Hum. Behav. 2017, 67, 113–128. [Google Scholar] [CrossRef]
Tewari, A.S.; Barman, A.G. Sequencing of items in personalized recommendations using multiple recommendation techniques. Expert Syst. Appl. 2018, 97, 70–82. [Google Scholar] [CrossRef]

Figure 1. Precision comparisons for different neighborhood sizes using the Margarin dataset.

Figure 2. Recall comparisons for different neighborhood sizes using Margarin dataset.

Figure 3. F-measure comparisons for different neighborhood sizes using Margarin dataset.

Figure 4. Precision comparisons for different neighborhood sizes using the Delicious dataset.

Figure 5. Recall comparisons for different neighborhood sizes using Delicious dataset.

Figure 6. F-measure comparisons for different neighborhood sizes using the Delicious dataset.

Table 1. User-resource tagging information.

	R1	R2	R3	R4	R5
U1	Tag1, Tag2 (May 15)		Tag 1, Tag 3, Tag 4 (May 6)	Tag1, Tag4, Tag5 (May 11)
U2		Tag4 (May 15)		Tag4, Tag6 (May 1)
U3	Tag1, Tag7 (May 15)		Tag 8 (May 11)	Tag 4 (May 6)	Tag1, Tag 7 (May 6)

Table 2. Example of tag-based weight.

Tag	R1	R2	R3	R4	R5
U1	0.5	0	0.75	0.75	0
U2	0	0.67	0	1.0	0
U3	0.67	0	0.17	0.17	0.67

Table 3. Example of time-based weight.

Time	R1	R2	R3	R4	R5
U1	1.0	0	0.25	0.5	0
U2	0	1.0	0	0.25	0
U3	1.0	0	0.5	0.25	0.25

Table 4. Example of hybrid weight.

Hybrid	R1	R2	R3	R4	R5
U1	0.75	0	0.5	0.625	0
U2	0	0.835	0	0.625	0
U3	0.835	0	0.335	0.21	0.46

Table 5. Symbol and meaning.

Symbol	Meaning
u	User
v	Neighbor User
r	Resource
R	Set of Resources
i	A Certain User i
W_tag(u,r)	Weight from Tags by User ‘u’ for Resource ‘r’
W_time(u,r)	Temporal Weight of User ‘u’ towards Resource ‘r’
W_hybrid(u,r)	Combined Tag-Time Weight for User ‘u’ and Resource ‘r’
W_i(u,r)	Composite Weight Profile of User ‘u’ for Resource ‘r’
W_i(v,r)	Each Weight Vector of Neighbor User ‘v’ to Resource ‘r’
λ	Parameter for Balancing Tag-Based and Time-Based Weights
sim_i(u,v)	Similarity Score Between User ‘u’ and User ‘v’
S_tag, S_time, S_hybrid	Tag-Based, Time-Based, and Hybrid Similarities, respectively
KNN(u)	Set of ‘k’ Nearest Neighbors of User ‘u’
Score(u,r)	Predicted Preference Score of User ‘u’ for Resource ‘r’
t	Relative Time Point Value
t_i	Tagging Day
t_l	Last Tagging Day
t_f	First Tagging Day
hl_t	Half-Life for Each User

Table 6. The description of datasets.

Data	Number of Resources	Number of Tags	Number of Bookmarks
Margarin	15,765	11,065	18,850
Delicious	43,028	9383	104,687

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jo, H.; Hong, J.-h.; Choeh, J.Y. Improving Collaborative Filtering Recommendations with Tag and Time Integration in Virtual Online Communities. Appl. Sci. 2023, 13, 10528. https://doi.org/10.3390/app131810528

AMA Style

Jo H, Hong J-h, Choeh JY. Improving Collaborative Filtering Recommendations with Tag and Time Integration in Virtual Online Communities. Applied Sciences. 2023; 13(18):10528. https://doi.org/10.3390/app131810528

Chicago/Turabian Style

Jo, Hyeon, Jong-hyun Hong, and Joon Yeon Choeh. 2023. "Improving Collaborative Filtering Recommendations with Tag and Time Integration in Virtual Online Communities" Applied Sciences 13, no. 18: 10528. https://doi.org/10.3390/app131810528

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving Collaborative Filtering Recommendations with Tag and Time Integration in Virtual Online Communities^†

Abstract

1. Introduction