Next Article in Journal
Exploring the Key Characteristics and Theoretical Framework for Research on the Metaverse
Next Article in Special Issue
Lane Line Type Recognition Based on Improved YOLOv5
Previous Article in Journal
Irrigation Scheduling in Processing Tomato to Save Water: A Smart Approach Combining Plant and Soil Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

CDF-LS: Contrastive Network for Emphasizing Feature Differences with Fusing Long- and Short-Term Interest Features

1
School of Computer and Software Engineering, Xihua University, Chengdu 610039, China
2
Lab of Security Insurance of Cyberspace, Chengdu 610039, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2023, 13(13), 7627; https://doi.org/10.3390/app13137627
Submission received: 17 May 2023 / Revised: 20 June 2023 / Accepted: 26 June 2023 / Published: 28 June 2023

Abstract

:
Modelling both long- and short-term user interests from historical data is crucial for generating accurate recommendations. However, unifying these metrics across multiple application domains can be challenging, and existing approaches often rely on complex, intertwined models which can be difficult to interpret. To address this issue, we propose a lightweight, plug-and-play interest enhancement module that fuses interest vectors from two independent models. After analyzing the dataset, we identify deviations in the recommendation performance of long- and short-term interest models. To compensate for these differences, we use feature enhancement and loss correction during training. In the fusion process, we explicitly split long-term interest features with longer duration into multiple local features. We then use a shared attention mechanism to fuse multiple local features with short-term interest features to obtain interaction features. To correct for bias between models, we introduce a comparison learning task that monitors the similarity between local features, short-term features, and interaction features. This adaptively reduces the distance between similar features. Our proposed module combines and compares multiple independent long-term and short-term interest models on multiple domain datasets. As a result, it not only accelerates the convergence of the models but also achieves outstanding performance in challenging recommendation scenarios.

1. Introduction

Recommendation systems play a critical role in accurately recommending items or content that match users’ preferences in various fields, such as news [1], e-commerce [2,3], video [4], online advertising [2], and so on. Traditional recommendation methods, such as collaborative filtering [5], KNN [6], and matrix factorization [7], use user-item interaction information, including clicks, follows, ratings, and purchase history, to find similar users or items for recommendation. However, these methods use static information that is difficult to capture the dynamic interests of users. Although matrix factorization technology was later proposed to mine users’ potential interests, its performance in recommendation is limited.
In recent years, deep learning technology has been widely used in various fields, such as anomaly detection [8,9], data enhancement [9,10,11], and so on. In the recommendation system field, based on the consideration of users’ long- and short-term interests [1,2,3,12,13,14] over time, the core principle is to model user interests based on the order of interacting items over a period of time. We can train long- and short-term interests separately based on the length of the sequence data and combine them to make recommendations that balance between personalization and diversity.
Long-term interests: extracting stable interests from sequential data based on long-term interests has always been a research focus. A common solution is to learn longer user behavior sequences as much as possible and store user interest features offline. DIN [2] considers that each user’s attention to the target item should be different and proposes a model based on attention mechanism, which uses target objects and historical sequences to calculate attention scores to update sequence information. MIMN [15] decouples user interest memory storage units from recommendation modules. Long-term user interests do not need to change in a short time, and offline storage can remember longer user sequence data. SIM [16] proposes a method for quickly retrieving user behavior memory sequences throughout the user’s life cycle, and improves the memory sequence to tens of thousands. SURGE [17] integrates different types of preferences in long-term user behaviors into clusters in the graph. Although longer sequence data can extract more stable user interests, there are also problems with difficult updates, difficult training convergence, and high data requirements.
Short-term interests: deep learning models based on time-series analysis have been found to be effective for modeling short-term user interests [3,12,14,18,19]. Several notable models, including GRU4REC [14] and DIEN [3], have integrated recurrent neural networks and attention mechanisms to improve recommendation accuracy. More recent models such as SDM [12] and CGNN-MHSA-AR [20] have further enhanced these approaches by incorporating multi-dimensional information. However, short-term interest models have limitations in dealing with noisy and incomplete data, which can lead to bias in recommendation results when coordinated with long-term interest models.
LS-term interests: typically, recommendation systems rely on either long-term or short-term benefits to generate recommendations. However, considering only one side interest will lose the platform and user experience [21]. Recent methods [1,12,18,21,22,23,24] have proposed solutions to this problem by dividing the model into two parts that separately model the user’s long-term and short-term interests. The final recommendation is then based on a fusion of these two models. The advantage of hybrid recommendation lies in its ability to combine the strengths and weaknesses of different recommendation algorithms and applicable scenarios, and then choose the most suitable ratio to fuse multiple algorithm recommendations based on real-time data. Current feature combination methods can be divided into three categories: summation [1], concatenation [24,25], and weighted fusion [12,26]. However, these methods mainly consider the overall relationship between independent features, even the latest practical industrial methods. When it comes to the long-short interest model, it ignores the relationship between aspect-level interest and short-term interest in long-term interest features. Therefore, we explicitly partition the long-term interest features and use similarity and contrast loss to compensate for the shortcomings of previous methods. In practical applications, more side information is often introduced to enhance user and item features and reduce the bias of fusion features [27,28,29]. However, less research has been conducted on the enhancement of interest features. To address this gap, we specifically designed a novel fusion enhancement module, and the final experimental results demonstrate the effectiveness of our method.
In the following text, we will answer the following three main questions:
Q1.
Why train the long and short interest models separately?
Q2.
How does the integration module correct for interest bias?
Q2.
What is the versatility of our approach?
In conclusion, the starting point of this paper is to connect the independent long and short interest models and correct the deviation between each other to improve the performance of the models. The main contributions of this paper are summarized as follows:
  • A plug-and-play user long and short interest fusion module is designed, which can effectively and quickly fuse long and short interests using the shared attention mechanism, thus improving the model accuracy;
  • The sources of interest bias are analyzed experimentally, and an improved ternary contrast loss function is introduced to accelerate the convergence of the model by using the bias between features as the index of the loss function;
  • The effectiveness and generality of our proposed method is demonstrated by combining and experimenting several different long and short interest models on several different domain datasets with data of different sequence lengths as input.

2. Related Work

2.1. Entanglement Training

The long- and short-term user interest model is a crucial component of recommender systems. It involves two stages: entangling long- and short-term interests for joint training and decoupling them to enhance interpretability and reduce confusion. Various models have been proposed for both stages, each with distinctive characteristics. For the entangling stage, SASrec [30], LUTUR [1], RCNN [18], and DIEN [3] are some of the popular models that employ complex neural network architectures to model the joint distribution of long- and short-term interests. SASrec uses the Markov chain assumption to pass sequence data through an attention network layer and a feed-forward network layer to predict the action probability of the last click. LUTUR uses user ID to capture long-term interests and long-term data to initialize short-term interest models. RCNN models long-term interest preferences using RNN and short-term interests using CNN on the hidden state. DIEN adds an improved GRU network layer with better temporal sensitivity to the lower layer of the attention mechanism network.

2.2. Disentanglement Training

To address the problem of poor model interpretability and confusion of captured user interests, researchers have proposed decoupling methods to separate long- and short-term user interests. PLASTIC [19], GNewsRec [24], SDM [12], MA-GNN [31], and CLSR [21] are some of the models that have employed different approaches for this stage. PLASTIC uses a combination of multi-factorization machine and RNN to capture long- and short-term interests separately and uses reinforcement learning to dynamically train fusion weight coefficients. GNewsRec constructs a relationship graph based on user-item interaction data to mine long-term interests and trains short-term interest using LSTM and attention mechanism. SDM trains long-term interests using item labels and user attributes and uses LSTM and multi-headed attention mechanism to obtain short-term interests. MA-GNN uses GNN to model short-term interests and a multi-headed attention mechanism to model long-term interests. CLSR proposes to use self-supervised signals to separate long- and short-term interests to reduce the biased expression of user interests.

2.3. Feature Fusion

Feature fusion refers to the fusion of information from different feature sources into a unified feature vector to improve the performance and robustness of a model in various tasks. It includes feature-level fusion, modality-level fusion and feature correction. For example, in image recognition tasks, features such as different scales and colors can be extracted from the original image and then fused to form a new feature vector. Kaiming He et al. [32] proposed a deep residual structure to fuse different levels of features across scales, and Nitish Srivastava et al. [11] proposed a model using a deep Boltzmann machine for training features from visual, text and speech data from different modalities for cross-modal learning and feature fusion, and Albert Gordo et al. [33] proposed a convolutional neural network based image retrieval method that integrates feature correction mechanisms such as local voting and pooling in the network structure, which is used to improve retrieval robustness and accuracy. Regardless, feature enhancement is a not an indispensable part of feature fusion, which refers to the reconstruction or expansion of existing features, such as data enhancement [10], transformation of features from different sources [11], gradient selection [9] and feature reconstruction [34]. We borrowed the feature fusion ideas from the above papers and proposed the use of attention mechanism and global average residual structure to fuse long and short interest features, and proved the effectiveness of our method in the final feature fusion comparison experiments.

3. Our Approach

The recommendation system operates by forecasting a user’s future actions based on their past behavior. We denote the set of all users using the mathematical symbol U, and the set of all items using the symbol I. The user interaction sequence has a fixed length of N, denoted by X u = x 1 u , x 2 u , , x N u , which represents the sequence of interaction items ordered in time by user u. The time series model utilized in this study learns both the user’s long-term and short-term interests from the first N − 1 items of the interaction sequence, as well as the user’s ID data, to predict the probability of clicking on the Nth item in the prediction model. To incorporate the latest sequence information, we draw inspiration from the paper [23] and extract the most recent consecutive segment of sequence, N ˜ , as input to the short-term interest model. Embedding techniques have been widely utilized in data processing phases of recommender systems. In our study, we apply the embedding technique to represent user discrete data and item discrete data, where each user dimension is represented by u and each item dimension is represented by i. This process can be expressed as Equation (1).
p u U , q i X = E m b e d ( U ) , E m b e d ( X )
p u U is the embedding vector of user u, q i X is the embedding vector of item i.

3.1. LS Interest Modeling

The behavioral motivation of users is often complex. Even users with stable long-term interests may occasionally click on popular items unrelated to their interests. As a result, short-term user interests are considered unstable and contain a significant amount of noise data. Zheng et al. [21] have emphasized that accurately predicting a user’s long-term interests can increase click rates, while accurate short-term interest prediction is critical for platform profitability and user experience. In sequence modeling, RNN structures are frequently employed because they take into account the time factor of sequence data. Although a user’s interests may differ significantly in a short period of time, they are still linked, and the RNN structure captures the weight of the relationship between items before and after.
Table 1 displays the prediction results of our DIN model on different types of users with varying sequence lengths, trained on Amazon Electric data. The findings indicate that when users have few click sequences, the model recommends more diverse item types and popular items to explore their interests, resulting in fluctuating hit rates. As the number of user interactions increases, the proportion of recommended item types matching the user’s click history increases, and the proportion of popular items decreases, indicating that more data can extract stable user interests. For users with diverse interests, the long-sequence model has minimal impact on the diversity of recommended items, while users with narrower interests experience the opposite effect, with longer models resulting in greater bias. A long time span of item interaction can be fatal to a short-term sequence model, leading to recommended popular items to compensate for the deficiency. However, the long-term interest model has minimal effect on short-term interest prediction and should be used judiciously to avoid impairing the prediction accuracy of the model.
In summary, we can answer the first question (Q1): long sequence models are more suitable for users with diverse interests, while short sequence models are better for users with less diverse interests. Long sequence models prioritize diversity over accuracy, while short sequence models prioritize accuracy over diversity. To overcome these challenges, we propose decoupling the short-length interest model, redesigning the feature enhancement module, and introducing a loss function to account for differences in interests.

3.2. Fusion Module

The fusion module workflow is detailed in Figure 1. Firstly, the user interest features from the long-term interest model are divided into four equal parts. This was inspired by Feng et al. [35], who suggested that user behavior consists of short sessions with varying interest points. After segmentation, multiple local features are cross-fused with short-term interest features through a shared attention mechanism, capturing the correlation between short-term features and local features. The aim is to identify the correlation between short-term feature fusion and local features to account for differences between long-term interests and short-term interests.
a i = e x p ( f ( p s u , p l u , i ) ) j = 1 n e x p ( f ( p s u , p l u , j ) )
a t t ( p l u , i , p s u ) = a i p l u , i
p l u , i refers to the ith feature part equally divided into n parts. f is the fusion enhancement function. a i calculates the attention fraction between the short-term and local features. The a t t is obtained as a result of fusion of individual local features with short-term features.
The short-and long-term interest model incorporates an attention mechanism that calculates weights between historical serial goods and target goods, providing a direct integration of global information while ignoring one-sided interest. Meanwhile, the attention module in the fusion module leverages shared parameters to calculate weights between each local feature and short-term features, thereby avoiding an overly large number of parameters. While this approach can effectively integrate local features with short-term features and obtain part of the global long-term interest features, it still lacks the ability to fully integrate global information. To address this limitation, we propose the use of an MLP and sigmoid to calculate the weights of each interaction feature, and then stitch all interaction features together as user interest features.
Multiple output feature vectors contain the evolution direction of user interest, and effectively integrating them is a crucial problem. Inspired by the channel attention mechanism in SE-Net for images, we propose a lightweight fusion unit that treats each feature as a channel. This unit dynamically adjusts the weights using the attention mechanism. Unlike the SDM model, our fusion unit is computationally efficient and adheres to the plug-and-play principle. The fusion process can be expressed as follows:
p o u t = o u + o u · σ ( f ( m a x p o o l ( p u ) + a v g p o o l ( p u ) ) )
m a x p o o l and a v g p o o l represent maximum pooling and average pooling, respectively, f is the multilayer perceptron, σ is the sigmoid function, and o u is the output result set interaction feature of the attention mechanism. The output p u is obtained by splicing the output of a multi-head attention mechanism. The fully connected layer f is followed by a sigmoid function.
Based on the fusion module’s content, we will address the second question (Q2): although p l u and p s u are explicitly separated, the disentanglement between them cannot be fully guaranteed, and there is no corresponding label to supervise the learning of their differences. To address this limitation, we propose a new contrast learning task that dynamically adjusts the similarity between the interactive feature and the original feature. The recommendation for different types of users varies in the ratio of long-term and short-term interest. For example, in the case of cold startup, local features should be more similar to short-term features, while regular users require the opposite. The goal of this task is to minimize the similarity between features and maximize features with large differences.

3.3. Dynamic Decouple Loss

The fusion module incorporates an attention mechanism to integrate local features and short-term features. The binary loss function adjusts the difference between the output and input. However, the model may face challenges with slow convergence and instability due to large model parameters and diverse user types. To overcome these issues, we use the attention scores obtained from the attention mechanism to represent the contribution of each local feature in the interaction feature of the output. We introduce a ternary comparison loss function that uses the feature similarity differences as an index to sum to the final loss value. The final loss function is reformulated as follows:
L o s s = L B C E + i = 1 n a i L t r i , i
The loss function used in this study is a combination of binary cross entropy loss ( L B C E ) and triplet loss ( L t r i ), where a represents the attention fraction. The traditional triplet loss function is defined as follows:
L = i = 1 n max ( d ( o u i , p l u , i ) d ( o u i , p s ) + m a r g i n , 0 )
d ( o u i , p l u , i ) computes the Euclidean distance between interaction features and local features, d ( o u i , p s ) counts the Euclidean distance between interaction features and short-term features, and margin is a custom constant. However, the default setting is dominated by local features, which may reduce the weight of short-term features and lead to biased results. This shortcoming can be attributed to the traditional triplet loss function, which calculates the difference between the anchor sample and the positive and negative samples, and the result is mandatory. To account for varying user preferences in prediction results, constraints are incorporated into the loss function and dynamic parameters are introduced through attention scores. The formula for this is as follows:
s i m m i n , s i m m a x = s o r t e d ( s ( o u , i , p l u , i ) , s ( o u i , p s u ) )
L t r i = m a x ( 1 n i = 1 n a i ( 1 s i m m a x ) + s i m m i n + m a r g i n , 0 ) )
For each set of triplet features, we sort the similarity between the remaining interaction features and the two features. The similarity is calculated using the formula s, which is the reciprocal of the Euclidean distance between the features. The group of features with the highest similarity is considered dominant. n is the number of interaction features and a i represents the attention weights of each interaction feature. 1 s i m m a x and s i m m i n will gradually increase the distance between dissimilar features during the iterative learning process, which can serve to regulate the ratio of long-term interest to short-term interest contribution to the final result.
To summarize, we propose an enhanced triplet loss function that incorporates an attention score to regulate the influence of local interests, a similarity term to measure the difference between local and short-term interests, and an indexing mechanism to incorporate these factors into the overall loss function. In our comparative experiments, we demonstrate that this loss function efficiently accelerates model convergence.

4. Experiments

In this section, we present a comprehensive overview of our experimental process and results. First, we describe the multiple field datasets used in our experiments and the user interest model employed for comparison. Next, we explore and analyze the performance of our fusion module by addressing three main research questions:
  • RQ1: How do our modules perform in practice?
  • RQ2: What is the individual contribution of each component in our model?
  • RQ3: Can our model effectively handle the complexity of sequence data with varying lengths across different scenarios?
We provide detailed answers to these questions and present our experimental findings in a structured and rigorous manner.

4.1. Datasets and Experimental Setup

To simulate various recommendation scenarios, we have selected four publicly available datasets in the domain. Table 2 provides detailed information about each dataset. For instance, the Amazon and Douban datasets have opposite user interest breadth, the Taobao dataset has longer user click sequences and richer data, and the Yelp dataset has fewer user clicks, which can effectively simulate the cold-start situation. In order to measure the effect of the model, three metrics are selected, namely Accuracy (ACC), Area Under Curve (AUC) and F1-score (F1). The higher the number of the three, the better the effect.

4.2. Competitors

In the experiment, the long and short interest model and the mixed training model were compared, respectively, including DIN [2], DIEN [3], NARM [13], PACA [36], RCNN [18], SLiRec [37], FMLP [38], CLSR [21], GRU4REC [14], LUTUR [1], CASER [39]. Among them, LUTUR, SLiRec and CLSR are hybrid models, which also share our model architecture.

4.3. Experimental Metrics

The experimental metrics include Accuracy, AUC and F1. The CTR task is a binary classification task, which represents the proportion of the total number of correct model predictions, and the accuracy is calculated as:
A c c u r a c y = T P + T N N
T P , T N denotes the number of positive samples predicted correctly and the number of negative samples predicted correctly, respectively, and N represents the total number of samples.
AUC is a measure of the ranking performance of a recommender system. The ROC curve is a curve with FPR (False Positive Rate) as the horizontal axis and TPR (True Positive Rate) as the vertical axis, and AUC is the area under the ROC curve. The AUC is calculated as follows:
A U C = I ( p p o s i t i v e , p n e g a t i v e ) P × N
I ( p p o s i t i v e , p n e g i t i v e ) = 1 , p p o s i t i v e > p n e g i t i v e 0.5 , p p o s i t i v e = p n e g i t i v e 0 , p p o s i t i v e < p n e g i t i v e
P is the number of positive samples, N is the number of negative samples, p p o s i t i v e is the positive sample prediction score, and p n e g a t i v e is the negative sample prediction score. The F1 score is a composite of accuracy and recall and can be calculated by the following formula:
F 1 = 2 P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
Precision and recall denote accuracy and recall, respectively. The F1 score is a combination of precision and recall, while the AUC score is a combination of TPR and FPR.

4.4. Overall Performance Comparison (RQ1)

Table 2 details the overall performance of all models on the four datasets, from which the following four results can be observed:
  • The overall performance of the short-term interest model is better. From the overall results, the short-term interest model is better overall than the long-term interest model because it can capture the actual by-sequence information of user interaction well. The results from DIN, PACA, and RCNN show that the length interest features are more informative, and if more effective methods can be used to fit the long-term interest features, the improvement of the models’ effectiveness is significant.
  • The long-term interest model has the advantage of playing in two situations: a large variety of products and a long time span of user clicks. Although the short-term interest model is generally better, the short-term model does not necessarily outperform the long-term model for the two cases of large variety of items and long click time span. The RCNN and CASER models also use CNN networks, but CASER is slightly less effective than the RCNN model, but there is a large gap between them and FMLP, which indicates that the user’s pre-click data helps the model to capture long-term interest, but the short-term interest weight is generally larger than the long-term interest weight. The cold start problem is a difficult problem for both models, and the best results for the Yelp dataset are lower than the remaining three datasets, and there is a large gap between the long-term and short-term models, which verifies that fitting to the length of the sequence data and effective extraction of the sequence data are the keys to improve the performance.
  • Joint modeling of long and short interests is a generally effective approach. Joint modeling of models somewhat alleviates the poor performance of independent models in cold starts, large span of user clicks, and many types of user clicks, but it is not always effective. NARM, LUTUTR, and SLIREC are trained by entangling long and short interests with each other, which somewhat increases the redundancy of the models, and the performance of SLIREC in ACC and F1 which are inadequate. In contrast, CLSR decouples the calculation of long and short interests, and the adaptive weighting fuses the long and short interest weights, which alleviates the training volume of the model, which makes up for the lack of realizations of SLIREC on Taobao.
  • Contrast learning and feature enhancement can effectively improve model performance. Our model differs slightly from the best comparison model CLSR on the data-rich dataset Taobao, but improves AUC by almost 0.01 on Douban, which has a much smaller variety of products, and for the cold-start case, our model leads CLSR in all metrics, and has into 1.3% improvement on the long time span Taobao dataset. The results from Table 3 also show that CDF-LS achieves a better balance in terms of computational cost. These all validate the effectiveness of contrast loss and feature enhancement.
Figure 2 depicts the decline diagram of training loss value of four mixed models on all datasets. The zigzag curve is due to mild overfitting. CDF-LS can rapidly decline iteration according to the difference index between features in the early stage of training, and it is also applicable in difficult recommendation scenarios. This verifies that the introduction of contrast loss between short and long features is effective.

4.5. Results of Ablation Experiments(RQ2)

4.5.1. Contrast Loss

Contrast learning facilitates model fitting and interpretability by learning the similarity between long and short features, combined with dynamic weight assignment. We conducted ablation experiments on contrast loss, we compared the performance of two models LUTUR, SLIREC with and without contrast loss, and replaced the contrast loss function of CLSR. The experimental effects of dynamic assignment of weights were also compared, and Table 4 shows all the comparison results in detail.
As can be seen from Table 5, the contrast loss adaptation can effectively improve the performance of the LS-term model and is applicable to difficult recommendation scenarios. The assignment of dynamic weights can serve to effectively assign the weights of long and short features. We replaced the contrast loss in the CLSR model, which outperformed our model by 0.35% on the Amazon dataset and by 0.13% in the cold start case. This indicates that the decoupling framework in CLSR is applicable to most domains and can effectively improve model performance, and that self-supervised decoupling on how to do so effectively is a future direction for improvement.

4.5.2. Feature Fusion

To illustrate that our model is suitable for most short and long interest models and easy to debug and add, we combined four sets of short and long interest models and conducted exploratory experiments on more difficult prediction datasets. Table 6 shows the results of the comparison experiments in detail, from which we can find that for DIN and CASER, the worst-performing group of individual models, the performance improvement is obvious, especially for DIN, which improves the AUC metric on Yelp by 15.4%. This is attributed to the high-weighted short-term features and the effective fusion method. As for the best performance combination RCNN and FMLP, its performance approaches CLSR on Amazon and even surpasses CLSR by 0.02% on Yelp, which strongly validates that there is still room for progress in the extraction of user interest, the enhancement of features with long and short term interest can effectively play the performance of the respective models, and the effectiveness of our module for feature fusion.
Based on the findings presented in Table 1 and Table 3, we can address the third research question on universality (Q3) raised in the introduction. Our results indicate that the CBF-LS approach can effectively be applied across multiple datasets and models in various fields. This is achieved by dynamically adjusting the contribution ratio of short and long-term interest features based on the attention weight score and leveraging the differences between features in the process of reverse optimization. The clear theoretical foundation of our approach can be easily extended to other domains, and the faster iterations result in reduced time and costs. Our results demonstrate that the wide applicability and training efficiency of our approach support its universality.

4.6. Robustness Test Experimental Results (RQ3)

To demonstrate the versatility of our designed module for various models and scenarios, we conducted experiments on two challenging recommendation datasets using different models with varying sequence lengths for training the long and short interest models. The fusion module was utilized for pairing the models. Table 7 shows that the combination of the longest sequence model and the shortest sequence model outperforms the baseline model. However, when using similar models, the results were poorer. This is mainly due to the lack of compensating measures in similar models, which prevents the optimization of contrast loss based on differences between the two models. This can even result in interference, as evidenced by smaller results for 20 and 30 sequence lengths compared to 40 and 50 sequence lengths. Our tabular results demonstrate that our module significantly improves model performance in cases where there are large differences between long and short interest models. However, it is less effective for similar models, highlighting the limitations of contrast learning.

5. Conclusions

In this paper, we present an experimental analysis of the differences in recommendation outcomes between long and short user interest models. To address this gap, we propose a novel interest fusion module that assimilates both long and short user interests. Our module is designed as a plug-and-play feature that employs a shared attention mechanism to fuse the long and short interest features. Additionally, we incorporate a comparison task to assess the similarity between the two interest representations and the fused interest representations. Finally, we evaluated the effectiveness of our proposed module for recommendation results on multiple datasets.

Author Contributions

Conceptualization, K.L. and W.W.; methodology, K.L. and W.W.; software, W.W.; validation, K.L. and W.W.; formal analysis, R.W.; investigation, X.C.; resources, L.Z.; data curation, X.Y.; writing—original draft preparation, W.W.; writing—review and editing, K.L. and X.L.; visualization, R.W., X.C. and X.L.; supervision, L.Z. and X.Y.; project administration, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by National Natural Science Foundation of China (No. 62202390), Science and Technology Fund of Sichuan Province (No. 2022NSFSC0556), and the Opening Project of Lab of Security Insurance of Cyberspace, Sichuan Province.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. An, M.; Wu, F.; Wu, C.; Zhang, K.; Liu, Z.; Xie, X. Neural news recommendation with long-and short-term user representations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 336–345. [Google Scholar]
  2. Zhou, G.; Zhu, X.; Song, C.; Fan, Y.; Zhu, H.; Ma, X.; Yan, Y.; Jin, J.; Li, H.; Gai, K. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1059–1068. [Google Scholar]
  3. Zhou, G.; Mou, N.; Fan, Y.; Pi, Q.; Bian, W.; Zhou, C.; Zhu, X.; Gai, K. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5941–5948. [Google Scholar]
  4. Covington, P.; Adams, J.; Sargin, E. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 191–198. [Google Scholar]
  5. He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
  6. Xu, G.; Wu, Z.; Zhang, Y.; Cao, J. Social networking meets recommender systems: Survey. Int. J. Soc. Netw. Min. 2015, 2, 64–100. [Google Scholar] [CrossRef]
  7. Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
  8. Chen, P.; Liu, H.; Xin, R.; Carval, T.; Zhao, J.; Xia, Y.; Zhao, Z. Effectively detecting operational anomalies in large-scale IoT data infrastructures by using a gan-based predictive model. Comput. J. 2022, 65, 2909–2925. [Google Scholar] [CrossRef]
  9. Liu, M.; Deng, J.; Yang, M.; Cheng, X.; Liu, N.; Liu, M.; Wang, X. Cost Ensemble with Gradient Selecting for GANs. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, Vienna, Austria, 23–29 July 2022; pp. 1194–1200. [Google Scholar] [CrossRef]
  10. Xie, T.; Cheng, X.; Wang, X.; Liu, M.; Deng, J.; Zhou, T.; Liu, M. Cut-thumbnail: A novel data augmentation for convolutional neural network. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 1627–1635. [Google Scholar]
  11. Li, N.; Liu, Y.; Wu, Y.; Liu, S.; Zhao, S.; Liu, M. Robutrans: A robust transformer-based text-to-speech model. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8228–8235. [Google Scholar]
  12. Lv, F.; Jin, T.; Yu, C.; Sun, F.; Lin, Q.; Yang, K.; Ng, W. SDM: Sequential deep matching model for online large-scale recommender system. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2635–2643. [Google Scholar]
  13. Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1419–1428. [Google Scholar]
  14. Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv 2015, arXiv:1511.06939. [Google Scholar]
  15. Pi, Q.; Bian, W.; Zhou, G.; Zhu, X.; Gai, K. Practice on long sequential user behavior modeling for click-through rate prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2671–2679. [Google Scholar]
  16. Pi, Q.; Zhou, G.; Zhang, Y.; Wang, Z.; Ren, L.; Fan, Y.; Zhu, X.; Gai, K. Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 2685–2692. [Google Scholar]
  17. Chang, J.; Gao, C.; Zheng, Y.; Hui, Y.; Niu, Y.; Song, Y.; Jin, D.; Li, Y. Sequential recommendation with graph neural networks. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 378–387. [Google Scholar]
  18. Xu, C.; Zhao, P.; Liu, Y.; Xu, J.; Sheng, V.S.S.; Cui, Z.; Zhou, X.; Xiong, H. Recurrent convolutional neural network for sequential recommendation. In Proceedings of the The World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3398–3404. [Google Scholar]
  19. Zhao, W.; Wang, B.; Ye, J.; Gao, Y.; Yang, M.; Chen, X. PLASTIC: Prioritize Long and Short-term Information in Top-n Recommendation using Adversarial Training. IJCAI 2018, 3676–3682. [Google Scholar] [CrossRef] [Green Version]
  20. Song, Y.; Xin, R.; Chen, P.; Zhang, R.; Chen, J.; Zhao, Z. Identifying performance anomalies in fluctuating cloud environments: A robust correlative-GNN-based explainable approach. Future Gener. Comput. Syst. 2023, 145, 77–86. [Google Scholar] [CrossRef]
  21. Zheng, Y.; Gao, C.; Li, X.; He, X.; Li, Y.; Jin, D. Disentangling user interest and conformity for recommendation with causal embedding. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 2980–2991. [Google Scholar]
  22. Dong, D.; Zheng, X.; Zhang, R.; Wang, Y. Recurrent Collaborative Filtering for Unifying General and Sequential Recommender. IJCAI 2018, 3350–3356. [Google Scholar] [CrossRef] [Green Version]
  23. Bai, T.; Du, P.; Zhao, W.X.; Wen, J.R.; Nie, J.Y. A long-short demands-aware model for next-item recommendation. arXiv 2019, arXiv:1903.00066. [Google Scholar]
  24. Hu, L.; Li, C.; Shi, C.; Yang, C.; Shao, C. Graph neural news recommendation with long-term and short-term interest modeling. Inf. Process. Manag. 2020, 57, 102142. [Google Scholar] [CrossRef] [Green Version]
  25. Ma, M.; Wang, G.; Fan, T. Improved DeepFM Recommendation Algorithm Incorporating Deep Feature Extraction. Appl. Sci. 2022, 12, 1992. [Google Scholar] [CrossRef]
  26. Shao, J.; Qin, J.; Zeng, W.; Zheng, J. Multipointer Coattention Recommendation with Gated Neural Fusion between ID Embedding and Reviews. Appl. Sci. 2022, 12, 594. [Google Scholar] [CrossRef]
  27. Ho, T.L.; Le, A.C.; Vu, D.H. Multiview Fusion Using Transformer Model for Recommender Systems: Integrating the Utility Matrix and Textual Sources. Appl. Sci. 2023, 13, 6324. [Google Scholar] [CrossRef]
  28. Zuo, Y.; Liu, S.; Zhou, Y.; Liu, H. TRAL: A Tag-Aware Recommendation Algorithm Based on Attention Learning. Appl. Sci. 2023, 13, 814. [Google Scholar] [CrossRef]
  29. Liang, N.; Zheng, H.T.; Chen, J.Y.; Sangaiah, A.K.; Zhao, C.Z. TRSDL: Tag-Aware Recommender System Based on Deep Learning–Intelligent Computing Systems. Appl. Sci. 2018, 8, 799. [Google Scholar] [CrossRef] [Green Version]
  30. Kang, W.C.; McAuley, J. Self-attentive sequential recommendation. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 197–206. [Google Scholar]
  31. Ma, C.; Ma, L.; Zhang, Y.; Sun, J.; Liu, X.; Coates, M. Memory augmented graph neural networks for sequential recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 5045–5052. [Google Scholar]
  32. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  33. Srivastava, N.; Salakhutdinov, R.R. Multimodal learning with deep boltzmann machines. Adv. Neural Inf. Process. Syst. 2012, 25, 2222–2230. [Google Scholar]
  34. Volpi, R.; Morerio, P.; Savarese, S.; Murino, V. Adversarial feature augmentation for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5495–5504. [Google Scholar]
  35. Feng, Y.; Lv, F.; Shen, W.; Wang, M.; Sun, F.; Zhu, Y.; Yang, K. Deep session interest network for click-through rate prediction. arXiv 2019, arXiv:1905.06482. [Google Scholar]
  36. Zhang, J.; Ma, C.; Mu, X.; Zhao, P.; Zhong, C.; Ruhan, A. Recurrent convolutional neural network for session-based recommendation. Neurocomputing 2021, 437, 157–167. [Google Scholar] [CrossRef]
  37. Yu, Z.; Lian, J.; Mahmoody, A.; Liu, G.; Xie, X. Adaptive User Modeling with Long and Short-Term Preferences for Personalized Recommendation. IJCAI 2019, 7, 4213–4219. [Google Scholar]
  38. Zhou, K.; Yu, H.; Zhao, W.X.; Wen, J.R. Filter-enhanced MLP is all you need for sequential recommendation. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 2388–2399. [Google Scholar]
  39. Tang, J.; Wang, K. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Los Angeles, CA, USA, 5–9 February 2018; pp. 565–573. [Google Scholar]
Figure 1. The fusion module is the main component of our model, which requires both long-term and short-term user interest features as input, and the user interest model can be replaced according to actual needs. GRU is a variant of recurrent neural networks, but with fewer parameters, which alleviates the gradient problem in long-term memory and back propagation. attention is the attention mechanism. Multi split splits longer long-term interest features into multiple copies.
Figure 1. The fusion module is the main component of our model, which requires both long-term and short-term user interest features as input, and the user interest model can be replaced according to actual needs. GRU is a variant of recurrent neural networks, but with fewer parameters, which alleviates the gradient problem in long-term memory and back propagation. attention is the attention mechanism. Multi split splits longer long-term interest features into multiple copies.
Applsci 13 07627 g001
Figure 2. LUTUR, SLIREC, CLSR and CDF-LS on four datasets with loss descent plots.
Figure 2. LUTUR, SLIREC, CLSR and CDF-LS on four datasets with loss descent plots.
Applsci 13 07627 g002
Table 1. Table of prediction results of different length sequence models for different users. “Class” represents the number of categories, “popu” represents the proportion of popular items, and “cross” represents the proportion of recommended results crossed with historical items categories.
Table 1. Table of prediction results of different length sequence models for different users. “Class” represents the number of categories, “popu” represents the proportion of popular items, and “cross” represents the proportion of recommended results crossed with historical items categories.
DIN ModelUser HistoriesSequencesCategoriesTime Span
LengthThread≤5≥50≤2≥20≤1≥3300
class31342674139
10popu0.0160.0050.0000.0080.0100.005
cross0.0270.3330.3840.1760.0150.034
class678752414764
20popu0.0100.0080.0040.0160.0320.012
cross0.0270.2760.0140.2080.0400.120
class145165164144201193
40popu0.0140.0050.0070.0100.0030.005
cross0.0640.2020.0140.0960.0260.082
class1899820289141133
50popu0.0180.0010.0080.0100.0090.012
cross0.0250.3300.0110.1730.0280.085
Table 2. The number of users and items in the four datasets. Average click sequence indicates the average number of items clicked by users; average click categories indicates the average number of items clicked by users; average click time span indicates the average time span clicked by users.
Table 2. The number of users and items in the four datasets. Average click sequence indicates the average number of items clicked by users; average click categories indicates the average number of items clicked by users; average click time span indicates the average time span clicked by users.
DatasetsUsersItemsAverage
Click Sequence
Average
Click Categories
Average
Click Time Span
Amazon19,24063,0018.7808.7801.050
Taobao104,6931,592,919102.17024.3850.009
Douban52,539140,5026.3212.7260.418
Yelp1,542,656209,3933.9563.0250.324
Table 3. Comparison of computational cost of LS-term model on Taobao and Yelp datasets. Params represents the number of parameters of the model, FLOPs represents the floating point computation of the model, and the unit G = 1,000,000 for the above two metrics. Throughput represents the throughput of the model, and the unit Samples/s represents the maximum number of samples processed per second.
Table 3. Comparison of computational cost of LS-term model on Taobao and Yelp datasets. Params represents the number of parameters of the model, FLOPs represents the floating point computation of the model, and the unit G = 1,000,000 for the above two metrics. Throughput represents the throughput of the model, and the unit Samples/s represents the maximum number of samples processed per second.
ModelTaobaoYelp
Params (G)FLOPs (G)Throughput
(Samples/s)
Params (G)FLOPs (G)Throughput
(Samples/s)
LUTUR29.021.28250,246.8152.790.64248,976.63
SLIREC28.982.3764,913.8152.752.5470,198.13
CLSR29.01.6298,586.7152.771.62102,487.96
CDF-LS31.571.2079,161.0654.841.2184,611.67
Table 4. The red font indicates the best results, and the LS-term models all use the RCNN and FMLP user vector weights for the best results.
Table 4. The red font indicates the best results, and the LS-term models all use the RCNN and FMLP user vector weights for the best results.
CategoryLong-TermShort-TermLS-Term
DatasetModelDINPACANARMRCNNCASERGRU4RECDIENFMLPLUTURSLIRECCLSRCDF-LS
ACC0.71480.70570.73640.76980.76650.77470.78050.78810.79240.80020.80460.8014
AmazonAUC0.80950.81540.83400.84650.84150.85740.86360.87160.87860.87730.88570.8824
F10.71670.70960.73100.76680.76330.77890.77740.78300.79110.79730.80770.8096
ACC0.68950.70330.70210.71740.71220.71890.72960.73740.75610.75430.76070.7724
TaobaoAUC0.76240.77610.77230.80840.70960.80870.83900.83890.83910.83180.83880.8392
F10.69410.70970.70290.72180.71450.71870.72790.74480.77190.76930.76910.7710
ACC0.85490.84400.86990.87400.87100.88110.89510.89410.90660.90180.91320.9134
DoubanAUC0.89740.88380.91740.92810.92040.91680.92860.93780.94540.94630.94810.9567
F10.85770.83690.87870.88170.87630.88370.89380.88690.90100.90860.90470.9079
ACC0.65660.66100.68340.70930.73070.73990.75240.75800.76300.77190.78180.7907
YelpAUC0.70350.72710.73190.75040.78090.78070.80360.80730.80980.81220.81640.8204
F10.65890.66830.68030.70170.73930.72710.75130.75150.76410.77830.77300.7943
Table 5. Contrast stands for using the comparison loss function, Weights stands for using dynamic weights to update the length of interest, and CLSR itself has dynamic weights, so only the new loss function is compared. The gray numbers indicate the increase relative to the original model results.
Table 5. Contrast stands for using the comparison loss function, Weights stands for using dynamic weights to update the length of interest, and CLSR itself has dynamic weights, so only the new loss function is compared. The gray numbers indicate the increase relative to the original model results.
ModelContrastWeightsAmazonYelp
AUCF1AUCF1
LUTUR 0.8804 + 0.00180.8013 + 0.01020.8149 + 0.00510.7736 + 0.0095
0.8820 + 0.00340.8049 + 0.01360.8157 + 0.00590.7746 + 0.0105
SLIREC 0.8853 + 0.00800.8051 + 0.00780.8166 + 0.00440.7804 + 0.0021
0.8861 + 0.00880.8064 + 0.00910.8171 + 0.00490.7847 + 0.0063
CLSR0.8859 + 0.00020.8101 + 0.00240.8217 + 0.00530.7881 + 0.0151
Table 6. Four combinatorial models comparing the simple splicing method with our fusion method. The gray numbers represent the relative increase compared to the results of the original model. The maximum increase reached 0.0314.
Table 6. Four combinatorial models comparing the simple splicing method with our fusion method. The gray numbers represent the relative increase compared to the results of the original model. The maximum increase reached 0.0314.
ModelFusionAmazonYelp
AUCF1AUCF1
DIN + CASER0.8640 + 0.02250.7811 + 0.01780.7988 + 0.01790.7439 + 0.0046
0.8715 + 0.03000.7866 + 0.02330.8123 + 0.03140.7501 + 0.0118
NARM + DIEN0.8695 + 0.00590.7854 + 0.00080.8101 + 0.00650.7613 + 0.0100
0.8763 + 0.01270.7878 + 0.01040.8160 + 0.01240.7729 + 0.0216
PACA + GRU4REC0.8641 + 0.00670.7811 + 0.00220.8010 + 0.02030.7684 + 0.0413
0.8720 + 0.01460.7869 + 0.00800.8103 + 0.02960.7700 + 0.0429
RCNN + FMLP0.8740 + 0.00240.7846 + 0.00160.8074 + 0.00010.7517 + 0.0002
0.8809+ 0.00930.7903+ 0.00730.8166+ 0.00930.7624+ 0.0109
Table 7. Underline represents the baseline model for the combination, bolded represents the best result. The longer sequences in each row are used to train the long-term interest model, and the shorter sequences are used to train the short-term interest model.
Table 7. Underline represents the baseline model for the combination, bolded represents the best result. The longer sequences in each row are used to train the long-term interest model, and the shorter sequences are used to train the short-term interest model.
ModelSequencesAmazonYelp
20304050AUCF1AUCF1
DIN + CASER 0.84270.76960.79090.7415
0.87150.78660.81230.7501
0.87530.79170.81490.7553
0.86310.78980.80770.7430
0.86490.79100.80830.7431
0.85480.77040.79050.7257
FMLP + RCNN 0.87010.77920.80750.7547
0.88090.79030.81660.7724
0.88470.79530.81890.7764
0.87030.77650.80860.7561
0.87610.78460.81370.7704
0.86020.77390.80600.7433
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, K.; Wang, W.; Wang, R.; Cui, X.; Zhang, L.; Yuan, X.; Li, X. CDF-LS: Contrastive Network for Emphasizing Feature Differences with Fusing Long- and Short-Term Interest Features. Appl. Sci. 2023, 13, 7627. https://doi.org/10.3390/app13137627

AMA Style

Liu K, Wang W, Wang R, Cui X, Zhang L, Yuan X, Li X. CDF-LS: Contrastive Network for Emphasizing Feature Differences with Fusing Long- and Short-Term Interest Features. Applied Sciences. 2023; 13(13):7627. https://doi.org/10.3390/app13137627

Chicago/Turabian Style

Liu, Kejian, Wei Wang, Rongju Wang, Xuran Cui, Liying Zhang, Xianzhi Yuan, and Xianyong Li. 2023. "CDF-LS: Contrastive Network for Emphasizing Feature Differences with Fusing Long- and Short-Term Interest Features" Applied Sciences 13, no. 13: 7627. https://doi.org/10.3390/app13137627

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop