Next Article in Journal
Optimal Defined Contribution Pension Management with Jump Diffusions and Common Shock Dependence
Next Article in Special Issue
A Hybrid Model to Explore the Barriers to Enterprise Energy Storage System Adoption
Previous Article in Journal
An Anonymous Authentication Scheme Based on Chinese Residue Theorem in Wireless Body Area Networks
Previous Article in Special Issue
Key Factors for a Successful OBM Transformation with DEMATEL–ANP
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Large-Scale Reviews-Driven Multi-Criteria Product Ranking Approach Based on User Credibility and Division Mechanism

School of Frontier Crossover Studies, Hunan University of Technology and Business, Changsha 410205, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(13), 2952; https://doi.org/10.3390/math11132952
Submission received: 15 May 2023 / Revised: 20 June 2023 / Accepted: 27 June 2023 / Published: 1 July 2023
(This article belongs to the Special Issue Multi-criteria Decision Making and Data Mining, 2nd Edition)

Abstract

:
Massive online reviews provide consumers with the convenience of obtaining product information, but it is still worth exploring how to provide consumers with useful and reliable product rankings. The existing ranking methods do not fully mine user information, rating, and text comment information to obtain scientific and reasonable information aggregation methods. Therefore, this study constructs a user credibility model and proposes a large-scale user information aggregation method to obtain a new product ranking method. First, in order to obtain the aggregate weight of large-scale users, this paper proposes a consistency modeling method of text comments and star ratings by mining the associated information of user comments, including user interaction information and user personalized characteristics information, combined with sentiment analysis technology, and then constructs a user credibility model. Second, a double-layer group division mechanism considering user regions and comment time is designed to develop the large-scale group ratings aggregation approach. Third, based on the user credibility model and the large-scale ratings aggregation approach, a product ranking method is developed. Finally, the feasibility and effectiveness of the proposed method are verified through a case study for automobile ranking and a comparative analysis is furnished. The analysis results of the application case of automobile ranking show that there is a significant difference between the ranking results obtained by the ratings aggregation method based on the arithmetic mean and the ranking results obtained by this method. The method in this study comprehensively considers user credibility and group division, which can be reflected in user aggregation weights and the group aggregation process, and can also obtain more scientific and reasonable decision results.

1. Introduction

With the increasing popularity and development of the Internet, e-commerce and review platforms have emerged. These platforms allow consumers to post online reviews about their experiences and opinions on various product criteria, including quality and functionality. Online reviews offer valuable information to consumers who lack expert knowledge of the products they wish to purchase and help inform their purchase decisions. However, the abundance of online review data and the fake reviews posted by malicious users make it challenging for consumers to determine the authenticity of reviews and make informed purchase decisions. Thus, it is crucial to identify credible reviews from the vast amount of review data available. This study focuses on the credibility of online product reviews.
To date, numerous experts and scholars have conducted extensive research on the credibility of online product reviews, and they applied their findings to evaluate information credibility across different review platforms and social networks [1,2]. These studies involve developing credible models for online reviews to validate their reliability. Researchers examine the factors that influence the credibility of reviews and investigate the distribution patterns of ratings. For instance, Verma et al. proposed a credibility model for online reviews by analyzing the influencing factors of content, communicator, context, and consumer [3]. They also explored credibility variables associated with these factors and established a causal relationship between the variables and the credibility of online reviews by exploring 22 propositions. Banerjee et al. proposed a theoretical model of reviewer credibility based on dimensions such as positivity, engagement, experience, reputation, competence, and social connections [4]. They employed robust regression to determine the significance of these factors. Furthermore, Sun et al. developed a reputation rating method based on user rating bias and rating characteristics [5]. They discovered that the ratings provided by reliable users exhibit a peak distribution, whereas those provided by malicious users are substantially biased. On the other hand, scholars have presented research on the credibility of reviews from review text, ratings, user information, etc. Xiang et al. analyzes the reliability of review data from the aspects of review text semantic features, sentiment, and ratings [6]. Meel et al. put forward a holistic view of how the information is being weaponized to fulfil the malicious motives and forcefully make a biased user perception about a person, event, or firm [7]. In addition, review text and ratings are also key factors in the study of the credibility of reviews. For instance, Hazarika pointed out that there is an inconsistency between the review text and the rating in product reviews [8]. Almansour et al. proposed to build a system by fusing review text, ratings, and sources [9]. Lo et al. studied the credibility of reviews from the consistency of review text and ratings [10]. However, these studies have rarely introduced the latest text analysis tools to analyze the sentiment of texts. Additionally, traditional statistical and mathematical approaches are not sufficiently precise and efficient in reviewing texts. In this study, we employ the latest text analysis tool to develop a model based on sentiment analysis to assess the usefulness of comments.
Sentiment analysis (SA) technology is an important means of obtaining emotional tendencies in large-scale comments. SA is the process of gathering and analyzing people’s opinions, thoughts, and impressions regarding various topics, products, subjects, and services [11]. SA involves analyzing text or speech using computer techniques to determine the sentiment or emotional state within the text [12,13]. The latest text analysis model, which can be pretrained on large-scale text data, can be fine-tuned to address sentiment analysis tasks [14]. Yang et al. proposed a new SA model, SLCABG [15], and the related experimental results show that the model can effectively improve the performance of text SA. At the same time, SA is also used to analyze large-scale review sets. For instance, Haque et al. used a supervised learning method on a large-scale Amazon dataset to polarize it and achieve satisfactory accuracy [16]. Guo et al. identified the key dimensions of customer service voiced by hotel visitors using a data mining approach, latent Dirichlet analysis (LDA) [17], and the related set included 266,544 online reviews for 25,670 hotels located in 16 countries. Thus, this research employs the latest text analysis model, BERT, to conduct SA on comment text, and quantify the SA of the comment text to five levels corresponding to user ratings. This study constructs a sentiment score acquisition method for text comments based on manually annotated sentiment training libraries and combined with the BERT model.
Currently, there are numerous Internet-based review platforms exclusively for automobile brands, which offer rich and standardized data, providing information resources for useful research. Therefore, this research focuses on automobiles as the research subject and develops a model of review usefulness. Research on ranking decisions for automotive products is divided into two primary areas of investigation, namely rating-driven ranking decisions and text review-driven ranking decisions. In the research on rating-based ranking decisions, distributed linguistic term sets remain the core representation tool for transforming rating information. PROMETHEE-II and TODIM are extended to the linguistic term set environment to propose product ranking approaches [18,19]. In the research on ranking decisions based on text reviews, by combining the sentiment ratings and star ratings based on the output of the DUTIR sentiment dictionary, scholars developed a PageRank algorithm product ranking technique based on a directed graph model [20]. Additionally, scholars have addressed the accuracy problem of sentiment intensity recognition by developing a ranking decision approach based on ideal solutions and introducing two interval type fuzzy sets [21]. Scholars used sentiment analysis techniques to output five types of sentiment ranking by considering the advantages of probabilistic linguistic term sets in characterizing sentiment tendencies and their distribution forms. They combined these sentiment rankings with TODIM and evidence theory to construct a related product ranking decision approach [22]. The above studies primarily aggregate group wisdom knowledge in large-scale online reviews from a statistical viewpoint, using fuzzy sets, linguistic term sets, and other representational methods. However, they have not fully combined the current advanced text analysis techniques to analyze review texts, or considered word-of-mouth credibility and aggregation weights of heterogeneous individuals, and the inconsistency between text reviews and star ratings, as well as reviewer information disclosure, to conduct research. Thus, the current identification of false reviews is not precise enough. On the other hand, the current research does not consider the aggregation of large-scale ratings from group users, which is easily affected by large-scale fake reviews. To address this issue, this study proposes the construction of a user credibility model, and applies the text analysis model to analyze the user credibility weight of each review during the process of aggregating the wisdom knowledge of group online reviews. A group user score aggregation method is also built to calculate the comprehensive score of automobile brands. Based on this analysis, an automobile ranking decision method is developed.
In summary, this study examines how to weaken the influence of fake reviews and extract real and credible reviews for product ranking, and proposes a user credibility model based on the consistency of review sentiment orientations and ratings to solve the problem of difficult automobile ranking decisions. Compared with existing approaches, this approach examines the credibility of reviews in terms of online review text content, performs sentiment analysis on the review text, uses the high accuracy of the text analysis model, quantifies the sentiment intensity of each review text, and further analyzes the user disclosure information to compute user credibility weights. The contributions of this approach can be summarized as follows.
(1)
A user weight model based on user disclosure information is constructed, which includes authentication information, interaction information, and driving information. Then, the sentiment analysis techniques and expert knowledge are used to measure the degree of consistency between ratings and text comments, and a comprehensive user weight calculation model is developed.
(2)
The large-scale group ratings aggregation approach based on user region and comment time division is proposed, and a product ranking method is developed.

2. Problem Description and Data Description

Many consumers encounter difficulties in selecting the appropriate automobile for themselves because of their lack of professional experience and knowledge about automobiles. To address this issue, this research proposes a review credibility model based on user disclosure information and consistency to examine the aggregation of automobile reviews, and compute and rank the overall rating of each automobile brand. The notation defined below is employed to denote the aggregation and variables for this problem.
X = x 1 , x 2 x n : n represents the number of alternative target automobiles, and x i represents the i-th target automobile, i = 1 , 2 , , n .
A = a 1 , a 2 , , a 8 : The data analysis demonstrates that there are eight automobile criteria and a j is the j-th criteria of the automobile, j 1 , 8 , corresponding to {space, power, control, electricity/fuel consumption, comfort, exterior, interior, value for money}.
U = u i 1 , u i 2 , , u i K i : u i k represents the k-th user who commented on the target automobile x i , and K i represents the number of users who commented on the target automobile x i .
T = t i j 1 , t i j 2 , , t i j K i : t i j k represents the text of a comment made by user e i k on a criterion a j of target automobile x i . In this study, a user can only make one comment on an automobile in the dataset. Therefore, the number of comments is equal to the number of users making comments, k = 1 , 2 , , K i .
S = s i j 1 , s i j 2 , , s i j k : s i j k represents the star rating of user e i k for a criterion a j of the target automobile x i , k = 1 , 2 , , K i .
Finally, the automobile’s overall rating is determined from the above dataset by computing the mapping function: G = F X , A , T , S , G represents the composite rating and F represents the mapping function.

3. A User Credibility Model Based on Consistency and User Disclosure Information

In this section, a user credibility model based on consistency and user disclosure information is developed to compute user credibility weights, as shown in Figure 1.
The target automobile X = x 1 , x 2 x n is filtered according to constraints such as price, budget and model, and relevant comments are crawled on the Autohome.com forum, using Python crawlers, which comprise comment text, ratings, and user disclosure information. Python and SQL tools are e employed to eliminate illegal comment data, and the data are normalized and stored in the format shown in Table 1.

3.1. User Weights Based on the Consistency of Ratings and Text Reviews

  • Step 1. Building automobile text review sentiment dataset
(1) Method construction
Step 1.1 Obtaining text review training set
The automobile reviews are crawled on the Autohome.com forum using Python crawlers, and prepossessing is performed to remove garbled characters, missing information, and other data that do not meet the specifications. At the same time, according to the types of automobiles currently on the market, they are divided into new energy automobiles and gasoline automobiles, and the review data are screened according to the score distribution to make the score distribution even. Finally, the data matrix M ¯ = D h H 0 × 1 is obtained.
Step 1.2 Obtaining text review training set with expert sentiment values
The obtained automobile review sample library is uploaded to the built automobile review labeling system to ensure that the data labeling conforms to the specification. At the same time, in order to obtain accurate data, L experts were hired to mark the comment text. Each review text is annotated once by L experts. The experts formulate the rating rules through discussion, and then mark the emotional strength of the comment text according to the rules, as illustrated in Figure 2. Finally, the data matrix M = D h l H 0 × L is obtained.
Step 1.3 Aggregating of expert sentiment values for text review training set
The variance of the marked sentiment score of each comment is calculated and a threshold set to filter the data, so as to maintain the stability of the comment data. Then, the average value of the expert sentiment values S V h for text review D h is calculated as the emotional strength of the label.
S V h = 1 L l = 1 L D h l
(2) Method execution
Step 1.1 Obtaining text review training set
Using crawlers to obtain more than 4000 new energy automobile reviews and more than 4000 gasoline automobile reviews in the forum, a total of more than 8000 automobile review data were obtained. Review data including missing information and garbled characters were removed through preprocessing, and finally 7361 automobile review data were obtained.
M ¯ 0 = D h 7361 × 1
Step 1.2 Obtaining text review training set with expert sentiment values
Eight experts were hired to discuss and formulate the sentiment rating rules for automobile review texts; they logged in to the annotation system to annotate the above-mentioned obtained review texts. Finally the matrix M 0 of the text emotional annotations of the eight experts was obtained.
M 0 = D h l 7361 × 8
Step 1.3 Aggregating of expert sentiment values for text review training set
First, the variance of the sentiment labeling rating of eight experts for each comment was calculated, and the labeled data with a variance greater than 2 were removed. Then, the sentiment annotation matrix of the above-mentioned automobile review text was aggregate. Finally, 6563 automobile review text sentiment annotation data were obtained to form a sentiment analysis model training dataset. This included eight automobile attributes, where each attribute has a comment text. Finally 52,504 texts were obtained. The distribution of sentiment intensity is shown in Table 2.
S V h = 1 8 l = 1 8 D h l S M = S V h 6563 × 1  
  • Step 2. Training text review sentiment values based on BERT sentiment analysis model
Step 2.1 Building the automobile review sentiment analysis model based on BERT
In this study, the deep learning framework pytorch was used to build the automobile review sentiment analysis model. The main process is shown as the Figure 3.
The process was as follows: First, remove stop words, stemming, and other preprocessing of the automobile review text. Then extract the topics in the comment text, and then select an efficient deep learning model according to the short text processing effect. This study used the BERT model to build a sentiment analysis model.
The overall framework of the model is a stack of encoders with multiple layers of transformers, with a single-layer structure, as illustrated in Figure 4. E 1 , E 2 , , E N is the embedded word after the embedding process, T r m represents the transformer encoding layer of the host, and T 1 , T 2 , , T N represents the word encoding after the multilayer process. The affective intensity prediction process includes embedding, multi-head self-attention, feedforward, and layer normalization.
At the same time, the model parameters are preliminarily set according to the length and data volume of the automobile review text data. The parameter settings of the model are shown in Table 3.
Step 2.2 Training text review sentiment values
In this study, the aforementioned prepared training dataset was employed to train the BERT model based on the pytorch framework to obtain a sentiment analysis model with eight automobile features. Figure 5 shows the process of sentiment analysis model training. Table 4 shows the accuracy of the model.
  • Step 3. Predicting text review sentiment values
To quantitatively predict the sentiment intensity of all comments, the trained sentiment analysis model was employed. The model predicts an emotional intensity based on the input utterances using the above module, as illustrated in Figure 6. The top part [CLS] is the identification of the beginning of the text, which contains the information of the entire sentence, but has no real meaning. The output of all other positions will be biased by placing more weight on the weight of their position, so the first place is output. Then, this is processed by the linear classifier module to predict a label.
The sentiment intensity S of the a j ( j 1 , 8 ) indicator review text of user e i k for an automobile x i is obtained by examining the review text t i j k with the sentiment analysis model. This yields the sentiment intensity prediction matrix S p = s p i j 1 , s p i j 2 , , s p i j k for all reviews of the automobile x i . The granularity s p i j k is the same as the user star rating s i j k , which takes on a range of values 1 , 2 , 3 , 4 , 5 .
  • Step 4. User weights based on the consistency of ratings and text reviews
Generally, fake reviews are characterized by inconsistencies between the star rating s i j k and the sentiment intensity s p i j k of their review texts. Thus, this section proposes approaches to compute the consistency weights of the two factors. The higher the weight, the higher the degree of consistency and the more credible the review. The computation process is as follows:
r i j k = 1 4 4 s i j k s p i j k
The consistency weight matrix is obtained for online reviews of automobiles, R = r i j k , r i j k 0 .

3.2. User Weights Based on Disclosure Information

User disclosures include whether they are authenticated, their interaction index (number of replies, number of likes, and number of views), as illustrated in Table 5, and their daily travel rate, expressed as I = I 1 , I 2 , I 3 . User disclosures provide a side view of the credibility of reviews.
  • Step 1. Certification indicators
Users who post by word of mouth on the platform can be categorized as certified and non-certified owners. Certified owners are users who have purchased the automobile. The platform’s automobile owner certification requires uploading personal information such as certified automobile models and driving licenses. This information is audited by the platform. In contrast, uncertified owners may be users who have not purchased the automobile in question. Reviews from certified owners are more credible. The weights w i I 1 k are computed, as illustrated in the following formula:
w i I 1 k = 1 , C e r t i f i e d 0.5 , N o n c e r t i f i e d
  • Step 2. Travel rate indicator
The trip rate weighting is determined by combining the daily trip rate and mileage. Research suggests that many indicators of an automobile require sufficient mileage to test its performance. Thus, the higher the usage of the automobile, the deeper the user’s experience of the automobile’s performance and the more credible the reviews they publish. The usage rate of an automobile can be computed based on its daily driving rate and mileage driven. Based on statistics, the daily driving rate of the automobile is around 15 78   km / d , and this study divides the interval accordingly to compute the daily driving rate weights, as shown in Table 6. Furthermore, statistics on mileage posted by word of mouth in automobile forums indicate that mileage is concentrated at 0 1000   km . The higher the mileage, the lower the number of published word-of-mouth entries, according to which the following intervals are divided, as illustrated in Table 7.
The usage weights w i I 2 k can be computed by summing up the travel rate weights and the mileage weights as follows, where μ represents the automobile usage weight computation parameter:
w i I 2 k = μ q 1 i k + 1 μ q 2 i k
  • Step 3. Interactive indicators
In this study, word-of-mouth entries posted on automobile forums are viewed, liked, and replied to by other users, and the ratio of the sum of the three to the length of posting is called the interaction index. A higher interaction index indicates that the review is more recognized and considered more credible. To assign weights to the interaction index, the following intervals were computed, as illustrated in the following formula:
w i I 3 k = 1.0 , I 3 15199 , 0.8 , I 3 5424 , 15199 0.6 , I 3 1769 , 5424 0.4 , I 3 674 , 1769 0.2 , I 3 0 , 674
  • Step 4. The weight of k-th user e i k for automobile x i is given as follows based on the above three indicators:
    f i k = 1 3 w i I 1 k + w i I 2 k + w i I 3 k

3.3. User Comprehensive Weight Calculation

Fusing the consistency weight of online reviews with the user disclosure weight yields a user credibility weight for reviews c i j k . The formula is as follows, where μ is the parameter for computing the credibility weight of online reviews:
c i j k = μ r i j k + 1 μ f i j k
where f i j k = f k i j = 1 , 2 , , 8 .

4. Large-Scale Ratings Aggregation Based on Group User Division

This section proposes a multi-criteria ratings aggregation method for group users to address the issue of weakening the role of user reputation weights in large groups of users, thus weakening the impact of false reviews on the overall rating. The approach first divides users into multiple sets based on their purchase location. Then, all user sets are divided into sets based on their purchase time. User ratings are then computed using user reputation weights. Finally, the aggregation of all user sets is distributed to compute the overall rating.
  • Step 1. Group division method
Step 1.1 Group division method based on user geography
As the users of the platform are automobile owners from different regions of the country, their experience and needs of the automobile may vary. Therefore, the users of each automobile x i purchase u i k are divided into eight collections based on geography. The geographical divisions of China are set up as
D = D d d = 0 , 1 , , 7 = No   region ,   Northeast ,   North ,   Central ,   East ,   South ,   Southwest ,   Northwest ,
The seven sets of users by geography are represented as
d = 0 7 u i D d , u i D d = u i d k k = 1 , 2 , , D d ,   d = 0 , 1 , , 7 .
Step 1.2 Group division method based on the time of user comments
Time is a crucial factor that should not be overlooked, and reviews are even more time sensitive, with different references at different times. Additionally, the automobiles themselves are being updated and the purchase of services and prices are constantly changing. Thus, the study of reviews must also be approached according to different time periods. This study divides the collection based on geographical divisions using years as the research step, and the difference between the earliest and latest reviews, n , as the research quotient. The time collection is T = T t t = 1 , 2 , , n . The set of users divided by geography and time can be expressed as d = 1 7 t = 1 n u i D d T t , u i D d T t = u i d t k k = 1 , 2 , , D d T t , d = 0 , 1 , , 7 , t = 1 , 2 , , n , where D d T t represents the number of users who commented during time period t in region d . The group division structure is shown in Figure 7.
Each user is assigned a weight based on the set division of users, and the final credibility weight for each comment is u i j d t k , k = 1 , 2 , , D d T t , which denotes the credibility weight of the a j ( j 1 , 8 ) indicator for the automobile x i by the k-th user in year T t under a geographical region D d . This is then normalized to give c i j d t k .
c i j d t k = u i j d t k k = 1 D d T t u i j d t k
  • Step 2. Calculating the overall user rating
To further mitigate the impact of false reviews, the original star rating s i j k of each online review was arithmetically averaged with the predicted sentiment intensity s p i j k of its text t i j k to obtain a new rating S ¯ i j k for each review on each criteria of the automobile.
S ¯ i j k = λ s i j k + 1 λ s p i j k
where S ¯ i j k denotes user u i k rating of indicator a j for automobile x i .
  • Step 3. Aggregating group user ratings
After the aforementioned group segmentation, the rating corresponding to user u i d t k is S ¯ i j d t k , S ¯ i j d t k k = 1 , 2 , , D d T t . The explanation of the related parameters is shown in Table 8.
Finally, the rating is multiplied by the credibility weight and summed to obtain the final rating S ¯ i j .
S ¯ i j = 1 8 d = 0 7 1 n t = 1 n k = 1 D d T t c i j d t k × S ¯ i j d t k
where k = 1 , 2 , , D d T t , and S ¯ i j represents the combined rating of all users of the a j indicator for automobile x i .
  • Step 4. Aggregating multi-criteria ratings
Finally, the eight indicators were aggregated to find the computed composite rating S ¯ i for automobile x i . w j is the weight of criteria a j .
S ¯ i = j = 1 8 w j S ¯ i j

5. Product Ranking Methods

  • Step 1. Collect the data and structure it to obtain the comment dataset of the automobile.
  • Step 2. Obtain the user weights.
    • Step 2.1 Calculate the user weights based on information disclosure.
    • Step 2.2 Calculate the user weights based on the consistency of ratings and text reviews.
    • Step 2.3 Obtain the user comprehensive weight.
  • Step 3. Aggregate large-scale ratings.
    • Step 3.1 Group division based on user geography and comment time.
    • Step 3.2 Calculate the overall user rating based on ratings and emotional analysis value of text comments.
    • Step 3.3 Aggregate group user ratings.
    • Step 3.4 Aggregate multi-criteria ratings.
  • Step 4. The ranking results for alternative target automobiles x i i = 1 , 2 , , n are obtained based on the final overall ratings S ¯ i i = 1 , 2 , , n
    x σ i x σ i + 1 i = 1 , 2 , , n
    where S ¯ σ i S ¯ σ i + 1 i = 1 , 2 , , n .

6. Application of the Method

The popularity of e-commerce has resulted in the development of numerous review platforms. As a large commodity, the market for automobiles is huge, and this has resulted in the emergence of specialized automobile review platforms, such as AutoZone. These platforms offer reviews of almost all automobiles and provide comprehensive information, making them one of the most crucial sources of information for consumers. However, many consumers are plagued by false reviews because of their lack of automobile-related knowledge, making it challenging for them to make an informed choice. Thus, this section is based on user disclosure information and a consistency user-credibility model analysis approach to assist consumers in making informed purchasing decisions.
  • Step 1. Determining product sets, criteria sets, and data acquisition
As shown in Table 9, six alternative target automobiles were selected based on consumers’ budgets and models. All criteria of each automobile brand were analyzed, with the computation process detailed below.
A Python crawler was written to crawl the review data of the corresponding target automobile in AutoZone as of 30 December 2022, as each automobile brand was released at a different time, and thus its review count was different. The crawled data were structured using the Python program. Table 9 shows the review data obtained after removing illegal review data, such as garbled codes and null values.
  • Step 2. Use a user credibility model based on consistency and user disclosures to obtain user credibility weights.
    • Step 2.1 User weights based on the consistency of ratings and text reviews
      • Step 2.1.1 Sentiment analysis based on text comments
The trained sentiment analysis model was employed to predict the sentiment intensity of the preprocessed online review text for each automobile. A new rating, denoted s p i j k , was obtained for each of the eight automobile features of each online review, as illustrated in Table 10.
    • Step 2.1.2 Consistency weights based on ratings and text
After obtaining the predicted sentiment intensity s p i j k of all the review texts, a consistency analysis was conducted with their corresponding original star ratings to determine a consistency weight r i j k . Table 11 illustrates the data for the consistency weighting r i j k component of the Dongfeng Nissan-Xuan Yi ( x 2 ).
Step 2.2 Calculation of weights based on user information disclosure
The characteristic information weights of all review publishers for each automobile were computed. The authentication weight ( I 1 ), usage rate weight ( I 2 ), and interaction index weight ( I 3 ) were computed using the computation approach proposed in the previous section. The parameters μ = 0.3 for computing automobile usage rate weight were set, and aggregation was used to obtain the characteristic information weight f i k for each review publisher, as illustrated in Table 12 for the partial data of Dongfeng Nissan-Henyi.
Step 2.3 Combined user weighting calculation
Combining user feature information weights and review consistency weights, and setting μ = 0.7 , yields a credibility weight c i j k for each review about all automobile criteria for each automobile brand, as illustrated in Table 13.
  • Step 3. Aggregation of group user ratings
    • Step 3.1 Group division
      • Step 3.1.1 Grouping based on user geography
According to the purchase area of the reviews divided into eight collections, the Python location function was used to achieve regional collection division, and the number of reviews in each collection was determined, as illustrated in Table 14.
  Step 3.1.2 Group segmentation based on user purchase time periods
In addition to the regional set division, each regional set was again divided into sets by observing the distribution of users’ time to purchase an automobile, as illustrated in Table 15.
  Step 3.1.3 Combined user weighting normalization
Table 16 shows the results of normalizing the combined weights of users after dividing each set.
  Step 3.1.4 Calculation of the overall user rating
(1)
Rating and text emotional intensity combined
The average of the raw ratings of each comment and the sentiment intensity obtained from the text sentiment quantification was computed to obtain the rating S ¯ p i j k for each comment. Table 17 presents the average ratings for some of the comments.
(2)
Overall user rating calculation
Based on each review’s rating and its user reputation weighting to compute its composite rating, Table 18 illustrates the composite ratings computed for a collection of group users.
  Step 3.1.5 Aggregation of group user ratings
(1)
Group aggregation by user purchase time
Aggregation is based on a collection of time-of-purchase users to compute an overall rating, as shown in Table 19.
(2)
Group aggregation by user geography
A composite rating is computed based on the aggregation of the set of users in the area of purchase, as illustrated in Table 20. Each serial number represents a geographical group.
(3)
Aggregation of multi-geographical ratings
The multi-criteria composite ratings were obtained by aggregating the aggregated ratings of all regional user groups. Table 21 illustrates the multi-criteria composite ratings for the six automobile brands.
Step 3.2 Aggregation of multi-criteria ratings
A multi-criteria rating aggregation approach was implemented, and consumers set all weights w j = 0.125 , j = 1 , 2 , , 8 to obtain the overall ratings of the automobile brands. Table 22 illustrates the calculation results of the comprehensive score when λ in Formula (12) takes different values, and their corresponding rankings for the six automobile brands. And Figure 8 shows the changing trend. Figure 6 demonstrates the pattern of the composite ratings obtained from the three different methods.
  • Step 4. Obtain the ranking results
The composite ratings of each automobile brand were obtained and ranked based on the aforementioned composite rating computation, and the findings are presented in Table 23.
The ranking order of the automobiles was changed by performing a consistency analysis of the ratings with the review text and the fusion of the text feature information to compute the overall rating of the automobiles, which differed from the original rating of the automobiles. Six automobiles were ranked with the original overall rating of x 4 = x 6 > x 5 > x 2 > x 3 > x 1 . The ranking results of the overall rating and sentiment analysis overall rating computed using the above approach were x 1 > x 2 > x 3 > x 5 > x 4 > x 6 . The change in rating and brand ranking demonstrates that fake reviews affect the overall rating and ranking of an automobile brand. Users can employ this analysis to choose an automobile brand that suits their needs for each automobile criterion by simply adjusting the weight W of each automobile criteria. For instance, if they prefer an automobile with comfortable space, Toucan L is the recommended choice; if they prioritize low fuel consumption, Dongfeng Nissan- Xuan Yi is a suitable choice.
This study employs a text sentiment analysis approach as the foundation to examine the consistency between ratings and review text while fusing review text features, which can well verify the authenticity of each review. This study’s experimental findings demonstrate that the proposed analysis approach effectively mitigates the impact of false reviews, allowing consumers to obtain comprehensive ratings of automobile brands devoid of the influence of false reviews, as well as criteria-specific ratings for each brand. Therefore, consumers can personalize the selection of automobile brands based on their needs.

7. Conclusions

Considering that user feature information and content feature information can also reflect the credibility of reviews, this study also calculates the weight of user feature information and content feature information of each review. Considering the inconsistency between online review texts and the corresponding star ratings, this study uses a deep learning model to analyze the sentiment of online review texts, predict the sentiment intensity rating of each text, and compare it with the corresponding star ratings given by users to obtain the consistency weight. By combining objective and subjective factors, the feasible weight of each review can be more accurately calculated. In recalculating the composite ratings, this study split the reviews of an automobile from multiple sets by the location and time of purchase and calculated the composite ratings for each set within the set, taking full account of the impact of the location and time of purchase on the credibility of the reviews. Compared with similar existing studies, the research process, methods, and results of this paper are more interpretable and enlightening. In particular, in terms of user credibility, the proposed approach closely explores the personality characteristics disclosed by users. However, there are still limitations in this study; for example, the product quality complaint data are not considered, and the information is not comprehensive enough. Missing values for future user reviews are also not considered. This research is aimed at automobiles, and further research is needed on how to deal with other fields. In the future, further research and attempts will be made regarding the consideration of user personalized preferences in the ratings aggregation process, and the integration of product quality complaint data provided by users for product rankings.

Author Contributions

Methodology, X.Y. and Y.Y.; Software, X.Y.; Validation, W.C.; Investigation, Y.Y.; Writing—original draft, W.C.; Visualization, Y.Y.; Supervision, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used to support the findings of the study are available from the corresponding author upon request. The author’s email address is yangshijiazu@my.swjtu.edu.cn.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Esposito, C.; Galli, A.; Moscato, V.; Sperlí, G. Multi-criteria assessment of user trust in Social Reviewing Systems with subjective logic fusion. Inf. Fusion 2022, 77, 1–18. [Google Scholar] [CrossRef]
  2. Moran, G.; Muzellec, L. eWOM credibility on social networking sites: A framework. J. Mark. Commun. 2017, 23, 149–161. [Google Scholar] [CrossRef] [Green Version]
  3. Verma, D.; Dewani, P.P. eWOM credibility: A comprehensive framework and literature review. Online Inf. Rev. 2021, 45, 481–500. [Google Scholar] [CrossRef]
  4. Banerjee, S.; Bhattacharyya, S.; Bose, I. Whose online reviews to trust? Understanding reviewer trustworthiness and its impact on business. Decis. Support Syst. 2017, 96, 17–26. [Google Scholar] [CrossRef]
  5. Sun, H.-L.; Liang, K.-P.; Liao, H.; Chen, D.B. Evaluating user reputation of online rating systems by rating statistical patterns. Knowl.-Based Syst. 2021, 219, 106895. [Google Scholar] [CrossRef]
  6. Xiang, Z.; Du, Q.; Ma, Y.; Fan, W. A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism. Tour. Manag. 2017, 58, 51–65. [Google Scholar] [CrossRef]
  7. Meel, P.; Vishwakarma, D.K. Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities. Expert Syst. Appl. 2020, 153, 112986. [Google Scholar] [CrossRef]
  8. Hazarika, B.; Chen, K.; Razi, M. Are numeric ratings true representations of reviews? A study of inconsistency between reviews and ratings. Int. J. Bus. Inf. Syst. 2021, 38, 85–106. [Google Scholar] [CrossRef]
  9. Almansour, A.; Alotaibi, R.; Alharbi, H. Text-rating review discrepancy (TRRD): An integrative review and implications for research. Future Bus. J. 2022, 8, 3. [Google Scholar] [CrossRef]
  10. Lo, A.S.; Yao, S.S. What makes hotel online reviews credible? An investigation of the roles of reviewer expertise, review rating consistency and review valence. Int. J. Contemp. Hosp. Manag. 2019, 31, 41–60. [Google Scholar] [CrossRef]
  11. Wankhade, M.; Rao, A.C.S.; Kulkarni, C. A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 2022, 55, 5731–5780. [Google Scholar] [CrossRef]
  12. Yadav, A.; Vishwakarma, D.K. Sentiment analysis using deep learning architectures: A review. Artif. Intell. Rev. 2020, 53, 4335–4385. [Google Scholar] [CrossRef]
  13. Zhang, L.; Wang, S.; Liu, B. Deep learning for sentiment analysis: A survey. WIREs Data Min. Knowl. Discov. 2018, 8, e1253. [Google Scholar] [CrossRef] [Green Version]
  14. Xu, H.; Liu, B.; Shu, L.; Yu, P.S. BERT post-training for review reading comprehension and aspect-based sentiment analysis. arXiv 2019, arXiv:1904.02232. [Google Scholar]
  15. Yang, L.; Li, Y.; Wang, J.; Sherratt, R.S. Sentiment analysis for E-commerce product reviews in Chinese based on sentiment lexicon and deep learning. IEEE Access 2020, 8, 23522–23530. [Google Scholar] [CrossRef]
  16. Haque, T.U.; Saber, N.N.; Shah, F.M. Sentiment analysis on large scale Amazon product reviews. In Proceedings of the 2018 IEEE International Conference on Innovative Research and Development (ICIRD), Bangkok, Thailand, 11–12 May 2018; pp. 1–6. [Google Scholar]
  17. Guo, Y.; Barnes, S.J.; Jia, Q. Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation. Tour. Manag. 2017, 59, 467–483. [Google Scholar] [CrossRef] [Green Version]
  18. Fan, Z.P.; Xi, Y.; Liu, Y. Supporting consumer’s purchase decision: A method for ranking products based on online multi-criteria product ratings. Soft Comput. 2018, 22, 5247–5261. [Google Scholar] [CrossRef]
  19. Wu, Q.; Liu, X.; Qin, J.; Wang, W.; Zhou, L. A linguistic distribution behavioral multi-criteria group decision making model integrating extended generalized TODIM and quantum decision theory. Appl. Soft Comput. 2021, 98, 106757. [Google Scholar] [CrossRef]
  20. Guo, C.; Du, Z.; Kou, X. Products Ranking Through Aspect-Based Sentiment Analysis of Online Heterogeneous Reviews. J. Syst. Sci. Syst. Eng. 2018, 27, 542–558. [Google Scholar] [CrossRef]
  21. Bi, J.W.; Liu, Y.; Fan, Z.P. Representing sentiment analysis results of online reviews using interval type-2 fuzzy numbers and its application to product ranking. Inf. Sci. 2019, 504, 293–307. [Google Scholar] [CrossRef]
  22. Liu, P.; Teng, F. Probabilistic linguistic TODIM method for selecting products through online product reviews. Inf. Sci. 2019, 485, 441–455. [Google Scholar] [CrossRef]
Figure 1. User credibility weights calculation flow chart.
Figure 1. User credibility weights calculation flow chart.
Mathematics 11 02952 g001
Figure 2. Automobile review data expert labeling flow diagram.
Figure 2. Automobile review data expert labeling flow diagram.
Mathematics 11 02952 g002
Figure 3. Sentiment analysis model construction flow diagram.
Figure 3. Sentiment analysis model construction flow diagram.
Mathematics 11 02952 g003
Figure 4. BERT sentiment analysis model structure diagram.
Figure 4. BERT sentiment analysis model structure diagram.
Mathematics 11 02952 g004
Figure 5. Flow chart of automobile review sentiment analysis model training.
Figure 5. Flow chart of automobile review sentiment analysis model training.
Mathematics 11 02952 g005
Figure 6. Sentiment intensity prediction.
Figure 6. Sentiment intensity prediction.
Mathematics 11 02952 g006
Figure 7. Group user division flowchart.
Figure 7. Group user division flowchart.
Mathematics 11 02952 g007
Figure 8. Ranking results for all alternatives.
Figure 8. Ranking results for all alternatives.
Mathematics 11 02952 g008
Table 1. Comment text and rating scale.
Table 1. Comment text and rating scale.
Automotive Indicators Space   ( a 1 ) Power   ( a 2 ) Control   ( a 3 ) Consumption   ( a 4 ) Comfort   ( a 5 ) Exterior   ( a 6 ) Interior   ( a 7 ) Vfm   ( a 8 )
Comment text t i 1 k t i 2 k t i 3 k t i 4 k t i 5 k t i 6 k t i 7 k t i 8 k
Table 2. Distribution of sentiment analysis model training dataset.
Table 2. Distribution of sentiment analysis model training dataset.
Sentiment RatingDataset
19888
29200
310,952
412,216
510,248
Table 3. BERT model parameters.
Table 3. BERT model parameters.
ParametersNameValue
Max_lengthMaximum text length512
EpochTraining batches5
Batch_sizeNumber of batch gradients down16
Table 4. Sentiment analysis model accuracy.
Table 4. Sentiment analysis model accuracy.
Automobile Properties a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
Model accuracy89%81%80%76%76%87%77%85%
Table 5. User disclosure information form.
Table 5. User disclosure information form.
Automotive SeriesUser Certified   ( I 1 ) Interaction   Index   ( I 2 ) Usage   Rate   ( I 3 )
ViewsPointsRepliesDaily Travel RateMileage
x i e i k Yes/No p 1 p 2 p 3 q 1 q 2
Table 6. Daily travel rate weights rule.
Table 6. Daily travel rate weights rule.
Daily   Travel   Rate   ( km / d ) [ 78 , + ) [ 57 , 78 ) [ 36 , 57 ) [ 15 , 36 ) [ 0 , 15 )
q 1 1.0 0.8 0.6 0.4 0.2
Table 7. Mileage weights rule.
Table 7. Mileage weights rule.
Mileage   ( km ) [ 8000 , + ) [ 5000 , 8000 ) [ 3000 , 5000 ) [ 1000 , 3000 ) [ 0 , 1000 )
q 2 1.0 0.8 0.6 0.4 0.2
Table 8. Explanation of formulas S ¯ i j d t k .
Table 8. Explanation of formulas S ¯ i j d t k .
ParameterMeaning
d The d-th purchase region
t The t-th time period
k The k-th user in the user set of the t-th time period in the d-th region
i The i-th alternative automobile
j The j-th feature of the automobile
S ¯ Composite score of text sentiment strength and raw rating
Table 9. Alternative target automobile brands.
Table 9. Alternative target automobile brands.
Automobile ModelNumber of ReviewsOverall RatingReview TimePrice (in RMB)
GAC Toyota-Hanlanda ( x 1 )37624.512006–202226.88–34.88
Dongfeng Nissan-Xuan Yi ( x 2 )26894.532016–20229.98–17.49
SAIC Volkswagen-ToucanL ( x 3 )33294.522018–202219.90–28.38
Dongfeng Honda-XR-V ( x 4 )23924.862015–202213.29–15.29
SAIC-Volkswagen-Polo ( x 5 )36474.712014–20229.09–12.49
FAW-Volkswagen-Golf ( x 6 )37564.862014–202212.98–22.98
Table 10. Predicted emotional intensity data table for Dongfeng Nissan-Xuan Yi ( x 2 ).
Table 10. Predicted emotional intensity data table for Dongfeng Nissan-Xuan Yi ( x 2 ).
No. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
044344534
153352425
244555554
344435544
454335555
Table 11. Consistency weights of ratings and texts for Dongfeng Nissan-Xuan Yi ( x 2 ).
Table 11. Consistency weights of ratings and texts for Dongfeng Nissan-Xuan Yi ( x 2 ).
No. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
043344344
133434443
233333341
342444433
434414444
Table 12. Information on indicators for calculating the weighting of the Nissan-Xuan Yi user feature.
Table 12. Information on indicators for calculating the weighting of the Nissan-Xuan Yi user feature.
No.CertifiedPurchase TimeComment TimeMileage p 1 p 2 p 3 I 1 I 2 I 3 f i k
0NO2017-082017-0815002035000.50.60.31.4
1YES2016-102017-0887881919001.00.60.62.2
2NO2017-082017-08998673010.50.40.21.1
3NO2017-082017-097602529000.50.60.21.3
4NO2018-012018-0133134,664166950.51.00.21.7
Table 13. Combined weighting of users of Dongfeng Nissan- Xuan Yi ( x 2 ).
Table 13. Combined weighting of users of Dongfeng Nissan- Xuan Yi ( x 2 ).
No. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
03.192.492.493.193.192.493.193.19
12.792.793.492.793.493.493.492.79
22.342.342.342.342.342.343.040.94
33.071.673.073.073.073.072.372.37
42.463.163.161.063.163.163.163.16
Table 14. Dongfeng Nissan—Xuan Yi ( x 2 ) regional breakdown set.
Table 14. Dongfeng Nissan—Xuan Yi ( x 2 ) regional breakdown set.
AreaNumber of ReviewsAreaNumber of Reviews
No region91East836
Northeast210South408
North240Southwest240
Central438Northwest210
Table 15. Dongfeng Nissan—Xuan Yi ( x 2 ) Regional—Time Group Segmentation Set.
Table 15. Dongfeng Nissan—Xuan Yi ( x 2 ) Regional—Time Group Segmentation Set.
Area2016–201820192020–2022
No region13951
Northeast2544141
North3213870
Central43172223
East96386354
South70231107
Southwest16106118
Northwest1348165
Table 16. Set internal confidence weights normalized.
Table 16. Set internal confidence weights normalized.
No. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
10.082 0.097 0.055 0.082 0.101 0.075 0.071 0.078
20.080 0.095 0.075 0.056 0.099 0.073 0.089 0.099
30.086 0.100 0.058 0.110 0.082 0.079 0.094 0.105
40.094 0.108 0.088 0.094 0.113 0.108 0.101 0.113
50.062 0.078 0.102 0.085 0.081 0.100 0.073 0.059
60.112 0.081 0.105 0.088 0.084 0.102 0.096 0.107
70.093 0.085 0.087 0.070 0.089 0.085 0.100 0.089
80.112 0.081 0.105 0.112 0.107 0.102 0.076 0.107
90.097 0.111 0.113 0.074 0.116 0.089 0.104 0.093
100.089 0.103 0.105 0.113 0.085 0.103 0.097 0.085
110.091 0.062 0.107 0.115 0.042 0.083 0.099 0.065
Table 17. Mean ratings of Dongfeng Nissan- Xuan Yi ( x 2 ) ratings and textual sentiment intensity.
Table 17. Mean ratings of Dongfeng Nissan- Xuan Yi ( x 2 ) ratings and textual sentiment intensity.
No. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
05.02.54.55.04.04.53.04.0
13.53.53.03.54.04.04.04.5
23.53.54.54.54.53.54.03.5
35.03.04.03.05.05.02.54.5
43.54.03.03.55.03.03.04.0
Table 18. Overall rating of Dongfeng Nissan- Xuan Yi ( x 2 ).
Table 18. Overall rating of Dongfeng Nissan- Xuan Yi ( x 2 ).
No. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
00.068 0.102 0.098 0.047 0.095 0.041 0.104 0.096
10.097 0.039 0.052 0.109 0.036 0.041 0.055 0.051
20.077 0.080 0.077 0.045 0.093 0.095 0.041 0.065
30.075 0.050 0.058 0.054 0.102 0.048 0.080 0.057
40.099 0.083 0.069 0.111 0.096 0.098 0.105 0.097
Table 19. Rating matrix for aggregation of Dongfeng Nissan- Xuan Yi ( x 2 ) by the time interval in the Northeast.
Table 19. Rating matrix for aggregation of Dongfeng Nissan- Xuan Yi ( x 2 ) by the time interval in the Northeast.
Time Interval a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
2016–20184.493.463.653.914.364.063.934.46
20194.453.893.964.164.504.624.264.34
2020–20224.684.094.494.474.664.744.574.65
Table 20. Dongfeng Nissan- Xuan Yi ( x 2 ) geographical aggregation rating matrix.
Table 20. Dongfeng Nissan- Xuan Yi ( x 2 ) geographical aggregation rating matrix.
No. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
04.384.094.144.384.684.634.034.49
14.453.994.124.224.534.574.224.49
24.544.024.124.264.514.614.344.51
34.503.984.154.274.524.544.304.48
44.473.924.164.244.524.534.254.40
54.493.764.074.184.484.484.224.40
64.453.784.174.284.444.514.164.40
74.543.814.034.184.504.474.254.48
Table 21. Multi-criteria automobile composite rating table.
Table 21. Multi-criteria automobile composite rating table.
Automobile Criteria x 1 x 2 x 3 x 4 x 5 x 6
a 1 4.684.484.654.433.863.65
a 2 4.263.924.424.224.084.30
a 3 4.254.124.424.324.544.40
a 4 4.004.253.894.144.163.92
a 5 4.544.524.093.514.083.87
a 6 4.684.544.664.674.724.61
a 7 4.034.223.993.733.924.21
a 8 4.374.464.284.124.423.96
Table 22. Overall ratings for alternatives with different ratios ( λ ) of ratings to text sentiment scores.
Table 22. Overall ratings for alternatives with different ratios ( λ ) of ratings to text sentiment scores.
Automobile Brand Original   ( l 1 ) Ranking λ = 0   ( l 2 ) Ranking λ = 0.5   ( l 3 ) Ranking λ = 0.2 ( l 4 ) Ranking λ = 0.8   ( l 5 ) Ranking
x 1 4.5164.1514.3514.2314.481
x 2 4.5344.1424.3124.2124.422
x 3 4.5254.1334.3034.2034.413
x 4 4.8613.9354.1454.0254.275
x 5 4.7134.0244.2244.1144.344
x 6 4.8613.9164.1163.9964.256
Table 23. Overall scoring ranking table.
Table 23. Overall scoring ranking table.
Case1 x 4 = x 6 > x 5 > x 2 > x 3 > x 1
Case2 x 1 > x 2 > x 3 > x 5 > x 4 > x 6
Case3 x 1 > x 2 > x 3 > x 5 > x 4 > x 6
Case4 x 1 > x 2 > x 3 > x 5 > x 4 > x 6
Case5 x 1 > x 2 > x 3 > x 5 > x 4 > x 6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, W.; Yang, X.; Yang, Y. A Large-Scale Reviews-Driven Multi-Criteria Product Ranking Approach Based on User Credibility and Division Mechanism. Mathematics 2023, 11, 2952. https://doi.org/10.3390/math11132952

AMA Style

Cao W, Yang X, Yang Y. A Large-Scale Reviews-Driven Multi-Criteria Product Ranking Approach Based on User Credibility and Division Mechanism. Mathematics. 2023; 11(13):2952. https://doi.org/10.3390/math11132952

Chicago/Turabian Style

Cao, Wenzhi, Xingen Yang, and Yi Yang. 2023. "A Large-Scale Reviews-Driven Multi-Criteria Product Ranking Approach Based on User Credibility and Division Mechanism" Mathematics 11, no. 13: 2952. https://doi.org/10.3390/math11132952

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop