1. Introduction
Online shopping and the Internet have dramatically changed traditional offline businesses into web-based ones using platforms like social media, search engines, and e-commerce sites like Amazon, Shopify, eBay, Etsy, WooCommerce, etc. Customers can effortlessly place their orders to purchase products and receive them at home. Online shopping has many advantages, e.g., fast access to various items, a better selection of items, detailed information about products, etc. The problem that arises in such shopping is that customers need to learn about the actual quality of the products. Business proprietors have created review or feedback systems for prior customers to cope with such issues. Feedback from customers is collected in different ways, e.g., through surveys on the social web, live chat, offering incentives to customers on their responses, creating a feedback system on the website, etc. The focus of this study is to identify product quality from customers’ attitude using advanced machine learning methods. Attitude is a somewhat durable organization of beliefs, thoughts, and behavior proclivities in connection with critical social elements, groups, occurrences, or symbols [
1]. The appraisal framework examines the attitude of people in a written text. The attitude system in the appraisal framework has three main subdomains: appreciation, affect, and judgment. Affects are emotion-based; judgment deals with the evaluation of persons to extract their opinions, and appreciation deals with the extraction of opinions related to physical things, processes, etc. Appreciation is further divided into three main categories, i.e., reaction, composition, and valuation. Impact and quality are associated with the reaction. The impact is associated with opinions regarding the attractiveness and unattractiveness of things. Quality is related to the liking and disliking of things, e.g., beautiful, elegant, hideous, etc. Composition is further divided into two main parts, i.e., complexity and balance. Complexity is about opinions containing complications and the simplicity of things. Balance deals with ideas like balance and unbalanced things. Valuation, as its name implies, contains opinions related to concepts like the innovativeness, uselessness, cost, shoddiness, etc., of things [
2]. Quality means the features of a particular product that satisfy customers’ needs. Quality assessment enables stakeholders to improve the standard of their products and make them more marketable, compete more effectively, gain market share, and generate sales revenue. Higher quality enables businesses to lower error rates, rework, field failures, warranty costs, customer unhappiness, and other costs. It also reduces the time to launch new products [
3]. This study exploits an appraisal framework to prepare the quality of the product-based lexicon dictionary and then employs a BERT word embedding model and BiLSTM model to detect the quality of the product from customers’ attitudes in product reviews.
The main contributions of this study are the following:
The development of annotation guidelines for a lexicon dictionary to identify product quality based on an appraisal framework.
To the best of our knowledge, the proposed model is the first to identify the quality of a product from customers’ attitude based on an appraisal framework using a lexicon approach with N-grams and the utilization of a pre-trained BERT word embeddings model in combination with BiLSTM.
This paper is organized as follows:
Section 2 concerns related work.
Section 3 consists of the methodology.
Section 4 is about the experiment.
Section 5 comprises the results and discussion, while
Section 6 contains the conclusions and future work.
2. Related Work
Electronic word-of-mouth is about the sharing of information by customers unceremoniously on the internet with respect to particular features of products and services, or about the vendor [
4]. Customers, in their reviews, state their knowledge, approval, and opinions about goods or services on different web-based platforms, e.g., social webs, blogs, etc. [
5]. Online reviews form an opinion and provide useful information that can be used to evaluate potential purchases [
6]. The primary goal of natural language processing is the automatic processing of text to assess peoples’ attitudes, feelings, etc. Opinion mining involves the extraction of sentiments and the topics discussed within the text. Based on online reviews, the author in [
7] employed text mining techniques to predict the positive and negative attitudes of consumers toward the hotel. Important features related to the positive and negative attitude of consumers were extracted. These critical features significantly assisted the marketers in planning keyword selection in their marketing policies. According to the author in [
8], development of business is mainly based on recommendations which are the sole predictors.
In the age of e-commerce, online consumer reviews have become very helpful because they influence upcoming purchases. E-commerce sites frequently offer product review ranking services to assist customers in making decisions and solve the concerns about the quality of online goods. The quick emergence of social networks and the corresponding increase in data generation mean that sentiment analysis is no longer a static activity, and the methodologies that have emerged can be broadly divided into the lexicon-based approach [
9] and machine learning approach [
10]. The authors in [
11] investigated the selection of online products by consumers using a web-based product recommendation system. They discovered that customers checked the recommendations from a web-based system for product recommendations twice as often as consumers who did not.
On the web, there is a wealth of information about consumer reviews. Consumer recommendations are also included in these reviews. These textual reviews provide information on the money-spinner and display a set of assessment variables regarding the presence or absence of recommendations. With the vast amount of information available on the internet, data mining techniques are used to extract hidden information about customer behavior [
12,
13]. The number of features that are included in a product that satisfy the customers’ needs and how those features are altered to conform to those expectations are considered a product’s quality. The authors presented machine learning techniques for extracting different product parameters that are available across various sources in an unstructured manner to improve product quality [
14]. In [
15], the authors proposed text mining to assess product reviews while taking into account the validity of each review in order to develop a trustworthy evidence-based strategy for online product evaluation. The authors in [
16] used sentiment analysis to gather consumers’ reviews and predicted the product’s rating. In [
17], the author suggested different aspects of a product, i.e., internal features, external features, industry standards, reliability, lifetime, services, customers’ response, exterior finish, and past performance of the product. Udeh et al. [
18] employed a survey method and examined the hypothesis with the help of multiple regression analysis to check the influence of product quality on Pay TV customer happiness. Their research study concluded that customer satisfaction with Pay TV positively and significantly correlates with reception quality, content quality, and customer service.
The author in [
19] used Twitter data to assess the consumers’ sentiments toward well-known brands. In [
20,
21], the authors obtained product features and envisioned market formation by using text mining techniques. Text mining techniques were also used by the authors in [
22,
23] to supplement numerical data in order to forecast product sales. The approach in [
24] evaluated new quality value by collecting web-based customers’ comments about shopping and then implementing text classification techniques with a fuzzy comprehensive evaluation method. This evaluation procedure aided consumers in making better decisions when purchasing suitable products. The author in [
25] prescribed innovative architecture for inventors by utilizing web-based reviews for product design based on reality-based online item reviews of a sample product in terms of precision, comparison, and rationality. In [
26], the authors tried to get some critical information from asset reviews to improve the features of an asset and aid in improving customer service and knowledge. The authors used text mining algorithms to examine customer reviews of a product to identify issues that frequently cropped up and how they tended to develop over time. Cruz [
27] investigated the relationship between the quality of products and the satisfaction of customers. Martinet et al. [
28] explored social media contents with the help of an appraisal framework.
Social media is a significant source of information. To analyze its contents is a very laborious task through traditional data mining tools because their contents are large, noisy, unstructured as well as not being similar to each other. Conventional tools of data mining are slow, costly, depending on their size, and are biased as well [
29]. Mining social media contents deals with people’s opinions, attitude, and emotion identification. Social media contents mining revolves around two main concepts, i.e., sentiment analysis and opinion mining. The word “sentiment analysis” was first introduced in [
30], and the term “opinion mining” was initially used in [
31]. Although there is major disagreement about the limitation between these two fields. Authors also believe that text mining techniques are used under opinion mining to discover exciting and intuitive correlations among opinions of authors in [
32], while sentiment analysis is known as sentiment classification, which is related to the categorization of a text, or part of the text based on computing the amount of the individual’s opinion and the accurate information contained in the text and orientation [
33].
The lexicon-based strategies make use of a vocabulary of words whose labels indicating the sentimental valences of those words [
34]. These techniques break down a text into a collection of words, whose sentiment orientations are then summed up or combined to categorize the text. Although this method is straightforward, it mainly relies on manually tagging the text [
35]. Baharudin et al. [
36] proposed that sentence structure and context play an important role in the classification and orientation of sentiment. Each word in the sentence was given a sentiment score from the SentiWordNet lexicon in their work. The cumulative sum of the individual scores for each of the terms in the sentence determines the phrase’s overall classification. Although the method is intriguing, one of its drawbacks is that words with the same orientation but opposing meanings may be classified as having the incorrect lexicon labels in machine learning models.
Word-grams are employed in text classification to produce word co-occurrence patterns and vectors for machine learning classifiers [
37]. Jain et al. [
38] employed bi-grams and tri-grams in text representation to extract features from the text. Their research produced encouraging findings, demonstrating that N-grams can effectively represent text. They suggested an extensive, cognitive computing-inspired big data analytics framework for sentiment analysis and classification.
Techniques for word embeddings-based vector representation have recently gained importance in natural language processing [
39]. Mudinas et al. [
40] created a word sentiment label by combining a lexicon-based approach with a support vector machine classifier. Rezaeinia et al. in [
41] assigned lexicon vectors to words in a text using a variety of lexicons called Lexicon2Vec, and to create a hybrid vector representation, they coupled their vector with Word2Vec and PoS2Vec.According to Mikolov et al. [
42], the field of word-embedding feature selection research gathered steam in 2013. Word2Vec [
43], Glove [
44], and FastText [
45] are the three primary word embedding algorithms that are used to turn words into vectors. The Bidirectional Encoder Representations from the Transformers (BERT) model has garnered a lot of interest because of bidirectional and attention processes [
46]. In [
47], the authors evaluated Word2Vec, Glove, FastText, and BERT in their study, which showed the significance of the BERT model in sentiment analysis. The capacity of BERT to read words in bidirectional form, in contrast to other word embedding models, would undoubtedly improve the contextual performance of the target text. The BERT’s disadvantage is that it reads the target text in its entirety. As a result, a BERT embedding-based model performs better than other models, resulting in an impressive performance in sentiment analysis tasks [
47,
48]. The usage of synonyms and the creation of vectors with lower dimensionality than the bag of words make word embeddings superior to the traditional bag of words representation [
41,
49]. Garg et al. [
50] proved that Word2Vec embeddings outperformed alternative word embedding techniques. Currently, researchers are using pre-trained word embedding vectors to perform sentiment analysis because they are more precise and compatible with deep learning neural networks [
51]. However, pre-trained word embeddings overlook the sentiment orientation of words and their semantics, which reduces the accuracy of sentiment categorization. Chen [
52] examined the convolutional neural network’s performance using pre-trained Word2Vec vectors as inputs and by setting various hyper-parameters for the convolutional neural network model. For aspect-level sentiment analysis, the authors in [
53] employed pre-trained Glove vectors as inputs in an attention-based LSTMs model. By adding domain information to the vector, the authors in [
54] improved the pre-trained Word2Vec models for cross-domain classification. The authors in [
55] categorized Konkani texts using FastText pre-trained word embeddings and neural networks. In [
56], the authors compared various baseline models—Glove, Word2Vec, etc.—and proposed a LeBERT model by combining the sentiment lexicon and BERT via word N-grams to classify the sentiments using a convolutional network. According to related research, it is observed that most of the existing research for a product review system has been done using traditional sentiment analysis and opinion mining. It is also further examined in the existing literature that BERT is a superior model for word embedding and plays a significant part in advanced sentiment analysis. By delving further into conventional sentiment analysis and extending the work already being done, we propose the QLeBERT model in this study to predict product quality from consumers’ attitude in product reviews based on the quality subcategory of the attitude system in an appraisal framework. QLeBERT combines a lexicon approach based on the quality subcategory of appraisal framework with a BERT model via word N-grams and then, the BiLSTM model is used to predict the quality of the product.
5. Results and Discussion
This section comprises the results of experiments. We test the proposed model on benchmark Amazon customers’ reviews datasets. In this research study, we employed the 100-dimensions Glove word embedding model which is pre-trained on the English Wikipedia Giga-word 5th Edition dataset and the 100-dimensions word2vec word embedding model which is pre-trained on Google news. We reduced 768 dimensions of the BERT model to 128 dimensions to reduce the storage requirement. In training, we employed tensor flow to implement and evaluate the model. Smaller numbers of reviews were used to validate our model. We first check the effect of the lexicon-based approach on the input data and vector as shown in
Table 2.
Table 2 shows that the lexicon-based method was applied to take out a portion of the input text, considerably reducing the size of the text as a whole and thereby decreasing the computational time for the model. We then carried out an experiment with BiLSTM to assess the performance of the QLeBERT model to predict the quality of a product.
The embedding shape and preprocessing for the BERT model are represented by the Keras layer. Due to the lack of computational resources, BERT small was employed to initialize the word embedding. The word embedding’s dimension was set to 128. The default settings for the baseline models, Glove and Word2Vec word embeddings with 100 dimensions each, were used. The Keras Layer is an input layer that comprises input vectors generated by word embedding models.
We first conducted an experiment to investigate the impact of the size of N-grams on the QLeBERT model using the BiLSTM model. The datasets about customers’ reviews of Amazon products were employed in the experiment.
Table 3 displays the experimental outcomes for N = 1, 2, and all words.
For N = 1, it shows that the lexicon-based approach selected only one word to predict the quality of a product based on the quality subcategory of the attitude system in the appraisal framework. Only one word cannot represent the entire customer’ review. Therefore, a poor result was achieved. N = 2 demonstrated the best outcome by arriving the F1-macro score of 0.91. We checked the performance of the model up to N = 4. In the case of the entire text of the review, the lexicon-based approach was not applied, hence this return back to the BERT model.
Table 4 demonstrates the performance results of the Glove word embedding approach with the BiLSTM model using N-grams. For N = 3, the Glove word embedding technique with the BiLSTM model achieved the best result, having an F1 macro score of 0.78 among different N-gram words with Glove word embedding and the BiLSTM model.
Table 5 depicts the performance measurement of Word2Vec model with the BiLSTM model using N-grams. For N = 2, the Word2Vec word embedding technique with the BiLSTM model achieved a better result, with an F1-macro score of 0.80 among different N-grams with the Word2Vec word embedding approach and the BiLSTM model.
5.1. Performance Measurement of QLeBERT Model in Comparison to Different Models
An experiment was conducted on the Amazon Customers’ Reviews Dataset to validate the performance of QLeBERT with Bigram (N = 2) compared to other pre-trained word embedding models: Glove, Word2Vec, and BERT with theBiLSTM model. The experiment was carried out with and without a lexicon-based approach, as shown in
Table 6.
The proposed QLeBERT outclassed different models with and without a lexicon-based approach to predicting the quality of a product from the quality subcategory of the attitude system in the appraisal framework. F1-macro is an excellent performance assessment metric where classes are imbalanced.
5.2. Comparison of Proposed QLeBERT with Baseline Model
Finally, we compared our proposed QLeBERT model with the baseline model as reported by the authors in [
56]. They used a sentiment lexicon and BERT via word N-grams to classify the sentiments using a convolutional network. In this research study, we used the lexicon approach based on the quality subcategory of the appraisal framework and BERT via word N-grams to classify the quality of the product using BiLSTM. The proposed QLeBERT, when tested on the Amazon product reviews dataset, achieved the highestF1-macro score of 0.91 as compared to the baseline approach to predicting the quality of a product from customers’ attitude using an appraisal framework.