A Cross-Domain Generative Data Augmentation Framework for Aspect-Based Sentiment Analysis

Xue, Jiawei; Li, Yanhong; Li, Zixuan; Cui, Yue; Zhang, Shaoqiang; Wang, Shuqin

doi:10.3390/electronics12132949

Open AccessArticle

A Cross-Domain Generative Data Augmentation Framework for Aspect-Based Sentiment Analysis

by

Jiawei Xue

,

Yanhong Li

,

Zixuan Li

,

Yue Cui

,

Shaoqiang Zhang

and

Shuqin Wang

^*

College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(13), 2949; https://doi.org/10.3390/electronics12132949

Submission received: 22 May 2023 / Revised: 27 June 2023 / Accepted: 28 June 2023 / Published: 4 July 2023

(This article belongs to the Special Issue Trends and Prospects in Hybrid Methods for Natural Language Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Aspect-based sentiment analysis (ABSA) is a crucial fine-grained sentiment analysis task that aims to determine sentiment polarity in a specific aspect term. Recent research has advanced prediction accuracy by pre-training models on ABSA tasks. However, due to the lack of fine-grained data, those models cannot be trained effectively. In this paper, we propose the cross-domain generative data augmentation framework (CDGDA) that utilizes a generation model to produce in-domain, fine-grained sentences by learning from similar, coarse-grained datasets out-of-domain. To generate fine-grained sentences, we guide the generation model using two prompt methods: the aspect replacement and the aspect–sentiment pair replacement. We also refine the quality of generated sentences by an entropy minimization filter. Experimental results on three public datasets show that our framework outperforms most baseline methods and other data augmentation methods, thereby demonstrating its efficacy.

Keywords:

ABSA; data augmentation; cross-domain; generation model

1. Introduction

Aspect-based sentiment analysis (ABSA) is a fine-grained task that analyzes the sentiment polarity (negative, neutral, or positive) associated with particular aspects or features of a product or service [1]. As shown in Figure 1, consider the sentence “The food is very fresh, except for the noisy environment.”, where the sentiment regarding the “food” aspect term is positive, while that of the “environment” is negative.

Coarse-grained sentiment analysis posits that a sentence expresses a single viewpoint, or one sentiment [2]. As shown in Figure 2, consider the statement, “Worthless. They turn off the WiFi on Sundays! Now I have a drink and can’t get any work done.” The entire sentence exhibits a single sentiment polarity, which is negative.

Recently, large pre-trained language models (PLMs) such as BERT [3] have been successfully deployed in ABSA to enhance accuracy by leveraging the comprehensive knowledge acquired during the pre-training process. Typically, those works utilize sequences as input to capture contextual relationships between tokens through PLMs, which are then fed into a well-designed neural network model such as multilayer perceptron (MLP) [4]. Recent works have explored generation models such as BART [5], T5 [6], and GPT3 [7] to boost their capabilities to solve complex problems. To handle multiple subtasks with varying inputs and outputs, Yan et al. [8] proposed a unified framework. Zhang et al. [9] employed the T5 model to generate a standard paradigm for extracting four sentiment elements: aspect category, aspect term, opinion term, and sentiment polarity.

Despite considerable performance improvements, ABSA still faces challenges due to the scarcity of data. To alleviate this challenge, Mao et al. [10] created a tree structure to produce negative samples that improve the robustness. Similarly, Chen et al. [11] developed a masked language model that improves data augmentation via back translation.

In this paper, we introduce a cross-domain generation of data augmentation framework (CDGDA) to mitigate the data shortage in ABSA for English. Our CDGDA generates a suitable dataset for ABSA by leveraging a coarse-grained sentiment analysis dataset with adequate data. The main contributions of this paper can be summarized as follows:

We propose a CDGDA framework which generates new fine-grained in-domain data by using a generation model to perform cross-domain transfer from numerous coarse-grained sentiment analysis datasets out-of-domain.
We incorporate a cross-entropy filter into the CDGDA to improve the output quality by reducing duplicate data.
We evaluated the CDGDA on three public datasets, and the experiments demonstrate that our framework achieves superior results compared with other baseline methods.

The rest of this paper is organized as follows. Section 2 reviews some existing work related to our work. Section 3 is a methodological section that discusses task formulation, sentence generation methods, and model architectures. Section 4 presents the dataset, experimental results, and analysis, and demonstrates our approach through examples. Finally, Section 5 discusses the conclusions and provides ideas for future work.

2. Related Work

In this section, we review the development of language models, ABSA task definitions, classifications, recent advances, and some work on domain adaptation and data augmentation.

2.1. Language Models

Long-short term memory networks (LSTMs) [12] are widely employed in ABSA due to their ability to comprehend sentence meanings while retaining awareness of word order, maintaining consistency in both the active and passive voice [13]. LSTMs perform computation along symbolic positions of input and output sequences, a natural sequential process that lacks the ability to parallelize. This lack of parallelization is notably critical for longer sequences, limiting its batch processing capability. While some attempts have been made to improve computational efficiency and model performance [14], the inherent limitations of sequential computation remain.

To mitigate sequence length and performance issues, Ashish Vaswani et al. [15] proposed the transformer, which is based on the attention mechanism. The transformer avoids recursion by using multi-head self-attention instead of a recursive layer in the encoder–decoder architecture, and it relies entirely on attention mechanisms to derive global dependencies between inputs and outputs. This approach enables a significant increase in parallelization.

However, the number of model parameters increases rapidly, necessitating sufficient data to train the model efficiently and prevent overfitting. To address this issue, pre-training the model on a large unlabeled text corpus is conducted to develop effective representations that can be used in other tasks [16,17]. The PLMs can be broadly classified into four categories [7], which are as follows:

Left-to-Right: GPT, GPT-2, GPT-3 [18]
Masked: BERT [3], RoBERTa [19]
Prefix: UniLM1 [20], UniLM2 [21]
Encoder–Decoder: BART [5], T5 [6], MASS [22]

There have been extensive explorations of the use of PLMs in text classification tasks, and most of these works employ the “fixed-prompt LM Tuning” approach for assessing the effectiveness of pre-training [23].

2.2. Aspect-Based Sentiment Analysis

In general, ABSA aims to identify aspect-level sentiment elements, including aspect terms, aspect categories, opinion terms, and sentiment polarities [9]. For instance, consider the sentence “The keys are smooth”. In this case, the four elements—“keys”, “keyboard”, “smooth”, and “positive”—are identified. While “keys” and “smooth” are explicitly expressed in the sentence, “keyboard” and “positive” belong to predefined categories and sentiment sets.

ABSA tasks can be categorized into two groups: single tasks and compound tasks [24]. Single tasks focus on identifying each element separately. For instance, the aspect term extraction task [25] aims to extract all mentioned aspect terms in a given text, while the aspect sentiment classification task [26] is to predict the sentiment polarity of a particular aspect in a sentence. In contrast, the compound tasks require extracting multiple elements while identifying correspondences and dependencies between them. For example, the aspect–opinion pair extraction (AOPE) task [27] requires extracting the aspect and its corresponding opinion term in a composite form; the aspect sentiment triplet extraction (ASTE) task [28] aims to extract the aspect term, its associated opinion term, and the sentiment polarity. Recently, Cai et al. [29] introduced the aspect–category–opinion–sentiment (ACOS) quadruple extraction, a new task that predicts all the four elements at once.

2.3. Domain Adaptation

Domain adaptation is a technique that involves adapting a model from one domain (e.g., Amazon electronics) to another (e.g., laptop). ABSA domain adaptation methods can be classified into two categories: (1) Extracting aspectual sentiment elements using artificial syntactic rules: Ding et al. [30] first use grammar rules to generate auxiliary labels and then LSTMs are employed to learn a suitable hidden layer for predicting auxiliary labels in both the source and target domains. Chen et al. [31] have built two types of connections that can assist with cross-domain transfer. Firstly, they consider the syntactic roles involved in a word as the pivot feature to connect target domain words with similar features. Secondly, they build semantic connections by using grammar-enhanced similarity measures to connect highly similar words between the two domains, thus linking them across domains. (2) Using PLMs: Ben et al. [32] employ self-generated domain-related features (DRFs) from various source domains to span their shared semantic space. These DRFs reflect the similarities and differences between source domains, as well as domain-specific knowledge. Given a new example from an unknown domain, the model first generates a sequence belonging to one of the source domains as a prompt for emotion prediction. For instance, for an input instance from the aviation domain, “The food was cold and the seats were uncomfortable”, the model utilizes DRFs to generate the prompt “restaurant-food, chair” projected onto a shared semantic space, which are, respectively, related to the restaurant and home decor domains. Then, with the assistance of the prompt and source domain knowledge, the model outputs a negative emotion prediction in the final classification.

2.4. Data Augmentation

Data augmentation aims to increase the amount of data by modifying existing data [33]. Data augmentation can be classified into three categories [34]: (1) Replacing words: This approach involves replacing some words in a source sentence using synonyms from a dictionary or similar words in a word vector [35]. (2) Translation-based augmentation: This method involves translating the source language into multiple target languages, and then back-translating them into the source language. However, the performance of these approaches is limited by the quality of the translation [36]. (3) Data augmentation using PLMs: For instance, Wang et al. [37] used generation models to produce synthetic sentences from aspect and polarity channels, which improved the robustness of the data. Li et al. [38] proposed an augmentation framework based on generation models for generating cross-domain data for unsupervised domain adaptation.

3. Methodology

In this section, we initially present the formulations for ABSA and data augmentation tasks. Subsequently, we provide an in-depth discussion of the proposed sentence generation methods and the model architecture.

3.1. Task Formulation

For the ABSA task, given a group of combination

Ω = {\{s_{i}, a_{i}\}}_{i = 1}^{n},

n is the number of all aspect words in the dataset. Note that there may be more than one aspect term in a sentence, for example, in a sentence that has h aspect words,

a_{1}, \dots, a_{h}

are different, but

s_{1}, \dots, s_{h}

have the same value. The target output is a series of predictions

P = {p_{1}, p_{2}, \dots, p_{n}}

, where s, a, and

p \in {n e g a t i v e, n e u t r a l, p o s i t i v e}

represent the sentence, aspect term, and sentiment polarity, respectively.

For the data augmentation task, the input is a sentence

y_{j} \in D_{o u t}, j = {1, 2, \dots, | D_{o u t} |}

from the out-of-domain dataset, and the output is a set of sentences

X_{j}

generated by the generation model (e.g., T5 [6]).

3.2. Sentence Generation

To generate sentences, we first need to fine-tune the generation model to fit the ABSA task. As shown in Figure 3, we randomly extract a set

{s, a, p}

from the training set of the in-domain dataset

D_{i n}

, and randomly select a sentence y from

D_{o u t}

. Then, we train the model to generate sentence

x

with

[y, a]

or

[y, a, p]

as conditions, where the former is called the aspect replacement

R_{a}

and the latter is called the aspect–sentiment pair replacement

R_{a p}

. Since ABSA is a low-resource task, adjusting the entire large-scale generation model with scarce samples can lead to overfitting issues [39]. Therefore, parameter-efficient strategies need to be employed to avoid such issues.

After fine-tuning, the parameters are frozen to prepare for the task of generating sentences. As shown in Figure 4, we prepared a set of aspect terms

A

whose items are extracted from the training set of

D_{i n}

and a set of sentiment polarities

P = {n e g a t i v e, n e u t r a l, p o s i t i v e}

, and still generate sentences using

R_{a}

and

R_{a p}

introduced earlier, with

[y, a]

or

[y, a, p]

as conditions to generate sentences. Each method is used twice, resulting in a set

X

consisting of a total of four sentences. Unlike during fine-tuning, the aspect term a and sentiment polarity p are randomly selected from the sets

A

and

P

, respectively.

3.3. Model Architecture

T5 [6] is selected as the backbone to generate the augmented data and BERT [3] is used as the prediction model. Since T5 has already been pre-trained on a large corpus, we fine-tune it through supervised learning by using out-of-domain sentences as input and in-domain sentences as output. As shown in Figure 3, we train the model by minimizing the loss to adapt it to this generation method, then, we fix the model parameters and use T5 to generate data-augmented sentences. Afterwards, as shown in Figure 4, we use the fixed-parameter generation model (Fixed GM) to generate sentences and filter the sentences through the entropy filter. In summary, the framework completes the task through three parts. First, it adapts the generation model to the generation method through fine-tuning. Then, it generates augmented sentences through the fixed-parameter generation model. Finally, the aspect sentiment polarity is predicted through the prediction model.

Fine-tuning. In this stage, we randomly select a combination

T = {s_{i}, a_{i}, p_{i}}

from the training set of

D_{i n}

, and then randomly select a sentence

y_{j}

from

D_{o u t}

. Afterwards, we use the cross-entropy (CE) to evaluate the loss between the generated sentence and the corresponding sentence from

D_{i n}

.

More specifically, for

R_{a}

, we want to find the parameters

θ_{a}

that minimize the CE by:

L_{a} = - \frac{1}{N} \sum_{i = 1}^{N} log Γ (s_{i} | θ_{a} ([y_{i}, a_{i}]))

(1)

where

Γ

is the generation model, and N is the number of aspect terms in the training set of

D_{i n}

. For

R_{a p}

, we want to find the parameters

θ_{a p}

that minimize the CE by:

L_{a p} = - \frac{1}{N} \sum_{i = 1}^{N} log Γ (s_{i} | θ_{a p} ([y_{i}, a_{i}, p_{i}]))

(2)

Then, the fine-tuning stage objective is as follows:

L_{A u g} = L_{a} + L_{a p}

(3)

After this stage, we obtain the parameters

Θ

of

Γ

.

Data Augmentation. For each sentence

y_{k}

from

D_{o u t}

, we use

Γ

with the frozen parameter

Θ

to generate new sentences

x_{k}

,

x_{k}^{'} \in X_{k}

,

k = {1, 2, \dots, | D_{o u t} |}

in two ways:

x_{k} = Γ ([y_{k}, a_{i}], Θ)

(4)

x_{k}^{'} = Γ ([y_{k}, a_{i}, p_{j}], Θ)

(5)

where

a_{i}, i = 1, 2 \dots, | A |

and

p_{j}, j = 1, 2, 3

are chosen randomly from

A

and

P

, respectively. After that, we obtain the set of sentences

X

.

To improve the quality of the generated sentences, low-quality sentences are filtered by an entropy minimization:

H (x_{i}) = - E_{x_{i}} [log P (x_{i})] = - P (x_{i}) log P (x_{i})

(6)

P (x_{i}) = S o f t m a x (h W + b)

(7)

h = F (x_{i})

(8)

where

x_{i} \in X, i = 1, 2 \dots, | X |

,

h

,

W \in R^{d \times 3}

,

b

, and

F

are the generated sentence, representation of hidden layers, model weight, bias of layers, and classification model (e.g., BERT), respectively.

Afterward, the sentences are sorted in descending order by their obtained scores, and then the hyperparameter t is used to determine how many sentences to use for the next prediction stage.

Prediction Stage. During the process of training the prediction model, the target is to minimize the negative log likehood (NLL) through:

L_{p} = - \frac{1}{N} \sum_{i = 1}^{N} log F^{(z)} (s_{i})

(9)

L_{p}^{'} = - \frac{1}{M} \sum_{k = 1}^{M} log F^{(z)} (x_{k})

(10)

where

F^{(z)} (*)

is the predicted probability of the sample * on class z using a prediction model. Note that

z \in {1, 2, 3}

which mean negative, neutral, and positive, respectively, and M is the number of sentences in

X

.

The final training loss consists of the addition of two parts:

L = L_{p} + α L_{p}^{'}

(11)

where

α

is the hyperparameter.

The algorithm is shown in Algorithm 1.

Algorithm 1 Sentence generation

Require:

D_{i n}

: in-domain dataset;

D_{o u t}

: out-of-domain dataset;

T = {\{s_{i}, a_{i}, p_{i}\}}_{i = 1}^{n} \in

training set of

D_{i n}, n

is the number of aspect terms;

y_{i} \in D_{o u t}, i = 1 \dots m, m

is the number of sentences;

Γ

: generation model;

θ, Θ

: model parameters;

Ensure: X: Sentences of data augmentation

1:: for $i \leftarrow 1$ to n do ▹ Step 1: Fine-tuning
2:: $g \leftarrow l o s s (Γ ([y_{i}, a_{i}], θ_{i}), s_{i})$
3:: $g^{'} \leftarrow l o s s (Γ ([y_{i}, a_{i}, p_{i}], θ_{i}), s_{i})$
4:: $θ_{i + 1} \leftarrow T r a i n (g, g^{'}, θ_{i})$
5:: end for
6:: $Θ \leftarrow θ_{n}$
7:: for $i \leftarrow 1$ to m do ▹ Step 2: Generation
8:: $X_{i}^{'} \leftarrow \emptyset$
9:: for $j \leftarrow 1$ to 2 do
10:: randomly select a and p from set $A$ and $P$ , respectively.
11:: $x_{j} \leftarrow Γ ([y_{i}, a], Θ)$
12:: $x_{j}^{'} \leftarrow Γ ([y_{i}, a, p], Θ)$
13:: $X_{i}^{'} \leftarrow X_{i}^{'} + x_{j} + x_{j}^{'}$
14:: end for
15:: $X_{i} \leftarrow F i l t e r (X_{i}^{'})$
16:: end for
17:: return X

4. Experiment

In this section, we introduce the in-domain and out-of-domain datasets, the experimental setting and the study of hyper-parameters, the baseline methods, the experimental results and analysis, the ablation study, and the case study.

4.1. Datasets

We conduct experiments on three public fine-grained sentiment analysis datasets: Restaurant, Laptop, from the SemEval 2014 ABSA task (https://alt.qcri.org/semeval2014/task4, accessed on 20 February 2023) [1], and Twitter (http://goo.gl/5Enpu7, accessed on 20 February 2023) [40].

We selected coarse-grained datasets similar to the fine-grained data for our data augmentation. Specifically, we used Amazon Electronics (https://jmcauley.ucsd.edu/data/amazon, accessed on 20 February 2023) [41], Yelp (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset, accessed on 20 February 2023), and Twitter17 (https://alt.qcri.org/semeval2017/task4, accessed on 20 February 2023) [42]. It is important to note that the coarse-grained datasets Amazon Electronics and Yelp are labeled with sentiment polarity for scores ranging from 1 to 5. The sentiment polarity of the Twitter17 dataset is divided into three categories: negative, neutral, and positive, and this aligns with the ABSA task, so no special handling is required.

The descriptions of the datasets used are shown in Table 1 and Table 2. For fine-grained datasets, the number of sentiment labels exceeds the number of sentences because a sentence may contain two or more sentiments. For coarse-grained datasets, each sentence has only one sentiment. To ensure sample uniformity, for datasets with sentiment scores ranging from 1 to 5, the same number of samples is used for each score, and for datasets with three sentiment categories, the same number of samples is used for each category. Some examples from each dataset are shown in Table 3.

Our data augmentation method generates new sentences by learning from out-of-domain coarse-grained datasets that are similar to the in-domain fine-grained datasets. The Yelp dataset comes from a website where the users give their reviews of restaurants, and the sentences are similar to the fine-grained Restaurant dataset. The Amazon Electronics dataset comes from the reviews of electronic products on an online shopping website, and the sentences are similar to the Laptop dataset. The Twitter17 dataset was proposed in SemEval-2017 Task 4 with the goal of determining whether a tweet is negative, neutral, or positive, and the sentences are similar to the Twitter dataset. In summary, the sentences generated on Yelp, Amazon, and Twitter17 will be added to the training sets of Restaurant, Laptop, and Twitter, respectively.

4.2. Experimental Setting

During data augmentation, we use LoRA [43] as the parameter for effective transfer learning and T5-base (https://huggingface.co/t5-base, accessed on 20 February 2023) is utilized as the generation model for 100 training epochs with batch size set to 16. The hyperparameter t is used to determine how many generated sentences participate in training. It is set to 1 for the Restaurant and Twitter datasets and 2 for the Laptop dataset. A study of t can be found in Section 4.5.

In the prediction stage, using BERT-base (https://huggingface.co/bert-base-uncased, accessed on 20 February 2023) as the prediction model, we use the Adam optimizer [44] for optimization, and employ LoRA as the parameter effective transfer learning. Moreover, the final prediction model is trained for 15 epochs with bert_dropouts set to 0.3, batch size set to 16, and

α

set to 1. To ensure the validity of our experiments, we use accuracy (Acc) and macro-F1 (F1) to evaluate the proposed method. The model is run on a TESLA V100 32G GPU and it takes about six hours to train the model.

4.3. Compared Methods

RAM [45] It uses multiple attention mechanisms to capture features that are far apart and is combined with RNN for concurrency capability.

MGAN [46] It proposes a fine-grained attention mechanism that can capture the word-level interactions between aspect and context.

TNET [47] It uses CNN layers to extract important features and preserves the original contextual information from the LSTM layer.

CDT [48] It converts ABSA into a sentence pair classification task by constructing auxiliary sentences based on syntactic structure and fine-tunes BERT afterwards.

RGAT [49] It proposes a relational graph attention network to add syntactic information in the tree structure for sentiment prediction.

DualGCN [50] It proposes a dual-graph convolutional network model that considers both the complementarity and semantic relevance of syntactic structures.

EDA [36] It uses four methods to augment data: random deletion, random insertion, synonym replacement, and random swapping.

Back Translation (BT) [51] It augments data by translating monolingual training data into multiple target languages and then back-translating them to the source language.

Conditional BERT (CBERT) [52] It converts BERT to conditional BERT and augments sentence data augmentation by contextual random word substitution.

C $^{3}$ DA [37] It generates synthetic sentences by passing two channels (aspect-augmented channel and polarity-augmented channel) over a given aspect and polarity. The robustness of the ABSA model is then improved by performing comparative learning on these generated data.

4.4. Main Results

Table 4 displays the experimental results, which indicate that our framework achieves better F1 scores than the best baseline on the Restaurant, Laptop, and Twitter datasets by 1.7, 0.54, and 0.85, respectively. The baseline results are retrieved from Li et al. [50] and Wang et al. [37]. Our framework outperforms the previous baseline model as well as other data augmentation methods, indicating that learning out-of-domain knowledge through a cross-domain method can significantly improve model performance.

Notably, we observe that the improvement is not significant for the Twitter dataset. We speculate that this is because the Twitter dataset is not domain-specific, making it challenging to train the generated sentences effectively. Our findings indicate that both graph neural-network-based approaches outperform all attention- and syntax-based approaches. This result suggests that graph neural networks can leverage syntactic knowledge, establish word-to-word dependencies, and avoid noise introduced by the attention mechanism.

Although the overlay model with BERT, syntactic, and graph neural networks show similarities in the results, the basic BERT method outperforms the attention-based and syntactic-based methods. These results show that the performance can be significantly improved by using PLMs. Our framework outperforms other data augmentation methods because it leverages a cross-domain approach to enhance its data content, resulting in higher quality generated sentences. More details can be found in the case study Section 4.7.

From the results, it is evident that previous data augmentation methods do not significantly enhance performance, as they mainly focus on word replacement or reverse translation. These approaches possess a notable disadvantage: they are unable to effectively introduce new knowledge but merely modify existing sentences. In contrast, our framework employs a cross-domain approach, which facilitates learning knowledge from outside the domain and improves the quality of the data.

4.5. Effects of Hyperparameter and Data Volume

To investigate the impact of the number of generated data used for training on the prediction performance during the prediction phase, we conduct experiments on three datasets. As shown in Figure 5, for Restaurant and Twitter, the best performance is achieved when

t = 1

, while for Laptop, the best performance is achieved when

t = 2

. When t is set to 3 or 4, the performance becomes worse, indicating that the cross-entropy filtering of the model can effectively filter out low-quality sentences.

In order to investigate the impact of the number of out-of-domain datasets on the prediction performance, we conducted experiments using 5000, 6000, 9000, and 15,000 datasets. Since Yelp and Amazon Electronics have five sentiment categories (scored from 1 to 5), and Twitter17 has three sentiment categories (negative, neutral, positive), we choose to use 5000 for the former and 6000 for the latter to maintain sample balance. As shown in Figure 6, the model achieves the best performance with 5000 data in the Yelp dataset, and with 15,000 data in the Amazon Electronics and Twitter17 datasets.

4.6. Ablation Study

We conducted an ablation study to analyze the effectiveness of the key components in our approach. The results are shown in Table 5, where w/o represents some removed components. It can be seen that removing any of the framework components significantly degrades performance. Hence, our framework can generate high-quality sentences that enhance model capabilities.

The results show that the removal of

R_{a p}

has the most significant impact on Restaurant and Twitter, with F1 scores dropping by 1.89 and 2.14, respectively. Meanwhile, on Laptop, the removal of

R_{a}

brings the greatest impact, with the F1 score decreasing by 1.87. The reason may be because Laptop has more samples that implicitly express sentiment [53], which makes it difficult for the model to identify their sentiment polarity. As a result, the performance is poorer because the sentences generated by

R_{a p}

are of lower quality compared to those generated by

R_{a}

, which only replaces aspect words.

By removing the module that filters the generated sentences, the results show a fair performance. The F1 scores decrease by 1.66, 0.21, and 1.84 for the prediction of the three datasets, respectively. This indicates that the quality of the generated sentences is relatively high, and at the same time, the results suggest that the filter module also has a performance improvement effect.

Finally, we also compare our method with a framework that removes both

R_{a}

and

R_{a p}

, meaning that no data augmentation is used, and predictions are only made by the prediction model. The results show that the F1 score is the lowest among the several ablation experiments, which indicates that each module has a positive effect on improving the model’s performance.

4.7. Case Study

Table 6 displays two sample cases showcasing the ability of our framework. The aspect terms are highlighted in red and polarities marked in blue. From the table, we can observe how our framework leverages the knowledge learned from out-of-domain for in-domain data augmentation. The aspect terms are randomly selected from the in-domain training set, and the model is able to learn expressions from out-of-domain sentences. In the first example, our framework learns the opinion words “pricey” and “worth it”, while in the second example, it learns the phrase “fast” and illustrates how quickly the battery consumption occurs during usage. These words or phrases are not present in the in-domain training dataset, indicating that the generated sentences leverage the rich out-of-domain data to generate new diverse in-domain sentences. Consequently, our approach enhances the predictive ability by enriching the data more than before.

Table 7 shows some error examples, most of which are caused by generating low-quality sentences that lack emotional descriptions, such as “The lobster teriyaki and naan”, as well as generating sentences with ambiguous polarity, leading to prediction errors, such as “A new version of the Windows 7 family is now available”. This is because our data augmentation is generated by a generation model rather than a simple word replacement pattern, so the model may generate low-quality sentences. We use an entropy filter to filter out such sentences.

5. Discussion and Conclusions

Due to the lack of data in ABSA, it is difficult to improve the performance of various models. Therefore, we increase the amount of data through data augmentation methods. To improve the quality of the augmented data, we use cross-entropy to filter the generated data. In detail, we propose a cross-domain generative data augmentation framework (CDGDA) to relieve the issue of scarce data in ABSA task datasets. Our framework leverages a powerful generation model to learn expressions from similar datasets out-of-domain, and uses aspect term replacement and aspect–sentiment pair replacement to generate new fine-grained sentences with improved quality. The results show that our framework performs well on all three datasets, outperforming both previous baselines and other data augmentation methods.

In future work, we plan to adapt our framework to other ABSA tasks that predict multiple elements. In addition, we can try to apply it to language tasks with a lack of data. This approach has the potential to improve performance in various domains, helping to address data scarcity issues across different natural language processing tasks.

Author Contributions

Conceptualization, methodology, and writing—original draft, J.X. and S.W.; data curation, Y.L.; formal analysis, Z.L.; funding acquisition: Y.C. and S.Z.; software, J.X., Y.L. and Z.L.; supervision, S.W.; validation, Y.L. and Z.L.; writing—review and editing, J.X., S.W., Y.L., Z.L., Y.C. and S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China (Grant No. 62201385), the Natural Science Fund of Tianjin (Grant No. 19JCZDJC35100) and Tianjin Science and Technology Plan Project (Technology Innovation Guidance Special Fund) (Grant No. 22YDTPJC00610).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014; Association for Computational Linguistics: Dublin, Ireland, 2014; pp. 27–35. [Google Scholar]
Zhao, J.; Liu, K.; Xu, L. Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Comput. Linguist. 2016, 42, 595–598. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar]
Chen, Y.; Zhang, Z.; Zhou, G.; Sun, X.; Chen, K. Span-based dual-decoder framework for aspect sentiment triplet extraction. Neurocomputing 2022, 492, 211–221. [Google Scholar] [CrossRef]
Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar] [CrossRef]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551. [Google Scholar]
Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
Yan, H.; Dai, J.; Ji, T.; Qiu, X.; Zhang, Z. A Unified Generative Framework for Aspect-based Sentiment Analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021; pp. 2416–2429. [Google Scholar]
Zhang, W.; Deng, Y.; Li, X.; Yuan, Y.; Bing, L.; Lam, W. Aspect Sentiment Quad Prediction as Paraphrase Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 9209–9219. [Google Scholar]
Mao, Y.; Shen, Y.; Yang, J.; Zhu, X.; Cai, L. Seq2Path: Generating Sentiment Tuples as Paths of a Tree. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Dublin, Ireland, 2022; pp. 2215–2225. [Google Scholar]
Chen, D.Z.; Faulkner, A.; Badyal, S. Unsupervised Data Augmentation for Aspect Based Sentiment Analysis. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; International Committee on Computational Linguistics: Gyeongju, Republic of Korea, 2022; pp. 6746–6751. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 2, Bangkok, Thailand, 18–22 November 2020; MIT Press: Cambridge, MA, USA, 2014; pp. 3104–3112. [Google Scholar]
Kuchaiev, O.; Ginsburg, B. Factorization tricks for LSTM networks. arXiv 2018, arXiv:1703.10722. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Zhang, F.; Zhang, M.; Liu, S.; Sun, Y.; Duan, N. Enhancing RDF Verbalization with Descriptive and Relational Knowledge. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 6, 1–18. [Google Scholar] [CrossRef]
Mengge, X.; Yu, B.; Zhang, Z.; Liu, T.; Zhang, Y.; Wang, B. Coarse-to-Fine Pre-training for Named Entity Recognition. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 19–20 November 2020; pp. 6345–6354. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Dong, L.; Yang, N.; Wang, W.; Wei, F.; Liu, X.; Wang, Y.; Gao, J.; Zhou, M.; Hon, H.W. Unified language model pre-training for natural language understanding and generation. arXiv 2019, arXiv:1905.02450. [Google Scholar]
Bao, H.; Dong, L.; Wei, F.; Wang, W.; Yang, N.; Liu, X.; Wang, Y.; Piao, S.; Gao, J.; Zhou, M.; et al. UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training. arXiv 2020, arXiv:2002.12804. [Google Scholar]
Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. Mass: Masked sequence to sequence pre-training for language generation. arXiv 2019, arXiv:1905.02450. [Google Scholar]
Gao, T.; Fisch, A.; Chen, D. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1 August 2021; pp. 3816–3830. [Google Scholar]
Zhang, W.; Li, X.; Deng, Y.; Bing, L.; Lam, W. A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges. IEEE Trans. Knowl. Data Eng 2022, 1–20. [Google Scholar] [CrossRef]
Liu, P.; Joty, S.; Meng, H. Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Association for Computational Linguistics: Lisbon, Portugal, 2015; pp. 1433–1443. [Google Scholar]
Jiang, L.; Yu, M.; Zhou, M.; Liu, X.; Zhao, T. Target-dependent Twitter Sentiment Classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; Association for Computational Linguistics: Portland, OR, USA, 2011; pp. 151–160. [Google Scholar]
Chen, S.; Liu, J.; Wang, Y.; Zhang, W.; Chi, Z. Synchronous Double-channel Recurrent Network for Aspect-Opinion Pair Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6515–6524. [Google Scholar]
Peng, H.; Xu, L.; Bing, L.; Huang, F.; Lu, W.; Si, L. Knowing What, How and Why: A Near Complete Solution for Aspect-Based Sentiment Analysis. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8600–8607. [Google Scholar]
Cai, H.; Xia, R.; Yu, J. Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021; pp. 340–350. [Google Scholar]
Ding, Y.; Yu, J.; Jiang, J. Recurrent Neural Networks with Auxiliary Labels for Cross-Domain Opinion Target Extraction. In Proceedings of the 31th AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Association for the Advancement of Artificial Intelligence: San Francisco, CA, USA, 2017; Volume 31. [Google Scholar]
Chen, Z.; Qian, T. Bridge-Based Active Domain Adaptation for Aspect Term Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021; pp. 317–327. [Google Scholar]
Ben-David, E.; Oved, N.; Reichart, R. PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains. Trans. Assoc. Comput. Linguist. 2022, 10, 414–433. [Google Scholar] [CrossRef]
Fadaee, M.; Bisazza, A.; Monz, C. Data Augmentation for Low-Resource Neural Machine Translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 567–573. [Google Scholar]
Li, G.; Wang, H.; Ding, Y.; Zhou, K.; Yan, X. Data augmentation for aspect-based sentiment analysis. Int. J. Mach. Learn. Cybern. 2023, 14, 125–133. [Google Scholar] [CrossRef]
Coulombe, C. Text data augmentation made simple by leveraging nlp cloud apis. arXiv 2018, arXiv:1812.04718. [Google Scholar]
Wei, J.; Zou, K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 6382–6388. [Google Scholar]
Wang, B.; Ding, L.; Zhong, Q.; Li, X.; Tao, D. A Contrastive Cross-Channel Data Augmentation Framework for Aspect-Based Sentiment Analysis. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; International Committee on Computational Linguistics: Gyeongju, Republic of Korea, 2022; pp. 6691–6704. [Google Scholar]
Li, J.; Yu, J.; Xia, R. Generative Cross-Domain Data Augmentation for Aspect and Opinion Co-Extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; Association for Computational Linguistics: Seattle, WA, USA, 2022; pp. 4219–4229. [Google Scholar]
Wang, Y.; Xu, C.; Sun, Q.; Hu, H.; Tao, C.; Geng, X.; Jiang, D. PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Dublin, Ireland, 2022; pp. 4242–4255. [Google Scholar]
Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M.; Xu, K. Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 22–27 June 2014; Association for Computational Linguistics: Baltimore, MD, USA, 2014; pp. 49–54. [Google Scholar]
He, R.; McAuley, J. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 507–517. [Google Scholar]
Rosenthal, S.; Farra, N.; Nakov, P. SemEval-2017 Task 4: Sentiment Analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, BC, Canada, 3–4 August 2017; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 502–518. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
Loshchilov, I.; Hutter, F. Fixing weight decay regularization in adam. arXiv 2017, arXiv:1711.05101. [Google Scholar]
Chen, P.; Sun, Z.; Bing, L.; Yang, W. Recurrent Attention Network on Memory for Aspect Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 452–461. [Google Scholar]
Fan, F.; Feng, Y.; Zhao, D. Multi-grained Attention Network for Aspect-Level Sentiment Classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 3433–3442. [Google Scholar]
Li, X.; Bing, L.; Lam, W.; Shi, B. Transformation Networks for Target-Oriented Sentiment Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, VIC, Australia, 15–20 July 2018; Association for Computational Linguistics: Melbourne, VIC, Australia, 2018; pp. 946–956. [Google Scholar]
Sun, K.; Zhang, R.; Mensah, S.; Mao, Y.; Liu, X. Aspect-Level Sentiment Analysis Via Convolution over Dependency Tree. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 5679–5688. [Google Scholar]
Wang, K.; Shen, W.; Yang, Y.; Quan, X.; Wang, R. Relational Graph Attention Network for Aspect-based Sentiment Analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3229–3238. [Google Scholar]
Li, R.; Chen, H.; Feng, F.; Ma, Z.; Wang, X.; Hovy, E. Dual Graph Convolutional Networks for Aspect-based Sentiment Analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021; pp. 6319–6329. [Google Scholar]
Sennrich, R.; Haddow, B.; Birch, A. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Association for Computational Linguistics: Berlin, Germany, 2016; pp. 86–96. [Google Scholar]
Wu, X.; Lv, S.; Zang, L.; Han, J.; Hu, S. Conditional bert contextual augmentation. In Proceedings of the Computational Science–ICCS 2019: 19th International Conference, Faro, Portugal, 12–14 June 2019; Proceedings, Part IV 19. Springer: Berlin/Heidelberg, Germany, 2019; pp. 84–95. [Google Scholar]
Li, Z.; Zou, Y.; Zhang, C.; Zhang, Q.; Wei, Z. Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual Event/Punta Cana, Dominican Republic, 7–11 November 2021; pp. 246–256. [Google Scholar]

Figure 1. An example of the ABSA task. The aspect terms are marked in red and sentiment polarities are marked in blue.

Figure 2. An example of the coarse-grained sentiment analysis task.

Figure 3. Generation model fine-tuning training. a, p,

R_{a}

, and

R_{a p}

are the aspect term, sentiment polarity, aspect replacement, and aspect–sentiment pair replacement, respectively.

Figure 3. Generation model fine-tuning training. a, p,

R_{a}

, and

R_{a p}

are the aspect term, sentiment polarity, aspect replacement, and aspect–sentiment pair replacement, respectively.

Figure 4. Sentence generation and sentiment prediction.

A

and

P

are the set of aspect terms and the set of sentiment polarities, respectively. Fixed GM is the generation model with fixed parameters.

Figure 4. Sentence generation and sentiment prediction.

A

and

P

are the set of aspect terms and the set of sentiment polarities, respectively. Fixed GM is the generation model with fixed parameters.

Figure 5. Acc and F1 scores for different values of the hyperparameter t.

Figure 6. Acc and F1 scores for different orders of magnitude of out-of-domain datasets after data augmentation.

Table 1. The experimental datasets with the sentiment polarity labels. Note that (o) refers to coarse-grained out-of-domain datasets.

Dataset		Sentence	# Positive	# Neutral	# Negative
Restaurant	Train	1980	2164	637	807
	Test	599	727	196	196
Laptop	Train	1454	976	455	851
	Test	409	337	167	128
Twitter	Train	6051	1507	3016	1528
	Test	677	172	336	169
Twitter17(o)	Aug	15,000	5000	5000	5000

Table 2. The rest of the coarse-grained out-of-domain datasets, whose sentiments are labeled as ratings from 1 to 5. Note that (o) refers to coarse-grained out-of-domain datasets.

Dataset		Sentence	Sentiment Score
Dataset		Sentence	1	2	3	4	5
Yelp(o)	Aug	5000	1000	1000	1000	1000	1000
AMZElec ¹ (o)	Aug	15,000	3000	3000	3000	3000	3000

¹ Amazon Electronics.

Table 3. Some examples from each dataset.

Dataset	Examples
Restaurant	Not only was the food outstanding, but the little ‘perks’ were great. #food #Positive #perks #Positive Nevertheless the food itself is pretty good. #food #Positive
Laptop	The software that comes with this machine is greatly welcomed compared to what Windows comes with. #software #Positive #Windows #Negative They also use two totally different operating systems. #operating systems #Neutral
Twitter	We are having an awesome Day, No problem we love madonna too. #madonna #Positive harry potter the last of all will be amazon! #harry potter #Positive
Yelp	a tad overpriced for the quality of the food. service just ok. probably wont be returning. #2.0 Cool place to chill out during the afternoon thunder storms. The frozen Irish coffee was amazing #5.0
Amazon Electronics	pros love it. It allows to attach and reattach the anti static cablecons none that I can think of #5.0 Also, the tech support is not very good. Read the forums I would just buy another product, and I am ready to junk my unit. #1.0
Twitter17	We may not win Academy Awards but we will be the freaking Peoples Choice dammit #0 Im actually excited to record AW on Ps4 tomorrow, it been a while #2

Table 4. Overall performance of different methods on the three datasets. We use accuracy (Acc) and macro-F1 (F1) to measure the performance of various methods. LTSM, Syn., Att., and Aug. mean the methods using the LTSM model, syntactic structure, attention mechanism, and data augmentation, respectively. The underlined values are the best performing of the baseline models. The bolded part is the best result.

Category	Method	Restaurant		Laptop		Twitter
Category	Method	Acc	F1	Acc	F1	Acc	F1
LTSM	TNET [47]	80.69	71.27	76.54	71.75	74.90	73.60
Syn.	CDT [48]	82.30	74.02	77.19	72.99	74.66	73.66
	RGAT [49]	83.30	76.08	77.42	73.76	75.57	73.82
	DualGCN [50]	84.27	78.08	78.48	74.74	75.92	74.29
Att.	RAM [45]	80.23	70.80	74.49	71.35	69.36	67.30
	MGAN [46]	81.25	71.94	75.39	72.47	72.54	70.81
	BERT−base [3]	86.31	80.22	79.66	76.11	76.50	75.23
Syn. and Att.	RGAT + BERT [49]	86.60	81.35	78.21	74.07	76.15	74.88
	DualGCN + BERT [50]	87.13	81.16	81.80	78.10	77.40	76.02
Aug.	CBERT [52]	86.27	80.00	79.83	76.12	76.44	75.36
	BT + BERT [51]	86.47	79.63	79.59	75.79	76.26	75.16
Aug. and Gen.	C $^{3}$ DA [37]	86.93	81.23	80.61	77.11	77.55	76.53
	CDGDA (ours)	87.67	83.05	81.65	78.64	78.14	77.38

Table 5. Ablation study of the CDGDA.

Model	Restaurant		Laptop		Twitter
Model	Acc	F1	Acc	F1	Acc	F1
CDGDA	87.67	83.05	81.65	78.64	78.14	77.38
w/o $R_{a}$	86.68	81.85	80.54	76.77	77.55	76.69
w/o $R_{a p}$	86.51	81.16	80.22	77.10	76.66	75.24
w/o Filter	86.42	81.39	80.85	78.43	76.37	75.54
w/o $R_{a}$ and $R_{a p}$	86.31	80.22	79.66	76.11	76.50	75.23

Table 6. Case study. The aspect terms and polarities are noted in red and blue, respectively.

Source	This sushi tasted too salty. #Negative
out-of-domain	Great food. They have a huge eclectic selection, which is a major plus these days. It is pricey but securely worth it.
CDGDA	Food tastes bad and pricey. #Negative The bacos and samosas are plentiful and taste good. #Positive The cream cheeses are ordinary. #Neutral Fish tastes good, but a little pricey, but the takeout menu is worth it. #Positive
Source	I love the size of the keyboard. #Positive
out-of-domain	Decent when the screens on its brightest, but not good when dim. Bubble soft card is useless. The battery runs out fast.
CDGDA	The volume control is very easy to use. #Positive The computer’s assistance is useless. #Negative The battery is rated at 6.5 h but the battery life is 4.5 h. #Negative The machine loads very fast. #Positive

Table 7. Case study. Some error examples.

Datasets	CDGDA
Restaurant	The lobster teriyaki and naan. #Positive Coffee was good, but not great. #Positive
Laptop	The firewire connection is the only way to connect. #Positive i want to install software, #Positive
Twitter	larry king’s life is a ninja, #Negative a new version of the windows 7 family is now available. #Negative

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xue, J.; Li, Y.; Li, Z.; Cui, Y.; Zhang, S.; Wang, S. A Cross-Domain Generative Data Augmentation Framework for Aspect-Based Sentiment Analysis. Electronics 2023, 12, 2949. https://doi.org/10.3390/electronics12132949

AMA Style

Xue J, Li Y, Li Z, Cui Y, Zhang S, Wang S. A Cross-Domain Generative Data Augmentation Framework for Aspect-Based Sentiment Analysis. Electronics. 2023; 12(13):2949. https://doi.org/10.3390/electronics12132949

Chicago/Turabian Style

Xue, Jiawei, Yanhong Li, Zixuan Li, Yue Cui, Shaoqiang Zhang, and Shuqin Wang. 2023. "A Cross-Domain Generative Data Augmentation Framework for Aspect-Based Sentiment Analysis" Electronics 12, no. 13: 2949. https://doi.org/10.3390/electronics12132949

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Cross-Domain Generative Data Augmentation Framework for Aspect-Based Sentiment Analysis

Abstract

1. Introduction

2. Related Work

2.1. Language Models

2.2. Aspect-Based Sentiment Analysis

2.3. Domain Adaptation

2.4. Data Augmentation

3. Methodology

3.1. Task Formulation

3.2. Sentence Generation

3.3. Model Architecture

4. Experiment

4.1. Datasets

4.2. Experimental Setting

4.3. Compared Methods

4.4. Main Results

4.5. Effects of Hyperparameter and Data Volume

4.6. Ablation Study

4.7. Case Study

5. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI