Next Article in Journal
Semi-Automatic Approaches for Exploiting Shifter Patterns in Domain-Specific Sentiment Analysis
Next Article in Special Issue
Identification of Review Helpfulness Using Novel Textual and Language-Context Features
Previous Article in Journal
Relationship between Mental Health and Socio-Economic, Demographic and Environmental Factors in the COVID-19 Lockdown Period—A Multivariate Regression Analysis
Previous Article in Special Issue
An Entity-Matching System Based on Multimodal Data for Two Major E-Commerce Stores in Mexico
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development of a Multilingual Model for Machine Sentiment Analysis in the Serbian Language

Department of Computer Science and Information Technology, School of Electrical Engineering, University of Belgrade, Bulevar kralja Aleksandra 73, 11000 Belgrade, Serbia
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(18), 3236; https://doi.org/10.3390/math10183236
Submission received: 18 July 2022 / Revised: 26 August 2022 / Accepted: 3 September 2022 / Published: 6 September 2022

Abstract

:
In this research, a method of developing a machine model for sentiment processing in the Serbian language is presented. The Serbian language, unlike English and other popular languages, belongs to the group of languages with limited resources. Three different data sets were used as a data source: a balanced set of music album reviews, a balanced set of movie reviews, and a balanced set of music album reviews in English—MARD—which was translated into Serbian. The evaluation included applying developed models with three standard algorithms for classification problems (naive Bayes, logistic regression, and support vector machine) and applying a hybrid model, which produced the best results. The models were trained on each of the three data sets, while a set of music reviews originally written in Serbian was used for testing the model. By comparing the results of the developed model, the possibility of expanding the data set for the development of the machine model was also evaluated.

1. Introduction

One of the important subfields of machine learning is natural language processing (NLP). It includes the development of software systems that are able to automatically analyze and understand natural human languages. The largest amount of research in the field of natural language processing has been done for the English language, which is the most widely spoken language in the world.
Sentiment analysis, as a segment in natural language processing, deals with the development of models that are able to determine the subject’s attitude on a given topic based on the content of the text [1]. The most common example is surveying users’ opinions on forums, news portals, online stores, social networks, etc. [2,3,4,5,6].
This research aims to develop a machine model for sentiment analysis in the Serbian language. The creation of a model for the Serbian language is largely conditioned by the availability of resources, and for this reason, the first part of the research dealt exclusively with the collection of data for machine processing. During the development of the model, some available data sets in the Serbian language were used, as well as data sets in the English language, which were translated into the Serbian language using the Google Translate API.
In today’s NLP research, English is still the most dominant language [7]. The cause of such a situation is the important use of the English language in international communication, and it is a consequence of the development of many new NLP solutions for the English-speaking area. In addition, the public availability of NLP resources for the English language is very high, which increases the amount of research on that language [8]. In the digital sphere, it is estimated that 60% of the content on the Internet is written in English.
With the development of NLP models for deep learning, the number of data sets of the most popular languages is increasing because such models have to work with large amounts of training data. On the other hand, collecting a large amount of data and data sets for fewer languages is very difficult. Recently, the interest of researchers in the computer processing of other languages, such as Chinese, Japanese, German, or Arabic, has been increasing [9,10]. This trend can be seen in [11], where the most-represented natural languages in NLP research over the last 20 years are shown, and at the top are English, Mandarin Chinese, Japanese, and German, followed by Arabic, French, Spanish, Italian, and Czech. Currently, it is considered that there are approximately 7100 languages on the planet, of which 4000 have a script. Still, more than 6500 languages are not present in digital form, i.e., it is impossible to find data sets in those languages on the Internet [12].
A precise definition of low-resource languages does not exist in NLP research [7,13,14,15]. In this study, the Serbian language will be considered as a language with low resources because it meets the following criteria: (a) the Serbian language is moderately widespread in digital form; (b) there are a small number of available preprocessing tools and resources in Serbian, but it is not possible to obtain large amounts of annotated data in Serbian for the creation of NLP models; and (c) the number of researchers involved in the development of NLP tools for the Serbian language is minimal and financial resources are limited. Furthermore, applications of a multilingual or cross-lingual approach are rare in the Serbian language, as is the development of resources for analysis. This was also the main motivation of the authors, as they intend to use the realized models to show what would be best in the sentiment analysis of short texts.
The second section provides an overview of the application of multilingual models based on traditional machine learning algorithms and modern approaches. The third section describes the methods used for collecting and preparing the data for machine processing. It explains some of the most important algorithms used for sentiment analysis, as well as the methodologies applied for extracting the useful attributes from the text. In the fourth section, the results of developing of the model for the Serbian language are presented. Discussion is presented in the fifth section. Finally, the last section describes the key outcomes of the research and provides suggestions for improving the sentiment analysis model for the Serbian language.

2. Related Work

Sentiment analysis (or opinion mining) represents the problem of the automatic detection and processing of attitudes, assessments, and opinions expressed by people towards certain entities, persons, events, issues, topics, and their properties [16,17]. The main data sources for sentiment analysis are blogs and entertainment sites with user reviews, e-commerce sites such as Amazon, eBay, etc. with user reviews, social media content generated on Twitter and Facebook, and data from communication mediums such as SMS, WhatsApp, Viber, etc.
The basic categories of sentiment analysis techniques are lexicon and machine learning techniques. In the first group, opinions are identified based on manual or automatic processing techniques, such as dictionary-based or corpus-based methods. Most machine learning techniques consider sentiment analysis as a supervised learning problem, but modern methods have also explored semi-supervised approaches. Sentiment analysis is usually classified as a classification problem [18]. This problem includes narrower problems such as polarity detection, where the goal is the binary division of texts into positive and negative. Further, this problem contains subjectivity text detection, where the goal is to distinguish objective texts from subjective ones. Sarcasm detection is also represented in the research as a sentiment analysis problem [19]. Due to the enormous commercial need for the automatic detection and processing of people’s attitudes, sentiment analysis represents an NLP problem with one of the most practical applications.
A multilingual model is a single model that can handle multiple languages simultaneously. The biggest problem with sentiment analysis research is the lack of resources for low-resource languages, such as Serbian [20]. In a study on the needs of a developed machine model, reviews in Serbian and English were used. For the efficient sentiment analysis of multilingual content and the identification of positive and negative comments, the authors used traditional techniques such as a naive Bayes classifier, SVM classifier, simple neural networks, convolutional neural networks, and recurrent neural networks—the most famous of which is long short-term memory (LSTM) [21,22]. Most of today’s researchers use powerful models such as mBERT (multilingual bidirectional encoder representations from transformers), which supports 104 languages [23,24,25]. In some research, an LSTM model was used instead of the BERT model to capture the sentiment of multilingual comments [26,27]. For example, the research by Žitnik et al. [28] applied a customized BERT adapter to a newly annotated data set of Slovene news articles. Other authors consider an Electra approach to be computationally more efficient than a BERT model, and the authors of [29] developed a transformer model that was pre-trained on 8 billion tokens of crawled text from web domains with South Slavic languages.
Mozetič et al. [30] conducted multilingual sentiment analysis for 13 languages, including Serbian, Bosnian, Croatian, Slovenian, Bulgarian, Slovak, etc., based on Twitter data and compared the performance of the most famous classifier models. They concluded that the size and quality of the data sets impacts the performance more than the model selection does. The rules of negation in the Serbian language and their influence on polarity in Twitter data were investigated by Ljajić and Marovac [31]. They used a lexicon-based approach and machine learning methods. Batanović dealt with the sentiment analysis and semantic similarity of short texts in the Serbian language [32].
Some authors have used IMDb user movie reviews and translated them into Serbian, receiving outstanding results [33]. Other authors have translated tweets written in various European languages into English to overlook the results of the Eurovision song contest [34]. The authors of [35] presented the process of developing a sentiment analysis framework for the Serbian language. Stankovic et al. presented a study on the sentiment analysis of Serbian novels from the period 1840–1920. Their comparison shows that models trained on the labeled data sets of movie reviews indicate that they cannot successfully be used for the sentiment analysis of sentences in old novels [36].
Table 1 shows a review of analyzed research papers in the form of covered languages, applied techniques, and analyzed data sets. The analysis includes data sets in the Serbian language or another low-resource language and different types of sentiment analysis applications in those languages. It can be noticed that research works and open data sets for sentiment analysis in the Serbian language are very rare.

3. Materials and Methods

The most common problem when analyzing sentiment in Serbian and other low-resource languages is the irregular distribution of positive, neutral, and negative examples within one categorized data set. The first example of a balanced data set in the Serbian language is SerbMR, which is based on movie reviews. Based on this data set, a data set was created for sentiment analysis in the Serbian language, containing reviews of music albums and songs.

3.1. Data Sources and Challenges

Since there is no single database with a sufficiently large number of reviews of music albums in the Serbian language, various internet portals were used as data sources. A greater number of sources brings greater diversity in review writing style and review content, evaluation method, etc. All of this complicates the process of formatting the output data in a unique way, and in some cases, it requires manual data processing.
The collection of reviews was done with the help of a developed intelligent agent in the Python programming language using the BeautifulSoup library. For each portal, links with review texts were first collected, and then the text of the review with the grade was separately extracted from the web portal. During the data collection process, there were several key challenges that were classified into several categories:
  • absence of negative reviews
  • absence of grades with the text of the review itself
  • unfavorable web structure of the portal (album reviews were not separated into different categories on the portal and web pages with texts containing reviews could not be automatically filtered from other articles)
  • adverse web structure of the web portal with review, including:
    -
    the grade is not separated from the rest of the text (often in the middle of the text)
    -
    textual content within the element, with a rating
    -
    template content at the beginning and/or end of the text
    -
    unnecessary content with the review itself (for example, JavaScript code)
  • different scales and assessment methods
Sources that did not have ratings with the text of the review or that did not have negative reviews in the set were excluded from consideration. Additional programmatic and manual text filtering solved the problems arising from a site’s unfavorable structure and pages. Since most portals used a ten-class rating scale where 1 was the lowest rating and 10 was the highest, the other scales were mapped to the ten-class scale. The final data set with the distribution of grades is shown in Table 2.
Table 3 shows the statistics of the collected reviews from the 13 selected web portals. A balanced data set is a set in which the examples of each class are evenly represented. In this data set of music reviews, three classes were formed: negative reviews (grades from 1 to 4), neutral reviews (grades 5 and 6), and positive reviews (grades from 7 to 10). As the number of negative reviews among the collected data was less than the positive and neutral reviews, for each negative review, the best positive and negative pair was searched according to the modification of the algorithm shown in [37]. When pairs were found, the following characteristics of each review were considered:
  • Review grading—negative reviews were paired with positive ones according to the principle of inverse grades, e.g., 1 by 10, 2 by 9, etc., and the subset of neutral reviews consisted of an equal number of reviews rated as 5 and 6.
  • Review length—the difference in the word counts between the pairs should be minimal.
  • Review source—different portals had different review writing styles and different criteria, and they covered different music genres. When pairs were found, preference was given to the reviews from the same portal.
The algorithm used consists of the following steps:
  • Finding all potential pairs—for each negative review, a list of possible pairs is found from the positive and neutral set, respecting the above criteria. In case there is no such pair, the first criterion to be relaxed is the source of the review, followed by the differences in the length of the reviews. The criterion of the review score is never relaxed. The criteria are relaxed cyclically until a compatible review is found.
  • Sort negative reviews in ascending order in potential pairs to maximize the number of pairs found in one iteration.
  • Matching of reviews—in case there is a large number of positive candidates, the one with the smallest difference in length is chosen, as is the one that reduces the total difference in length between the positive and negative reviews. Neutral reviews are selected in a similar way, except that the rule of equal representation of reviews with a rating of 5 and 6 is respected as much as possible.
The steps are repeated cyclically until a positive and neutral pair is found for each negative review. An overview of the characteristics of the data set obtained by the used algorithm is given in Table 4 (column A).
In addition to the described data set of reviews, two more data sets were used in the research because the Serbian language is a languages with limited resources. For the development of the model to be successful, it was necessary that the data be related, and so movie reviews written in Serbian and music album reviews—MARD, originally written in English—were used [38]. The second data set was translated into Serbian using the Google Translate API, and it was used in the model.
In this way, the possibility of expanding the data set for machine text processing was examined. One set was the use of data in the same language, but from a different domain, while the other was the use of data originally written in other languages and translated into Serbian. An overview of the characteristics of the additional two data sets can be found in Table 4 (columns B and C).

3.2. Sentiment Analysis

Text sentiment analysis is a subgroup within the text classification process. Based on the content of the text, it is necessary to determine the feelings and the opinions of the author of the text according to the topic described in the text. Examples are hotel reviews, movie reviews, comments on social media, and comments in newspaper articles. Sentiment analysis, as a part of natural language processing, solves two problems: the classification of subjectivity and the classification of polarity. It is necessary to separate the subjective from the objective, as well as the positive from the negative, when expressing an attitude about an entity.
Different natural language processing methods and algorithms are used to determine the sentiment of text. These methods can be divided into manual, automatic, and hybrid. One of the main difficulties in language analysis is the complexity of linguistic expressions, along with morphological forms, irony, metaphors, negation, ambiguity of sentences, etc. A diagram of the machine learning classifier is given in Figure 1.
In manual methods, the sentiment is determined based on some simple rules—the sentiment dictionary. However, manual methods are very naive and unreliable because they do not consider how words are connected in sentences. With automatic methods, determining the sentiment of a text is presented as a classification problem—the input data is observed as a vector of values, and during model training, a function is found that maps that vector to the appropriate class. In the testing process, i.e., the prediction, an attribute vector is created from the input text, which is then passed to the machine model to determine the class based on the selected mapping function.

3.3. Text Attribute Vector

The first step in the text sentiment analysis process is to transform the text into an attribute vector. The most frequently used attributes are the presence and frequency of words, the sentiment of words and phrases, and negation.
In the bag-of-words model, text is observed as the unordered set of words contained within it. Each word represents one attribute in the classification model, the order and relationship of the words are ignored, and the value of the attribute is either the number of repetitions of the given word in the text or a binary value (0 or 1) representing the absence or presence of that given word in the text. In addition, sequences of several consecutive words—a bag of n-grams—can be considered as an attribute.
Weighting implies a methodology for determining the importance of a word in a document or set of documents. Types of weighting include term frequency weighting, inverse document frequency weighting, and term frequency-inverse document frequency weighting.
With term frequency (TF) weighting, the relevance of document d for a specific query increases with the higher frequency of occurrence of the word t from the query in the document—not linearly, but logarithmically—as follows:
T F = 1 + log 10 C o u n t t d   , C o u n t t d   > 0 0 , C o u n t t d   = 0
For the inverse document frequency (IDF) technique, words that appear in all documents are less important than words that appear in a small number of documents. In this case, words that rarely occur are given more weight, as follows:
I D F = log 10 N d f t ,
where N is the total number of documents in the set and dft is the number of documents in which the word t occurs.
TF-IDF (term frequency-inverse document frequency) is a statistical measure that evaluates how relevant a word is to a document in a collection of documents, which is determined as follows:
T F I D F = ( 1 + log 10 C o u n t t d ) · log 10 N d f t
The Serbian language is a morphologically rich language, and so one word can have many different forms. A word can change by case, gender, singular or plural form, and verb tense. As computer systems cannot recognize fundamentally different words from morphological variations, it is necessary to reduce the various forms to the basic word form. Morphological changes can be classified into two groups:
  • Inflectional morphology—different forms of one word (e.g., book, books, etc.)
  • Derivational morphology—derivation of new words from the basic one:
    • derivation by adding a suffix (e.g., logic => logical)
    • derivation by adding a prefix (e.g., pure => impure)
    • derivation by combining several words (compounds) (e.g., snowball, grandmother, upstream, etc.)
The two basic methods of morphological normalization are: word stemming and word lemmatization. Stemming is a methodology similar to word rooting, but without the knowledge of linguistics. Stemmers cut off the ends of words but do not recognize the concept of suffixes, and the cutting is instead implemented based on a list of rules (maps or regular expressions).
Lemmatization is a more complex procedure than stemming. It is most often implemented in the form of a separate machine model with the help of morphological dictionaries, which map different forms of words into lemmas. Lemmatization also depends on the context of the text.

3.4. Supervised Classification Algorithms

In this research, three standard algorithms were used for the classification problems: naive Bayes (NB), logistic regression (LR), and support vector machine (SVM).

3.4.1. Naive Bayes

A naive Bayes is one of the most widespread and successful algorithms for text classification. The algorithm is based on Bayes’ probability theorem:
P y | x = P y P x | y P x
where y is the class, x is the input data, P y | x is a posterior probability, i.e., the probability that y will happen if x happens, P x | y is the certainty function, and P(x) and P(y) are the probabilities that x and y will occur.
The classification decision is made based on the maximum value of the posterior probability. Based on the input data, we calculate the probability for each of the possible classes and choose the maximum value.
The prefix “naive” comes from the assumption that the attributes are conditionally independent of each other and that all attributes are equally important. Although the assumptions about attribute independence are often not correct, in practice, this classifier offers good results.

3.4.2. Logistic Regression

Logistic regression is the use of a linear regression model for a classification problem. The logistic regression model belongs to the probabilistic classifier. The logistic regression hypothesis is:
h x = 1 1 + e ω 0 + ω 1 x 1 + + ω n x n
where n is the number of features used in the model. The new data is added to the class that is more probable for it. For h(x) > 0.5, the data is classified in the class y = 1, and for h(x) < 0.5, it is in the class y = 0.
By introducing the fictitious feature x0 = 1, the hypothesis is transformed into the following form:
h x = e W T X e W T X + 1
where W represents the vector of all weight parameters, X represents the vector of all attribute values, and WTX is their scalar product. Then, for the class separation hyperplane h(x) = 0.5, the following applies:
e W T X = 1
W T X = i = 0 n ω i x i = 0
When training a model, the optimal values of the model parameters are determined so that h(x) correctly determines the class y for the input parameters x. The loss function L(h(x), y) defines the measure of the deviation of the hypothesis value from the exact value, on a single piece of data. The error function is the average of the loss function values on all data from the observed set:
J ω = 1 m i = 1 m L h x i , y i
In linear regression, the error function uses the mean squared deviation of h(x) from y; however, due to the nature of the logistic classifier and the need for the error and loss functions to be convex, it is not a suitable loss function for logistic regression. The cross-entropy loss function meets the requirements for a logistic regression loss function, as follows:
L h x ,   y = y ln h x 1 y ln 1 h x

3.4.3. Support Vector Machine

Similar to logistic regression, with SVM, it is necessary to find a hyperplane that separates data belonging to different classes. Unlike LR, SVM has only a classification decision as an output, and it represents a non-probabilistic classifier.
If we look at the example of a binary classifier, and if the data can be linearly separated, this means that it is possible to construct two parallel hyperplanes that separate the data of different classes. The area of space between the classes is called the margin, and its value should be maximal so that the classification error of the new data is minimal. If n is the number of decisions used in the model, then the hypothesis is of the form:
h x = ω 0 + ω 1 x 1 + ω 2 x 2 + + ω n x n = W T X + ω 0 ,
and the equation of the separating hyperplane is:
h x = W T X + ω 0 = 0 .
The factors W and ω 0 can be chosen arbitrarily, but the convention is to choose a value such that the following applies to the support vectors X(sv):
y s v W T X s v + ω 0 = 1 .
SVM has very good performance in a wide range of problems and much lower tendency to overfit than other methods. Unfortunately, the output is not of the probabilistic type, and depending on the number of features and the amount of data, it can be much slower than other models.

3.5. Multi-Class Classification

A naive Bayesian model is directly applicable to multiclass classification because nothing is assumed about the number of output values in the development of the model. Logistic regression and the support vector method can be applied to multiclass classification by combining the results of a number of binary classifiers in one of the following ways:
  • One-vs-All (OvA) or One-vs-Rest (OvR) approach
  • One-vs-One (OvO) approach
With the OvA approach, k binary classifiers for k classes are constructed. Each of the classifiers receives one class and treats all other classes together as another class. The problem with this principle is the imbalance of the number of examples in individual classifiers, as the number of examples in the second class is far greater than the number of examples in the first. The new data is classified into the class whose binary classifier produces the highest probability of the data belonging to the observed class.
For the OvO approach, k · k 1 2 binary classifiers are constructed—one for each pair of classes. The number of classifiers is much larger than in the OvA approach, but the training data set for each binary classifier is smaller. The new data is classified into the class selected by the most binary classifiers.
Multinomial logistic regression is a natural extension of logistic regression to work with multiple classes. The probability of belonging to a class is obtained using the function:
P ( y = t x ) = e i = 0 n ω i t x i j = 1 k e i j x i = e W t T x j = 1 k e W j T x ,
where k is the number of classes, n is the number of features, ω i t is the weight parameter of the ith feature for the t class, and X is the feature value vector.

3.6. Assessment of Classifier Quality

When evaluating the classifier, a new data set that was not used for learning is used. Then, a pre-known class from the data set is compared with the class determined by the classifier. To compare and evaluate the performance of the classifiers, the following evaluation functions are used: accuracy, precision, recall, and f-measure.
In order to define functions for model evaluation, it is necessary to first explain the confusion matrix on the example of a binary classifier with the classes 0 and 1. Then, for each data record we want to classify, we distinguish four states: true positive (TP), false positive (FP), true negative (TN), and false negative (FN).
The accuracy of the classifier represents the percentage of successfully classified data. The precision of the classifier represents the percentage of truly positive data. The recall of a classifier is a measure of the opposite of precision—of the data that are positive, what percentage is selected as positive. Combining precision and recall achieves an f1 measure.
The technique of n cross-validations involves dividing the data set into n parts and then into n iterations—n − 1 parts are used for training the model and one part is used for validation. Finally, the intersection of all obtained values is taken as a result of the evaluation function.

4. Results

The models were trained on each of the three described sets, while only the data sets of music reviews, written in Serbian, were used for testing. The parameters that were tested and adjusted in the development of the model can be divided into several groups:
the model and input values of the machine learning model
number of attributes and stop words
number of n-grams
value and attribute type
In this research, the existing stemmers for the Serbian language (Milošević [39]) and for the Croatian language (Ljubešić and Pandžić [40,41]) were used because they belong to a group of similar Slavic languages. The model evaluation diagram is shown in Figure 2. The optimal algorithm and parameters, as well as text attributes, were found by the method of examining different combinations with the help of the Pipeline and GridSearchCV classes from the sckit-learn library. Model accuracy was used as a function for model evaluation and comparison, as was the cross-validation technique, with n = 5.

4.1. Results of the Three-Class Classification

First, three sentiment classes in the data set were considered. The results obtained when using a data set of music reviews for both training and testing the model are shown in Table 5 (Results (A)). Using the same parameter values, the model was trained on a set of movie reviews (Results (B) in Table 5), as well as music reviews translated from English (Results (C) in Table 5). The model testing was done with a set of music reviews originally written in Serbian.
In the case of the three-class classification, approximately similar results were obtained when the model was trained on a set of movie reviews and tested on a set of music reviews. The reason for this is that both data sets had reviews from the same portals, and so the review writing style and vocabulary were similar, even though they were different domains. In addition, expanding the data set improved the quality and precision of the developed model. When using the translated data set, the results were lower than the results obtained using only music reviews.

4.2. Binary Classification Results

Only positive and negative reviews were observed from the input data set. The results obtained during the development of the model are shown in Table 6 (Results (A)). In this case, the same models were also trained on a set of film reviews, as well as music reviews translated from English, and they were tested on the original Serbian reviews (Results (B) and (C) in Table 6).
With the binary classification, as with the three-class, we noticed that the results were very similar when training on a set of movie reviews. The results of using translated reviews were better in the three-class classification than they were in the binary classification.

4.3. Hybrid Models

The last step in the model evaluation was the implementation and testing of a hybrid model: a naive Bayes–method of support vectors (SVM) hybrid [42]. This model was based on combining the linear model with the Bayesian model and replacing the word frequency attributes with their ratio vector of the NB counting of positive and negative classes. The main model was a linear classifier:
y i = s i g n W T x i   +  
If f i is a vector of attributes and the output value is y i , V is the set of attributes and f j i is the number of occurrences of the attribute Vj in the input text i.
The counting vectors of the positive and negative class are defined as:
p = α + i : y i = 1 f i
q = α + i : y i = 1 f i  
The positive to negative class count ratio vector is defined as:
r = log p / p 1 q / q 1
In order to combine the above equations, an elemental multiplication of the SVM vector of attributes (f) and the ratio vector of the results of NB counting the positive and negative classes of (r) is performed:
f ¯ k = r · f k
The resulting vector is used as the input for a standard SVM classifier.
The evaluation results of the described model are shown in Table 7. The model was trained on different data sets, while testing was always done on the data set of music reviews.
The research tested the use of logistic regression models instead of SVM. The results obtained by combining the naive Bayes and logistic regression are shown in Table 8.

5. Discussion

In the previous sections, we described the application of the traditional machine learning techniques to the problem of multilingual sentiment analysis in NLP. We used the Serbian data set in our experimental set-up.
In addition to standard algorithms, such as LR, SVM and MNB, the hybrid algorithms NB–SVM and NB–LR were considered due to the problem of binary classification. As suggested in [42], in LR, SVM, and NB–SVM, the L2 loss function and L2 regularization were used. A five-layer nested stratified cross-validation was used for the optimization of hyperparameter C, which is used in LR, SVM, and NB–SVM algorithms, as well as for the optimization of hyperparameter β in the NB–SVM algorithm. All other model hyperparameters were set to default values. During classification, all text was normalized to lowercase letters.
Two different types of stemmers were used in the three-class classification. In Table 6, we can see that in the case of binary classification, the Ljubešić and Pandžić stemmer was used as the optimal solution for morphological normalization.
Overall, from the results shown in Table 5, Table 6, Table 7 and Table 8, it can be seen that the hybrid approach in the form of the naive Bayesian model and the linear classifier offers average good results, but it still does not provide significant improvements compared to other models. A 2% improvement can be seen with the binary classifiers as there is a clear separation in positive and negative sentiment. In the three-class classifiers, the neutral class is not clearly separated from the other two. However, it represents a combination of positive and negative sentiments in the review, and so the hybrid model and approach using the ratio vectors of the NB class counts does not contribute to the quality of the model. Correction of typographical errors, normalization of emoticons, and character repetitions and morphological normalization are useful for all sentiment analysis problems when applied with features obtained by the bag-of-words principle.
Sentiment annotation was performed, and data sets were realized in the Serbian language using 13 different sources in Serbian (web portals) and MARD, originally written in English [38], with two approaches: the original Serbian language and a machine translation of the content, from English to Serbian, using Google Translate. In this way, the collected data can help other researchers to improve the machine translation process into the Serbian language. Furthermore, concerning the research papers discussed in Section 2, the accuracy obtained in this research is in the range of other results of multilingual models for the Serbian language.

6. Conclusions

This research aimed to collect data in a low-resource language and develop a model for sentiment analysis in the Serbian language. In the most extensive and state-of-the-art research in multilingual sentiment analysis, languages with limited resources, such as Serbian, are not covered, or they are only covered to a small extent [20,21].
In addition to the movie reviews collected in [37], a data set of music reviews (originally written in the Serbian language) is another applied set used for sentiment analysis. With the increase in the set and the scope of the data, the opportunities for developing new models and sentiment analysis in the Serbian language also increase. Likewise, the research showed that a set of movie and music reviews can be used together and that the models developed in this case offered good results. The assumption is that one of the reasons for the good results is the fact that part of the data from both data sets was collected from the same or similar portals.
The problem of the unavailability of resources in the Serbian language was attempted to be overcome by using an English data set which was translated into the Serbian language using the Google Translate API. Other researchers have also used Google Translate or the Bing translator to work on translated data for multilingual or cross-lingual sentiment analysis [43,44,45,46,47,48,49]. However, we did not achieve good results in this research, and the model had a much weaker performance than when working with reviews originally written in Serbian, likely because of the different vocabulary and style of writing reviews, as well as the quality of the translated text.
The results of this research represent a breakthrough in developing machine processing in the Serbian language. Furthermore, the creation of available annotated data sets with reviews will facilitate the further development of the sentiment analysis of short texts in the Serbian language. The main contributions of this research are the creation of a representative and sufficiently large database with movie and music reviews in the Serbian language from various sources available on the Internet, the application of the most significant algorithms in supervised text classification, and the development of different models that were trained on a set of collected data, after which the evaluation was carried out.
Using additional and/or more advanced techniques for extracting attributes from text, the proposed models can be further improved. In this paper, negation was not processed, and the filtering of stop words was done automatically within the existing library implementation of the algorithm for creating attribute vectors. A more detailed analysis of the vocabulary in the data sets could create a better set of stop words, followed by testing them on the given models. These are also the main limitations of the study. In the continuation of the research, the authors will also replace the traditional machine learning methods with a CNN or LSTM in order to obtain even better precision with more modern models, while still requiring minimal execution time.

Author Contributions

Conceptualization, D.D. and D.Z.; methodology, B.N.; related work, D.D.; software, D.Z. and D.D.; validation, D.Z.; writing—original draft preparation, D.D. and B.N.; visualization, D.D.; supervision, D.D. and B.N.; project administration, B.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science Fund of the Republic of Serbia, grant no. 6526093, AI–AVANTES (http://fondzanauku.gov.rs/).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
APIApplication programming interface
IDFInverse document frequency
LRLogistic regression
MARDMultimodal album reviews data set
MLMachine learning
MNBMultinomial naïve Bayes
NB Naïve Bayes
NLPNatural language processing
OvAOne-vs-All
OvOOne-vs-One
SVMSupport vector machine
TFTerm frequency

References

  1. Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs Up? Sentiment Classification using Machine Learning Techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), EMNLP, Philadelpiha, PA, USA, 6–7 July 2002. [Google Scholar]
  2. Abbasi, A.; Chen, H.; Salem, A. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 2008, 26, 1–34. [Google Scholar] [CrossRef]
  3. Das, S.R.; Chen, M.Y. Yahoo! for Amazon: Sentiment extraction from small talk on the Web. Manag. Sci. 2007, 53, 1375–1388. [Google Scholar] [CrossRef]
  4. Neethu, M.S.; Rajasree, R. Sentiment analysis in Twitter using machine learning techniques. In Proceedings of the 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), Tiruchengode, India, 4–6 July 2013. [Google Scholar]
  5. Bouazizi, M.; Ohtsuki, T. Sentiment analysis: From binary to multi-class classification: A pattern-based approach for multi-class sentiment analysis in Twitter. In Proceedings of the IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 22–27 May 2016. [Google Scholar]
  6. Čutura, G.; Knežević, B.; Drašković, D. Public opinion about Novak Djokovic through the eyes of Twitter. In Proceedings of the 12th International Conference on Information Society and Technology, Kopaonik, Serbia, 13–16 March 2022; pp. 81–85. [Google Scholar]
  7. Benjamin, M. Hard Numbers: Language Exclusion in Computational Linguistics and Natural Language Processing. In Proceedings of the LREC 2018 Workshop “CCURL2018–Sustaining Knowledge Diversity in the Digital Age”, Miyazaki, Japan, 7–12 May 2018; pp. 26–32. [Google Scholar]
  8. El-Haj, M.; Kruschwitz, U.; Fox, C. Creating language resources for under-resourced languages: Methodologies, and experiments with Arabic. Lang. Resour. Eval. 2015, 49, 549–580. [Google Scholar] [CrossRef]
  9. Maxwell, M.; Hughes, B. Frontiers in linguistic annotation for lower-density languages. In Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora. Association for Computational Linguistics, Sydney, NSW, Australia, 22 July 2006; pp. 29–37. [Google Scholar]
  10. Streiter, O.; Scannell, K.P.; Stuflesser, M. Implementing NLP projects for noncentral languages: Instructions for funding bodies, strategies for developers. Mach. Transl. 2006, 20, 267–289. [Google Scholar] [CrossRef]
  11. Towards Data Science. Available online: http://towardsdatascience.com/major-trends-in-nlp-a-review-of-20-years-of-acl-research-56f5520d473 (accessed on 15 May 2022).
  12. Kornai, A. Digital Language Death. PLoS ONE 2013, 8, e77056. [Google Scholar]
  13. Berment, V. Several directions for minority languages computerization. In Proceedings of the 19th International Conference on Computational Linguistics: Project Notes (COLING 2002). Association for Computational Linguistics, Taipei, Taiwan, 26–30 August 2002. [Google Scholar]
  14. King, B.P. Practical Natural Language Processing for Low-Resource Languages; University of Michigan: Ann Arbor, MI, USA, 2015. [Google Scholar]
  15. Duong, L.T. Natural Language Processing for Resource-Poor Languages. Ph.D. Thesis, University of Melbourne, Melbourne, VIC, Australia, 2017. [Google Scholar]
  16. Pang, B.; Lee, L. Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. 2008, 2, 1–135. [Google Scholar] [CrossRef]
  17. Liu, B.; Zhang, L. A Survey of Opinion Mining and Sentiment Analysis. In Mining Text Data; Aggarwal, C.C., Zhai, C., Eds.; Springer: Boston, MA, USA, 2012; pp. 415–463. [Google Scholar]
  18. Paulino, J.; Almirol, L.; Favila, J.; Aquino, K.; De La Cruz, A.; Roxas, R. Multilingual Sentiment Analysis on Short Text Document Using Semi-Supervised Machine Learning. In Proceedings of the 5th International Conference on E-Society, E-Education and E-Technology, Virtual Format, 21–23 August 2021; pp. 164–170. [Google Scholar]
  19. Nankani, H.; Dutta, H.; Shrivastava, H.; Rama Krishna, P.V.N.S.; Mahata, D.; Shah, R.R. Multilingual Sentiment Analysis. In Deep Learning-Based Approaches for Sentiment Analysis; Part of the Algorithms for Intelligent Systems Book Series; Agarwal, B., Nayak, R., Mittal, N., Patnaik, S., Eds.; Springer: Singapore, 2020. [Google Scholar]
  20. Dashtipour, K.; Poria, S.; Hussain, A.; Cambria, E.; Hawalah, A.Y.; Gelbukh, A.; Zhou, Q. Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques. Cogn. Comput. 2016, 8, 757–771. [Google Scholar] [CrossRef]
  21. Sagnika, S.; Pattanaik, A.; Mishra, B.S.P.; Meher, S. A Review on Multi-Lingual Sentiment Analysis by Machine Learning Methods. J. Eng. Sci. Technol. Rev. 2020, 13, 154–166. [Google Scholar] [CrossRef]
  22. Bera, A.; Ghose, M.K.; Pal, D.K. Sentiment Analysis of Multilingual Tweets Based on Natural Language Processing (NLP). Int. J. Syst. Dyn. Appl. 2021, 10, 1–12. [Google Scholar] [CrossRef]
  23. Xu, H.; Van Durme, B.; Murray, K. BERT, mBERT or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Online, 7–11 November 2021. [Google Scholar]
  24. Khan, L.; Amjad, A.; Ashraf, N.; Chang, H.-T. Multi-class sentiment analysis of urdu text using multilingual BERT. Sci. Rep. 2022, 12, 5436. [Google Scholar] [CrossRef] [PubMed]
  25. Pota, M.; Ventura, M.; Fujita, H.; Esposito, M. Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets. Expert Syst. Appl. 2021, 181, 115119. [Google Scholar] [CrossRef]
  26. Agüero-Torales, M.; Salas, J.; López-Herrera, A. Deep learning and multilingual sentiment analysis on social media data: An overview. Appl. Soft Comput. 2021, 107, 107373. [Google Scholar] [CrossRef]
  27. Kanfoud, M.R.; Bouramoul, A. SentiCode: A new paradigm for one-time training and global prediction in multilingual sentiment analysis. J. Intell. Inf. Syst. 2022; Online ahead of print. [Google Scholar] [CrossRef] [PubMed]
  28. Žitnik, S.; Blagus, N.; Bajec, M. Target-level sentiment analysis for news articles. Knowl.-Based Syst. 2022, 249. [Google Scholar] [CrossRef]
  29. Ljubešić, N.; Lauc, D. BERTić-The transformer language model for Bosnian, Croatian, Montenegrin and Serbian. arXiv 2021, arXiv:2104.09243, 2021. [Google Scholar]
  30. Mozetič, I.; Grčar, M.; Smailović, J. Multilingual Twitter Sentiment Classification: The Role of Human Annotators. PLoS ONE 2016, 11, e0155036. [Google Scholar] [CrossRef]
  31. Ljajić, A.; Marovac, U. Improving Sentiment Analysis for Twitter Data by Handling Negation Rules in the Serbian Language. Comput. Sci. Inf. Syst. 2018, 16, 289–311. [Google Scholar] [CrossRef]
  32. Batanović, V. Semantic Similarity and Sentiment Analysis of Short Texts in Serbian. In Proceedings of the 29th Telecommunications Forum (TELFOR), Virtual Event, 11 December 2021. [Google Scholar]
  33. Lohar, P.; Popovic, M.; Way, A. Building English-to-Serbian Machine Translation System for IMDb Movie Reviews. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, Florence, Italy, 2 August 2019; pp. 105–113. [Google Scholar]
  34. Kumpulainen, I.; Praks, E.; Korhonen, T.; Ni, A.; Rissanen, V.; Vankka, J. Predicting Eurovision Song Contest Results Using Sentiment Analysis. In Artificial Intelligence and Natural Language; Filchenkov, A., Kauttonen, J., Pivovarova, L., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 1292. [Google Scholar]
  35. Mladenović, M.; Mitrović, J.; Krstev, C.; Vitas, D. Hybrid sentiment analysis framework for a morphologically rich language. J. Intell. Inf. Syst. 2016, 46, 599–620. [Google Scholar] [CrossRef]
  36. Stankovic, R.; Kosprdic, M.; Ikonic-Nesic, M.; Radovic, T. Sentiment Analysis of Sentences from Serbian ELTeC corpus. In Proceedings of the SALLD-2 Workshop at Language Resources and Evaluation Conference (LREC), Marseille, France, 24 June 2022; pp. 31–38. [Google Scholar]
  37. Batanovic, V.; Nikolic, B.; Milosavljevic, M. Reliable Baselines for Sentiment Analysis in Resource-Limited Languages: The Serbian Movie Review Dataset. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), LREC, Portorož, Slovenia, 23–28 May 2016. [Google Scholar]
  38. Oramas, S.; Espinosa-Anke, L.; Lawlor, A.; Serra, X.; Saggion, H. Exploring Customer Reviews for Music Genre Classification and Evolutionary studies. In Proceedings of the 17th International Society for Music Information Retrieval Conference, New York, NY, USA, 7–11 August 2016. [Google Scholar]
  39. Milošević, N. Stemmer for Serbian language. arXiv 2012, arXiv:1209.4471. [Google Scholar]
  40. Ljubešić, N.; Boras, D.; Kubelka, D. Retrieving Information in Croatian: Building a Simple and Efficient Rule-Based Stemmer. In Proceedings of the 1st International Conference The Future of Information Sciences—INFuture: “Digital Information and Heritage”, Zagreb, Croatia, 7–9 November 2007. [Google Scholar]
  41. Ljubešić, N.; Klubička, F.; Agić, Ž.; Jazbec, I.-P. New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Portorož, Slovenia, 23–28 May 2016; pp. 4264–4270. [Google Scholar]
  42. Wang, S.; Manning, C.D. Baselines and Bigrams: Simple, Good Sentiment and Topic Classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012), Jeju Island, Korea, 8–14 July 2012; pp. 90–94. [Google Scholar]
  43. Hogenboom, A.; Heerschop, B.; Frasincar, F.; Kaymak, U.; de Jong, F. Multi-lingual support for lexicon-based sentiment analysis guided by semantics. Decis. Support Syst. 2014, 62, 43–53. [Google Scholar] [CrossRef]
  44. Lin, Z.; Jin, X.; Xu, X.; Wang, Y.; Tan, S.; Cheng, X. Make it possible: Multilingual sentiment analysis without much prior knowledge. In Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), IEEE Computer Society, Warsaw, Poland, 11–14 August 2014; Volume 2, pp. 79–86. [Google Scholar]
  45. Hajmohammadi, M.S.; Ibrahim, R.; Selamat, A.; Fujita, H. Combination of active learning and self-training for crosslingual sentiment classification with density analysis of unlabelled samples. Inf. Sci. 2015, 317, 67–77. [Google Scholar] [CrossRef]
  46. Becker, K.; Moreira, V.P.; dos Santos, A.G. Multilingual emotion classification using supervised learning: Comparative experiments. Inf. Processing Manag. 2017, 53, 684–704. [Google Scholar] [CrossRef]
  47. Chen, Z.; Shen, S.; Hu, Z.; Lu, X.; Mei, Q.; Liu, X. Ermes: Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification. arXiv 2018, arXiv:1806.02557. [Google Scholar]
  48. Balahur, A.; Turchi, M. Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Comput. Speech Lang. 2014, 28, 56–75. [Google Scholar] [CrossRef]
  49. Bhargava, R.; Sharma, Y. MSATS: Multilingual sentiment analysis via text summarization. In Proceedings of the 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence, IEEE, Noida, India, 12–13 January 2017; pp. 71–76. [Google Scholar]
Figure 1. Diagram of a machine classifier.
Figure 1. Diagram of a machine classifier.
Mathematics 10 03236 g001
Figure 2. Diagram of model evaluation.
Figure 2. Diagram of model evaluation.
Mathematics 10 03236 g002
Table 1. Related research papers overview.
Table 1. Related research papers overview.
Author, Year, LanguageData SetMajor ContributionTechniques
Mozetič et al., 2016,
13 languages, including
Slovenian, Serbian, Albanian, Bulgarian, etc. [30]
Twitter dataevaluation of data sets using different classifiers and comparative analysis for multiple languagesNB, different types
of SVM
Mladenović et al., 2016, Serbian [35]movie reviews, news setbuilding a sentiment analysis framework for SerbianMaximum Entropy
Ljajić and Marovac, 2018, Serbian [31]Twitter dataexamining how the treatment
of negation impacts the
sentiment of tweets
NB, LR, SVM,
J48-DTree
Lohar et al., 2019,
English => Serbian [33]
large movie
review data set (Maas, 2011)
building a machine translation system for user-generated contentMoses MT toolkit, OpenNMT
Batanović, 2021,
Serbian [32]
movie reviews,
book reviews
evaluation and determination of the optimal configurations using several different kinds of machine-learning models on a range of sentiment classification tasksMNB, CNB, LR,
SVM, NB-SVM
Stanković et al., 2022, Serbian [36]SrpELTeC1 (multilingual corpus of novels)development and application of sentiment lexicon, (sentence) data set labeling,
and training of the models for sentiment analysis
LR, NB, DTree, RF, SVN, k-NN
Table 2. Summary overview of the collected reviews.
Table 2. Summary overview of the collected reviews.
Web Portal12345678910Sum
2kokice.com
(accessed on 12 September 2021)
4023010826121176
balkanrock.com
(accessed on 13 September 2021)
88173616951371096033519
popboks.com
(accessed on 10 September 2021)
813389820535046729566141554
serbian-metal.org
(accessed on 16 September 2021)
000441852108531240
hardwiredmagazine.com
(accessed on 18 September 2021)
018241201871217189
nocturno.com
(accessed on 13 September 2021)
3003127421204540281
hellycherry.com
(accessed on 15 September 2021)
422012852733
mnsblog.weebly.com
(accessed on 19 September 2021)
301202432320
tegla.rs
(accessed on 13 September 2021)
1653518446020427063534898
plejer.net
(accessed on 15 September 2021)
0205216717101170
Balkanmetalpromotion
(accessed on 16 September 2021)
001102274017
petar-kostic.blogspot
(accessed on 12 September 2021)
020100110320257
Mislitemojomglavom
(accessed on 15 September 2021)
9511201330411238229363
Table 3. Statistical presentation of the collected reviews.
Table 3. Statistical presentation of the collected reviews.
Web PortalGenreGrade ScaleNumber of ReviewsNumber of PositivesNumber of NeutralsNumber of NegativesAverage Review Length (Words)Shortest Review (Words)Longest Review (Words)
2kokice.compop1–107675%13.2%12.8%24838612
balkanrock.comrock, metal, punk1–1051965.3%21.4%13.3%256472072
popboks.comrock, pop1–10155454.3%35.7%10%555731915
serbian-metal.orgmetal, rock1–10024089%9%2%448681192
hardwiredmagazine.comrock1–518962%32%6%517591263
nocturno.comrock1–1028188%10%2%566691242
hellycherry.comrock1–53367%9%24%427451031
mnsblog.weebly.compop1–102060%10%30%464361646
tegla.rsdifferent0–589870%8%22%601309
plejer.netrock1–57064%26%10%50058892
balkanmetalpromotionrock, metal1–1001776%12%12%5903671038
petar-kostic.blogspotrock1–55760%19%21%7793991472
mislitemojomglavomdifferent1–1036377%11%12%601441638
Table 4. Statistical representation of the music reviews in a balanced set (A), movie reviews (B), and translated reviews (C).
Table 4. Statistical representation of the music reviews in a balanced set (A), movie reviews (B), and translated reviews (C).
(A)(B)(C)
Total number of reviews1830252351 234
Number of reviews per class61084117 078
Longest positive review2025 words1813 words2129 words
Longest neutral review1552 words1621 words3125 words
Longest negative review1664 words1835 words1845 words
Shortest positive review8 words21 words1 word
Shortest neutral review6 words73 words2 words
Shortest negative review1 word21 words1 word
Average positive review489 words472 words112 words
Average neutral review344 words468 words132 words
Average negative review344 words467 words101 words
Table 5. Results of the three-class classification.
Table 5. Results of the three-class classification.
MNBLRSVM
Attribute typebag of wordsbag of wordsTF-IDF
Attribute valuebinarybinarybinary
StemmerMiloševićMiloševićLjubešić/Pandžić
Number of n-gram221
Max frequency n-gram0.70.71
Min frequency n-gram111
Number of attributes20,00020,00020,000
Small letters onlyyesyesyes
Results (A)0.580.600.59
Results (B)0.550.580.51
Results (C)0.460.500.50
Table 6. Results of binary classifiers.
Table 6. Results of binary classifiers.
MNBLRSVM
Attribute typebag of wordsbag of wordsag of words
Attribute valuebinarybinarybinary
StemmerLjubešić/PandžićLjubešić/PandžićLjubešić/Pandžić
Number of n-gram113
Max frequency n-gram10.71
Min frequency n-gram111
Number of attributes50005000max
Small letters onlyyesyesyes
Results (A)0.770.750.77
Results (B)0.720.610.73
Results (C)0.620.450.60
Table 7. NB–SVM hybrid model results.
Table 7. NB–SVM hybrid model results.
Three ClassesBinary
Results (A)0.570.78
Results (B)0.540.70
Results (C)0.490.62
Table 8. NB–LR hybrid model results.
Table 8. NB–LR hybrid model results.
Three ClassesBinary
Results (A)0.580.79
Results (B)0.510.74
Results (C)0.420.61
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Draskovic, D.; Zecevic, D.; Nikolic, B. Development of a Multilingual Model for Machine Sentiment Analysis in the Serbian Language. Mathematics 2022, 10, 3236. https://doi.org/10.3390/math10183236

AMA Style

Draskovic D, Zecevic D, Nikolic B. Development of a Multilingual Model for Machine Sentiment Analysis in the Serbian Language. Mathematics. 2022; 10(18):3236. https://doi.org/10.3390/math10183236

Chicago/Turabian Style

Draskovic, Drazen, Darinka Zecevic, and Bosko Nikolic. 2022. "Development of a Multilingual Model for Machine Sentiment Analysis in the Serbian Language" Mathematics 10, no. 18: 3236. https://doi.org/10.3390/math10183236

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop