Next Article in Journal
Spatio-Temporal Fractal Dimension Analysis from Resting State EEG Signals in Parkinson’s Disease
Next Article in Special Issue
Anomaly Detection Using an Ensemble of Multi-Point LSTMs
Previous Article in Journal
A Many-Objective Evolutionary Algorithm Based on Dual Selection Strategy
Previous Article in Special Issue
Topic Discovery and Hotspot Analysis of Sentiment Analysis of Chinese Text Using Information-Theoretic Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sentiment Analysis on Online Videos by Time-Sync Comments

1
School of Software Engineering, Tongji University, Shanghai 201804, China
2
School of Electronic and Information Engineering, Tongji University, Shanghai 201804, China
3
SILC Business School, Shanghai University, Shanghai 201800, China
4
SHU-SUCG Research Centre for Building Industrialization, Shanghai University, Shanghai 200072, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2023, 25(7), 1016; https://doi.org/10.3390/e25071016
Submission received: 25 May 2023 / Revised: 28 June 2023 / Accepted: 29 June 2023 / Published: 2 July 2023
(This article belongs to the Special Issue Information-Theoretic Methods in Data Analytics)

Abstract

:
Video highlights are welcomed by audiences, and are composed of interesting or meaningful shots, such as funny shots. However, video shots of highlights are currently edited manually by video editors, which is inconvenient and consumes an enormous amount of time. A way to help video editors locate video highlights more efficiently is essential. Since interesting or meaningful highlights in videos usually imply strong sentiments, a sentiment analysis model is proposed to automatically recognize sentiments of video highlights by time-sync comments. As the comments are synchronized with video playback time, the model detects sentiment information in time series of user comments. Moreover, in the model, a sentimental intensity calculation method is designed to compute sentiments of shots quantitatively. The experiments show that our approach improves the F1 score by 12.8% and overlapped number by 8.0% compared with the best existing method in extracting sentiments of highlights and obtaining sentimental intensities, which provides assistance for video editors in editing video highlights efficiently.

1. Introduction

With the boom of online video websites, more and more people are likely to watch videos online. Those websites not only bring convenience in watching videos but also provide functions for people to make comments on videos. However, since a huge amount of videos are uploaded to the websites every day, it is hard for one to watch every minute in the videos. In this circumstance, audiences may prefer to watch video highlights, which are composed of excellent video fragments instead of watching entire videos.
Video highlights are a crucial aspect of video content as they provide audiences with a condensed version of the most interesting and meaningful parts of the video. However, the process of manually editing these highlights is time-consuming and labor-intensive, making it essential to find a more efficient way to locate the video highlights. In recent years, sentiment analysis has emerged as a promising approach of automatically recognizing sentiments of video highlights using time-sync comments.
Time-sync comments (TSCs) are messages that users send while watching a video to express their thoughts and feelings about what they are seeing. These comments appear on the screen at the moment they are made and reflect the users’ mood during that particular segment of the video. By analyzing the time-sync comments, we can gain insights into the emotions of the viewers and even predict the emotional trajectory of the video. In this paper, we mainly conduct experiments on Chinese time-sync comments. These comments are often used to express various emotions and moods, ranging from happiness and excitement to sadness and frustration. For example, viewers may leave comments like “OMG” or “lol” to express their amusement or laughter, while comments such as “so sad” or “heartbreaking” can indicate a feeling of sadness or sympathy.
By analyzing the sentiment of time-sync comments, we can detect sentiment information in the time series of comments and use this information to extract the most interesting or meaningful parts of the video. Furthermore, we can quantify the sentimental intensity of these shots using a sentimental intensity calculation method.
In this paper, we propose a TSC-based sentiment analysis model to extract highlights from videos and calculate their sentiment intensity. The main contributions include: (1) a sentiment fragments detection model for videos using TSC data is proposed to detect video fragments with strong sentiment from videos, (2) a highlight extraction strategy is designed to find video highlights, and (3) a sentiment intensity calculation method for video fragments is constructed in order to compute sentiments of video fragments quantitatively.
The rest of the paper is organized as follows. Section 2 reviews the related work. Section 3 defines two problems of sentiment analysis on online videos. Two sentiment analysis strategies using TSC are proposed in Section 4 and Section 5. Section 6 evaluates the performance of the model using a TSC dataset. We conclude our work in Section 7.

2. Related Work

2.1. Time-Sync Comments

Time-sync comments, first introduced in academia [1], are widely used in video websites, such as Acfun, Bilibili, and YouKu, which are some of the most popular video websites in China. One TSC is composed of a comment and a time stamp. It is a comment by an audience, which shows the audience’s opinion on a video shot.
The time stamp is synchronized to the shot’s playback time in the video [2]. TSCs are used for video classification tasks [3]. Current researchers use TSCs to extract video highlights [4,5,6]. Moreover, current approaches are beginning to apply TSCs to the emotional analysis of videos [7,8]. Bonifazi et al. [9] take into account the similarity between patterns and put forth a content semantic network called CS-Net to handle reviews. To measure the similarity between two networks, they calculated the similarity of structural features across different networks. As TSCs of a video indicate opinions of audiences on the shots of the video, text analysis of the TSCs is able to extract details for every single shot of a video. Moreover, the extraction results reflect not only explicit information but also implicit information.

2.2. Video Highlight Extraction

The work of video highlight extraction is mainly carried out by editors of online video websites manually. In order to extract highlights in videos, those editors have to watch the whole videos first. Then, they select video fragments that are interesting and may be welcomed by audiences. Lastly, the video fragments are re-edited and re-organized as video highlights. As such work is inefficient, it is necessary to provide a method that can extract interesting video fragments automatically. Recently, some researchers have begun to use TSCs for video highlight extraction. One work proposes to use “global + local” sentiment analysis to find highlights [5]. Another work proposes to use lag-calibration and the combination of topic and emotion concentration in an unsupervised way to detect highlights [6]. Actually, in a video, fragments that are welcomed by audiences always indicate one or more sentiments strongly. Therefore, in achieving the goal of welcomed video fragment extraction, sentiment detection for video fragments is the key process.

2.3. Sentiment Analysis

Many researchers have focused on detecting sentiment using image-based approaches. A number of researchers track the human face [10,11,12,13] or human pose [14,15,16,17,18], while some other researchers extract semantic features of sentiment from images [19,20,21,22,23,24]. However, compared with text-based processes, image-based approaches consume more time and cost more computational resources, but achieve less accuracy [25]. Additionally, labels extracted by the image-based approaches can only reflect explicit sentiments [26]. By contrast, both explicit and implicit sentiments can be detected by analyzing audience comments using the text-based approaches.
As the textual approaches have those advantages, many efforts have been directed to text-based analysis [27,28,29,30,31,32,33,34,35,36]. Nevertheless, current approaches either assign sentiment tags to whole videos instead of a single shot [37] or treat the video shots as independent objects [38], while a video segment constitutes a group of the shots that may have relations with preceding and following shots. Bonifazi et al. [39] propose a general framework capable of analyzing the range of sentiment associated with any topic on any social network.
In conclusion, while researchers primarily focus on tasks such as video classification and video clip recommendation using TSCs, they often overlook the potential of using TSCs for video highlight extraction and calculating the sentimental intensity of those highlights. Therefore, we propose a four-step strategy for extracting sentiment highlights in videos, which involves identifying and grouping together adjacent video fragments that share similar sentiment. Moreover, we introduce a strategy for quantitatively measuring the sentimental intensity of a highlight, taking into account not only the types of sentiment implied but also the strength of the sentiment within each type. By employing these strategies, we aim to enhance the understanding and representation of contents having various sentiment within videos.

3. Problem Definition

3.1. Illustration of Time-Sync Comments

A time-sync comment is composed of text-based comments and time stamps. The comment is usually a sentence of fewer than 20 words. Sometimes it is a text symbol representing an emotion, such as OMG standing for surprise, LOL meaning happiness, and 233333 expressing a laugh in habits of people who are using TSCs. The time stamp records the playback time of a video shot, and it is synchronized to the comments on the shot.
Figure 1 shows an example of two shots and their TSCs in the video Forrest Gump. In the figure, Is she Jenny?! and She is beautiful are two TSCs on the shot whose playback time is 13:43, and He was shot and It’s so affecting are another two TSCs that are synchronized to the time stamp 54:13.
The sentiment features of a video shot are indicated by TSCs. For example, She is beautiful reflects that the sentiment of the current shot is close to LIKE rather than HATE. In addition, It’s so affecting means that the fragment close to the playback time 54:13 contains a positive sentiment instead of a negative one.

3.2. Formal Definition

Let v be a video. Let T s t a r t and T e n d be the start time and finish time of v, respectively. Let T v be the length of v. We have T v = T e n d T s t a r t .
Let F v = { f v , 1 , f v , 2 , , f v , N F } be a set of fragments in v, where f v , i ( 1 i N F ) is the i-th fragment and N F is the number of fragments. We use T s t a r t , i and T e n d , i to represent the start time and finish time of f v , i . We define that, for any f v , i F v , the length of f v , i is T f = T e n d , i T s t a r t , i . For any f v , i , f v , i + 1 F v , there is a interval I   ( I < T f ) between the start time of f v , i and that of f v , i + 1 . That is I = T s t a r t , i + 1 T s t a r t , i . It means every two adjacent fragments have an ( T f I ) -length overlap. Thus, T v = I × ( N F 1 ) + T f . Obviously, N F = T v T f + I I . Usually, T f is far less than T v , and I is less than T f . Therefore, the number of fragments in v is approximately T v I .
Suppose T f is small enough that makes one fragment unable to display a complete highlight. It means that a fragment is only a part of a highlight. In another words, a highlight consists of more than one continuous fragment when T f is small.
Let H v = { h v , 1 , h v , 2 , , h v , N H } be a set of highlights in v, where h v , i ( 1 i N H ) is the i-th highlight and N H is the number of highlights. For any h v , i H v , h v , i = j = s t { f v , j } , where f v , j ( s j t ) is the j-th fragment in v.
Suppose there are k types of sentiments. Let S = { s 1 , s 2 , , s k } be the set of sentiments. Sentiment intensity of a highlight, h v , i , is defined as E d , h v , i = ( e 1 , e 2 , , e k ) . It is a vector that shows intensity distribution in the k types of sentiments for the highlight h v , i . For any e j ( 1 j k ) , it is an intensity value of sentiment type s j in h v , i .
Let B v be the set of TSCs in v, and B f v , i be the set of TSCs in f v , i . For any TSC b B v , b is described as a tuple ( w b , t b , u b ) , where w b is b’s comment, which is a set of words or text symbols, t b is b’s time stamp, and u b represents a user ID of an audience who sends b. Let N U be the total number of audiences who send comments to v. Let T s y n c ( w ) be a time stamp that is synchronized to a comment w, and u s e r ( w ) be the user who sends w. In the case of tuple ( w b , t b , u b ) , T s y n c ( w b ) = t b , u s e r ( w b ) = u b .
The notations defined are listed in Table 1.

3.3. Problem Statement

Under the formal description, the problems of sentiment highlight extraction and sentiment intensity calculation are defined. The two problems are described as follows.
(1)
Problem of Sentiment Highlight Extraction:
Given v and B v . For any 1 < i < N F , to find l i and r i to satisfy all the constraint conditions below,
a.
1 l i < i and 1 < r i N F i ;
b.
For any i l i k i + r i 1 , f v , k and f v , k + 1 have similar sentiment;
c.
f v , i l i 1 and f v , i l i do not have similar sentiment;
d.
f v , i + r i and f v , i + r i + 1 do not have similar sentiment.
(2)
Problem of Sentiment Intensity calculation:
Given H v , B v , and S. For any 1 i N H , find a vector ( e 1 , e 2 , , e k ) that shows intensity distribution in ( s 1 , s 2 , , s k ) for h v , i , where e j ( 1 j k ) is the value of intensity in s j and s j S .
As fragments in the same highlight reflect similar sentiment, the problem of highlight extraction is how to gather fragments that have similar sentiment together. If the problem is solved, we can obtain a set of highlights, H v = { j = i l i i + r i { f v , j } } { 1 i N F } . It means H v is a set of elements that are j = i l i i + r i { f v , j } for every i   ( 1 i N F ) . After removing redundancy, H v can be organized in the format of { h v , 1 , h v , 2 , , h v , N H } . After obtaining the set of highlights, H v , the sentiment intensity of each highlight in H v can be computed by solving the problem of highlight sentiment intensity calculation using the TSC set.

4. Sentiment Highlight Extraction

A strategy of sentiment highlight extraction is used to extract highlights in a video by gathering video adjacent fragments that have similar sentiment together. It is mainly composed of four steps: (1) TSC vectors of all fragments are constructed, (2) similarity matrices of all fragments are generated to measure similarities among user comments, (3) feature similarity of each fragment is calculated, and (4) the highlight score of each fragment is calculated. The processes of the strategy are shown in Figure 2. The details of the four steps in Figure 2 are described in the four subsections in Section 4.

4.1. Construct TSC Vectors

We construct a TSC vector, C ( i ) , for fragment f v , i ( 1 i N F ) . It is organized as C ( i ) = ( w b , 1 ( i ) , w b , 2 ( i ) , , w b , N U ( i ) ) . Each element, w b , j ( i ) ( 1 j N U ) , is a set of comments on f v , i , commented by user u b j . We describe w b , j ( i ) as w b , j ( i ) = { w b | T s t a r t , i T s y n c ( w b ) T e n d , i , u s e r ( w b ) = u b j } , where T s t a r t , i and T e n d , i are the start time and finish time of fragment f v , i , respectively.

4.2. Generate Similarity Matrices

A similarity matrix is generated for each fragment. It reflects the similarities of comments from different users on the same fragment. A similarity matrix, M f v , i , has a size of N U × N U . Let m j , k ( i ) be the elements at the j-th row and k-th column in M f v , i . Then, m j , k ( i ) is calculated by the formula m j , k ( i ) = f s ( w b , j ( i ) , w b , k ( i ) ) , where f s ( · , · ) is a similarity factor such as cosine similarity, and w b , j ( i ) and w b , k ( i ) are two sets of comments on f v , i commented by user u b j and user u b k , respectively.

4.3. Calculate Feature Similarity

After obtaining the similarity matrix M f v , i for fragment f v , i , we easily obtain M f v , i ’s largest real eigenvalue and its corresponding eigenvector, p i . The Perron–Frobenius theorem ensures that components in p i are positive values. Values in p i are thought of as features of “sentiment” implied by audiences’ comments on fragment f v , i .
Since p i represents features of f v , i , we calculate the mean value of features of the nearest m fragments before f v , i . The mean value, p i , m e a n , is calculated by Equation (1).
p i , m e a n = ( j = 1 i 1 p j ) / ( i 1 ) i m ( j = i m i 1 p j ) / m i > m
The feature similarity of fragment f v , i , notated as S f v , i , is the similarity of p i and p i , m e a n . The similarity is calculated using the cosine function, which is S f v , i = cos ( p i , p i , m e a n ) .

4.4. Finding Video Highlights

Firstly, highlight scores of all fragments are calculated in order to decide which fragments are put together in the same highlight. R f v , n , the highlight score of fragment f v , n , is calculated by Equation (2), where D f v , n is the TSC density in f v , n , defined as the number of all TSCs commented on, f v , n .
R f v , n = S f v , n × log ( 1 + D f v , n )
The larger TSC density a fragment has, the stronger sentiment the fragment manifests. It is attributed to the fact that people prefer to express their opinions when they feel a fragment is interesting or meaningful, which makes the number of TSCs increase.
Next, fragments that have high highlight scores are selected as single highlights.
A highlight score of a fragment indicates the possibility that the fragment is considered as a highlight. The higher a fragment’s highlight score is, the higher is probability that the fragment may become a highlight.
A highlight threshold, δ , is set for single highlight detection. If R f v , i , the highlight score of fragment f v , i , is larger than the highlight threshold, δ , and f v , i is selected as a single highlight.
After that, relevant single highlights are merged into one highlight. For any two fragments, f v , i and f v , j , they will be merged as a highlight if (a) R f v , i > δ , (b) R f v , j > δ , (c) | i j | = 1 , and (d) | R f v , i R f v , j | < θ , where δ is the highlight threshold, and θ is a link threshold for deciding whether two fragments have strong relevance in sentiment.
Under the strategy, a fragment will be merged with its neighboring fragment if the two fragments are relevant in sentiment and both of them are single highlights. Moreover, three or more adjacent fragments can be merged as a highlight.
Lastly, a highlight set is obtained by putting all the highlights together. We can obtain a highlight set, H v , that composed of different highlights in video v. A highlight, h v , i , in H v is called a sentiment highlight of H v .

5. Sentiment Intensity Calculation

A strategy of sentimental intensity is used to measure the strength of sentiment for a highlight quantitatively. It reflects not only which sentiment types the highlight implies, but also how strong the highlight’s sentiment is in each type. In this paper, we choose TSCs in Chinese language to analyze sentiment intensity because Chinese is the most popular language in TSCs. For TSCs in other languages, the sentiment intensity can still be calculated using grammar rules of the languages and conventional sentiment analysis methods such as Bidirectional Encoder Representations from Transformers (BERT) in the same way.

5.1. Word Groups Division for TSCs

Using the strategy of sentiment highlight extraction, a set of highlights, H v , is extracted from video v. For a highlight h v , i H v , it is composed of one or more adjacent fragments. That is, h v , i = j = s i s i + N i 1 { f v , j } , where N i is the number of fragments in h v , i , s i is the index of the first fragment in h v , i , and s i + N i 1 is the index of the last fragment in h v , i .
Let C M T h v , i be a set of TSC comments that are commented in fragments of h v , i . Thus, C M T h v , i = { w b | T s t a r t , s i T s y n c ( w b ) T e n d , s i + N i 1 } , where T s t a r t , s i and T e n d , s i + N i 1 are the start time of f v , s i and the finish time of f v , s i + N i 1 , respectively.
Through linguistic analysis, sentiments implied in a sentence are impacted by some special words in the sentence. In the case of TSCs, there are three categories of special words, which are emotional words, adverbs, and negative words. An emotional word in comments expresses some kinds of sentiments and their intensity. An adverb strengthens or weakens sentiment intensity for a comment. A negative word changes the meaning of a comment completely. For example, both the sentences I am a little bit happy and I am very happy express the sentiment of HAPPY, but the sentiment of the second sentence is much stronger than that of the first one. It is attributed to the fact that very is an adverb whose weight is much greater than a little bit. Another example, I am happy, shows the sentiment of HAPPY, while I am not happy describes a sentiment opposite to HAPPY, i.e., probably SAD.
Emotional words in C M T h v , i can be selected according to a dictionary of emotional words. Sentiment intensities of the emotional words can also be obtained from the dictionary. Actually, for an emotional word, d j , its sentiment intensity, E d , d j = ( e 1 , e 2 , , e k ) , is a distribution of sentiment strengths on the k types of sentiments, and e j ( 1 e j k ) is the strength of d j on the j-th sentiment type.
Most words in TSCs can be covered by the dictionary. However, there are some new terms that are not included in the dictionary. For those emotional terms that exist in C M T h v , i but are not found in the dictionary, we extend the dictionary by setting a sentiment type and a value of sentiment intensity. There are two available approaches to extend the sentiment dictionary. One method uses a dictionary of synonyms, and new terms are synonymous with existing ones. We replace new terms with terms from the existing sentiment dictionary, thus obtaining a similar sentiment intensity. Another approach uses the original sentiment dictionary as a foundation and calculates the semantic similarity between new terms and those terms in the sentiment dictionary. It allows for the extension of the sentiment dictionary based on the semantic associations between terms. As the dictionary extension approaches are beyond the topics of this paper, it will not be introduced in the details of approaches in this paper.
Like a sentiment intensity, E d , d j , of an emotional word, d j , can be obtained from the dictionary of emotional words, a weight, W D , for an adverb, D, can be obtained through a dictionary of adverbs. Similarly, negative words in C M T h v , i are able to be found easily from a dictionary of negative words.
Suppose there are N D , i emotional words in C M T h v , i , and the words of comments in C M T h v , i are organized into N D , i groups { G 1 , G 2 , , G N D , i } . Each emotional word with its related adverbs and negative words are put into the same group. Thus, every group contains only one emotional word and may include one or more adverbs and negative words. Figure 3 shows groups of TSC words.

5.2. Sentiment Intensity Calculation for Highlights

According to the definition in Section 2, E d , h v , i = ( e 1 , e 2 , , e k ) is the sentiment intensity of highlight h v , i , where k is the number of sentiment types, and e j ( 1 j k ) is an intensity value of the j-th sentiment type in h v , i .
The sentiment intensity of G j ( 1 j N D , i ) is affected by adverbs and negative words in G j . The sentiment intensity of G j is calculated in three situations:
(a)
There is neither an adverb nor negative word in G j . The sentiment intensity of G j is the same as that of emotional word d j , which is
E d , G j = E d , d j
where E d , d j is the sentiment intensity of emotional word d j .
(b)
There is no adverb but there are N n ( N n 1 ) negative words in G j . Since a negative word oppositely affects a emotional word, in Chinese grammar, the presence of an even number of negative words at the same time indicates a stronger positive meaning, while the simultaneous appearance of an odd number of negative words indicates a stronger negative meaning. Therefore, according to the number of negative words that appear, the sentiment intensity of G j is calculated as
E d , G j = ( 1 ) N n × E d , d j
where E d , d j is the sentiment intensity of emotional word d j .
(c)
There is no negative word but there is one adverb in G j . The sentiment intensity of G j is calculated as
E d , G j = W D × E d , d j
where W D is the weight of adverb D, and E d , d j is the sentiment intensity of emotional word d j .
(d)
There are both adverbs and N n ( N n 1 ) negative words in G j . As comments in C M T h v , i are Chinese characters, according to Chinese linguistic features, if there is more than one adverb in word group G j , then we consider G j to be not grammatical, so we just consider that there is one adverb or less in G j . At the same time, an adverb written before or after a negative word affects the sentiment intensity of a word group differently. If the position of an adverb is before all negative words in G j , the sentiment intensity of G j is calculated as
E d , G j = ( 1 ) N n × W D × E d , d j
where W D is the weight of adverb D, and E d , d j is the sentiment intensity of emotional word d j .
If there are N n 1 ( 1 < N n 1 N n ) negative words before D, and N n 2 ( N n 2 = N n N n 1 ) negative words after D, the sentiment intensity of G j is calculated as
E d , G j = ( 1 ) N n 1 + 1 × W × W D × ( 1 ) N n 2 × E d , d j
where W is the parameter to weaken sentiment intensity, W D is the weight of adverb D, and E d , d j is the sentiment intensity of emotional word d j .
From the processes above, we can obtain the sentiment intensity of each word group, G j , in C M T h v , i . Then, we use the sentiment intensity of all word groups to generate the sentiment intensity of a video highlight. The sentiment intensity of highlight, h v , i , is calculated as
E d , h v , i = j = 1 N D , i E d , G j ( T e n d , s i + N i 1 T s t a r t , s i ) / I
where E d , G j is the j-th word group in C M T h v , i , N D , i is the number of word groups in C M T h v , i , T s t a r t , s i and T e n d , s i + N i 1 are the start point and end point of video highlight h v , i , respectively, and I is the interval between T s t a r t , s i and T s t a r t , s i + 1 .
The sentiment intensity, E d , h v , i , is an average value of total sentiment intensity in the highlight, h v , i , per unit time.

6. Evaluation

6.1. Experiment Setup

A TSC dataset that includes approximate 16 million TSCs is used to evaluate the performance of our proposed work. The TSCs are collected from 4841 online videos, which contain movies, animation, TV series and variety shows.
Emotional words ontology (http://ir.dlut.edu.cn/info/1013/1142.htm (accessed on 1 May 2023)), provided by the Dalian University of Technology, is used to build up our sentiment dictionary. In the dictionary, each word is related to a sentiment intensity, a 7-dimension vector. Each dimension represents one of seven kinds of sentiment, which are happy, good, angry, sad, afraid, hate, and shock.
We randomly selected 34 movies on the Bilibili website, including action movies, comedy movies, fantasy movies, horror movies, etc. The TSCs of movies including Spider-Man: Homecoming, Harry Potter and the Philosopher’s Stone, Green Book, Charlie Chaplin, The Shawshank Redemption, Secret Superstar, etc. from the dataset were chosen for our experiments. In the experiments, fragment length T f is set to 30 s and fragment interval I is set to 20 s. Different movies have different numbers of time-sync comments. We randomly selected 5000 time-sync comments for each movie. We combined the movie categories on the iMDb website and the sentiment analysis of all the time-sync comments of the movies to classify the selected movies in the experiments. The basic information of the movies is shown in Table 2.
There are some highlights in each movie. All of the baseline highlights are manually selected by movie audiences. We obtained the edited highlight moment video on the imdb and bilibili websites, and matched it with the original movie to obtain the highlight time. The baseline highlights of some movies in the dataset are listed in Table 3. We chose one movie from each category, and we can find the movie name, highlight number, and highlight playback time in Table 3.
In the experiments, we used two metrics to measure the performance of sentiment highlight extraction strategy, which are,
(1)
Sentiment highlight F1 score, calculated by equation
F 1   S c o r e = 2 × p r e c i s i o n × r e c a l l p r e c i s i o n + r e c a l l
(2)
Overlapped number count, which is the number of overlapped fragments between highlights extracted by our proposed approach and the baseline highlights.

6.2. Evaluation of Sentiment Highlights Extraction

In the experiments, the highlight threshold, δ [ 0 , 1 ) , and linking threshold, θ [ 0 , 1 ) , are two adjustable parameters. After a number of experiments, results show that θ has little effect on sentiment highlight extraction. Therefore, θ is set 0.1 in the experiments. In order to obtain the optimal value of δ , we calculated the average F1 score and overlapped number count under different δ . We used Latent Dirichelet Allocation (LDA) and BERT, respectively, to construct TSC vectors in our method. The main parameters in the LDA model are as follows: the number of theme sampling iteration η = 100 ; and the quantity of hidden topics K = 100 . The main parameters in the BERT model are as follows: the number of hidden layers is 12; the hidden size is 768; and the number of attention heads is 12. We compared our method with three methods: (1) randomly selected fragments, (2) Multi-Topic Emotion Recognition (MTER) [5], and (3) the method proposed by Ping [6]. We also compared our methods using different ways for constructing TSC vectors and the method without the step in section find video highlights. The overlapped number count is the sum of overlapped number from these methods.
Figure 4 shows the experiments results of the sentiment highlight extraction strategy. As we can see in the figure, our proposed strategy has the highest average F1 score and highest overlapped number count when δ = 0.2 ; in the other words, our model has the optimal extraction effect at δ = 0.2 . Therefore, we set δ = 0.2 and θ = 0.1 in the following experiments.
Table 4 shows the sentiment highlight F1 score for these sentiment highlight extraction methods. The optimal value of each row is shown in bold. From the experimental results, it can be seen that, for different categories of movies, the experimental method has better experimental results with comedies and dramas, because the highlights of these movies are more concentrated, while, for action, horror and thriller movies, the experimental results are lower. On one hand, the sentiment type is relatively simple and single in comedy movies. Audiences have the same feeling when they watch happy clips. There is an agreement on the understandings of the clips. Therefore, the happy clips can be easily extracted as highlights, which makes the F1 score of the comedy genre higher than that of other genres. On the other hand, there are a greater number of various scenes in other genres, such as fighting in action movies, and jump scares in horror and thriller movies. The sentiment types are various and complex in those genres. It makes different audiences have different understandings, even when they watch the same scenes. Therefore, movies of those genres achieve a lower F1 score compared with comedy movies.
From the experimental results, we can see that our method with BERT has a higher F1 score than other methods with regards to action–adventure movies, comedies, fantasy movies, crime movies, and drama movies. This shows that our method has better universality for different categories of movies. However, in the genre of horror and thriller movies, the experimental results of our method are slightly worse than those proposed by Ping [6]. We speculate that this may be because people will be full of tension when watching horror and thriller movies, and there is a larger latency. Meanwhile, the values of the F1 score in our method are all greater than 0.5, while the method proposed by Ping performs poorly on some movies, such as Slumdog Millionaire. Therefore, our method performs more stably with respect to the method proposed by Ping. In summary, we find that the experimental results of our proposed method are better than MTER [5] and Ping [6]. The experimental results indicate that our method has a better performance with higher overall accuracy and F1 score than other methods. The experimental results show that our method has good results with different categories of movies, and has certain universality for various categories of movies.
As can be seen from Table 5, the average value of our method with the BERT overlapped number is better than other methods. We randomly selected one movie of each type from Table 2. The overlapped number of the movie is visually displayed in Figure 5. Figure 5 demonstrates that the overlapped number of our method with BERT is higher than other methods in most movies.
The results of our experiments show that different types of movies yield different results. Specifically, we found that the movie genre affects the emotional response of audiences, which in turn impacts the use of emotional words in the TSCs. For instance, the Charlie Chaplin and Secret Superstar overlapped numbers of these methods are high while the Pacific Rim overlapped numbers of these methods are low. Charlie Chaplin is a comedy movie with a relaxing and cheerful emotional tone, which increases the probability of audiences using straightforward emotional words such as “2333”, “funny”, and “interesting”. Similarly, in Secret Superstar, a movie with a profound conceptual theme, the plot twist can elicit a strong emotional response from audiences, leading them to express straightforward emotional words more frequently.
In contrast, for action movies such as Pacific Rim, audiences tend to pay more attention to fight scenes and special effects rather than the emotional content of the movie, resulting in a lower probability of using similar, straightforward emotional words. As a result, we observed a higher overlap in the emotional words used by audiences for Charlie Chaplin and Secret Superstar, and a lower overlap for Pacific Rim.
To investigate the influence of various similarity measures on experimental results, we conducted experiments using different similarity measures in conjunction with BERT. The employed measures encompassed the Euclidean distance, Pearson correlation coefficient, Manhattan distance, Minkowski distance, and cosine similarity. The experimental results, shown in Table 6, indicate that the employment of cosine similarity demonstrates a higher average F1 score and average overlapped number. Based on these results, we selected cosine similarity as the preferred measure for our method.

6.3. Evaluation of Sentiment Intensity Calculation

We randomly selected one movie of each type from Table 2 to show the experimental results of sentiment intensity. The experimental results of sentiment intensity information are listed in Table 7 after normalization (for each movie, three highlights are listed).
In Table 7, we can find the representative sentiment highlight information. We can see that, for different categories of movies, the distribution of sentiment on highlighted clips is not the same. In addition, these emotional distributions match our impressions of these movies. For instance, Charlie Chaplin is a comedy. The movie’s emotional fundamental key is relaxing, so the good dimension value is much higher than other dimensions. Furthermore, the sentiment intensity of Secret Superstar distributes on each dimension much more evenly instead of focusing on the same dimension. This is also in line with our expectations. Secret Superstar is a movie with various sentiments, which means its sentiment is complicated, and audiences may have quite different views on the same sentiment highlight.
To evaluate performance of the strategy of sentiment intensity calculation, we invited three experts who are professional in movie appreciation to label the sentiment intensity for each sentiment highlight.
After comparing sentiment highlights and intensities with their corresponding movie shots and origin TSC data, we found that our sentiment intensity can describe the sentiment information for sentiment highlights very well.

7. Conclusions and Future Work

In this paper, a time-sync-comments-based sentiment analysis model aimed at extracting sentiment highlights from videos and measuring sentiment intensity for highlights using TSCs is proposed. A four-step approach to extract video highlights and a strategy for calculating sentiment intensity are proposed, enabling the quantitative assessment of sentiment within these video highlights. The experimental results not only show that our approach improves the F1 score by 12.8% and overlapped number by 8.0% compared with the best existing method in highlight extraction, but also indicate a sentiment distribution in line with the corresponding movie scenes. Moreover, the proposed approach can be widely used for TSCs in various language. Strategies of sentiment highlight extraction and sentiment intensity calculation proposed in this paper focus on Chinese TSCs, but they can work on other languages by replacing grammar rules and sentiment analysis methods in other languages.
In the future, prior knowledge will be considered in highlight extraction strategy in order to improve the performances for those movie genres such as action, horror and thriller movies. Then, the sentiment dictionary will be continuously extended to increase the performances of sentiment intensity calculation.

Author Contributions

Conceptualization, J.L., Z.L. and X.M.; methodology, J.L. and Z.L.; experiment, Z.L., X.M. and G.Y.; validation, J.L. and Z.L.; formal analysis, Q.Z.; data curation, Z.L. and Q.Z.; writing, J.L., Z.L., C.Z. and G.Y.; supervision, C.Z.; project administration, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially supported by the National Key Research and Development Program of China (Grant Nos. 2021YFC3340601), the Science and Technology Program of Shanghai, China (Grant Nos. 20ZR1460500, 22511104300, 21511101503), the Natural Science Foundation of Shanghai, China (Grant Nos. 21ZR1423800), the ShanghaiMunicipal Science and Technology Major Project (2021SHZDZX0100) and the Fundamental Research Funds for the Central Universities.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The dataset and parameter configuration used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wu, B.; Zhong, E.; Tan, B.; Horner, A.; Yang, Q. Crowdsourced Time-Sync Video Tagging Using Temporal and Personalized Topic Modeling. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; Association for Computing Machinery: New York, NY, USA, 2014. KDD ’14. pp. 721–730. [Google Scholar] [CrossRef]
  2. Liao, Z.; Xian, Y.; Yang, X.; Zhao, Q.; Zhang, C.; Li, J. TSCSet: A Crowdsourced Time-Sync Comment Dataset for Exploration of User Experience Improvement. In Proceedings of the 23rd International Conference on Intelligent User Interfaces, Tokyo, Japan, 7–11 March 2018; Association for Computing Machinery: New York, NY, USA, 2018. IUI ’18. pp. 641–652. [Google Scholar] [CrossRef]
  3. Hu, Z.; Cui, J.; Wang, W.H.; Lu, F.; Wang, B. Video Content Classification Using Time-Sync Comments and Titles. In Proceedings of the 2022 7th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 22–24 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 252–258. [Google Scholar]
  4. Ping, Q.; Chen, C. Video highlights detection and summarization with lag-calibration based on concept-emotion mapping of crowd-sourced time-sync comments. arXiv 2017, arXiv:1708.02210. [Google Scholar]
  5. Pan, Z.; Li, X.; Cui, L.; Zhang, Z. Video clip recommendation model by sentiment analysis of time-sync comments. Multimed. Tools Appl. 2020, 79, 33449–33466. [Google Scholar] [CrossRef]
  6. Ping, Q. Video recommendation using crowdsourced time-sync comments. In Proceedings of the 12th ACM Conference on Recommender Systems, Vancouver, BC, USA, 2 October 2018; pp. 568–572. [Google Scholar]
  7. Pan, J.; Wang, S.; Fang, L. Representation Learning through Multimodal Attention and Time-Sync Comments for Affective Video Content Analysis. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 42–50. [Google Scholar]
  8. Cao, W.; Zhang, K.; Wu, H.; Xu, T.; Chen, E.; Lv, G.; He, M. Video emotion analysis enhanced by recognizing emotion in video comments. Int. J. Data Sci. Anal. 2022, 14, 175–189. [Google Scholar] [CrossRef]
  9. Bonifazi, G.; Cauteruccio, F.; Corradini, E.; Marchetti, M.; Terracina, G.; Ursino, D.; Virgili, L. Representation, detection and usage of the content semantics of comments in a social platform. J. Inf. Sci. 2022, 01655515221087663. [Google Scholar] [CrossRef]
  10. Harrando, I.; Reboud, A.; Lisena, P.; Troncy, R.; Laaksonen, J.; Virkkunen, A.; Kurimo, M. Using Fan-Made Content, Subtitles and Face Recognition for Character-Centric Video Summarization. In Proceedings of the International Workshop on Video Retrieval Evaluation, Gaithersburg, MD, USA, 17–19 November 2020. [Google Scholar]
  11. Rochan, M.; Ye, L.; Wang, Y. Video summarization using fully convolutional sequence networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 347–363. [Google Scholar]
  12. Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 4690–4699. [Google Scholar]
  13. Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 212–220. [Google Scholar]
  14. Bao, J.; Ye, M. Head pose estimation based on robust convolutional neural network. Cybern. Inf. Technol. 2016, 16, 133–145. [Google Scholar] [CrossRef] [Green Version]
  15. Patacchiola, M.; Cangelosi, A. Head pose estimation in the wild using Convolutional Neural Networks and adaptive gradient methods. Pattern Recognit. 2017, 71, 132–143. [Google Scholar] [CrossRef] [Green Version]
  16. Xiao, B.; Wu, H.; Wei, Y. Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 466–481. [Google Scholar]
  17. Kocabas, M.; Karagoz, S.; Akbas, E. MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  18. Li, J.; Wang, C.; Zhu, H.; Mao, Y.; Fang, H.S.; Lu, C. Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10863–10872. [Google Scholar]
  19. Zhang, Y.; Gao, J.; Yang, X.; Liu, C.; Li, Y.; Xu, C. Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12902–12909. [Google Scholar]
  20. Dai, B.; Fidler, S.; Urtasun, R.; Lin, D. Towards Diverse and Natural Image Descriptions via a Conditional GAN. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017; IEEE Computer Society: Washington, DC, USA, 2017; pp. 2989–2998. [Google Scholar] [CrossRef] [Green Version]
  21. Li, N.; Chen, Z. Image Captioning with Visual-Semantic LSTM. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; AAAI Press: Washington, DC, USA, 2018. IJCAI’18. pp. 793–799. [Google Scholar]
  22. Vadicamo, L.; Carrara, F.; Cimino, A.; Cresci, S.; Dell’Orletta, F.; Falchi, F.; Tesconi, M. Cross-media learning for image sentiment analysis in the wild. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 308–317. [Google Scholar]
  23. Xu, J.; Li, Z.; Huang, F.; Li, C.; Philip, S.Y. Social Image Sentiment Analysis by Exploiting Multimodal Content and Heterogeneous Relations. IEEE Trans. Ind. Inform. 2020, 17, 2974–2982. [Google Scholar] [CrossRef]
  24. Zhang, K.; Zhu, Y.; Zhang, W.; Zhu, Y. Cross-modal image sentiment analysis via deep correlation of textual semantic. Knowl.-Based Syst. 2021, 216, 106803. [Google Scholar] [CrossRef]
  25. Yadav, V.; Ragot, N. Text extraction in document images: Highlight on using corner points. In Proceedings of the 2016 12th IAPR Workshop on Document Analysis Systems (DAS), Santorini, Greece, 11–14 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 281–286. [Google Scholar]
  26. Song, K.; Yao, T.; Ling, Q.; Mei, T. Boosting image sentiment analysis with visual attention. Neurocomputing 2018, 312, 218–228. [Google Scholar] [CrossRef]
  27. Zheng, L.; Wang, H.; Gao, S. Sentimental feature selection for sentiment analysis of Chinese online reviews. Int. J. Mach. Learn. Cybern. 2018, 9, 75–84. [Google Scholar] [CrossRef]
  28. Qiu, X.; Sun, T.; Xu, Y.; Shao, Y.; Dai, N.; Huang, X. Pre-trained models for natural language processing: A survey. Sci. China Technol. Sci. 2020, 63, 1872–1897. [Google Scholar] [CrossRef]
  29. Xue, W.; Li, T. Aspect Based Sentiment Analysis with Gated Convolutional Networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Melbourne, Australia, 2018; pp. 2514–2523. [Google Scholar] [CrossRef] [Green Version]
  30. Zhang, M.; Zhang, Y.; Vo, D.T. Gated neural networks for targeted sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
  31. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
  32. Moholkar, K.; Rathod, K.; Rathod, K.; Tomar, M.; Rai, S. Sentiment Classification Using Recurrent Neural Network. In Proceedings of the Intelligent Communication Technologies and Virtual Mobile Networks, Tirunelveli, India, 14–15 February 2019; Springer: Cham, Switzerland, 2019; pp. 487–493. [Google Scholar]
  33. Elfaik, H.; Nfaoui, E.H. Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Arabic Text. J. Intell. Syst. 2021, 30, 395–412. [Google Scholar] [CrossRef]
  34. Chen, J.; Yu, J.; Zhao, S.; Zhang, Y. User’s Review Habits Enhanced Hierarchical Neural Network for Document-Level Sentiment Classification. Neural Process. Lett. 2021, 53, 2095–2111. [Google Scholar] [CrossRef]
  35. Chakravarthi, B.R.; Priyadharshini, R.; Muralidaran, V.; Suryawanshi, S.; Jose, N.; Sherly, E.; McCrae, J.P. Overview of the track on sentiment analysis for dravidian languages in code-mixed text. In Proceedings of the Forum for Information Retrieval Evaluation, Hyderabad, India, 16–20 December 2020; pp. 21–24. [Google Scholar]
  36. Al-Smadi, M.; Talafha, B.; Al-Ayyoub, M.; Jararweh, Y. Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews. Int. J. Mach. Learn. Cybern. 2019, 10, 2163–2175. [Google Scholar] [CrossRef]
  37. Marstawi, A.; Sharef, N.M.; Aris, T.N.M.; Mustapha, A. Ontology-based aspect extraction for an improved sentiment analysis in summarization of product reviews. In Proceedings of the 8th International Conference on Computer Modeling and Simulation, Canberra Australia, 20–23 January 2017; pp. 100–104. [Google Scholar]
  38. Abdi, A.; Shamsuddin, S.M.; Hasan, S.; Piran, J. Automatic sentiment-oriented summarization of multi-documents using soft computing. Soft Comput. 2019, 23, 10551–10568. [Google Scholar] [CrossRef]
  39. Bonifazi, G.; Cauteruccio, F.; Corradini, E.; Marchetti, M.; Sciarretta, L.; Ursino, D.; Virgili, L. A Space-Time Framework for Sentiment Scope Analysis in Social Media. Big Data Cogn. Comput. 2022, 6, 130. [Google Scholar] [CrossRef]
Figure 1. An Example of Time-Sync Comments in the Video Forrest Gump.
Figure 1. An Example of Time-Sync Comments in the Video Forrest Gump.
Entropy 25 01016 g001
Figure 2. Processes of sentimental highlight extraction.
Figure 2. Processes of sentimental highlight extraction.
Entropy 25 01016 g002
Figure 3. Groups of TSC words.
Figure 3. Groups of TSC words.
Entropy 25 01016 g003
Figure 4. Average F1 score and overlapped number count.
Figure 4. Average F1 score and overlapped number count.
Entropy 25 01016 g004
Figure 5. Sentiment highlight overlapped number.
Figure 5. Sentiment highlight overlapped number.
Entropy 25 01016 g005
Table 1. Notation list.
Table 1. Notation list.
SymbolDescription
vTSC commented video
T s t a r t Start time of video v
T e n d Finish time of video v
T v Video length (time duration)
F v Set of fragments in video v
N F The number of fragments in video v
f v , i i-th fragment in video v
T s t a r t , i Start time of fragment f v , i
T e n d , i Finish time of fragment f v , i
T f Length of fragment (time span)
IInterval between T s t a r t , i and T s t a r t , i + 1
H v Set of highlights in video v
N H The number of highlights in video v
h v , i i-th highlight in video v
SSet of k-type sentiments
s i i-th type in sentiment set S
E d , h v , i Sentiment intensity of highlight h v , i
e i Intensity value of sentiment type s j
B v Set of TSCs in video v
N B The number of TSC in the video v
B f v , i Set of TSC in fragment f v , i
bOne TSC in TSC set B v
w b Comment of TSC b
t b Time stamp of TSC b
u b User who sends TSC b
N U The number of users who send TSCs in video v
Table 2. Basic information of movies.
Table 2. Basic information of movies.
Movie NameMovie LengthMovie Type
Spider-Man:
Homecoming
133 min 32 sAction and Adventure
White Snake98 min 42 sAction and Adventure
Inception148 min 8 sAction and Adventure
Jurassic World Dominion147 min 12 sAction and Adventure
Pacific Rim131 min 17 sAction and Adventure
Transformers143 min 23 sAction and Adventure
Ready Player One139 min 57 sAction and Adventure
World War Z123 min 3 sAction and Adventure
Green Book130 min 11 sComedy
Charlie Chaplin144 min 30 sComedy
Let the Bullets Fly126 min 38 sComedy
Johnny English87 min 25 sComedy
Modern Times86 min 43 sComedy
The Croods: A New Age95 min 20 sComedy
La La Land128 min 2 sComedy
The Truman Show102 min 57 sComedy
Harry Potter and the
Philosopher’s Stone
158 min 50 sFantasy
Fantastic Beasts and
Where to Find Them
132 min 52 sFantasy
Kong118 min 32 sFantasy
Triangle98 min 59 sFantasy
The Shawshank
Redemption
142 min 29 sCrime
Catch Me If You Can140 min 44 sCrime
Slumdog Millionaire120 min 38 sCrime
Who Am I—
Kein System ist sicher
101 min 47 sCrime
Escape Room:
Tournament of Champions
88 min 5 sHorror
The Meg114 min 38 sHorror
Blood Diamond143 min 21 sThriller
Shutter Island138 min 4 sThriller
Secret Superstar149 min 47 sMusic
Heidi96 min 24 sFamily
Duo Guan134 min 45 sSport
Saving Private Ryan169 min 26 sWar
Source Code93 min 18 sAction
Dangal139 min 57 sAction
Table 3. Movies’ baseline highlights.
Table 3. Movies’ baseline highlights.
Movie
Name
Highlight
No.
Highlight
Playback
Time
Movie
Name
Highlight
No.
Highlight
Playback
Time
Pacific
Rim
118:48–19:11Charlie
Chaplin
11:00–1:55
220:20–21:0024:07–4:34
326:50–27:5737:08–7:33
452:09–52:5649:01–9:39
578:02–79:00520:26–21:17
679:26–80:51623:41–24:10
782:03–82:39724:48–25:30
889:01–90:12830:06–30:54
994:30–95:13937:50–38:37
1096:02–96:531039:05–39:58
11101:05–101:361142:03–42:32
12113:49–114:151245:23–45:57
13118:28–118:511354:28–55:17
14121:09–122:401455:48–56:39
1557:29–58:20
1694:49–95:33
17111:25–111:57
18118:01–118:50
19130:26–131:17
Harry Potter
and the
Philosopher’s Stone
10:00–1:30Catch Me
If You Can
11:25–1:54
212:24–13:1122:26–3:10
313:45–14:58320:29–20:55
421:00–22:11421:07–21:58
523:47–24:11524:06–24:35
626:25–27:00625:29–25:52
736:46–37:40726:20–26:54
840:50–42:34840:46–41:59
948:27–48:57955:45–56:10
1052:08–52:511058:25–58:55
1153:21–53:591159:50–60:20
1256:41–57:201261:23–62:16
1366:01–66:391375:49–76:12
1470:41–71:121484:43–85:30
1577:41–78:3215107:48–108:17
16108:44–109:1916126:09–126:59
17147:23–148:1817127:28–128:20
18150:05–150:5218128:44–129:20
19134:24–135:50
Blood
Diamond
16:40–7:20Secret
Superstar
153:45–54:35
224:45–25:10260:08–61:12
349:23–49:59366:41–67:12
455:46–56:11467:24–67:59
560:23–61:30572:21–72:53
668:20–69:00679:30–79:50
772:10–73:16781:26–81:51
880:43–81:34893:28–93:54
991:27–92:19996:00–97:10
1096:47–97:191097:40–98:14
11108:04–108:5711102:41–103:35
12109:29–109:5012110:23–111:33
13110:45–111:1013132:05–132:50
14115:44–116:1814134:21–135:18
15128:10–129:1215138:30–139:12
16131:42–132:1616139:42–140:36
17132:47–133:1117144:28–144:59
18134:05–135:3518145:29–145:53
19146:01–146:32
Table 4. Sentiment highlight F1 score.
Table 4. Sentiment highlight F1 score.
Movie NameRandomMTERPINGOur Method
(without Find Highlights)
Our Method
(with LDA)
Our Method
(with BERT)
Spider-Man:
Homecoming
0.1000.2000.5970.1670.3640.615
White Snake0.3000.0910.8240.2670.8240.828
Inception0.0830.2670.6500.2000.4000.588
Jurassic World
Dominion
0.1330.0620.5200.3560.5710.636
Pacific Rim0.0710.4670.5790.1100.7070.710
Transformers0.4090.4720.6090.3120.7330.661
Ready Player One0.2140.3660.7410.2680.6000.606
World War Z0.2000.3750.6860.3200.7300.733
Green Book0.2000.0910.6320.2670.5710.591
Charlie Chaplin0.1260.1500.7420.2530.8130.831
Let the Bullets Fly0.2140.0670.6490.2450.5860.545
Johnny English0.2310.3150.4290.1540.4970.770
Modern Times0.2000.4620.6550.2860.6560.750
The Croods:
A New Age
0.1670.1000.5000.3700.6860.717
La La Land0.2500.1540.6420.1110.7370.800
The Truman Show0.2960.4020.6230.3200.7090.714
Harry Potter and the
Philosopher’s Stone
0.1050.1210.6990.1500.7330.774
Fantastic Beasts and
Where to Find Them
0.1900.2110.6060.0740.7050.638
Kong0.3890.3920.7590.3030.8000.875
Triangle0.1000.1250.5000.2860.5330.625
The Shawshank
Redemption
0.1670.2860.6360.2220.5000.515
Catch Me If You Can0.1580.2710.5250.2680.7030.606
Slumdog Millionaire0.1430.3330.2990.1650.7070.652
Who Am I -
Kein System ist sicher
0.0830.2670.6120.2000.5730.575
Escape Room:
Tournament of Champions
0.1000.0910.8160.2670.7500.773
The Meg0.1540.2140.4220.1850.7000.742
Blood Diamond0.0560.2110.7470.2220.6540.701
Shutter Island0.1710.2670.6910.3900.6000.610
Secret Superstar0.1580.3750.6200.1800.8610.796
Heidi0.2500.3780.6770.2140.6360.653
Duo Guan0.1330.3430.4560.1700.5490.596
Saving Private Ryan0.2470.2110.6930.2670.7590.800
Source Code0.1430.2000.6920.1900.8070.923
Dangal0.1760.1670.4720.2990.6260.769
Average0.1800.2500.6180.2370.6580.697
Table 5. Average overlapped number.
Table 5. Average overlapped number.
RandomMTERPINGOur Method
(without Find Highlights)
Our Method
(with LDA)
Our Method
(with BERT)
Average overlapped number2.234.107.712.918.208.32
Table 6. F1 score and overlapped number with different similarity measures.
Table 6. F1 score and overlapped number with different similarity measures.
Euclidean DistancePearson Correlation CoefficientManhattan DistanceMinkowski DistanceCosine Similarity
Average F1 Score0.6600.6850.6190.6690.697
Average Overlapped Number7.668.216.947.738.32
Table 7. Representative sentiment highlights.
Table 7. Representative sentiment highlights.
Pacific RimCharlie Chaplin
Playback Time26:50–27:5796:02–96:53121:09–122:4037:50–38:3745:23–45:5794:49–95:33
Intensity Value0.03,0.62,0.0,0.0,
0.02,0.34,0.0
0.07,0.24,0.0,0.12,
0.05,0.45,0.06
0.10,0.61,0.0,0.0,
0.09,0.20,0.0
0.05,0.93,0.0,0.0,
0.02,0.0,0.0
0.04,0.80,0.0,0.12,
0.0,0.04,0.0
0.08,0.63,0.0,0.11,
0.0,0.18,0.0
Intensity FigureEntropy 25 01016 i001Entropy 25 01016 i002Entropy 25 01016 i003Entropy 25 01016 i004Entropy 25 01016 i005Entropy 25 01016 i006
Film PlotEntropy 25 01016 i007Entropy 25 01016 i008Entropy 25 01016 i009Entropy 25 01016 i010Entropy 25 01016 i011Entropy 25 01016 i012
Harry Potter and the Philosopher’s StoneCatch Me If You Can
Playback Time40:50–42:3477:41–78:32147:23–148:1840:46–41:5961:23-62:16126:09-126:59
Intensity Value0.10,0.64,0.0,0.10,
0.11,0.05,0.0
0.07,0.39,0.0,0.05,
0.04,0.46,0.0
0.28,0.35,0.0,0.07,
0.0,0.30,0.0
0.0,0.61,0.0,0.07,
0.07,0.25,0.0
0.30,0.36,0.0,0.0,
0.0,0.34,0.0
0.29,0.44,0.0,0.0,
0.11,0.15,0.0
Intensity FigureEntropy 25 01016 i013Entropy 25 01016 i014Entropy 25 01016 i015Entropy 25 01016 i016Entropy 25 01016 i017Entropy 25 01016 i018
Film PlotEntropy 25 01016 i019Entropy 25 01016 i020Entropy 25 01016 i021Entropy 25 01016 i022Entropy 25 01016 i023Entropy 25 01016 i024
Blood DiamondSecret Superstar
Playback Time60:23–61:3072:10–73:16108:04–108:5796:00–97:10110:23–111:33134:21–135:18
Intensity Value0.12,0.46,0.0,0.06,
0.04,0.32,0.0
0.15,0.51,0.0,0.0,
0.09,0.19,0.06
0.09,0.35,0.0,0.06,
0.0,0.49,0.0
0.20,0.34,0.0,0.14,
0.04,0.20,0.07
0.03,0.25,0.0,0.16,
0.04,0.53,0.0
0.26,0.42,0.0,0.03,
0.06,0.21,0.02
Intensity FigureEntropy 25 01016 i025Entropy 25 01016 i026Entropy 25 01016 i027Entropy 25 01016 i028Entropy 25 01016 i029Entropy 25 01016 i030
Film PlotEntropy 25 01016 i031Entropy 25 01016 i032Entropy 25 01016 i033Entropy 25 01016 i034Entropy 25 01016 i035Entropy 25 01016 i036
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Li, Z.; Ma, X.; Zhao, Q.; Zhang, C.; Yu, G. Sentiment Analysis on Online Videos by Time-Sync Comments. Entropy 2023, 25, 1016. https://doi.org/10.3390/e25071016

AMA Style

Li J, Li Z, Ma X, Zhao Q, Zhang C, Yu G. Sentiment Analysis on Online Videos by Time-Sync Comments. Entropy. 2023; 25(7):1016. https://doi.org/10.3390/e25071016

Chicago/Turabian Style

Li, Jiangfeng, Ziyu Li, Xiaofeng Ma, Qinpei Zhao, Chenxi Zhang, and Gang Yu. 2023. "Sentiment Analysis on Online Videos by Time-Sync Comments" Entropy 25, no. 7: 1016. https://doi.org/10.3390/e25071016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop