Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Performance Predictions of Sci-Fi Films via Machine Learning

Appl. Sci. 2023, 13(7), 4312; https://doi.org/10.3390/app13074312

by Amjed Al Fahoum^1,*

and Tahani A. Ghobon²

Reviewer 1:

K Meenakshi

Reviewer 2:

Reviewer 3:

Reviewer 4:

Reviewer 5:

Appl. Sci. 2023, 13(7), 4312; https://doi.org/10.3390/app13074312

Submission received: 31 December 2022 / Revised: 13 February 2023 / Accepted: 23 March 2023 / Published: 29 March 2023

(This article belongs to the Special Issue Advanced Computing and Neural Networks Applied in Learning Systems)

Round 1

Reviewer 1 Report

The presentation and organization of paper is good. Do the following changes:

1. What's the novelty of your work.

2. The methodology can be explained with neat explanation.

3. Comparison of existing system with the proposed work can be explained well.

Author Response

The authors appreciate the time and effort the reviewer has dedicated to providing valuable feedback on our manuscript. We are grateful to the reviewer for his/her insightful comments. We have incorporated changes to reflect the suggestions provided by the reviewer. We have tracked the changes in the revised manuscript. Here is a point-by-point response to the reviewers' comments and concerns.

What's the novelty of your work. "This study primarily assists with two primary outcomes by developing a viable method for predicting a film's early success: This work demonstrates how diverse types of freely accessible data, including structured data, network data, and unstructured data, may be collected, merged, and analyzed to train machine-learning algorithms. During the design and development of information system artifacts, these data-driven methodologies can assist firms with decision-making by providing insightful predictions and recommendations. It is the most precise recommendation among the algorithms and actions that may be taken to obtain optimal results from these data and algorithms. Second, this research offers numerous innovative approaches for predicting the early success of movies. Included are the film's plot, release date, producers, and directors, and introduce a feature scoring approach that reduces the complexity of the optimization algorithm of the ML method. These elements demonstrated that each component substantially impacted the system's performance and explained why movies are so popular. Contrarily to the previous ML algorithms, this paper provides a modified KNN-ML approach as an alternate approach to open space risk reduction that employs k-nearest neighbor approaches to discover discriminative characteristics of the feature set. This study aims to make a machine learning algorithm that improves on what we already know and gives a reasonable estimate of the success rate based on what we know, how we act, and how well the algorithm works. We will also show that our method has predictive value by showing how it can be used to suggest a group of agents that will bring in the most money. This study shows how predictive and prescriptive data analytics could help the science fiction movie industry in the future. However, it may be possible to create a model that can anticipate how a movie will do by analyzing box office returns and critic scores. This research uses fourteen machine learning algorithms to make predictions about a film's box office performance. These algorithms and their performance are evaluated and contrasted. "
The methodology can be explained with neat explanation. "The methodology section is completely revised and updated to connect the gaps and make it readable, simple, and fluent. Contributions at each level are clarified and revealed. The features and their numerical representations are included and a new implementation flowchart is included. All changes are colored differently".
Comparison of the existing system with the proposed work can be explained well. " the comparison part is updated to reflect the superiority of the algorithm with newly published works. Further, future directions and algorithm changes are also included. More references were included and analyzed. The conclusions also were updated to reflect these outcomes."

Reviewer 2 Report

The topic of the paper is interesting but the method used in the paper is out of date. The paper used simple machine learning tools in Matlab to predict filming success.

Author Response

The authors thank the editor and reviewers for their prompt response and constructive remarks. The authors appreciate the time and effort the reviewer has dedicated to providing valuable feedback on our manuscript. We are grateful to the reviewer for the insightful comments. We have incorporated changes to reflect the suggestions provided by the reviewer. We have tracked the changes in the revised manuscript. Here is a point-by-point response to the reviewers' comments:

More recent references were included.
Introduction and its references were updated and corrected.
The methodology is revised and further clarified. A new section for feature extraction and numerical representation is added.
Comparisons were further exploited and future directions on the algorithm and the feature extractions were embedded.
Conclusions are fully revised and updated.

Reviewer 3 Report

This paper a multiple MATLAB-implemented machine-learning algorithms are investigated to classify and predict the financial success of movies ; the proposed measure has the room to be improved before the acceptance of the manuscript.

1.The Abstract section is too long. Please short and include the instruction of the proposed method, the advantage of the method, and the main results.

2: Introduction should be clearly presented to highlight main ideas and motivation behind the

proposed research. Please include and clearly state research question and motivation of proposed

study in Introduction. The author should be covering the research gap.

3- The related works section needs to be completed, and the authors should review more studies and discuss recent papers.

4. It would be good to have an overview of the proposed framework (preferably a system diagram or flowchart), to give readers a quick understanding of the framework

5. Section experiment, it would be good to have more information about how experiments have been conducted. What tools/software has been used?

6- The conclusion is weak and should be rewritten, and the authors should provide a more comprehensive analysis of the obtained results.

7. Figure captions need to be expanded to make them self-explained.

8. the authors should analyze how to set the parameters of the proposed methods in the framework. Do they have the “optimal” choice?

9.The following papers on the same topic should be cited and discussed:

1. Robust graph regularization nonnegative matrix factorization for link prediction in attributed networks (2022)

2. Graph Regularized Nonnegative Matrix Factorization for Community Detection in Attributed Networks (2021)

Author Response

The authors thank the editor and reviewers for their prompt responses and constructive remarks. The authors appreciate the time and effort the reviewer has dedicated to providing valuable feedback on our manuscript. We are grateful to the reviewer for the insightful comments. We have incorporated changes to reflect the suggestions provided by the reviewer. We have tracked the changes in the revised manuscript. Here is a point-by-point response to the reviewers' comments:

1. The Abstract section is too long. Please short and include the instruction of the proposed method, the advantage of the method, and the main results.

"The Abstract is modified as indicated above; it includes the problem statement, the methods conducted and the advantages of these methods with some results and conclusive remarks.

2: Introduction should be clearly presented to highlight main ideas and motivation behind the proposed research. Please include and clearly state the research question and motivation of the proposed study in the Introduction. The author should be covering the research gap.

The followings dealt with this comment:

More recent references were included.
The Introduction and its references were updated and corrected.
The novelty of the paper is further elaborated on and revealed.
The paper's description is modified to ensure fluency and logical connectivity among the flow of ideas.
- The related works section needs to be completed, and the authors should review more studies and discuss recent papers. It would be good to have an overview of the proposed framework (preferably a system diagram or flowchart to give readers a quick understanding of the framework
"Nine recent references were included in the literature review and reasonably included within the manuscript without jeopardizing the flow of information's meaning or logical order.
The methodology section is fully revised to fill the gaps and ensure the algorithm is entirely understandable and representative, as proposed by the problem statement. Only apriori data are used to achieve the results of this study. A new flowchart describing the implementation of the algorithm is included. A new section for feature extraction and numerical representation is added.
5. Section experiment, it would be good to have more information about how experiments have been conducted. What tools/software has been used?
A numerical representation of the features is included.
Feature vector after processing and cleaning is determined.
The clean feature set is divided into training and testing sets.
A flow chart of the algorithm is added.
A description of the flow of the experimental results is included.
6- The conclusion is weak and should be rewritten, and the authors should provide a more comprehensive analysis of the obtained result
The conclusion section is rewritten to reflect the added changes to the article and ensure it reveals its results' quality.
7. Figure captions need to be expanded to make them self-explained
Figure captions are revised.
8- the authors should analyze how to set the parameters of the proposed methods in the framework. Do they have the "optimal" choice?
"The feature selection process and the way they are achieved is now explained and justified."
9- The following papers on the same topic should be cited and discussed:
"New nine references were added, and recent regularization papers were discussed.
Comparisons were further exploited, and future directions on the algorithm and the feature extractions were embedded.
The conclusions are fully revised and updated.

Reviewer 4 Report

This study used machine learning algorithms to predict the performance (or level of popularity) of Sci-Fi movied. The topic is interesting and has potential to be extended to move/TV program rating. However, the data processing and model training procedure were not solid enough and need improvement.

1. There is not clear description of the datasets used for training and testing the machine learning models. A section for data description should be added, elaborating the data size, features, dimensionality, and training/testing data split.

2. To leverage machine learning, features (or factors describing a Sci-Fi movie) are crucial to the prediction accuracy and effectiveness. However, this paper does not sufficiently clarify these factors, how and why they can reflect the movie performance.

3. The timeliness of IMBD data is questionable. IMBD data are historical records about the movies' performance after they are released. In that sense, the data already reflect the movie performance and machine learning is unnecessary. No prediction is needed anymore. In that sense, this study is not practical or useful. To make things work, I recommend that the authors look at the factors of movies during their production stage (before public release) and use them to train machine learning models.

Author Response

There is no clear description of the datasets used for training and testing the machine learning models. A section for data description should be added, elaborating the data size, features, dimensionality, and training/testing data split."The methodology section is revised to discuss the dataset, feature selection, feature calculation, feature training set, and testing set followed by an implementation flowchart."
To leverage machine learning, features (or factors describing a Sci-Fi movie) are crucial to the prediction accuracy and effectiveness. However, this paper does not sufficiently clarify these factors, how and why they can reflect the movie performance. "Feature set was explained and further justified."
The timeliness of IMBD data is questionable. IMBD data are historical records about the movies' performance after they are released. In that sense, the data already reflect the movie performance and machine learning is unnecessary. No prediction is needed anymore. In that sense, this study is not practical or useful. To make things work, I recommend that the authors look at the factors of movies during their production stage (before public release) and use them to train machine learning models. It sounds the previous version of the article was briefed. Only apriori data are used to achieve the results of this study. A new flowchart describing the implementation of the algorithm is included. A new section for feature extraction and numerical representation is added.
A numerical representation of the features is included.
Feature vector after processing and cleaning is determined.
The clean feature set is divided into training and testing sets.
A flow chart of the algorithm is added.
A description of the flow of the experimental results is included. Only features that can be extracted from the IMDB and social networks before the movie's release are employed in this study. Future analysis will further exploit and features and other optimizing algorithms. Discussions and conclusions were updated to ensure this information and its related explanation are clarified throughout the abstract, introduction, and methodology.

Reviewer 5 Report

1. What is the optimization process or methods used in the research?

2. What is the novelty of this work? Explain in detail based from the gaps found in the literature

3. The feature selection technique is not specified for data reduction

4. When comparing data, it will be much better to tell the story into graphs than tables (Table 7). It is hard to grasp the information in a tabular form

5. What are the lessons for the readers or researchers of this study? What new information can be extracted from the knowledge gained? How is this study useful? State clearly in the conclusions.

6. Fix grammar and structure of sentences.

7. Why do you think the Fine, Weighted, and Medium KNN types outperform other machine learning? What properties does it possess that proved its advantages?

8. Is there a data scaling methods used to mitigate the possibility of under or overfitting?

Author Response

What is the optimization process or methods used in the research?

“Fourteen Machine learnings algorithms were exploited in this study to optimize the prediction of the movie rate based on historical data and information of the film before its production”

What is the novelty of this work? Explain in detail based from the gaps found in the literature

“ This study primarily assists with two primary outcomes by developing a viable method for predicting a film's early success: This work demonstrates how diverse types of freely accessible data, including structured data, network data, and unstructured data, may be collected, merged, and analyzed to train machine-learning algorithms. During the design and development of information system artifacts, these data-driven methodologies can assist firms with decision-making by providing insightful predictions and recommendations. It is the most precise recommendation among the algorithms and actions that may be taken to obtain optimal results from these data and algorithms. Second, this research offers numerous innovative approaches for predicting the early success of movies. Included are the film's plot, release date, producers, and directors, and introduce a feature scoring approach that reduces the complexity of the optimization algorithm of the ML method. These elements demonstrated that each component substantially impacted the system's performance and explained why movies are so popular. Contrarily to the previous ML algorithms, this paper provides a modified KNN-ML approach as an alternate approach to open space risk reduction that employs k-nearest neighbor approaches to discover discriminative characteristics of the feature set. This study aims to make a machine learning algorithm that improves on what we already know and gives a reasonable estimate of the success rate based on what we know, how we act, and how well the algorithm works. We will also show that our method has predictive value by showing how it can be used to suggest a group of agents that will bring in the most money. This study shows how predictive and prescriptive data analytics could help the science fiction movie industry in the future. However, it may be possible to create a model that can anticipate how a movie will do by analyzing box office returns and critic scores. This research uses fourteen machine learning algorithms to make predictions about a film's box office performance. These algorithms and their performance are evaluated and contrasted. ”

The feature selection technique is not specified for data reduction

2.3 Feature Selection and Valuation

To identify the factors that may influence the movie's success, apriori data from IMDB were analyzed to identify those features that strongly affect the movie's rating. The regression of the following parameters Run time in minutes, year, number of votes, and Box office are shown in Table 4.

Table 4: Correlation matrix

	IMDB_Rating	Runtime_mins	Year	Num_Votes	Box_Office
IMDB_Rating	1.000000	0.392518	-0.201970	0.555694	0.283635
Runtime_mins	0.392518	1. 000000	0.066925	0.430708	0.451454
Year	-0.201970	0.066925	1.000000	0.030860	0.113318
Num_Votes	0.555694	0.430708	0.030860	1.000000	0.639275
IMDB_Rating	0.283635	0.451454	0.113318	0.639275	1.000000

It is evident from the table that these features are essential in determining the level of success—a strong positive correlation exists between run time, number of votes, and box office. However, a negative impact will affect the ratings over the years. The following data were also utilized as essential data to generate the feature vector for the ML algorithm: movie name, release year, release month, movie stars, directors, production studio, budget, IMDb Meta Score (Metacritic), IMDb rating, IMDb user vote, genres, and top actors’ social media followers (Instagram, Twitter, and Facebook). The analysis revealed a strong relationship between the factors described and the regression results. These factors play a significant role in movie success. By using these factors, calculating the success rate of an upcoming movie may be feasible. Every feature selected from the data set was given a numerical value between 0 and 10 to mimic the overall rating criteria of the movie. The following table shows the features and their corresponding formulas that provide numerical values between 0 and 10.

Table 5. Feature numerical representation

Feature	Numerical Formula
Actor	10* Average actor’ Movie ratings over five years./max rating of the actors during the five years
Director	10* Average director’s movie ratings/maximum rating of the directors
Month	Average ratings of the movies in that month
year	10 – 0.2 * number of years
Run time	Probability of run time* 10;
Production studio	10 *(The sum of ratings of the studio / Total sum of ratings of the top six studios)
Critic review of the movie story	The sum of positive critics in the first 1000 story reviews/(sum of all positive and negative critics in the 1000 story reviews)
Genres	10* sum of ratings for Sci Fic genre./total ratings for all genre;
Movie stars	The average rating of the top 5 actors in the movie
Top actor’s social media followers	10*(The number of followers of the top five actors in the movie./ top maximum number of flowers for five actors.)
Expected budget	Regression equation between IMDB ratings and budget

2.4 Dataset Preparation

Figure 6 depicts the implementation structure of the proposed algorithm. The first step is data collection because predictions are based on historical information. Two popular and complementary sources, IMDb and Box Office, were selected to assess the algorithm's efficacy. IMDb provides more detailed story summaries, while Box Office provides complete information on movie earnings and budgets. In other words, the two data sources can be combined to obtain data about several films.
Regarding data collection techniques, the two sources are distinct. IMDb offers movie data via an application programming interface (API). The public can only access Box Office's data through its website.
The second step involves cleaning, transforming, consolidating, and storing data from both sources in a database. During this project, acquired data is formatted consistently, and duplicates are eliminated from the database. In film titles, only alphabetic and numeric characters are used. This standardization ensures that extraneous characters do not impede the matching of titles between the two data sources. The feature selection procedure ensures that representative weighting factors are introduced using only historical data, as shown in Table 5. The third step entails the creation of features that will ultimately be used to train a predictive model with the collected data. The feature set is partitioned into a training and testing feature set in which 80 percent of the data is trained with the appropriate machine learning algorithm. When the feature vector has been determined, it is fed into the machine learning algorithm. A predictive model is trained using a complete and robust set of features. Employing machine learning techniques, the optimal prediction model and its parameters are identified based on accuracy, precision, and recall. The results are finally compiled and compared.

Figure 6: Implementation Model

When comparing data, it will be much better to tell the story into graphs than tables (Table 7). It is hard to grasp the information in a tabular form.

“Thank you for this comment, a new figure is included for the table. The table is kept for numerical precisions”

What are the lessons for the readers or researchers of this study? What new information can be extracted from the knowledge gained? How is this study useful? State clearly in the conclusions.
" The discussion section is further expanded to include comparisons, results, and future directions in this area of research. This indeed necessitates the modification of the abstract and conclusion to be in harmony with all the embedded changes in the manuscript"

Lessons learned

Fix grammar and structure of sentences.

Sure, the paper is fully revised against grammar mistakes and typos.

Why do you think the Fine, Weighted, and Medium KNN types outperform other machine learning? What properties does it possess that proved its advantages?

“ The optimization of the KNN algorithm and the reality that the algorithm is based on a non parametric approach in addition to the feature selection, extraction, and calculations all contributed to this improvement.”

Is there a data scaling methods used to mitigate the possibility of under or overfitting?

“ scaling is included as part of the data to provide under or overfitting”

Round 2

Reviewer 3 Report

The authors worked hard to improve their paper. I recommend the paper for acceptance.

Reviewer 5 Report

I accept the revisions. No more major suggestions.

Article Menu

Performance Predictions of Sci-Fi Films via Machine Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI