Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark

Electronics 2022, 11(16), 2567; https://doi.org/10.3390/electronics11162567

by Mousumi Chaudhury^*

, Amin Karami

and Mustansar Ali Ghazanfar

Reviewer 1:

Agata Giełczyk

Reviewer 2:

Jian Wu

Reviewer 3:

Thirunavukarasu Ramkumar

Reviewer 4:

Süleyman Eken

Electronics 2022, 11(16), 2567; https://doi.org/10.3390/electronics11162567

Submission received: 22 June 2022 / Revised: 12 August 2022 / Accepted: 14 August 2022 / Published: 17 August 2022

(This article belongs to the Special Issue Big Data Technologies: Explorations and Analytics)

Round 1

Reviewer 1 Report

The article describes the music genre classification. Authors clearly proved that this issue has become important recently. However, the soundness and scientific impact of this paper is very limited.

The Authors used a public dataset and perform a bit of data analysis. Then, they implemented 4 well known algorithms and compared the results. Unfortunately, doing the hyper-parameter tuninng is not a science yet.

The graphical side of the article also need be strongly improved. The Figure 25. The classification report for ten music genres - a figure is not needed at all. The Figure 26. Comparative analysis with different classifiers - it is not a comparative analysis - it is a bar chart presenting accuracy (again not useful at all). Figure 22. - again not informative - It has just been written: 'After the process of hyperparameter tuning, the classifier has achieved 90 percent accuracy for the music genre classification.' Do we need figures 12-21 to get the main idea of the article? In my opinion definitely not. Figures 1 and 2 present the same. Figure 27 and Table 5 present the same.

The equation have to be presented as equations (lines 536-539).

In line 621 the reference [22] is visible - is it a reference to article 22 (Kumar et al.) or to the Fig. 22?

The methodology is also not clear. We can see the results Accuracy, F1, Precision and Recall (e.g. in Table 5). Is this value given for all classes (average?) or of single class (which one?)

All in all, the presented paper cannot be accepted to the journal with IF=2.657.

Author Response

Response to Reviewer 1 Comments

Point 1: The article describes the music genre classification. Authors clearly proved that this issue has become important recently. However, the soundness and scientific impact of this paper is very limited.

Response 1: The answer to this comment is at page 25 (lines 818-824)

The present research is the solution of three research questions:

How can an audio data be analysed and classified into a group of similar kinds of audios?
What is the best possible way to achieve the highest rate of classification accuracy?
Which technology can be used to reduce the duration of data processing without computational cost? (I have added at page 25 lines 823-824)

I have added the following paragraph at page 25 and 26 (lines 849-869)

Random Forest is a combination of many decision trees to reduce the risk of overfitting and noise in the dataset. It uses the ensemble learning method and follows a Supervised learning algorithm. An ensemble learning method uses independent classifiers. This independent classifier either uses different algorithms on the same training samples or uses a similar algorithm trained on different subsets of the training sample.This classifier follows the Bagging procedure to generate the classification result. In the Bagging procedure, the dataset is randomly divided into different subsamples or training samples to train the same algorithm in parallel. After that, the individual predictions of those classifiers are combined to generate a final prediction. The combining procedure is done either by voting or averaging the individual predictions. Therefore, a set of decision trees in a Random Forest are trained in parallel by randomly selected features. The randomly selected features are generated with a replacement procedure from the original training samples. These features help the random forest model to reduce correlations between the feature attributes. The votes from each decision tree are collected and aggregated to a single class following the Bagging procedure. The classification result depends on the selected class which receives the most votes that answers the second research questions.

An in-memory, distributed, open source, cluster computing framework Apache Spark is used to reduce the processing time of machine learning predictions with no computational cost that answers the third research question. This paper contains appropriate data, methodologies for solving classification problem in the domain of music. This can be considered as the soundness and scientific impact of this paper.

Point 2: The Authors used a public dataset and perform a bit of data analysis. Then, they implemented 4 well known algorithms and compared the results. Unfortunately, doing the hyper-parameter tuning is not a science yet.

Response 2: Hyperparameters tuning is important because they directly control the behaviour of the training algorithm and have a significant impact on the performance of the model is being trained. A good set of hyperparameters can increase the performance of the model. This is highlighted in page 19 (lines 606-638)

Point 3: The graphical side of the article also need be strongly improved. The Figure 25. The classification report for ten music genres - a figure is not needed at all. The Figure 26. Comparative analysis with different classifiers - it is not a comparative analysis - it is a bar chart presenting accuracy (again not useful at all). Figure 22. - again not informative - It has just been written: 'After the process of hyperparameter tuning, the classifier has achieved 90 percent accuracy for the music genre classification.' Do we need figures 12-21 to get the main idea of the article? In my opinion definitely not. Figures 1 and 2 present the same. Figure 27 and Table 5 present the same.

Response 3: The figure 25 is changed to a table 6 to represent the classification report (Page 22, line 680). The Figure 26- comparative analysis with different classifiers is removed (Page 22 line 697). The Figure 27 is replaced by table 7(page 22). The Figure 22 is replaced by a table 3 with initial hyperparameters (Page 19). The table 4(page 20) represents tuned hyperparameters. Figures 12 to 21 are removed following the advice (Page 19). Figure 2 (Page 9) and table 7(Page 22) are kept.

Point 4: The equation has to be presented as equations (lines 536-539).

Response 4: The equation (lines 555-556) is presented as an equation (Page 18) with latex command.

Point 5: In line 621 the reference [22] is visible - is it a reference to article 22 (Kumar et al.) or to the Fig. 22?

Response 5: The reference is directed to table 4 (Page 20, line 638).

Point 6: The methodology is also not clear. We can see the results Accuracy, F1, Precision and Recall (e.g., in Table 5). Is this value given for all classes (average?) or of single class (which one?)

Response 6: These values are an average score for all ten classes-Blues, Classical, Country, Disco, Hip hop, Jazz, Metal, Pop, Reggae, Rock. (Table 6, Page 22).

Author Response File: Author Response.pdf

Reviewer 2 Report

This article proposes a new combination of Apache Spark and machine learning algorithms to deal with Large-scale Music Genre Analysis. This topic is interesting and its innovation. But it has the following question:

1.The introduction dose not be focused on the research questions, and then note your innovation. The abstract is to long.

2.What is the advantage of the combination of Apache Spark and machine learning algorithms comparing other methods. The authors should show an example or some proof.

3. There exists some Classification methods by Machine Learning, please refer the following works:

Unsupervised anomaly detection based method of risk evaluation for road traffic accident. Applied Intelligence, DOI:10.1007/s10489-022-03501-8

A group consensus-based travel destination evaluation method with online reviews. Applied Intelligence. 52:1306-1324.

4.The authors should make a compare with other work, and then show your advantage. Numerical experiments were carried out for comparison.

Author Response

Response to Reviewer 2 Comments

Point 1: The introduction does not be focused on the research questions, and then note your innovation. The abstract is too long.

Response 1: The response of this comment is already highlighted in this paper.

The introduction chapter of this paper focuses on background information that puts the research in context. This chapter also points out the value of the research and specifies research aims and objectives.

The contribution of this paper is enumerated below: (Page 3, lines 99 - 108)

Statistical features of music from a big size single dataset GTZAN are analysed and visualised.
An ensemble learning classifier Random Forest is developed and implemented to classify the types of music (music genres).
An in-memory distributed computing framework Apache Spark is used to process the data in parallel to reduce the duration of machine learning predictions without computational cost.
Multiple machine learning classifiers supported by Apache Spark such as Naïve Bayes, Decision Tree, and Logistic Regression are also implemented to compare the performance efficiency with Random Forest for the classification of music genres.

Point 2: What is the advantage of the combination of Apache Spark and machine learning algorithms comparing other methods. The authors should show an example or some proof.

Response 2:

The paragraph is added to support the comment at page 6, lines (263-266).

Training a machine learning model using a convolutional neural network using a deep learning approach and a traditional neural network not only increases training time but also increases computational cost. The combination of Apache Spark and machine learning algorithms are used to experiment several real-life problems such as banking data analysis, performance prediction.

Point 3: There exists some Classification methods by Machine Learning, please refer the following works:

Unsupervised anomaly detection based method of risk evaluation for road traffic accident. Applied Intelligence, DOI:10.1007/s10489-022-03501-8

A group consensus-based travel destination evaluation method with online reviews. Applied Intelligence. 52:1306-1324.

Response 3: The following works are added in Page 1, line 26 (introduction chapter) and references as well.

Unsupervised anomaly detection based method of risk evaluation for road traffic accident. Applied Intelligence, DOI:10.1007/s10489-022-03501-8

A group consensus-based travel destination evaluation method with online reviews. Applied Intelligence. 52:1306-1324.

Point 4:. The authors should make a compare with other work, and then show your advantage. Numerical experiments were carried out for comparison.

Response 4: A paragraph is added with the reference of the following paper at page 26,lines 899-904 (Analysis and discussions section).

Histograms of Daubechies wavelet coefficients (Daubechies Wavelet Coefficient Histograms) of music signal are computed in a comparative study to capture the local and global information of music signal. This new feature improves the accuracy of music genre classification.However, the correlation coefficient and descriptive statistics of statistcal features are computed in this paper to measure the relations between different features and target variable.

Li, T., Ogihara, M. and Li, Q., 2003, July. A comparative study on content-based music genre classification. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (pp. 282-289).

Author Response File: Author Response.pdf

Reviewer 3 Report

Authors are asked to elaborate the Big Data Environment setup they have executed? Especially number of clusters used for SPARK Framework execution. Performance efficiency at par with number of clusters etc.

Also I am strongly suggest to include the following works in the literature section which are based on spark based architecture / Ensemble learning.

A distributed tree-based ensemble learning approach for efficient structure prediction of protein Xavier, L.D., Thirunavukarasu, R. International Journal of Intelligent Engineering and Systems, 2017, 10(3), pp. 226–234

Author Response

Response to Reviewer 3 Comments

Point 1: Authors are asked to elaborate the Big Data Environment setup they have executed? Especially number of clusters used for SPARK Framework execution. Performance efficiency at par with number of clusters etc.

Response 1: In this paper, a single machine running spark engine is used. The hardware environment utilized in this paper is a laptop powered by Intel(R) Core (TM) i5-1035G1 Central Processing Unit. The processor is with 8 GB of Random Access Memory. The system operates on Windows 10 personal 64-bit operating system. (Page 18, lines 582-586)

Point 2: Also, I am strongly suggest to include the following works in the literature section which are based on spark based architecture / Ensemble learning.

Response 2: This paper is cited at page 4, lines 191- 195.

A spectacular approach is adopted by Xavier and Thirunavukarasu(2017). The author implemented an ensemble learning distributed approach to recognize the Protein secondary structures. The investigation revealed that the efficiency of ensemble approach for the classification of protein secondary structure is better with distributed environment such as Apache Spark.

Author Response File: Author Response.pdf

Reviewer 4 Report

Abstract needs to be re-written and please also address following ones:

(i) State the key results/findings of proposed work. (ii) State the percentage improvement by proposed work, compared with existing works.

Some figures such as Figs. 1, 23, and 25 need to be revised. It should be table form instead of screenshot of code output.
Please give a framework/flowchart of the proposed system.
Please also give scalability analysis with different size of inputs and different number of computing nodes.
Please cite following papers:

Playlist Generation via Vector Representation of Songs, 2016

An exploratory teaching program in big data analysis for undergraduate students, 2020

Musicbert: Symbolic music understanding with large-scale pre-training, 2021

Music Genre Classification using Transfer Learning on log-based MEL Spectrogram, 2021

It is pretty hard to say that it’s a novel or state of the art work, as plenty of works have already been carried out in this domain and authors did not compared their work with any state-of-the-art work.

Author Response

Response to Reviewer 4 Comments

Point 1: Abstract needs to be re-written and please also address following ones.

(i) State the key results/findings of proposed work. (ii) State the percentage improvement by proposed work, compared with existing works.

Response 1: These paragraphs are added at page 1, lines 12-16 and 18-22.

(i) State the key results/findings of proposed work.

Apache Spark is used in this paper to reduce the computation time for machine learning predictions with no computational cost as it focuses on parallel computation. This present work also demonstrates that the perfect combination of Apache Spark and machine learning algorithms reduces the scalability problem of the computation of machine learning predictions. (Page 1)

(ii) State the percentage improvement by proposed work, compared with existing works.

The experimental outcome shows that the developed Random Forest classifier can establish a high level of performance accuracy, especially for mislabelled, distorted GTZAN dataset. This classifier has outperformed other machine learning classifiers supported by Apache Spark in the present work. The Random Forest classifier is managed to achieve 90 percent accuracy for music genre classification compared to others work in the same domain. (Page 1)

Point 2: Some figures such as Figs. 1, 23, and 25 need to be revised. It should be table form instead of screenshot of code output.

Response 2: Fig 1 is replaced by Fig 2 (Page 9). Fig 25 is replaced by table 6(page 22) and table 5 represents fig 23(page 21).

Point 3: Please give a framework/flowchart of the proposed system.

Response 3: A flowchart is added to denote the flow of the investigation process under the methodology section. (Page 7, line 298)

Point 4: Please also give scalability analysis with different size of inputs and different number of computing nodes.

Response 4: In this paper, a single machine running spark engine is used. The hardware environment utilized in this paper is a laptop powered by Intel(R) Core (TM) i5-1035G1 Central Processing Unit. The processor is with 8 GB of Random Access Memory. The system operates on Windows 10 personal 64-bit operating system. (Page 18, lines 582-586). In addition, the scalability analysis for Random Forest classifier is highlighted in this paper. (page 19 and 20, lines 606-638, table 3 and table 4).

I investigated the performance accuracy of the developed Random Forest classifier with several combinations of inputs and different number of computing nodes (page 19 and 20, lines 606-638, table 3 and table 4). I tried the input combinations of numTrees and maxDepth hyperparameters in the range of 20 to 200 and 10 to 23 respectively. I also experimented maxBins and impurity hyperparameters with several input combinations such as 32 to 132, Gini and entropy respectively. I have managed to achieve the performance accuracy of the developed Random Forest classifier in the range of 62% to 90%. The input combinations for 90% accuracy are displayed in table 4(page 20).

Point 5: Please cite following papers:

Playlist Generation via Vector Representation of Songs, 2016

An exploratory teaching program in big data analysis for undergraduate students, 2020

Musicbert: Symbolic music understanding with large-scale pre-training, 2021

Music Genre Classification using Transfer Learning on log-based MEL Spectrogram, 2021

Response 5: The literatures from these four cited papers are extracted and added in the literature review section (Page 4, lines 195-201).

Point 6: It is pretty hard to say that it’s a novel or state of the artwork, as plenty of works have already been carried out in this domain and authors did not compared their work with any state-of-the-art work.

Response 6: A research paper Karunakaran and Nagamanoj(2018) has been cited to compare their work with the present work.(Page 26, lines 904-908).

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Thank you for the response.

In my opinion your contribution is very limited and thus, should not be accepted for publication.

In the manuscript there are still numerous technical and stylistic errors - see table 8 too wide, Figures 12 and 13 presenting the same thing, eg. 4 with convolution operation (*) not multiplication, missing citation in line 309 and many others.

Author Response

Response to Reviewer 1(Round 2) Comments

Point 1: In the manuscript there are still numerous technical and stylistic errors - see table 8 too wide, Figures 12 and 13 presenting the same thing, eg. 4 with convolution operation (*) not multiplication, missing citation in line 309 and many others.

Response 1: The length of the table 8 is adjusted (page 27). Figure 12 is removed and Figure 13(renamed as figure 12) is retained (Page 21). The convolution operation (*) is replaced by multiplication (×) (Page 18, equation 4). The citation in line 309 is removed as the figure corresponding to the citation is omitted due to reviewing process (page 8, line 312).

All paragraphs, citations, tables, and figures are reviewed to avoid any technical and stylistic errors throughout the paper.

Author Response File: Author Response.pdf

Article Menu

Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark

Further Information

Guidelines

MDPI Initiatives

Follow MDPI