Next Article in Journal
A Bibliometric Analysis of Product-Service Systems’ Design Methodologies: Potential Root-Cause Identification of PSS’ Failures
Previous Article in Journal
Advances in Biological Nitrogen Removal of Landfill Leachate
Previous Article in Special Issue
Security Risk Modeling in Smart Grid Critical Infrastructures in the Era of Big Data and Artificial Intelligence
 
 
Article
Peer-Review Record

Open Data Based Urban For-Profit Music Venues Spatial Layout Pattern Discovery

Sustainability 2021, 13(11), 6226; https://doi.org/10.3390/su13116226
by Xueqi Wang and Zhichong Zou *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Sustainability 2021, 13(11), 6226; https://doi.org/10.3390/su13116226
Submission received: 5 March 2021 / Revised: 29 April 2021 / Accepted: 25 May 2021 / Published: 1 June 2021
(This article belongs to the Special Issue Big Data in a Sustainable Smart City)

Round 1

Reviewer 1 Report

The paper deals with the distribution of music venues and the intensity of cultural events in chinese cities.

Thereby different open data sources are utilized and especially the websites of two ticketing agencies are scraped for data regarding music venues and belonging events.

In addition the methodology considers the spatial distribution of the music venues and defines various indices, in order to rank the related cities. I find such type of indices especially interesting, since they can be used to assess the level of smartness of a particular city.

The paper is well written and understandable. The level of English is good and the paper is  pleasant to read.

I have a number of remarks:

  • I miss a technical architecture picture in the document.
    • What are the main components ?
    • Do you have an Open Data harvester ?
    • Which Open Data APIs do you use ?
    • How is your software scraping the websites of the ticket agencies?
      • What methods do you use ?
      • How do you ensure the quality of the scraping and the extracted data ?
  • Do you have a database where you store the data inbetween ?
  • How do you make sure that the scrapped data has sufficient quality for your evaluations ?
  • With regard to the statistical methods for the analysis:
    • You are using different hierarchical clustering approaches and statistical tests: Please ellaborate further on why you decided to use thos ?
    • Please also put your selected methods in relation to other possible approaches for classification (e.g. Neural Networks, Support Vector Machines ...)
    • Could you also provide a summary of the data sample that you are analysing ?  It becomes clear that you obtained a lot of data but it would be good to have a summary like (number of data points, features of the data points ...)

 

 

Author Response

We greatly appreciate the detailed and constructive comments given by you. A major revision has been completed. The main revision locates in: Title, Abstract, Introduction, Materials and Methods, Results, and Appendix. A more detailed description of data processing (section 3.1) and method selection (section 3.3, section 3.5.2) has been added. K-means is newly applied for cross-validation of city-level music activities ranking result and the has been approved (section 4.1). Quadrat analysis is newly added to prove the spatial aggregation pattern of music venues in target cities (section 4.5.2). The content of research significance has been more deeply discussed (section 1), and the analytical framework has been adjusted (section 3).

 

Modifications have been made accordingly – all comments have been taken, as described below.

Reply to Reviewer 1:

  1. I miss a technical architecture picture in the document.

(1) What are the main components?

(2) Do you have an Open Data harvester?

(3) Which Open Data APIs do you use?

(4) How is your software scraping the websites of the ticket agencies?

(5) What methods do you use?

(6) How do you ensure the quality of the scraping and the extracted data?

(7) Do you have a database where you store the data inbetween?

(8) How do you make sure that the scrapped data has sufficient quality for your evaluations?

Thank you for your suggestion. In the revised version, the technical architecture content has been added in section 3, line 172 as follows: “However, website data are used for city-level music activities ranking, and both data capturing and data preprocessing work are more complicated.… ”. Figure 2 shows the main components and details of the technique flowchart.

The content of the software, methods, strategies, storage, and cleaning for data scraping has been added in section 3.1, line 195 as follows: “Figure 3 provides the diagram of data capturing work…”.

The justification of reliability of both websites has been added in section 3.1, line 184 as follows: “Both websites are the most popular ticketing platforms for music activities…”.

The Data coverage estimation has been added in section 3.1, line 223 as follows: “An official report on 2015 Beijing music activities is published… ”.

 

  1. With regard to the statistical methods for the analysis:

(1) You are using different hierarchical clustering approaches and statistical tests: Please ellaborate further on why you decided to use thos? Please also put your selected methods in relation to other possible approaches for classification (e.g. Neural Networks, Support Vector Machines ...)

Thank you for your suggestion. In the revised version, the content to explain the reason for approaches selection and the comparison with other approaches has been added in section 3.3, line 267 and section 3.4, line 319 as follows:

“Machine learning algorithms are suitable to rank cities… ” and “Comparing to Spearman’s Rank Correlation, Kendall Rank Correlation has a smaller gross error sensitivity and a smaller asymptotic variance… ”, respectively.

Moreover, the content of cross-validation for ranking results by K-means has been added in section 3.3, line 295 and section 4.1, line 434 as follows:

“K-means algorithm is applied to perform cross-validation of the urban division result obtained by hierarchical clustering… ” and “On the other hand, the K-means result has a high similarity with the city rankings described above… ”, respectively.

 

(2) Could you also provide a summary of the data sample that you are analysing? It becomes clear that you obtained a lot of data but it would be good to have a summary like (number of data points, features of the data points ...)

Thank you for your suggestion. In the revised version, the description on the summary of data has been added in section 3.1, line 216 as follows: “101 cities have performance information according to Appendix A, Table A1… ”.

A detailed summary of data features has been added in line 712, Appendix A. Table A1 provides the general information of the data. Table A2 provides the quantity information of cities for clustering-based ranking. Table A3 and A4 provide the data sample from two websites.

Or please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

In this paper, the authors attempt to classify Chinese cities into groups based on the number of music venues, the frequency of musical events and their ticket prices. They also measured ANNI and Moran's I for the locations of music venues in each city and investigated the relationship between the degree of spatial autocorrelation and the variables used for the classification of cities.

While this could be a potentially valuable contribution to the field of urban planning and development, the manuscript is not ready for publication in its current form for the following reasons:

1) It is not clear to me what implications the classification result of the cities can make. The variables used in this paper (i.e., the number of music venues, the frequency of musical events and their ticket prices) seem to be strongly related to the market sizes of the cities: there would be probably more music venues in cities with large population size and a higher income level. Costly musical performances would be held in cities where the local residents can afford the ticket prices.

If this is the case, the classification result presented in this work may just reflect the general market sizes of the cities, not the one specific for the music industry. Since the focus of this work lies on (for-profit) music venues, more careful analysis is required to support the argument of the authors.

2) The use of the term "Internet-accessible data" is confusing. In this work, the authors seem to use the term as a synonym for "open data". These two are, however, not the same: not all open data are accessible through the Internet, and many commercial data are distributed through the Internet. In addition to the clarification of the term, the authors should also justify whether the two websites can really provide representative data for the cities. What proportions of music venues and performances are covered in those websites? How did you estimate such figures?

3) The "spatial aggregation pattern discovery" is based on a combined use of two global measures, ANNI and Moran's I. This kind of approach is likely to oversimplify the spatial pattern of music venues in different cities. A more in-depth evaluation is required.

4) The manuscript is difficult to read and understand in many places, mainly due to the language. Extensive editing is required before resubmission.

Author Response

We greatly appreciate the detailed and constructive comments given by you. A major revision has been completed. The main revision locates in: Title, Abstract, Introduction, Materials and Methods, Results, and Appendix. A more detailed description of data processing (section 3.1) and method selection (section 3.3, section 3.5.2) has been added. K-means is newly applied for cross-validation of city-level music activities ranking result and the has been approved (section 4.1). Quadrat analysis is newly added to prove the spatial aggregation pattern of music venues in target cities (section 4.5.2). The content of research significance has been more deeply discussed (section 1), and the analytical framework has been adjusted (section 3).

 

Modifications have been made accordingly – all comments have been taken, as described below.

Reply to Reviewer:

  1. It is not clear to me what implications the classification result of the cities can make. The variables used in this paper (i.e., the number of music venues, the frequency of musical events and their ticket prices) seem to be strongly related to the market sizes of the cities: there would be probably more music venues in cities with large population size and a higher income level. Costly musical performances would be held in cities where the local residents can afford the ticket prices. If this is the case, the classification result presented in this work may just reflect the general market sizes of the cities, not the one specific for the music industry. Since the focus of this work lies on (for-profit) music venues, more careful analysis is required to support the argument of the authors.

Thank you for your suggestions. China is a developing country, and during the development of industries including Tertiary industries, the imbalance appears national wide. Music industry belongs to one of the tertiary industries. On this occasion, the contradiction of the imbalance between supply and demand of music industry is prominent in some cities. This imbalance is reflected in the number, size, and spatial distribution of cultural facilities including music venues. In addition, different cultural expenses can be selected, and live music is attractive to some people. Therefore, the demand for Chinese cultural consumption shows great diversity. Smart city decisions should be made based on the diversity of both current configuration and future demand.

Cities are clustered according to the record of commercial music performances and ranked into four groups. The ranking difference in the vitality of music performance is reflected in the difference of scale and spatial aggregation pattern of music venues.

In the revised version, the implication of clustering has been added in section 1, line 75 as follows: “A profound understanding of the current status of the music industry and the entity space for it is necessary… ”.

A more detailed result analysis has been added in section 4.1, line 440 as follows: “The clustering result reveals the disparity in the demand for commercial music performance… ”.

 

  1. (1) The use of the term "Internet-accessible data" is confusing. In this work, the authors seem to use the term as a synonym for "open data". These two are, however, not the same: not all open data are accessible through the Internet, and many commercial data are distributed through the Internet.

Thank you for your suggestion. In the revised version, the term “Internet-accessible data” is corrected with “open data” in the manuscript.

 

(2) In addition to the clarification of the term, the authors should also justify whether the two websites can really provide representative data for the cities. What proportions of music venues and performances are covered in those websites? How did you estimate such figures?

Thank you for your suggestion. In the revised version, the justification of reliability of both websites has been added in section 3.1, line 184 as follows: “Both websites are the most popular ticketing platforms for music activities…”.

The Data coverage estimation has been added in section 3.1, line 223 as follows: “An official report on 2015 Beijing music activities is published… ”.

 

  1. The "spatial aggregation pattern discovery" is based on a combined use of two global measures, ANNI and Moran's I. This kind of approach is likely to oversimplify the spatial pattern of music venues in different cities. A more in-depth evaluation is required.

Thank you for your suggestion. In the revised version, density-based approach, quadrat analysis has been added besides distance-based approach (ANNI and Moran’s I) to enrich study results. Description of the new method has been added in section 3.5.2, line 362 as follows: “Quadrat analysis is a kind of variance analysis… ”

A more in-depth evaluation is added in section 4.5.2, line 539 as follows: “However, VMR value by quadrat analysis indicates that music venues show a significant aggregation tendency… ”

 

  1. The manuscript is difficult to read and understand in many places, mainly due to the language. Extensive editing is required before resubmission.

Thank you for your suggestion. Extensive editing has been done in the revised version, including the unification of terms, the modification of grammar mistakes, and the change of expressions.

 

Or please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I think that this paper has improved a lot. I my opinion it can be accepted in this form.

Reviewer 2 Report

The authors seem to have addressed all my comments successfully. I believe that the paper is now ready for publication in this journal.

Back to TopTop