Next Article in Journal
Advanced Composite Materials for Structure Strengthening and Resilience Improvement
Next Article in Special Issue
The Relationship between Cost Overruns and Modifications for Construction Projects: Spanish Public Works and Their Legal Framework
Previous Article in Journal
Response of Reinforced Concrete Beams under the Combined Effect of Cyclic Loading and Carbonation
Previous Article in Special Issue
A Method to Enable Automatic Extraction of Cost and Quantity Data from Hierarchical Construction Information Documents to Enable Rapid Digital Comparison and Analysis
 
 
Article
Peer-Review Record

An Automated Method for Extracting and Analyzing Railway Infrastructure Cost Data

Buildings 2023, 13(10), 2405; https://doi.org/10.3390/buildings13102405
by Daniel Adanza Dopazo *, Lamine Mahdjoubi and Bill Gething
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Buildings 2023, 13(10), 2405; https://doi.org/10.3390/buildings13102405
Submission received: 16 August 2023 / Revised: 6 September 2023 / Accepted: 20 September 2023 / Published: 22 September 2023
(This article belongs to the Special Issue Data Analytics Applications for Architecture and Construction)

Round 1

Reviewer 1 Report

The authors presented a novel method mainly composed of three sequential processes: Data extraction, where the documentation is analyzed, and the existing information is parsed to fit a common standard. Data merging, where the different types of information are being combined, and data analytics, where the main factors that matter in infrastructure costs, are identified. The method is based on the application of data mining and machine learning to extract and analyze the existent information coming from 23 different CAF (Cost Analysis Forms) input files which a big variety of information and structures.

What is new here is that this method was used to extract and analyze rail cost infrastructure data.

In my opinion, this manuscript tem mérito e deve ser publicado.

Some typos

analyze -->analyzing
and for having-->and having
costs -->cost
allowing for costs-->allowing for cost
 on this field --> in this field
judgement-->judgment
 which a big variety -->  which a wide variety
demonstrating-->demonstrate
 found however--> found, however
related paper-->related papers

been classified in three categories-->been classified into three categories

that demonstrate-->that demonstrates
for comparing projects, perform costs-->to compare projects, performing cost
 for the future projects--> for future projects
worth to highlight that to our knowledge there is-->worth highlighting that, to our knowledge, there is
There have been however -->There have been, however
2005) where -->2005), where
that in this case, they seek for the -->that, in this case, they seek the
Alternatively, in (Miller & Meggers, 2017)-->Alternatively, (Miller & Meggers, 2017)
tools that when -->tools that, when
can extract for this paper-->can extract from this paper
data, they can-->data, can
provoked for the manual -->provoked by the manual

Author Response

On behalf of all the authors, I wanted to express our gratitude for your time and for your helpful review. Correctness in the English language is a necessary requirement for any research paper and we believe that the paper looks better after correcting all the suggested typos

Reviewer 2 Report

The authors address the problem o fhighlighting the challenges of standardizing railway infrastructure cost data and propose a solution using data mining and machine learning. The novelty of combining data mining, statistics, and machine learning to extract and analyze railway infrastructure cost data is interesting. The use of data from 23 real historical projects from the client network rail adds practicality and real-world relevance to the research, making the results more applicable. The paper provides a clear outline of the methodological approach, breaking it down into three sequential processes: data extraction, data merging, and data analytics. The paper demonstrates how the suggested approach increases efficiency in data gathering and analysis, emphasizing the practical benefits of automated data extraction and analysis. The integration of data extraction, merging, and analytics into a single framework shows a holistic understanding of the challenges involved in standardized data analysis.

 

The authors are highly encouraged to consider the following suggestions provided by the reviewer in order to further improve the scientific depth and the quality of presentation of the manuscript. The paper lacks technical details about the specific data mining, machine learning (which are discussed heavily in the paper), and statistical techniques used for extraction and analysis, making it difficult for readers to understand the depth of the approach. Some pointers to similar research works, e.g., Cerquitelli, Tania, et al. "Machine Learning Empowered Computer Networks." Computer Networks (2023): 109807 are needed to enable the non expert readers in the field to follow the rest of the analysis.  The paper lacks information on the metrics used to evaluate the success of the proposed approach in terms of efficiency, reliability, and comparison capabilities. While the paper mentions that "real projects demand big amounts of information," it does not discuss potential challenges or limitations in handling complex and diverse datasets.

While discussing related methods, the paper could provide a deeper explanation of why the scope of certain methods might have limitations or why they might not directly apply to railway costs.

 

A major revision is needed to address the aforementioned comments.

Minor editing of English language required

Author Response

on behalf of all the authors, I wanted to express our gratitude for your time and for your helpful suggestions which greatly improved the quality of the paper.

Stated issue:

The paper lacks technical details about the specific data mining, machine learning (which are discussed heavily in the paper), and statistical techniques used for extraction and analysis

Response:

Thank you for pointing it out. The section "3.4) The method step by step" has been rewritten to explain in more detail the configuration of the machine learning algorithms and the details of the data mining techniques that have been implemented to build the solution.

The reference: Machine Learning Empowered Computer Networks." Computer Networks (2023):  has been included in the paper and has been cited appropriately.

 

Another section 2.4) The contribution of the presented method has been included to highlight the main contributions of the paper, its novelty and to demonstrate the limitations on the scope of certain methods which are not directly applied to railway costs infrastructure.

 

 

Reviewer 3 Report

The literature review of the paper should be rewritten. It is a collection of results without deep reflection. For example, the writing pattern is like xx studied ww and yy studied zz. The novelthy and contribution should be clarified. It is not clear what is known and what is new. 

 

For the asset, a generic classification attribute which is slightly like the old Tier 1 attribute on the previous sections. The range of values that this attribute can take are the fol-lowing: Buildings and property, civils (drainage - resilience), civils (drainage - earthworks), civils (drainage - track), civils (earthworks), civils (structures), electric power and plant, per-manent way, railway control systems, telecommunications, train power systems. This should be explained clearly.

 

It is mentioned that the suggested method can be divided into three main steps that happen on a sequential basis: Firstly, all the different CAF files are analyzed and a data extraction process is being carried out, secondly, some merging processes and data pars-ing are being executed to finally provide an analysis and perform some tests to prove the benefits of data mining in the current scenario. The novelty is not clear. This approach has been studied in the work of Nadeem's group. 

 

 

The process of data gathering and reclassification into a common data format is use-ful not only for having a better understanding of the data but also for estimating future project costs based on the already registered ones. As a proof of that, three different ma-chine learning algorithms have been implemented: Linear regression, Lasso regression and random forest. A thorough statistical analysis is needed with more data and comparison. 

 

 

The language needs to be improved. I recommend a professional editing service. The quality is clearly below the publication standard. 

Author Response

On behalf of all the authors, I wanted to express our gratitude for your time and for your helpful suggestions which contribute greatly to improving the quality of the paper. The following stated issues have been sealed with:

 

Stated Issue:

The literature review of the paper should be rewritten. It is a collection of results without deep reflection. For example, the writing pattern is like xx studied ww and yy studied zz. The novelty and contribution should be clarified. It is not clear what is known and what is new. 

Response:

The state of the art has been reviewed to leave more room for a critical assessment of the studies instead of mentioning its main purpose

Stated issue:

For the asset, a generic classification attribute which is slightly like the old Tier 1 attribute in the previous sections. The range of values that this attribute can take are the following: Buildings and property, civils (drainage - resilience), civils (drainage - earthworks), civils (drainage - track), civils (earthworks), civils (structures), electric power and plant, permanent way, railway control systems, telecommunications, train power systems. This should be explained clearly.

Response:

Thanks for the feedback, the description of the asset has been rewritten to be more specific.

Stated issue:

It is mentioned that the suggested method can be divided into three main steps that happen on a sequential basis: Firstly, all the different CAF files are analyzed and a data extraction process is being carried out, secondly, some merging processes and data pars-ing are being executed to finally provide an analysis and perform some tests to prove the benefits of data mining in the current scenario. The novelty is not clear. This approach has been studied in the work of Nadeem's group. 

Response:

A new section 2.4 has been included to analyse the strengths and the novelty of the approach in contrast with the reviewed literature.

Stated issue:

The process of data gathering and reclassification into a common data format is use-ful not only for having a better understanding of the data but also for estimating future project costs based on the already registered ones. As a proof of that, three different ma-chine learning algorithms have been implemented: Linear regression, Lasso regression and random forest. A thorough statistical analysis is needed with more data and comparison. 

Response:

As requested. Section 3.4. The method step by step has been rewritten to specify the configuration of all the machine learning algorithms, the validation method of the results and a brief description of the implemented data mining techniques.

Round 2

Reviewer 2 Report

The authors have addressed in detail the reviewers' comments.

Reviewer 3 Report

The paper can be accepted. 

Language is ok. 

Back to TopTop