Next Article in Journal
Understanding Users’ Satisfaction towards Public Transit System in India: A Case-Study of Mumbai
Previous Article in Journal
Development after Displacement: Evaluating the Utility of OpenStreetMap Data for Monitoring Sustainable Development Goal Progress in Refugee Settlements
 
 
Article
Peer-Review Record

A Century of French Railways: The Value of Remote Sensing and VGI in the Fusion of Historical Data

ISPRS Int. J. Geo-Inf. 2021, 10(3), 154; https://doi.org/10.3390/ijgi10030154
by Robert Jeansoulin
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
ISPRS Int. J. Geo-Inf. 2021, 10(3), 154; https://doi.org/10.3390/ijgi10030154
Submission received: 27 December 2020 / Revised: 2 March 2021 / Accepted: 6 March 2021 / Published: 10 March 2021

Round 1

Reviewer 1 Report

The paper essentially describes the work of the authors to represent the French railway network at its climax (1920) by using remote sensing and Volunteer Geographic Information. The aim of the paper is to present a procedure elaborated by the authors to rebuild the network from available public data.

Overall Comments

The authors’ objective is of great interest, and to my knowledge one of the firsts that use VGI with such an historical perspective. However the manuscript suffers from major problems which might annihilate the interest of most of the readers.

First, the manuscript is ill structured and it does not allow the reader to easily grasp its content from a given level of abstraction (i.e. either skim or dig the text). The introduction does not clearly present the challenges the author should have expected in doing such work (i.e. from the literature review) while the conclusion does. Materials and Methods section is confusing since it mixes all along the text either data sources description, geojson coding, data structure or comments about the problems encountered. Although the Results section presents some interesting maps, it mostly consists of procedural validation schemas and coding. Finally, with the exception of the last few paragraphs, the discussion section is insubstantial.

Second, the whole procedure is built to fill a data structure that meets authors’ “conceptual schema”. However, the schema rather looks like a physical implementation. If the authors had used a higher abstraction level diagram (e.g. Entity-Relationship) to express the links between the different objects they manipulate, this would have helped the reader to understand why they need/manage their features the way they do (e.g. origin, terminus, connection, bridge or even “communes”). From my perspective, doing so might even have brought them to change their implementation model, or directly used nodes and edges as entities (i.e. graph model). For this reason, the reader will question the relevance of the required information throughout the reading of the manuscript unless it is clearly justified.

Third, there is an ambiguity about the actual purpose of the paper. On the one hand, the authors present the procedure as being of general interest (e.g. introduction and discussion), on the other hand, their “conceptual model” and the procedure developed is closely linked to the French rail network. This ambiguity has to be resolved. From my point of view, if generalized (e.g. conceptual modelling, naming convention) the procedure would be of a general interest for the scientific community. If limited to the French territory, as it seems to be, it could still be interesting for some readers but the authors should state the scope of the procedure much more clearly (e.g. title changed).

I encourage the authors to clarify their objectives, to review their modelling in order to generalize its application (i.e. at least for the European Community) and to restructure their text to make it easier for the lay readers to grasp the interest of their approach.

Comments are provided in an annotated version of the manuscript uploaded with this report. Major comments are in red, minor in yellow, highlights in green.

Comments for author File: Comments.pdf

Author Response

Please see attachments.

Author Response File: Author Response.docx

Reviewer 2 Report

Dear colleagues, 

thank you for your contribution  on "past railway networks: the value of Remote Sensing and Volunteer Geographic Information in Computer-assisted geocoding". It is out of discussion that the development of transport infrastructure may indicate societal changes/developments. For deriving the impact on the greenhouse gas emissions, further mechanisms need to be introduced. This statement in line 50 is missing its relevance for this contribution. Maybe you should explain it more. 

The contribution is quite challenging to read. After the introduction, which does not clearly state, in which use case and why past railway stations are needed, a very detailed section on database schemas follows. 
Starting from line 129, you explain the database schema and its attributes in detail. There is no reference to the requirements of harmonization for schemas of transport networks in Europe (INSPIRE), which make this schema discussion obsolete. 

Code examples throughout the paper should be moved to an appendix, which will help readability. 

Although the paper anounces to build up graphs and make use of graphs for several times, the explanation of the schema and target schema - including the listed JOIN operations - supports the impression that a relational model is used for the data integration. Could you please explain the method more clearly?

For the problem of data integration of different datasets with different qualities, especially for a lot of data (big data), you should consider schemaless graph database approaches. An evaluation and list of different approaches is also missing. It is unclear if the choosen approach is state-of-the-art. Could you please explain, why you have choosen your procedure?

The contribution is full of corrected and double sentences. It looks like the change tracking has been embedded into the PDF. 

Author Response

Thank you for the time you spent writing a thorough review of this manuscript. Before the point by point answer (hereafter), I think that your comments about the conceptual schema and the suggestion to consider INSPIRE standard, are real improvements to the previous version. Time has been short to fully take this into account, but section 2 has been largely modified.
Also important modification is a (hopefully) better focus to the main goal of the manuscript: a method for incrementally and cumulatively building a dataset of all the French gares, all geocoded by software or computer-assisted visual means.

Concerning the VGI, I would need more time investigating deeper how graph databases can improve the methodology: there is some answer below.

Note: title has been modified after a reviewer suggestion.

And here is the point-by-point answers to your comments.

<<< For deriving the impact on the greenhouse gas emissions, further mechanisms need to be introduced. This statement in line 50 is missing its relevance for this contribution. Maybe you should explain it more.
>>>
I have rephrased that part of the introduction. Indeed, a model for estimating the CO2/ton/km for a given status of the railway network of France would need more that the total distances between every gare. I would love to be able do so (reverse science fiction!), but I should find help with interdisciplinary collaboration. Only macro-level estimation can be done with national amount of transported goods by transport mode, seldom between a few big cities and Paris.

<<< The contribution is quite challenging to read. After the introduction, which does not clearly state, in which use case and why past railway stations are needed, ...
>>>
I have rewritten most of the introduction and section 2, with that objective of better clarity.

<<< Starting from line 129, you explain the database schema and its attributes in detail. There is no reference to the requirements of harmonization for schemas of transport networks in Europe (INSPIRE), which make this schema discussion obsolete.
>>>
This part has been rewritten and shortened, in order to minimize its importance because the relational approach is discarded in the sequel.
The INSPIRE reference has been added, and linked (somewhat) to the minimum information required by the target goal. It helps to make a decision on how the implementation should handle forks and 'interconnection' gares (eg: standard gauge gare next to a metric gauge gare). Thanks for the suggestion.

<<< Code examples throughout the paper should be moved to an appendix
>>>
done.

<<< Although the paper anounces to build up graphs and make use of graphs for several times, the explanation of the schema and target schema - including the listed JOIN operations - supports the impression that a relational model is used for the data integration.
>>>
You are right: I was using that term of graph in a misleading way: what I meant was to collect data able to build a directed graph (digraph), even a weighted digraph (with km). The implementation of the dataset, with gares sorted by rail line number and km order, should allow to build that weighted digraph, but no code and no demonstration has been done so far. So far, only the link between gares and communes of France can be done, showing how entire territories are now deprived from railway access (rail desert). I may try to display result in a week or two.

<<< For the problem of data integration of different datasets with different qualities, especially for a lot of data (big data), you should consider schemaless graph database approaches. An evaluation and list of different approaches is also missing. It is unclear if the choosen approach is state-of-the-art. Could you please explain, why you have choosen your procedure?
>>>
Your suggestion is legitimate, but I am not able to demonstrate, within a short time that the approach is state-of-the-art. A possible way to doing so is added in the discussion section.
The three main tasks (after the fusion step) are:
1- to find all the gares of the lignes recorded in the public datasets, what is rather easy: almost all answers on wikipedia (§3.1)
2- to find "forgotten" rail lines, through appropriate queries mostly to VGI identified sources, or to the full Internet. If an URL for a ligne is found, then to search for a list of the gares in the text of the URL.
In general, it appears as a simple list of toponyms (§3.2). Therefore, I have added comments about the necessity to work schemaless and structureless in the discussion: it would deserve to be investigated.
3- geocoding (direct or assisted)
The main quality issues in tasks 1 and 2 above, is incompleteness of the information.

<<<
The contribution is full of corrected and double sentences. It looks like the change tracking has been embedded into the PDF.
>>>
Yes, I probably forgot to turn off the Word Revision option, which was not visible on my screen, but added in green in the PDF. I am really sorry for this neglecting. --
Once again, I really appreciate your review and thank you for that.

Reviewer 3 Report

The presented article is valuable, the topic of the article is interesting and up-to-date. Nevertheless, the description of the results is too extensive, which makes it difficult to understand the entire article. The results should be presented in a general manner without such details. The article should be significantly shortened, and the purpose of the article and its motivation clearly stated.

Author Response

The new version has been extensively rewritten, in particular the description of the objectives of the paper. It has been difficult to shorten the manuscript as much as you may have expected, because the variety of sources requires a variety of specific investigations and demonstrations.  However, the number of pages is now 45 (without the annex), instead of 60. I am even considering to drop entirely that annex.

Note: title has been modified after a reviewer suggestion.

Thank you for your review that contributed to this better (hopefully) version.

Round 2

Reviewer 1 Report

The paper essentially describes the work of the authors to represent the French railway network at its climax (1920) by using remote sensing and Volunteer Geographic Information.

Overall Comments

Authors’ approach is of great interest, and to my knowledge one of the firsts that use VGI with such an historical perspective. However, despite a definite improvement of the manuscript, it is still attempting to meet two objectives without success.

Authors’ first objective is to present their innovative approach that uses the web (e.g. files, images, wikis) to recreate an historical view of a phenomenon (in this case the French railway network at its peak). I was expected this to be done through a manuscript that synthesizes both the methods used and the results obtained.

Authors’ second objective is to detail, step-by-step, the procedure they followed as well as the results they obtained at each of these step. The authors also wish to document their successful (and unsuccessful) attempts to solve the problems encountered, as well as the programs they built and used.

I would have like the second objective being met by presenting the detailed step-by-step procedure as supplementary material with the paper. This way, the readers could have had an overview of this innovative approach without the burden of getting through all the details required to meet the second objective. In order to offer an alternative to the authors, I tried to disentangle these two objectives from the text, but without success. I then strongly encourage the authors to think about the best way to meet their objectives within one paper.

Comments are provided in an annotated version of the manuscript uploaded with this report. Minor comments are in yellow, highlights in green, text components that could potentially be moved in supplementary material are in blue (up to page 16).

Comments for author File: Comments.pdf

Author Response

The answers are in the response (below) and in the pdf of your annotations answered (most of). The disentangling has not been done, but many of the examples (second-objective type) have been moved to a second Annex.

Response to Reviewer 1 Comments

..., despite a definite improvement of the manuscript, it is still attempting to meet two objectives without success.

Authors’ first objective is to present their innovative approach that uses the web (e.g. files, images, wikis) to recreate an historical view of a phenomenon (in this case the French railway network at its peak). I was expected this to be done through a manuscript that synthesizes both the methods used and the results obtained.

Response1: Where I was wrong from the beginning is that this was rather my second objective: to present the approach (fusion + using the web) as the result of a step-by-step experiment (see Response 2). I have an unfinished second manuscript that I intended to submit to the journal “Data”, which is probably closer to the (synthesis + results) version that you were expecting. It would mean a thorough modification. What I propose instead with the new version of the manuscript, is a lighter text (about 7 pages less), and more emphasis on the methods, one entire subsection being introduced by a warning to the reader that she can skip it in first reading, and Annex B created with two particular points: the toponymy-pattern and the fork-pattern, with all related examples.

Authors’ second objective is to detail, step-by-step, the procedure they followed as well as the results they obtained at each of these step. The authors also wish to document their successful (and unsuccessful) attempts to solve the problems encountered, as well as the programs they built and used.

I would have like the second objective being met by presenting the detailed step-by-step procedure as supplementary material with the paper. This way, the readers could have had an overview of this innovative approach without the burden of getting through all the details required to meet the second objective.

Response2: as mentioned above, the “burden” has been shortened, simplified (*) and preceded by a warning, plus Annex B. This solution is far from optimum, but avoids a drastic rewrite.

(*): geojson examples have been simplified to something more readable. Some minor procedures are removed or simply mentioned and not detailed.

In order to offer an alternative to the authors, I tried to disentangle these two objectives from the text, but without success. I then strongly encourage the authors to think about the best way to meet their objectives within one paper.

Response3: I warmly thank the reviewer for the important effort done in trying to help a better version of this paper. I know that the answer proposed in this new version is only half way (?) towards what he/she is encouraging us to do. The new manuscript is attempting to mentioning the second objective (and not detailing it), within a more focused presentation of the first objective.

Comments are provided in an annotated version of the manuscript uploaded with this report. Minor comments are in yellow, highlights in green, text components that could potentially be moved in supplementary material are in blue (up to page 16).

Response4: I have answered your annotations, the corresponding pdf is attached.

Additional modification (answering another reviewer comment): One technical modification has been done, but rather important. The initial UML-class diagram has been updated, adding a new class “node”, in relationship with gare-is_a-node (card 0..1 – 1..n), and with “ligne”: node-belongs_to-ligne (card 2..n - 1). What removes the weird notion of “multi-featured gare”: now a gare is a non empty list of nodes (at least 1). Also it helps presenting the fork-pattern, which was a nightmare for us, until we understood that it was the cause of so many “errors”.

Author Response File: Author Response.docx

Reviewer 2 Report

All comments of teh review have been answered and modified. There are still flaws for the method and placement in the international discussion (state-of-the-art status), but the contribution should be worthwhile published. 

Author Response

I have made an important modification in the Fig.1 (UML), by adding a fourth class: the "node", and a "gare" is now a non-empty list of nodes. In the text the term feature has been replaced by node (except when talking of a geojson feature). Actually, in the implementation, gare, node, fork, border point, are all geojson features, but treated differently. The conceptual schema is now independant of that implementation bias, and also more complying with Inspire. This was probably one of the flaws you are mentioning. Thank you for your comments. Also an Annex B is regrouping some notes about the "fork-pattern", which is related to the way a graph representation could be implemented more efficiently.

Reviewer 3 Report

The authors addressed successfully all my comments and the quality of the article has been significantly improved. The shortening of the text made it clearer. The author's additions to the introduction and results also improved the article.

Thank you for taking under consideration the comments and suggestions included in the first review report.

I suggest accepting the manuscript in the current form.

Author Response

Thank you for your encouraging comments. The manuscript has been modified following the other reviewers comments, although only minor revision was expected. A new Annex B has helped to shorten again the main text by 6 more pages, making it -hopefully- more readable.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

The paper presents an approach for reconstructing the railway network in France using several data sources. The whole idea of the paper is good, but, however, the method is not clear to me:
1. The paper has faced some traditional problems in database integration. Please revise them in order to compare and discuss your approach.

2. Please provide related work, and compare your method with the available ones.

3. Rewrite section 2 and 3: revise what is the original datasets, what is the main algorithm, and specific limitations for the approach (each one in a specific section). Include the respective journal format for listing algorithms. List examples for the data (in appropriate section), and errors/challenges.

4. Provide all details of implementation, testing, and comparison (including hardware, software implementation, time, database, algorithm complexity analysis, examples, etc.). Reorganize the information in the paper within proper sessions.
5. The experiments are not convincing and further details should be provided.

6. Avoid expressions such as “It can't be more crystal clear”, “It is time to investigate more sources of information”. Please be brief and concise, and add appropriate references for terms.

7. Review references.

Author Response

Thank you for your review, with which I do agree on most points.
First, I should say that the editorial board asked me to make that "major" revision in only two weeks, and I had to negotiate an extra third week, because I went in vacation end of August. I do not understand that 2 week policy, and I couldn't work, in particular concerning references, as I would have liked to do.

1. some of the traditional problems in database integration that we faced are either at a rather simple level (format conversion) or at a very difficult one: such as matching station-like building to spot the correct location of a station in a target town, on old aerial photographs. In the middle stands the problem of toponymy changes and variants, which made me spend weeks, for a final very disapointing result (see §2.3). My attempt to answer your remark is to start the Discussion section with a categorization of the integration issues, what I found in a medical sector paper, and adapted to the railway reconstruction.

2. I spend most of the week to rewrite section 2, in particular the target baseline and constraints, to bettre focus on the real contributions: Wikipedia parsing, Plain-text list of stations parsing, for geojson building, and coordinates extraction, or approximation. In section 3, I tried to better illustrate the generic procedure where the specific software takes palce. And I tried to improve the final quality control (§3.3) based on the target constraints.

4. I should confess that I am unable to correctly answer you requests about implementation, testing, complexity ... I would need at least one more month for that. This is implemented in javaScript and performed in node.js, though my first intent was to include this online in the browser. Due to long execution time, as soon as working on several lines implying possibly the 35000 communes of France, I decided to work in batch mode, and record the intermediate results for a limited number of lines, concentrated on same Region. I didn't keep track on each processing time, and I was working on this from time to time. To summarize that kind of data would take me a lot of time now. If you consider that aspect of my paper as needing to be reworked, the only solution is to ask another "major" revision.
However, what I did for this (4) is to reorganize the information in the paper, moving some paragraph in more appropriate place.

5. The experiments. My most important contribution, in my opinion, is the help brought by the two node.js routines, in building lists of stations belonging to a same line, with most of the "target" data already informed, excepting the coordinates. I should add to that the automatic check of approximated coordinates, what would greatly accelerate the delivery of a usable network, but with poorer quality. That was not my intent, preferring controlling the coordinates visually, station by station. But in the perspective of that paper, it could be an interesting shift, postponing the precision for later, and having a dataset with more than 10000 "gares", useful at a 1:25000 scale, instead of having a partial network which is correct at a 1:25000 scale, or even better.
This is what I reached for the regions Bretagne, and Provence. You can check it on this website:
http://bigbugdata.com/bigbugdata/sncf/sncfV1.html
note: it works with recent versions of Chrome, Edge, Fiefox, but not with Safari (the javaScript optional chaining is supposed to be in just with the most recent version July 2020).
I am still reluctant to mention that site in the paper, because it is not yet in my laboratory web page (I have to arrange that shortly).

6. Done. You're right, this was "oral" style, not suitable for a paper.

7. I modified several references, added a few about crowdsourcing and quality, removed some about environment and health.

Reviewer 2 Report

This paper presents authors work on reconstructing the railway network in France.

In my opinion, the main contribution of this paper is in the area of geographic information fusion and crowdsourcing open data.

However, this is not apparent from the paper. Abstract is very general and do not reflect paper content and main paper contribution.

Section Introduction focuses only to importance of the digital reconstruction of past networks. There are no references related to information fusion, data quality or crowdsourcing.  There is no comparison with similar approaches.

The rest of the paper focuses on specific problems and ways to solve them – using open public data, Wikipedia, geo-coding of ancient maps…, information integration and information fusion. In my opinion, this part of the paper describes the key contributions of this research (which, however, differ from the rest of the paper).

Author Response

Thank you for your remarks about the much too general introduction, without connection with the real contribution of the paper. This wasn't appropriate for a journal paper.

R: In my opinion, the main contribution of this paper is in the area of geographic information fusion and crowdsourcing open data.
A:
I do agree with you about one part of your sentence: in my own opinion, the main contribution is about developing tools for "integrating" open data. The difference with "fusion", is that it is rather a simple union than really a fusion. because the added information is - most of the time - totally independent from previous information, therefore generating no conflict, excepting where lines meet in the network.
I added that constraint in the new version: a new line can meet an existing one, either at an existing station (connection), or at some point of the line, and we must add what I name a "pseudo-gare", with all the attributes of a regular gare.
In general, I don't have the polyline coordinates of the lines (or I didn't try to gather that information), but a further work could be to check if the information exists and make a specific routine as a new specific tool.
Therefore, I removed the term fusion from the key-words and from some parts of §2.3. Fusion is much more complex.

R: Section Introduction focuses only to importance of the digital reconstruction of past networks. There are no references related to information fusion, data quality or crowdsourcing. There is no comparison with similar approaches.
A:
I totally rewrote the Introduction, and I tried to refocus the abstract.
Concerning references, what I found most relevant for my work was about toponyms disambiguation (added in §2.3), and about general integration of heterogeneous sources, at a system level (added in Discussion) and I added ref about data quality. About fusion, I decided to drop that mention and references.
About crowdsourcing and quality, new references are used in the Introduction and in some §.

About similar approaches, I had not much clue to find relevant literature, which could be relevant also for the problem of collecting crowdsourcing railway data. I'm probably not knocking the right doors. I asked people at SNCF-Reseau, who are not really aware of the use made of their public dataset, I asked contributors of gitHub who developed specific routines for a variety of Wikipedia Infoboxes, and I collaborated even with one on the "Bahnstrecke" template. Concerning toponymy disambiguation, which I considered, at first, as a bottleneck, I spent and spoiled time on that, to finally concluding that the gain was almost null. I spent time on the hope that collecting many miniatures of old stations from old aerial photographs could be the start point for a collaboration in deep learning for past rail stations, to finally consider that, as interesting as it could probably be, it wouldn't accelerate the delivery of the dataset. However, I am sure that there is an opportunity there.

R: The rest of the paper focuses on specific problems and ways to solve them – using open public data, Wikipedia, geo-coding of ancient maps…, information integration and information fusion. In my opinion, this part of the paper describes the key contributions of this research (which, however, differ from the rest of the paper).
A:
I do fully agree with that sentence: therefore, I spent the short time of this revision to rewrite the core parts on specific problems, hopefully better? and removing parts that differ from it.

The main difficulty was time. The editorial board send me reviews, the very day I was leaving home for vacation in a rental without Internet connection (yes it exists in France: Corrèze). I wasn't aware of the two weeks limit in time for the revision. I asked for more time, but they gave me a single extra week, what is a real challenge for a major revision.

Back to TopTop