Next Article in Journal
A New Big Data Processing Framework for the Online Roadshow
Next Article in Special Issue
An Artificial-Intelligence-Driven Spanish Poetry Classification Framework
Previous Article in Journal
The Value of Web Data Scraping: An Application to TripAdvisor
Previous Article in Special Issue
Twi Machine Translation
 
 
Article
Peer-Review Record

Empowering Short Answer Grading: Integrating Transformer-Based Embeddings and BI-LSTM Network

Big Data Cogn. Comput. 2023, 7(3), 122; https://doi.org/10.3390/bdcc7030122
by Wael H. Gomaa 1,2, Abdelrahman E. Nagib 2, Mostafa M. Saeed 2, Abdulmohsen Algarni 3 and Emad Nabil 4,5,*
Reviewer 1: Anonymous
Reviewer 2:
Big Data Cogn. Comput. 2023, 7(3), 122; https://doi.org/10.3390/bdcc7030122
Submission received: 24 April 2023 / Revised: 8 June 2023 / Accepted: 15 June 2023 / Published: 21 June 2023
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)

Round 1

Reviewer 1 Report

This paper proposed a system for automatic short answer grading based on pre-trained Transformer models and a Bi-LSTM network. The paper presented promising results and is well-written. I am happy to recommend this work for publication, although I have a few comments that I hope the authors can address.

At the beginning of section 2, the authors described the North Texas dataset that they worked with. However, they didn't discuss how the answers were graded by humans, i.e. the ground truth, and they didn't explain what is the "model answer" (which was mentioned several times throughout the paper). These are confusing points that stand out to me and I hope the authors can clarify them.

In the next paragraph, the authors reviewed an existing work (Reference [2]) that reached a correlation score of 0.95 using document embedding techniques, but only 0.15 if using raw text. I'm curious why the authors listed the lower score of 0.15 in Table 1, as it makes a lot more sense to me to reference the result from their vector embedding work.

In line 130, there seems to be a mis-spelling of the Jiang-Conrath metric which is correct in the next paragraph.

In the first paragraph of section 3.2, it may be better to replace "is having" with "with".

In section 3.4, the authors described the last stage of their system as a "regression task" but didn't have more details. I hope the authors can elaborate more on this step.

In section 4.2, the authors stated that they fixed a few hyperparameters for all experiments, including the batch size and learning rate. These appear to me as strange choices to fix as the learning rate is often the first hyperparameter to fine tune during training. I'm curious about what the results they got if they vary the learning rate.

Also in this section, the authors mentioned that the performance was worse without the (bi) lstm network, but didn't display the results. It would be great to include this result in Table 2 as a baseline.

In addition, it could be nice to bold the best result in tables 2 and 3.

I appreciate the authors' hard work and want to congratulate them on creating a nice paper!

There are a few typos (e.g. sting -> string) that could use a proofread.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Review: Empowering Short Answer Grading: Integrating TransformerBased Embeddings and BI-LSTM Network

 

 

 

 

This paper presents T5 model use for short answer automatic grading evaluated on the Texas dataset.

 

 

Some literature review is old, the long description in literature review of a paper the exploits traditional NLP (lemmatisation, etc) in almost half page is exaggerated. Other use last strictly, but none mentioned regarding a user of BERT or T5 like. I might in doubt that some works have done this, but is not listed here in this work. Perhaps a refined search of this world improve paper.

 

How about the other mentioned evaluated models such as roberta, etc, It would be relevant to set the results and discussion on why didn’t performed well.

 

 

Discussion of Dropout and Relu, etc are to long, Its widely knows their objectives, just a brief brief descriptions and citation would be enough. Also, there are other activation functions 

 

Figure 4 is not a good one, use a different representation of the network layers.

 

MSE for a text evaluation or just the grades? Does answers have some kind of text associated that need to be in place? Or just a simplegrade. A brief intro ok dataset make it clear.

The model by itself is not clear explained, T5 with LSTM. How this is plugged. No diagram to show this. How is the output? Simples grade number?

 

 

 

 

The English is ok, 

 

 

 

Also, Thera are other works to be compared that have results ? If exist is important to listed it namely in their performance. 

English is ok

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This article integrates T5 and BLSTM to predict student responses based on a predefined response model for each question. T5, which stands for "Text-to-Text Transfer Transformer," is a natural language processing model based on neural networks that can be used for various natural language processing tasks, including machine translation, question answering, and summarization generation. T5 is highly effective in these tasks and has set new accuracy records due to its "pre-training and fine-tuning" approach.

The authors propose integrating T5 with BLSTM, a variant of the recurrent neural network architecture known as Long Short-Term Memory (LSTM), which allows for processing data sequences in both forward and backward directions. This bidirectional processing capability is particularly useful in natural language processing tasks such as speech recognition, machine translation, part-of-speech tagging, and named entity recognition. Contextual information is critical for understanding the meaning of words and phrases.

The article is well presented, although Figures 2 and 3 may be unnecessary, and the label on Figure 1 should be changed to "PREDICTION." The authors could also benefit from testing their proposed model on another dataset such as SciEntsBank, a collection of science exam questions designed for evaluating machine comprehension and question-answering systems in the scientific domain. The SciEntsBank dataset is publicly available and can be downloaded from the Allen Institute for AI website.

Furthermore, it would be useful for the authors to discuss the social implications of their results. What is the purpose of this work? Are short evaluations like this necessary in the future?

Overall, this is well-done work, and integrating T5 and BLSTM can improve performance on natural language processing tasks.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The work presented continues to be a subpair in what is commonly done is literature. 

 

the proposed model is not clear regarding the use of Bi-LSTM and why this is more effiencent than using simple transformers instead.

 

 

Minor

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

Authors have addresses my comments

Minor edits

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop