Next Article in Journal
An Improved Wildfire Smoke Detection Based on YOLOv8 and UAV Images
Previous Article in Journal
A New Short-Circuit Current Calculation and Fault Analysis Method Suitable for Doubly Fed Wind Farm Groups
 
 
Article
Peer-Review Record

KD-Net: Continuous-Keystroke-Dynamics-Based Human Identification from RGB-D Image Sequences

Sensors 2023, 23(20), 8370; https://doi.org/10.3390/s23208370
by Xinxin Dai 1, Ran Zhao 1, Pengpeng Hu 2,* and Adrian Munteanu 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sensors 2023, 23(20), 8370; https://doi.org/10.3390/s23208370
Submission received: 28 August 2023 / Revised: 21 September 2023 / Accepted: 8 October 2023 / Published: 10 October 2023
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)

Round 1

Reviewer 1 Report

The Authors must be ameliorate the describes and the clearly of methods, are very confused

Author Response

We would like to thank you for the positive evaluation of our work as well as for the detailed comments and suggestions that helped improving our paper. Please refer to the attached responses to your comment.

Author Response File: Author Response.pdf

Reviewer 2 Report

[Strengths]

  The author explored a novel visual modality of keystroke dynamics for human identification from RGB-D image sequences captured by a single RGB-D sensor.

  Author created a novel dataset dubbed KD-MultiModal (RGB/depth-images of keystroke dynamics) that contains 243.2K frames of RGB images and depth images, obtained by recording the video of hand typing via an Azure Kinect. The author will make the dataset public when the paper is published to facilitate the related study.

  Author trained a deep neural network to verify the proposed modality of keystroke biometrics.

 

[Weaknesses]

  Existing studies mainly depend on the key press or release events, which focus on recording the velocities of pressing and releasing different keys, the typing styles of specific keys or combinations of keys, or the pressure exerted when pressing a key. The aforementioned work is based on restricted scenario assumptions, i.e., we don't have a camera scenario for authentication. If we have a camera, why don't we just do face recognition? What is the value of the application of identity recognition for hand-based information?

  The paper only compares its work to PointLSTM. It would be more helpful to compare the paper's work with more recent research.

  Learning spatiotemporal features is a crucial component of this model. However, there is a lack of an introduction to existing spatiotemporal feature learning methods. Please provide some methods for references, such as [1], [2], [3].

 

[1] Two-stream convolutional networks for action recognition in videos

[2] Towards Omni-Supervised Face Alignment for Large Scale Unlabeled Videos.

 

[3] Stm: Spatiotemporal and motion encoding for action recognition

Concerns about the writing. Please take a close look at the entire manuscript. For example, (1). ‘ROI of keboard and hand’ in Figure 3.

Author Response

We would like to thank you for the positive evaluation of our work as well as for the detailed comments and suggestions that helped improving our paper. Please refer to the attached responses to your comment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper discussed a visual modality of keystroke biometrics for human identification using a single RGB-D sensor. The authors create a dataset that contains 243.2K frames of RGB images and depth images, obtained by recording the video of hand typing with a single RGB-D sensor. Deep neural networks are adopted to learn features from different data representations, including RGB-KD-Net, D-KD-Net and RGBD-KD-Net. The results show that  RGB-D is more accurate in distinguishing individuals than RGB and point clouds. This work is interesting for using keystroke biometrics for human identification. There are two questions about the work.

1. For ROI extraction, there is a RGB-D sensor to capture the images, how do not use the deepth information to get the keybord and hand area directly?

2. For spetiotemporal sequences, how many frames is more suitable for the identification network.

 

Author Response

We would like to thank you for the positive evaluation of our work as well as for the detailed comments and suggestions that helped improving our paper. Please refer to the attached responses to your comment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The author addressed all my concerns.

Back to TopTop