Computer Vision and Machine Learning in Human-Computer Interaction

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (30 June 2021) | Viewed by 28004

Special Issue Editor


E-Mail Website
Guest Editor
Warsaw University of Technology, Faculty of Electronics and Information Technology, Institute of Control and Computation Engineering, Nowowiejska 15/19, 00-665 Warsaw, Poland
Interests: computational techniques in pattern recognition, artificial intelligence and machine learning, and their application to image- and speech analysis; robot vision; biometric techniques
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Among others, the rapid development in the area of imaging sensor technology has been responsible for the recent improvement and technological readiness of various human-computer interaction (HCI) systems, especially those taking the form of human-machine interfaces and human assistance systems. Numerous application fields of HCI techniques have already been found, like car-driver assistance systems, service and social robots, medical and healthcare systems, sport training assistance, and special communication modes for handicapped and elderly people. The price, size, and power requirements of image sensors and digital cameras are steadily falling, presenting new opportunities for machine learning techniques applied in computer vision systems. The miniaturisation of vision sensors and the improved design of high-resolution and high-speed RGB-D cameras significantly stimulates the collection of huge volumes of digital image data. Computer vision algorithms benefit a lot from this process since, along with classic signal processing and pattern recognition techniques, machine learning techniques can be realistically applied now, leading to new robust solutions to human-centred image analysis tasks.

In this Special Issue, we are particularly interested in system architecture and computational techniques, applied for the purpose of human-computer interactions, that are benefiting from modern vision sensors and cameras. From the methodological point of view, the focus is on combining classical pattern recognition and deep learning techniques to create new computational paradigms for typical tasks in visual human-machine interactions, like human pose detection and dynamic gesture recognition, hand- and body sign recognition, eye attention tracking, and face emotion recognition. On the practical side, we are looking for hardware and software components, prototypes, and demonstrators of smart human-computer interaction systems in various application fields. Topics of interest include but are not limited to the following:

  • Human-machine interfaces;
  • Human assistance;
  • Imaging sensors;
  • RGB-D cameras;
  • Image data collection and annotation;
  • Human pose detection;
  • Human gesture recognition;
  • Eye tracking;
  • Face emotion recognition;
  • Sign and body language recognition;
  • Vision-based human-computer interactions (VHCI).
  • Signal processing and pattern recognition in VHCI
  • Deep learning techniques in VHCI
  • Computational paradigms and system architectures for smart VHCI
  • Hardware and software of smart VHCI
  • Prototypes and demonstrators of smart VHCI
  • Applications of smart VHCI

Prof. Dr. Włodzimierz Kasprzak
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. The Special Issue runs on a continued submission model. Authors may submit their papers at any time. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 3477 KiB  
Article
Context Aware Video Caption Generation with Consecutive Differentiable Neural Computer
by Jonghong Kim, Inchul Choi and Minho Lee
Electronics 2020, 9(7), 1162; https://doi.org/10.3390/electronics9071162 - 17 Jul 2020
Cited by 8 | Viewed by 3382
Abstract
Recent video captioning models aim at describing all events in a long video. However, their event descriptions do not fully exploit the contextual information included in a video because they lack the ability to remember information changes over time. To address this problem, [...] Read more.
Recent video captioning models aim at describing all events in a long video. However, their event descriptions do not fully exploit the contextual information included in a video because they lack the ability to remember information changes over time. To address this problem, we propose a novel context-aware video captioning model that generates natural language descriptions based on the improved video context understanding. We introduce an external memory, differential neural computer (DNC), to improve video context understanding. DNC naturally learns to use its internal memory for context understanding and also provides contents of its memory as an output for additional connection. By sequentially connecting DNC-based caption models (DNC augmented LSTM) through this memory information, our consecutively connected DNC architecture can understand the context in a video without explicitly searching for event-wise correlation. Our consecutive DNC is sequentially trained with its language model (LSTM) for each video clip to generate context-aware captions with superior quality. In experiments, we demonstrate that our model provides more natural and coherent captions which reflect previous contextual information. Our model also shows superior quantitative performance on video captioning in terms of BLEU (BLEU@4 4.37), METEOR (9.57), and CIDEr-D (28.08). Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

12 pages, 2579 KiB  
Article
Woven Fabric Pattern Recognition and Classification Based on Deep Convolutional Neural Networks
by Muhammad Ather Iqbal Hussain, Babar Khan, Zhijie Wang and Shenyi Ding
Electronics 2020, 9(6), 1048; https://doi.org/10.3390/electronics9061048 - 24 Jun 2020
Cited by 46 | Viewed by 9066
Abstract
The weave pattern (texture) of woven fabric is considered to be an important factor of the design and production of high-quality fabric. Traditionally, the recognition of woven fabric has a lot of challenges due to its manual visual inspection. Moreover, the approaches based [...] Read more.
The weave pattern (texture) of woven fabric is considered to be an important factor of the design and production of high-quality fabric. Traditionally, the recognition of woven fabric has a lot of challenges due to its manual visual inspection. Moreover, the approaches based on early machine learning algorithms directly depend on handcrafted features, which are time-consuming and error-prone processes. Hence, an automated system is needed for classification of woven fabric to improve productivity. In this paper, we propose a deep learning model based on data augmentation and transfer learning approach for the classification and recognition of woven fabrics. The model uses the residual network (ResNet), where the fabric texture features are extracted and classified automatically in an end-to-end fashion. We evaluated the results of our model using evaluation metrics such as accuracy, balanced accuracy, and F1-score. The experimental results show that the proposed model is robust and achieves state-of-the-art accuracy even when the physical properties of the fabric are changed. We compared our results with other baseline approaches and a pretrained VGGNet deep learning model which showed that the proposed method achieved higher accuracy when rotational orientations in fabric and proper lighting effects were considered. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

36 pages, 9153 KiB  
Article
Utilisation of Embodied Agents in the Design of Smart Human–Computer Interfaces—A Case Study in Cyberspace Event Visualisation Control
by Wojciech Szynkiewicz, Włodzimierz Kasprzak, Cezary Zieliński, Wojciech Dudek, Maciej Stefańczyk, Artur Wilkowski and Maksym Figat
Electronics 2020, 9(6), 976; https://doi.org/10.3390/electronics9060976 - 11 Jun 2020
Cited by 3 | Viewed by 2655
Abstract
The goal of the research reported here was to investigate whether the design methodology utilising embodied agents can be applied to produce a multi-modal human–computer interface for cyberspace events visualisation control. This methodology requires that the designed system structure be defined in terms [...] Read more.
The goal of the research reported here was to investigate whether the design methodology utilising embodied agents can be applied to produce a multi-modal human–computer interface for cyberspace events visualisation control. This methodology requires that the designed system structure be defined in terms of cooperating agents having well-defined internal components exhibiting specified behaviours. System activities are defined in terms of finite state machines and behaviours parameterised by transition functions. In the investigated case the multi-modal interface is a component of the Operational Centre which is a part of the National Cybersecurity Platform. Embodied agents have been successfully used in the design of robotic systems. However robots operate in physical environments, while cyberspace events visualisation involves cyberspace, thus the applied design methodology required a different definition of the environment. It had to encompass the physical environment in which the operator acts and the computer screen where the results of those actions are presented. Smart human–computer interaction (HCI) is a time-aware, dynamic process in which two parties communicate via different modalities, e.g., voice, gesture, eye movement. The use of computer vision and machine intelligence techniques are essential when the human is carrying an exhausting and concentration demanding activity. The main role of this interface is to support security analysts and operators controlling visualisation of cyberspace events like incidents or cyber attacks especially when manipulating graphical information. Visualisation control modalities include visual gesture- and voice-based commands. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

16 pages, 2474 KiB  
Article
Deep Neural Network Based Ambient Airflow Control through Spatial Learning
by Sunghak Kim, InChul Choi, Dohyeong Kim and Minho Lee
Electronics 2020, 9(4), 591; https://doi.org/10.3390/electronics9040591 - 31 Mar 2020
Cited by 5 | Viewed by 3330
Abstract
As global energy regulations are strengthened, improving energy efficiency while maintaining performance of electronic appliances is becoming more important. Especially in air conditioning, energy efficiency can be maximized by adaptively controlling the airflow based on detected human locations; however, several limitations such as [...] Read more.
As global energy regulations are strengthened, improving energy efficiency while maintaining performance of electronic appliances is becoming more important. Especially in air conditioning, energy efficiency can be maximized by adaptively controlling the airflow based on detected human locations; however, several limitations such as detection areas, the installation environment, and sensor quantity and real-time performance which come from the constraints in the embedded system make it a challenging problem. In this study, by using a low resolution cost effective vision sensor, the environmental information of living spaces and the real-time locations of humans are learned through a deep learning algorithm to identify the living area from the entire indoor space. Based on this information, we improve the performance and the energy efficiency of air conditioner by smartly controlling the airflow on the identified living area. In experiments, our deep learning based spatial classification algorithm shows error less than ± 5 ° . In addition, the target temperature can be reached 19.8% faster and the power consumption can be saved up to 20.5% by the time the target temperature is achieved. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

12 pages, 3717 KiB  
Article
A Multi-Feature Representation of Skeleton Sequences for Human Interaction Recognition
by Xiaohang Wang and Hongmin Deng
Electronics 2020, 9(1), 187; https://doi.org/10.3390/electronics9010187 - 19 Jan 2020
Cited by 7 | Viewed by 3171
Abstract
Inspired from the promising performances achieved by recurrent neural networks (RNN) and convolutional neural networks (CNN) in action recognition based on skeleton, this paper presents a deep network structure which combines both CNN for classification and RNN to achieve attention mechanism for human [...] Read more.
Inspired from the promising performances achieved by recurrent neural networks (RNN) and convolutional neural networks (CNN) in action recognition based on skeleton, this paper presents a deep network structure which combines both CNN for classification and RNN to achieve attention mechanism for human interaction recognition. Specifically, the attention module in this structure is utilized to give various levels of attention to various frames by different weights, and the CNN is employed to extract the high-level spatial and temporal information of skeleton data. These two modules seamlessly form a single network architecture. In addition, to eliminate the impact of different locations and orientations, a coordinate transformation is conducted from the original coordinate system to the human-centric coordinate system. Furthermore, three different features are extracted from the skeleton data as the inputs of three subnetworks, respectively. Eventually, these subnetworks fed with different features are fused as an integrated network. The experimental result shows the validity of the proposed approach on two widely used human interaction datasets. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

15 pages, 5841 KiB  
Article
Fusion of 2D CNN and 3D DenseNet for Dynamic Gesture Recognition
by Erhu Zhang, Botao Xue, Fangzhou Cao, Jinghong Duan, Guangfeng Lin and Yifei Lei
Electronics 2019, 8(12), 1511; https://doi.org/10.3390/electronics8121511 - 09 Dec 2019
Cited by 29 | Viewed by 4690
Abstract
Gesture recognition has been applied in many fields as it is a natural human–computer communication method. However, recognition of dynamic gesture is still a challenging topic because of complex disturbance information and motion information. In this paper, we propose an effective dynamic gesture [...] Read more.
Gesture recognition has been applied in many fields as it is a natural human–computer communication method. However, recognition of dynamic gesture is still a challenging topic because of complex disturbance information and motion information. In this paper, we propose an effective dynamic gesture recognition method by fusing the prediction results of a two-dimensional (2D) motion representation convolution neural network (CNN) model and three-dimensional (3D) dense convolutional network (DenseNet) model. Firstly, to obtain a compact and discriminative gesture motion representation, the motion history image (MHI) and pseudo-coloring technique were employed to integrate the spatiotemporal motion sequences into a frame image, before being fed into a 2D CNN model for gesture classification. Next, the proposed 3D DenseNet model was used to extract spatiotemporal features directly from Red, Green, Blue (RGB) gesture videos. Finally, the prediction results of the proposed 2D and 3D deep models were blended together to boost recognition performance. The experimental results on two public datasets demonstrate the effectiveness of our proposed method. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

Back to TopTop