sensors-logo

Journal Browser

Journal Browser

Document-Image Related Visual Sensors and Machine Learning Techniques

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (30 April 2021) | Viewed by 36741

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Institute for Smart Systems Technologies, University Klagenfurt, A9020 Klagenfurt, Austria
Interests: intelligent transport systems; telecommunications; neuro-computing; machine learning and pattern recognition; nonlinear dynamics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Alpen-Adria-Universität Klagenfurt, Department of Applied Informatics, Klagenfurt, Austria
Interests: machine learning; pattern recognition; image processing; data mining; video understanding; cognitive modeling and recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Interests: machine learning; cognitive neuroscience; applied mathematics; machine vision
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Associate Professor, Department of Smart System Technologies, University of Klagenfurt, 9020 Klagenfurt, Austria
Interests: analog computing; dynamical systems; neuro-computing with applications in systems simulation and ultra-fast differential equations solving; nonlinear oscillatory theory with applications; traffic modeling and simulation; traffic telematics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Digitizing paper-based documents to cut down costs and reduce the well-known negative environmental impacts of using and wasting too much paper in offices has led to an increased focus on the systematic electronic scanning of documents through, scanners, mobile phone cameras, etc.

Generally, the quality of the captured document-images is far from good due to a series of challenges related to the performance of the visual sensors and, for camera-based captures, difficult external environmental conditions encountered during the sensing (image capturing) process. Such document-images are mostly hard to read, have low contrast and are corrupted by various artifacts such as noise, blur, shadows, spot lights, etc., just to name a few.

To ensure an acceptable quality of the final document-images that can be perfectly digitalized and involved in various high-level applications based on digital documents, the sensing process must be made much more robust than the raw capture result generated by a purely physical visual sensor. Thus, the physical sensors must be virtually augmented by a series of additional pre- and/or post-processing functional blocks, which mostly involve, amongst others, advanced machine learning techniques.

Paper submissions with innovative and robust approaches are invited for submission as they are needed to solve a series of core issues of relevance for this Special Issue:

  • Visual sensors related issues w.r.t. document capture or digitization:
    • Modeling and calibration of visual sensors w.r.t. various distortions
    • Camera calibration concepts to robustly defocus images
    • Identification, classification and characterization of visual sensor-related sources of document-image deterioration and distortion
    • Sharpness quality prediction for mobile-captured document images
    • Variational models for document-image binarization
    • Fuzzy models for blur estimation on document images
    • Adaptive binarization of degraded and/or distorted document images
    • Rectification and mosaicking of camera-captured document images
    • Sensor systems for low light document capture and binarization with multiple flash images
  • Quality assessment of the performance of visual sensors for document capture:
    • Document-image analysis
    • Subjective and objective assessment of the quality of document-images w.r.t. distortions such as blur, noise, contrast, shadow, spot light, etc.
    • Neurocomputing applications in image quality detection
  • Post-processing of document-images (captured either by scanners or by mobile phone cameras):
    • Image quality analysis and enhancement:
      • Document-image degradation models
      • Restoration of deteriorated document-images
      • Quality enhancement of distorted (w.r.t. blur, noise, contrast, shadow, spot light, etc.) document images
      • Datasets creation for the quality assessment of camera-captured images
      • Perspective rectification of camera-captured document images
    • Document image classification and character recognition:
      • Automated and robust classification of document-images under difficult realistic conditions (i.e. deteriorated or distorted images)
      • Deep learning applications in robust automated image classification
      • Impact of image distortion on the readability of QR codes
      • Robust optical character recognition for distorted and/or deteriorated document-images

Prof. Dr. Kyandoghere Kyamakya
Dr. Fadi Al-Machot
Dr. Ahmad Haj Mosa
Dr. Jean Chamberlain Chedjou
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

4 pages, 162 KiB  
Editorial
Document-Image Related Visual Sensors and Machine Learning Techniques
by Kyandoghere Kyamakya, Ahmad Haj Mosa, Fadi Al Machot and Jean Chamberlain Chedjou
Sensors 2021, 21(17), 5849; https://doi.org/10.3390/s21175849 - 30 Aug 2021
Cited by 2 | Viewed by 2212
Abstract
Document imaging/scanning approaches are essential techniques for digitalizing documents in various real-world contexts, e.g.,  libraries, office communication, managementof workflows, and electronic archiving [...] Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)

Research

Jump to: Editorial

15 pages, 6142 KiB  
Article
Text Detection Using Multi-Stage Region Proposal Network Sensitive to Text Scale
by Yoshito Nagaoka, Tomo Miyazaki, Yoshihiro Sugaya and Shinichiro Omachi
Sensors 2021, 21(4), 1232; https://doi.org/10.3390/s21041232 - 09 Feb 2021
Cited by 10 | Viewed by 2739
Abstract
Recently, attention has surged concerning intelligent sensors using text detection. However, there are challenges in detecting small texts. To solve this problem, we propose a novel text detection CNN (convolutional neural network) architecture sensitive to text scale. We extract multi-resolution feature maps in [...] Read more.
Recently, attention has surged concerning intelligent sensors using text detection. However, there are challenges in detecting small texts. To solve this problem, we propose a novel text detection CNN (convolutional neural network) architecture sensitive to text scale. We extract multi-resolution feature maps in multi-stage convolution layers that have been employed to prevent losing information and maintain the feature size. In addition, we developed the CNN considering the receptive field size to generate proposal stages. The experimental results show the importance of the receptive field size. Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)
Show Figures

Figure 1

18 pages, 5019 KiB  
Article
Pearson Correlation-Based Feature Selection for Document Classification Using Balanced Training
by Inzamam Mashood Nasir, Muhammad Attique Khan, Mussarat Yasmin, Jamal Hussain Shah, Marcin Gabryel, Rafał Scherer and Robertas Damaševičius
Sensors 2020, 20(23), 6793; https://doi.org/10.3390/s20236793 - 27 Nov 2020
Cited by 62 | Viewed by 6046
Abstract
Documents are stored in a digital form across several organizations. Printing this amount of data and placing it into folders instead of storing digitally is against the practical, economical, and ecological perspective. An efficient way of retrieving data from digitally stored documents is [...] Read more.
Documents are stored in a digital form across several organizations. Printing this amount of data and placing it into folders instead of storing digitally is against the practical, economical, and ecological perspective. An efficient way of retrieving data from digitally stored documents is also required. This article presents a real-time supervised learning technique for document classification based on deep convolutional neural network (DCNN), which aims to reduce the impact of adverse document image issues such as signatures, marks, logo, and handwritten notes. The proposed technique’s major steps include data augmentation, feature extraction using pre-trained neural network models, feature fusion, and feature selection. We propose a novel data augmentation technique, which normalizes the imbalanced dataset using the secondary dataset RVL-CDIP. The DCNN features are extracted using the VGG19 and AlexNet networks. The extracted features are fused, and the fused feature vector is optimized by applying a Pearson correlation coefficient-based technique to select the optimized features while removing the redundant features. The proposed technique is tested on the Tobacco3482 dataset, which gives a classification accuracy of 93.1% using a cubic support vector machine classifier, proving the validity of the proposed technique. Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)
Show Figures

Figure 1

16 pages, 8988 KiB  
Article
A Visual Sensing Concept for Robustly Classifying House Types through a Convolutional Neural Network Architecture Involving a Multi-Channel Features Extraction
by Vahid Tavakkoli, Kabeh Mohsenzadegan and Kyandoghere Kyamakya
Sensors 2020, 20(19), 5672; https://doi.org/10.3390/s20195672 - 05 Oct 2020
Cited by 5 | Viewed by 2673
Abstract
The core objective of this paper is to develop and validate a comprehensive visual sensing concept for robustly classifying house types. Previous studies regarding this type of classification show that this type of classification is not simple (i.e., tough) and most classifier models [...] Read more.
The core objective of this paper is to develop and validate a comprehensive visual sensing concept for robustly classifying house types. Previous studies regarding this type of classification show that this type of classification is not simple (i.e., tough) and most classifier models from the related literature have shown a relatively low performance. For finding a suitable model, several similar classification models based on convolutional neural network have been explored. We have found out that adding/involving/extracting better and more complex features result in a significant accuracy related performance improvement. Therefore, a new model taking this finding into consideration has been developed, tested and validated. The model developed is benchmarked with selected state-of-art classification models of relevance for the “house classification” endeavor. The test results obtained in this comprehensive benchmarking clearly demonstrate and validate the effectiveness and the superiority of our here developed deep-learning model. Overall, one notices that our model reaches classification performance figures (accuracy, precision, etc.) which are at least 8% higher (which is extremely significant in the ranges above 90%) than those reached by the previous state-of-the-art methods involved in the conducted comprehensive benchmarking. Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)
Show Figures

Figure 1

21 pages, 10853 KiB  
Article
A Real-World Approach on the Problem of Chart Recognition Using Classification, Detection and Perspective Correction
by Tiago Araújo, Paulo Chagas, João Alves, Carlos Santos, Beatriz Sousa Santos and Bianchi Serique Meiguins
Sensors 2020, 20(16), 4370; https://doi.org/10.3390/s20164370 - 05 Aug 2020
Cited by 9 | Viewed by 4989
Abstract
Data charts are widely used in our daily lives, being present in regular media, such as newspapers, magazines, web pages, books, and many others. In general, a well-constructed data chart leads to an intuitive understanding of its underlying data. In the same way, [...] Read more.
Data charts are widely used in our daily lives, being present in regular media, such as newspapers, magazines, web pages, books, and many others. In general, a well-constructed data chart leads to an intuitive understanding of its underlying data. In the same way, when data charts have wrong design choices, a redesign of these representations might be needed. However, in most cases, these charts are shown as a static image, which means that the original data are not usually available. Therefore, automatic methods could be applied to extract the underlying data from the chart images to allow these changes. The task of recognizing charts and extracting data from them is complex, largely due to the variety of chart types and their visual characteristics. Other features in real-world images that can make this task difficult are photo distortions, noise, alignment, etc. Two computer vision techniques that can assist this task and have been little explored in this context are perspective detection and correction. These methods transform a distorted and noisy chart in a clear chart, with its type ready for data extraction or other uses. This paper proposes a classification, detection, and perspective correction process that is suitable for real-world usage, when considering the data used for training a state-of-the-art model for the extraction of a chart in real-world photography. The results showed that, with slight changes, chart recognition methods are now ready for real-world charts, when taking time and accuracy into consideration. Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)
Show Figures

Figure 1

14 pages, 1927 KiB  
Article
An Algorithm Based on Text Position Correction and Encoder-Decoder Network for Text Recognition in the Scene Image of Visual Sensors
by Zhiwei Huang, Jinzhao Lin, Hongzhi Yang, Huiqian Wang, Tong Bai, Qinghui Liu and Yu Pang
Sensors 2020, 20(10), 2942; https://doi.org/10.3390/s20102942 - 22 May 2020
Cited by 7 | Viewed by 2432
Abstract
Text recognition in natural scene images has always been a hot topic in the field of document-image related visual sensors. The previous literature mostly solved the problem of horizontal text recognition, but the text in the natural scene is usually inclined and irregular, [...] Read more.
Text recognition in natural scene images has always been a hot topic in the field of document-image related visual sensors. The previous literature mostly solved the problem of horizontal text recognition, but the text in the natural scene is usually inclined and irregular, and there are many unsolved problems. For this reason, we propose a scene text recognition algorithm based on a text position correction (TPC) module and an encoder-decoder network (EDN) module. Firstly, the slanted text is modified into horizontal text through the TPC module, and then the content of horizontal text is accurately identified through the EDN module. Experiments on the standard data set show that the algorithm can recognize many kinds of irregular text and get better results. Ablation studies show that the proposed two network modules can enhance the accuracy of irregular scene text recognition. Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)
Show Figures

Figure 1

23 pages, 13949 KiB  
Article
Robust Combined Binarization Method of Non-Uniformly Illuminated Document Images for Alphanumerical Character Recognition
by Hubert Michalak and Krzysztof Okarma
Sensors 2020, 20(10), 2914; https://doi.org/10.3390/s20102914 - 21 May 2020
Cited by 12 | Viewed by 3323
Abstract
Image binarization is one of the key operations decreasing the amount of information used in further analysis of image data, significantly influencing the final results. Although in some applications, where well illuminated images may be easily captured, ensuring a high contrast, even a [...] Read more.
Image binarization is one of the key operations decreasing the amount of information used in further analysis of image data, significantly influencing the final results. Although in some applications, where well illuminated images may be easily captured, ensuring a high contrast, even a simple global thresholding may be sufficient, there are some more challenging solutions, e.g., based on the analysis of natural images or assuming the presence of some quality degradations, such as in historical document images. Considering the variety of image binarization methods, as well as their different applications and types of images, one cannot expect a single universal thresholding method that would be the best solution for all images. Nevertheless, since one of the most common operations preceded by the binarization is the Optical Character Recognition (OCR), which may also be applied for non-uniformly illuminated images captured by camera sensors mounted in mobile phones, the development of even better binarization methods in view of the maximization of the OCR accuracy is still expected. Therefore, in this paper, the idea of the use of robust combined measures is presented, making it possible to bring together the advantages of various methods, including some recently proposed approaches based on entropy filtering and a multi-layered stack of regions. The experimental results, obtained for a dataset of 176 non-uniformly illuminated document images, referred to as the WEZUT OCR Dataset, confirm the validity and usefulness of the proposed approach, leading to a significant increase of the recognition accuracy. Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)
Show Figures

Figure 1

14 pages, 5268 KiB  
Article
A New Filtering System for Using a Consumer Depth Camera at Close Range
by Yuanxing Dai, Yanming Fu, Baichun Li, Xuewei Zhang, Tianbiao Yu and Wanshan Wang
Sensors 2019, 19(16), 3460; https://doi.org/10.3390/s19163460 - 08 Aug 2019
Cited by 4 | Viewed by 2478
Abstract
Using consumer depth cameras at close range yields a higher surface resolution of the object, but this makes more serious noises. This form of noise tends to be located at or on the edge of the realistic surface over a large area, which [...] Read more.
Using consumer depth cameras at close range yields a higher surface resolution of the object, but this makes more serious noises. This form of noise tends to be located at or on the edge of the realistic surface over a large area, which is an obstacle for real-time applications that do not rely on point cloud post-processing. In order to fill this gap, by analyzing the noise region based on position and shape, we proposed a composite filtering system for using consumer depth cameras at close range. The system consists of three main modules that are used to eliminate different types of noise areas. Taking the human hand depth image as an example, the proposed filtering system can eliminate most of the noise areas. All algorithms in the system are not based on window smoothing and are accelerated by the GPU. By using Kinect v2 and SR300, a large number of contrast experiments show that the system can get good results and has extremely high real-time performance, which can be used as a pre-step for real-time human-computer interaction, real-time 3D reconstruction, and further filtering. Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)
Show Figures

Figure 1

11 pages, 27313 KiB  
Communication
Converting a Common Low-Cost Document Scanner into a Multispectral Scanner
by Zohaib Khan, Faisal Shafait and Ajmal Mian
Sensors 2019, 19(14), 3199; https://doi.org/10.3390/s19143199 - 20 Jul 2019
Cited by 3 | Viewed by 5169
Abstract
Forged documents and counterfeit currency can be better detected with multispectral imaging in multiple color channels instead of the usual red, green and blue. However, multispectral cameras/scanners are expensive. We propose the construction of a low cost scanner designed to capture multispectral images [...] Read more.
Forged documents and counterfeit currency can be better detected with multispectral imaging in multiple color channels instead of the usual red, green and blue. However, multispectral cameras/scanners are expensive. We propose the construction of a low cost scanner designed to capture multispectral images of documents. A standard sheet-feed scanner was modified by disconnecting its internal light source and connecting an external multispectral light source comprising of narrow band light emitting diodes (LED). A document was scanned by illuminating the scanner light guide successively with different LEDs and capturing a scan of the document. The system costs less than a hundred dollars and is portable. It can potentially be used for applications in verification of questioned documents, checks, receipts and bank notes. Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)
Show Figures

Figure 1

15 pages, 996 KiB  
Article
The Optimally Designed Variational Autoencoder Networks for Clustering and Recovery of Incomplete Multimedia Data
by Xiulan Yu, Hongyu Li, Zufan Zhang and Chenquan Gan
Sensors 2019, 19(4), 809; https://doi.org/10.3390/s19040809 - 16 Feb 2019
Cited by 7 | Viewed by 3385
Abstract
Clustering analysis of massive data in wireless multimedia sensor networks (WMSN) has become a hot topic. However, most data clustering algorithms have difficulty in obtaining latent nonlinear correlations of data features, resulting in a low clustering accuracy. In addition, it is difficult to [...] Read more.
Clustering analysis of massive data in wireless multimedia sensor networks (WMSN) has become a hot topic. However, most data clustering algorithms have difficulty in obtaining latent nonlinear correlations of data features, resulting in a low clustering accuracy. In addition, it is difficult to extract features from missing or corrupted data, so incomplete data are widely used in practical work. In this paper, the optimally designed variational autoencoder networks is proposed for extracting features of incomplete data and using high-order fuzzy c-means algorithm (HOFCM) to improve cluster performance of incomplete data. Specifically, the feature extraction model is improved by using variational autoencoder to learn the feature of incomplete data. To capture nonlinear correlations in different heterogeneous data patterns, tensor based fuzzy c-means algorithm is used to cluster low-dimensional features. The tensor distance is used as the distance measure to capture the unknown correlations of data as much as possible. Finally, in the case that the clustering results are obtained, the missing data can be restored by using the low-dimensional features. Experiments on real datasets show that the proposed algorithm not only can improve the clustering performance of incomplete data effectively, but also can fill in missing features and get better data reconstruction results. Full article
(This article belongs to the Special Issue Document-Image Related Visual Sensors and Machine Learning Techniques)
Show Figures

Figure 1

Back to TopTop