Image and Video Processing for Blind and Visually Impaired

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Image and Video Processing".

Deadline for manuscript submissions: 1 June 2024 | Viewed by 7108

Special Issue Editors


E-Mail Website
Guest Editor
The City College and The Graduate Center, The City University of New York, New York, NY 10016, USA
Interests: computer vision; multimodal perception; assistive technology; human computer interaction; machine learning; augmented reality; virtual reality

E-Mail Website
Guest Editor
1. Department of Rehabilitation Medicine and Department of Neurology, NYU Langone Health, New York, NY 10016, USA
2. Department of Mechanical and Aerospace Engineering, and Department of Biomedical Engineering, NYU Tandon School of Engineering, New York, NY 11201, USA
Interests: assistive technology; wearable computing; 3D computer modelling; virtual and augmented reality; mobile vision and navigation

E-Mail Website
Guest Editor
Borough of Manhattan Community College and The Graduate Center, The City University of New York, New York, NY 10007, USA
Interests: 3D computer modeling; virtual and augmented reality; mobile vision and navigation; assistive technology

Special Issue Information

Dear Colleagues,

Over 2.2 billion people live with vision loss throughout the world. Vision plays a primary role in the efficient capture and integration of sensory information from the surrounding environment, and is critically involved in the complex process of sensory transduction through higher-level cortical interpretation, enabling the localization and recognition of spatial layouts and objects, comprehension of three-dimensional relationships among objects and spatial geometry, including egocentric perspective or one’s own location relative to landmarks, and, on a meta-level, spatial cognition. Virtually all aspects of life are affected by loss of visual input. More broadly, visual impairment leads to difficulties in performing activities of daily living, affects safe mobility, decreases social participation, and results in diminished independence and quality of life. Beside the immediate limitations caused by sensory loss, physical/environmental infrastructure (e.g., lack of accessibility) and social factors (e.g., discrimination and lack of education resources) amplify visual impairment-related limitations and restrictions.

For this Special Issue, we call for original contributions that lead to innovative methods and applications, in particular those using image and video processing, which can be used to promote independence and community living among people of all ages with low vision and blindness. The topics include, but are not limited to:

  1. Increased access to graphical information, signage, travel information, or devices and appliances with digital displays and control panels through AI-based image and video processing.
  2. Improved non-visual or enhanced-visual orientation and mobility guidance in both indoor and outdoor environments by using portable and/or mobile image and video processing.
  3. Increased participation of people who are blind or have low vision in science, technology, engineering, arts, mathematics, and medicine (STEAM2) education and careers through the use of augmented reality techniques with image and video processing.

Prof. Dr. Zhigang Zhu
Prof. Dr. John Ross Rizzo
Prof. Dr. Hao Tang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • assistive navigation
  • augmented/mixed reality
  • information access
  • orientation and mobility
  • object identification
  • sensory substitution
  • signage reading
  • social interaction
  • visual question answering

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 3624 KiB  
Article
A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction
by Yu Hao, Fan Yang, Hao Huang, Shuaihang Yuan, Sundeep Rangan, John-Ross Rizzo, Yao Wang and Yi Fang
J. Imaging 2024, 10(5), 103; https://doi.org/10.3390/jimaging10050103 - 26 Apr 2024
Viewed by 384
Abstract
People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive [...] Read more.
People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive technologies for the visually impaired often struggle in real-world scenarios due to the need for constant training and lack of robustness, which limits their effectiveness, especially in dynamic and unfamiliar environments, where accurate and efficient perception is crucial. Therefore, we frame our research question in this paper as: How can we assist pBLV in recognizing scenes, identifying objects, and detecting potential tripping hazards in unfamiliar environments, where existing assistive technologies often falter due to their lack of robustness? We hypothesize that by leveraging large pretrained foundation models and prompt engineering, we can create a system that effectively addresses the challenges faced by pBLV in unfamiliar environments. Motivated by the prevalence of large pretrained foundation models, particularly in assistive robotics applications, due to their accurate perception and robust contextual understanding in real-world scenarios induced by extensive pretraining, we present a pioneering approach that leverages foundation models to enhance visual perception for pBLV, offering detailed and comprehensive descriptions of the surrounding environment and providing warnings about potential risks. Specifically, our method begins by leveraging a large-image tagging model (i.e., Recognize Anything Model (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV, using prompt engineering. By combining the prompt and input image, a vision-language foundation model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing environmental objects and scenic landmarks, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method can recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV. Full article
(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)
Show Figures

Figure 1

16 pages, 9769 KiB  
Article
Pedestrian-Accessible Infrastructure Inventory: Enabling and Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data for All Pedestrian Types
by Jiahao Xia, Gavin Gong, Jiawei Liu, Zhigang Zhu and Hao Tang
J. Imaging 2024, 10(3), 52; https://doi.org/10.3390/jimaging10030052 - 21 Feb 2024
Viewed by 1098
Abstract
In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data, including LiDAR data and satellite imagery data. We used an expanded definition of pedestrian infrastructure inventory, which goes [...] Read more.
In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data, including LiDAR data and satellite imagery data. We used an expanded definition of pedestrian infrastructure inventory, which goes beyond the traditional transportation elements to include street furniture objects that are important for accessibility but are often omitted from the traditional definition. Our contributions lie in producing the necessary knowledge to answer the following three questions. First, how can mobile LiDAR technology be leveraged to produce comprehensive pedestrian-accessible infrastructure inventory? Second, which data representation can facilitate zero-shot segmentation of infrastructure objects with SAM? Third, how well does the SAM-based method perform on segmenting pedestrian infrastructure objects? Our proposed method is designed to efficiently create pedestrian-accessible infrastructure inventory through the zero-shot segmentation of multi-sourced geospatial datasets. Through addressing three research questions, we show how the multi-mode data should be prepared, what data representation works best for what asset features, and how SAM performs on these data presentations. Our findings indicate that street-view images generated from mobile LiDAR point-cloud data, when paired with satellite imagery data, can work efficiently with SAM to create a scalable pedestrian infrastructure inventory approach with immediate benefits to GIS professionals, city managers, transportation owners, and walkers, especially those with travel-limiting disabilities, such as individuals who are blind, have low vision, or experience mobility disabilities. Full article
(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)
Show Figures

Figure 1

19 pages, 586 KiB  
Article
Images, Words, and Imagination: Accessible Descriptions to Support Blind and Low Vision Art Exploration and Engagement
by Stacy A. Doore, David Istrati, Chenchang Xu, Yixuan Qiu, Anais Sarrazin and Nicholas A. Giudice
J. Imaging 2024, 10(1), 26; https://doi.org/10.3390/jimaging10010026 - 18 Jan 2024
Viewed by 1472
Abstract
The lack of accessible information conveyed by descriptions of art images presents significant barriers for people with blindness and low vision (BLV) to engage with visual artwork. Most museums are not able to easily provide accessible image descriptions for BLV visitors to build [...] Read more.
The lack of accessible information conveyed by descriptions of art images presents significant barriers for people with blindness and low vision (BLV) to engage with visual artwork. Most museums are not able to easily provide accessible image descriptions for BLV visitors to build a mental representation of artwork due to vastness of collections, limitations of curator training, and current measures for what constitutes effective automated captions. This paper reports on the results of two studies investigating the types of information that should be included to provide high-quality accessible artwork descriptions based on input from BLV description evaluators. We report on: (1) a qualitative study asking BLV participants for their preferences for layered description characteristics; and (2) an evaluation of several current models for image captioning as applied to an artwork image dataset. We then provide recommendations for researchers working on accessible image captioning and museum engagement applications through a focus on spatial information access strategies. Full article
(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)
Show Figures

Figure 1

19 pages, 18837 KiB  
Article
Detecting Deceptive Dark-Pattern Web Advertisements for Blind Screen-Reader Users
by Satwik Ram Kodandaram, Mohan Sunkara, Sampath Jayarathna and Vikas Ashok
J. Imaging 2023, 9(11), 239; https://doi.org/10.3390/jimaging9110239 - 06 Nov 2023
Viewed by 3402
Abstract
Advertisements have become commonplace on modern websites. While ads are typically designed for visual consumption, it is unclear how they affect blind users who interact with the ads using a screen reader. Existing research studies on non-visual web interaction predominantly focus on general [...] Read more.
Advertisements have become commonplace on modern websites. While ads are typically designed for visual consumption, it is unclear how they affect blind users who interact with the ads using a screen reader. Existing research studies on non-visual web interaction predominantly focus on general web browsing; the specific impact of extraneous ad content on blind users’ experience remains largely unexplored. To fill this gap, we conducted an interview study with 18 blind participants; we found that blind users are often deceived by ads that contextually blend in with the surrounding web page content. While ad blockers can address this problem via a blanket filtering operation, many websites are increasingly denying access if an ad blocker is active. Moreover, ad blockers often do not filter out internal ads injected by the websites themselves. Therefore, we devised an algorithm to automatically identify contextually deceptive ads on a web page. Specifically, we built a detection model that leverages a multi-modal combination of handcrafted and automatically extracted features to determine if a particular ad is contextually deceptive. Evaluations of the model on a representative test dataset and ‘in-the-wild’ random websites yielded F1 scores of 0.86 and 0.88, respectively. Full article
(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction
Authors: Yu Hao; Fan Yang; Hao Huang; Shuaihang Yuan; Sundeep Rangan; John-Ross Rizzo; Yao Wang; Yi Fang
Affiliation: New York University, New York University Abu Dhabi
Abstract: People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive technologies for the visual disability often struggle in real-world scenarios due to the need for constant training and lack of robustness, which limits their effectiveness, especially in dynamic and unfamiliar environments, where accurate and efficient perception is crucial. To address these challenges, in this paper, we present a pioneering approach that leverages a foundation model to assist environmental interaction for pBLV, offering detailed and comprehensive descriptions of the surrounding environment and providing warnings about potential risks. The pretrained foundation model is particularly suited for assistive robotics applications as its extensive pretraining allows for better contextual understanding and more accurate perception in real-world scenarios. Specifically, our method begins by leveraging a large-image tagging model (i.e., Recognize Anything (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV, using prompt engineering. By combining the prompt and input image, a vision-language foundation model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing environmental objects and scenic landmarks, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method is able to recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV.

Back to TopTop