Image and Video Processing for Blind and Visually Impaired
A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Image and Video Processing".
Deadline for manuscript submissions: 1 June 2024 | Viewed by 7108
Special Issue Editors
Interests: computer vision; multimodal perception; assistive technology; human computer interaction; machine learning; augmented reality; virtual reality
2. Department of Mechanical and Aerospace Engineering, and Department of Biomedical Engineering, NYU Tandon School of Engineering, New York, NY 11201, USA
Interests: assistive technology; wearable computing; 3D computer modelling; virtual and augmented reality; mobile vision and navigation
Special Issue Information
Dear Colleagues,
Over 2.2 billion people live with vision loss throughout the world. Vision plays a primary role in the efficient capture and integration of sensory information from the surrounding environment, and is critically involved in the complex process of sensory transduction through higher-level cortical interpretation, enabling the localization and recognition of spatial layouts and objects, comprehension of three-dimensional relationships among objects and spatial geometry, including egocentric perspective or one’s own location relative to landmarks, and, on a meta-level, spatial cognition. Virtually all aspects of life are affected by loss of visual input. More broadly, visual impairment leads to difficulties in performing activities of daily living, affects safe mobility, decreases social participation, and results in diminished independence and quality of life. Beside the immediate limitations caused by sensory loss, physical/environmental infrastructure (e.g., lack of accessibility) and social factors (e.g., discrimination and lack of education resources) amplify visual impairment-related limitations and restrictions.
For this Special Issue, we call for original contributions that lead to innovative methods and applications, in particular those using image and video processing, which can be used to promote independence and community living among people of all ages with low vision and blindness. The topics include, but are not limited to:
- Increased access to graphical information, signage, travel information, or devices and appliances with digital displays and control panels through AI-based image and video processing.
- Improved non-visual or enhanced-visual orientation and mobility guidance in both indoor and outdoor environments by using portable and/or mobile image and video processing.
- Increased participation of people who are blind or have low vision in science, technology, engineering, arts, mathematics, and medicine (STEAM2) education and careers through the use of augmented reality techniques with image and video processing.
Prof. Dr. Zhigang Zhu
Prof. Dr. John Ross Rizzo
Prof. Dr. Hao Tang
Guest Editors
Manuscript Submission Information
Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.
Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.
Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.
Keywords
- assistive navigation
- augmented/mixed reality
- information access
- orientation and mobility
- object identification
- sensory substitution
- signage reading
- social interaction
- visual question answering
Planned Papers
The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.
Title: A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction
Authors: Yu Hao; Fan Yang; Hao Huang; Shuaihang Yuan; Sundeep Rangan; John-Ross Rizzo; Yao Wang; Yi Fang
Affiliation: New York University, New York University Abu Dhabi
Abstract: People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive technologies for the visual disability often struggle in real-world scenarios due to the need for constant training and lack of robustness, which limits their effectiveness, especially in dynamic and unfamiliar environments, where accurate and efficient perception is crucial. To address these challenges, in this paper, we present a pioneering approach that leverages a foundation model to assist environmental interaction for pBLV, offering detailed and comprehensive descriptions of the surrounding environment and providing warnings about potential risks. The pretrained foundation model is particularly suited for assistive robotics applications as its extensive pretraining allows for better contextual understanding and more accurate perception in real-world scenarios. Specifically, our method begins by leveraging a large-image tagging model (i.e., Recognize Anything (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV, using prompt engineering. By combining the prompt and input image, a vision-language foundation model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing environmental objects and scenic landmarks, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method is able to recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV.