1. Introduction
Amyotrophic lateral sclerosis (ALS), also known as Lou Gehrig’s disease, is a progressive paralysis of the skeletal muscles throughout the body caused by the gradual death of the motor neurons that control the body’s skeletal muscles, making it impossible to lead a life alone. As the motor neurons in the brain and spinal cord are damaged, movement of the jaw, lips, and tongue becomes progressively slower, eventually reaching a point where communication is impossible [
1,
2,
3]. For such patients, not being able to communicate well can be more painful than not being able to move around. The frustration of not being able to say what one wants to say and not being able to communicate one’s needs or speak at all can lead to a fear of not being able to call someone even during an emergency. Thus, immediate communication with patients with ALS is one of the most necessary and challenging tasks in daily life [
4,
5,
6].
Efforts to communicate with patients suffering from ALS, whose body’s muscles slowly stiffen, have been focused on eye movements, which are the only possible system of movement, and various studies have been conducted based on these. Chambayil et al. proposed a brain–computer interface (BCI) system that detected signals from eyeblinks using electroencephalography (EEG) and selected blocks and letters on a virtual keyboard [
7]. The participants were asked to type in letters by sequentially selecting 1 of 26 alphabets organized into three main blocks and sub-blocks by blinking their eyes. The user experience was enhanced by dividing the selected block into sub-blocks for further selection; however, the average processing time of more than one minute to print a single alphabet was time-consuming for communicating with a user. Similarly, Rusanu et al. proposed a communication scheme using EEG [
8]. It was designed to speed up communication with users by having them blink to select one of 32 predetermined emoticons rather than alphabet letters, and it also showed limitations in expressing something other than the predetermined emoticons. Bandara and Nanayakkara proposed a BCI system based on electromyography (EMG) and an electrooculogram (EOG) for partially paralyzed patients who could move some of their muscles and their eyes using a combination of eye blinks and lip movement signals to control electronic devices [
9]. They used EMG and EOG rather than of EEG to improve the accuracy of the blink detection by reducing noise. However, this method had some disadvantages: it was limited to controlling electronic devices such as lights and televisions through mouse control rather than communicating with the patients, making it difficult for patients to express specific opinions. Furthermore, another disadvantage was that it took a long time to interpret and identify the signals due to the real-time nature of the EMG and EOG signals.
These methods have a common problem in implementing user interface technology based on biometric signals, which leads to slow processing speed and poor accuracy due to the sensitivity of biosignals to external noises. They also require multiple electrodes to be attached to the scalp or muscles at all times, causing great inconvenience and exposing immunocompromised patients to the possibility of invasive infections [
10,
11]. Hence, instead of using biosignal-based technologies, researchers have been investigating the use of an eye mouse—based on simple eye movements—that uses cameras to identify patient intentions. Špakov and Majaranta suggested a scrollable QWERTY virtual keyboard system using eyeball tracking [
12]. The space occupied by the existing three-row virtual keyboard was reduced by incorporating a scroll function. As a result, the speed of word input improved, and it had the advantage of being accessible by using simple gaze processing to input words. However, its eyeball position detection could be sensitive even to small head movements, and in low-light environments, the eyeball itself can be difficult to detect, making eye-tracking impossible. Attiah and Khairullah proposed a method of typing words via a virtual keyboard by blinking when the letter the patient wanted to select appeared on the screen [
13]. While it had an advantage wherein it was easy for a patient to type words with a simple blink of an eye, it also had a long output time because the patient had to wait for the desired letter to appear. Sushmitha et al. proposed an eye-blinking scheme based on Morse code where the Morse code was designed as a combination of short and long blinks [
14]. However, with different Morse codes defined for each alphabet, the combination of short and long blinks was complex, which could lead to less accurate words and longer processing times.
In this way, although various studies have been completed to help paralyzed patients communicate, biosignal-based eye-blinking communication methods suffer from low accuracy due to noise, invasiveness of the patient, and the cost of the equipment required to receive and interpret the signals. Existing camera-based methods also have the drawbacks of low accuracy and slow processing speed.
Therefore, this paper proposes an eye mouse that uses simplified Morse code (iMouse-sMc) to improve the processing speed of a camera-based letter entry system that allows ALS patients to communicate with high accuracy without having to wear separate equipment. An eyeblink detector was chosen as a switch to show a patient’s intentions, and only seven simplified Morse codes were used to represent the alphabets, which is a quarter of the original 26 Morse codes, to increase the word processing speed and accuracy. We assigned the alphabets equally to four quadrants of the monitor and let the user select an area containing the desired alphabet via quadrant navigation. Quadrant navigation is a selection process on a virtual keyboard in which the mouse cursor moves clockwise every second in each quadrant, with only seven simplified eyeblink combinations representing the corresponding letters in the selected area. In addition, to enable eyeblink detection in low-light environments, an image enhancement technique (histogram equalization) was introduced to make it universal for day and night communication with patients [
15,
16]. To confirm the excellent performance of the proposed iMouse-sMC, comparative experiments with existing similar models were conducted. The results showed that the average accuracy of both short and long words was increased by up to 28.33% and 48.05%, respectively, and their processing speed was reduced by up to 20 s and 83 s, respectively. Furthermore, we compared the performance of the iMouse-sMC in low-light environments and found that even when the image brightness was reduced to 30%, the proposed model showed the high detection rate of 79.96%, while the existing similar models were unable to detect eye movements. The main contributions of this research can be summarized as follows:
By determining the level of ambient brightness and introducing histogram equalization in low-light environments where eye detection is difficult, the image was preprocessed to detect user’s blinks universally during the day or at night.
International Morse code represents the alphabets with a total of 26 long and short combinations, but in conjunction with quadrant navigation, we proposed a simplified Morse code that could output all alphabets using only seven combinations.
The communication accuracy was improved by applying the SymSpell algorithm that calculated the candidate typos for typed words and corrected them when they occurred.
The proposed eye mouse-based simplified Morse code system is described step by step in
Section 2, and the experimental process and results are presented in
Section 3. Finally, in
Section 4, we present our conclusions and future research directions.
2. Methodology
Efforts to communicate with patients with ALS have largely focused on movement of the eyes as this is the only motor system available. Among them, the method of communicating with patients through camera-based letter selection is easy to implement, but it requires substantial time to communicate with patients depending on the letter selection method. Further, there are limitations such as poor identification accuracy in the case of long sentences [
17,
18]. In addition, it is difficult to communicate with patients in low-light environments due to the difficulty in detecting the eye area. Thus, we have proposed an eye mouse that uses simplified Morse code (iMouse-sMc) to solve these problems and achieve faster and more accurate communication, and it includes the below two main processes, as shown in
Figure 1.
The first step in implementing the eye mouse is eye detection. However, in order to detect an eye, face detection has to occur first. After that, the facial landmark detector extracts the feature points of the face, connects the outline feature points of the eye, and estimates the eye position to complete eye detection. During this step, whether brightness adjustments are necessary is determined based on image pixels to enable eye detection in environments with insufficient luminance. If necessary, the image is adjusted to maximize the contrast through image enhancement techniques [
19,
20,
21]. The second step involves eye blink detection using the size and position of the pupil, the position and angle of the eyelids, etc. within the detected eye area to determine whether the eye has blinked for a short or long period of time, and the corresponding letter is inferred by matching the time period of the eyeblinks with a predefined, simplified Morse code combination. Finally, we introduced a typo-correcting SymSpell algorithm that efficiently suggests and corrects the most likely correct spelling of a selected word in case it is unintentionally misspelled [
22].
2.1. Region of Interest Detection
Eye detection, a region of interest (ROI) in an image, is the most basic and important step for communicating with a patient. In low-light environments, however, it becomes very difficult to detect the ROI, and so preprocessing is conducted to improve image contrast for eye detection.
2.1.1. Image Enhancement
Under low-light conditions at night or in cloudy weather, face detection becomes more difficult, which is a major factor in weakening the performance of an eye mouse based on eyeblinks. To solve these problems, we introduced contrast limited adaptive histogram equalization (CLAHE), which reduces the noise generated during image enhancement by setting an upper bound on the occurring frequency of blocks with a fixed size to avoid excessive smoothing of the histogram, thereby enhancing the contrast of the image more efficiently and naturally [
23]. The CLAHE process first divides the image into small non-overlapping blocks of size
. If the brightness of each pixel within each partitioned block is
, then the histogram of block is computed as follows:
where
is a Kronecker delta function that returns 1 if
and 0 for any other case. The histograms calculated within each block are then transformed into a cumulative distribution function,
, which maps the histograms evenly so that if the brightness value
within a block is
, the mapped value
can be summarized as follows:
where
is the maximum value of the brightness and
is the total number of pixels in a block. If the uniform mapping results in values that are too bright or too dark, then the values are modified by adjusting the cumulative distribution function of the corresponding block. This process improves the local contrast of the image and results in a sharper, more detailed image with better overall contrast.
2.1.2. Eye Detection
For eye detection, we first estimated the coordinates of 68 feature points on the face using a pre-trained facial landmark detector from the dlib library. The landmark detector generates feature vectors based on histograms of oriented gradients by calculating the gradient in the direction of the histogram at each pixel location in the image. The generated vector uses a support vector machine classifier to detect 68 facial landmarks that can describe the eyes, nose, ears, and mouth, as well as the facial shape. As a result, it detects landmarks that surround feature parts of the face, including the eyes and nose, as shown in
Figure 2b, by numbering each facial feature point, as shown in
Figure 2a, and applying it to the face image. In the figure, the left eye corresponds to feature points from 37 through 42 and the right eye corresponds to feature points from 43 through 48. To detect blinking, the corresponding feature points are extracted, but they are extracted as large enough to cover the area surrounding the landmark—approximately 1.2 times the area of the landmark—to minimize the error in detecting eye blinks.
Figure 2c,d shows the results of extracting eyes when the eyes are closed and when the eyes are open, respectively, and the eye is marked with six feature points (
).
2.2. Eyeblink Detection
As a second step to implement the eye mouse, eyeblinks were detected based on the extracted eye feature points. This step takes advantage of the fact that each time an eye blinks, the length between the inner and outer eyelids becomes smaller, and when they open again, the length becomes bigger. We used the eyes aspect ratio (EAR) as an indicator for determining whether blinking was occurring. EAR is defined as set out below by measuring the horizontal and vertical lengths of the eyes [
24]:
where
represents the six feature points around the eye and
represents the Euclidean distance between two points. If the EAR values of both the right and left eyes consistently fall below a certain threshold (
tyically 0.2–0.3) over a certain period of time, it is considered that a blink has occurred. In addition, to determine whether an eyeblink was short or long for implementing the Morse code, we used
, the frame rate while the eye was closed, as an indicator. If
was less than a predefined threshold (
), it was considered a short blink; otherwise, it was considered a long blink. The flowchart of this eyeblink detection process is shown in
Figure 3. To avoid detecting blinks when only one eye was closed, we compared the EARs of the left and right eyes,
and
, respectively, and took them into account when recognizing eyeblinks.
2.3. Simplified Morse Code
Once the eye mouse was implemented through the eye detection and eyeblink detection process, it communicated with patients by expressing their intention in text based on the proposed simplified Morse code. As shown in
Figure 4, on a quadrant monitor with quadrant navigation, the circled red mouse cursor began at the top left, and each second, it automatically moved clockwise to the next quadrant. Each quadrant was assigned seven alphabetic or control letters—A to G, H to N, O to U, and V to Z, Space, and Delete—which lasted until the patient selected the appropriate quadrant using two short blinks.
Once the user selected a quadrant containing the desired alphabet, the following predefined simplified Morse code keyer was implemented with an eyeblink. Here, the Morse code consisted of seven combinations of short and long blinks, and the blink combinations were designed, as shown in
Figure 5, to minimize the occurrence of blink recognition errors. In the figure, ● refers to a short blink and ▬ to a long blink. This minimized the number of Morse codes the user had to learn, making learning easier and enabling higher accuracy and faster processing. After outputting the desired alphabet, the user entered four short blinks to go to the initial screen to select the alphabet contained in the other quadrants. When the user finished entering the desired alphabet, the user moved to the typo correction process with one long blink and two short blinks.
2.4. Typo Correction
After all the desired alphabets had been output on the screen, the SymSpell algorithm was used to check for typos and correct them if necessary. Symspell is an efficient open-source library widely used for natural language processing tasks such as spell-checking and text autocomplete. The algorithm uses the Damerau–Levenshtein distance metric and N-gram for optimal performance and is particularly effective at handling large amounts of data, and its fast search speeds make it applicable to real-time searches. The SymSpell algorithm first builds a lookup table, or lexicon, from a large number of documents or datasets so that if a typo occurs in an input word, it can find the corresponding word. It also stores the frequency of the word to use for typo correction. Then, using the built dictionary, the input words are split into N-grams, and a Trie data structure that indicates which word each N-gram belongs to is created. Based on this, it performs typo correction on the input word, first searching for words that contain all N-grams of the input word to generate a list of candidate words, and then it scores them using frequency and Damerau–Levenshtein distance, returning the word with the highest score as a typo correction result.
4. Conclusions
ALS, also known as Lou Gehrig’s disease, is caused by damage to the motor neurons in the brain and spinal cord, which gradually leads to the loss of movement and eventually, to the inability to communicate. To communicate with patients with quadriplegia, research has been conducted on the methods of letter selection using brainwave or camera-based eyeblink detection. However, brainwave-based techniques are sensitive to noise, which can lead to inaccurate intention identification, and camera-based letter selection techniques have the disadvantage of requiring long processing times due to their use of Morse code or sequential selection methods, which can lead to low accuracy. To overcome these problems and make patient communication faster and easier, we proposed the iMouse-sMc, a simplified Morse code-based decision-making system utilizing an eye mouse. To improve the processing speed of letter selection, we suggested a fusion of the simplified Morse code, which was reduced to one-quarter of the existing 26 Morse codes, and quadrant navigation, and we also applied an image enhancement technique through histogram-smoothing to improve the detection performance of the eye mouse to enable communication with patients even in low-light environments. To evaluate the performance of the proposed iMouse-sMC, we conducted a comparison experiment with two state-of-the-art models based on using various words. The results showed that the communication time was reduced by up to 83 s and the intention recognition accuracy was improved by up to 48.05%, confirming the excellence of the presented model. Additionally, while the existing similar models were unable to detect eyes in low-light environments, the eye recognition rate of the proposed eye mouse with its image contrast enhancement scheme was maintained at 79.96%, validating its excellence in enabling universal communication with patients during the day and at night. We have shown that the proposed system is faster and more accurate than other systems for communicating with patients, and it is also capable of being used in low-light environments. Nevertheless, it suffers from fatigue because it takes a considerable amount of time for patients to type words by blinking their eyes. It also has a limitation wherein a patient’s face must be in the center of the camera for it to detect eye blinking well. Therefore, we will continue to conduct further research to reduce the input time required for words by blinking one’s eyes and to enable the detection of patients’ blinks at various angles.