# Optical Recognition of Handwritten Logic Formulas Using Neural Networks

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Use of NN for HCR in a new problem domain (handwritten logical formulas), where special characters, apart from English letters and digits, are also used.
- Introduction of an image preprocessing methodology, especially as far as the “Line detection” subprocess is concerned, for dataset creation.
- A thorough study of the effects of the changes in the values of the training hyperparameters on the performance of the NN.
- Development of a software tool that implements the introduced methodology and allows re-creation of the dataset and re-configuration of the NN model.

## 2. Related Work

## 3. Methodology

- Image preprocessing,
- Neural network design and training.

#### 3.1. Image Preprocessing

#### 3.1.1. Conversion to Grayscale

#### 3.1.2. Detection of the Paper Area

#### 3.1.3. Thresholding (Binarization)

#### 3.1.4. Detection of Text Area

#### 3.1.5. Character Detection (Contours Extraction)

#### 3.1.6. Line Detection

- Compute the average height and width of characters.
- Create an image with filled rectangles of every character and decrease its’ height by 2/3 if it is greater than the average height.
- Divide the image in vertical stripes (partitions) of width equal to 3 times the average width.
- Calculate horizontal histograms of every stripe and draw rectangles on lines’ bound areas.
- Concatenate all stripes and find all formed contours (as possible lines).
- Search for small contours with height less than 2/3 of the average height, and width less than half of the whole image width and merge them with the closest contour.
- Merge all contour pairs that have vertical distance less than average height (the case of lines that have large gaps inside).
- Order contour lines by vertical position and find every character that lies inside each line based on its’ center.
- Calculate the convex hull of each line according to its’ characters and draw the overlay (for user display).

#### 3.1.7. Ordering and Normalizing Characters

- ‘i’ and ‘j’: check if the lower point of the small one is on top of the higher point of the big one. At the same time, check if the slope between them is greater than 1.
- ‘=’: check if both aspect ratios are greater than 2 and if the center of one lies inside the other’s width area.

#### 3.2. Neural Network Design and Training

#### 3.2.1. Labeling and Dataset Creation

#### 3.2.2. Neural Network Configuration

- Number of inputs,
- Number of outputs,
- Number of hidden layers,
- Number of neurons at each hidden layer,
- Activation function of neurons at each layer.

#### 3.2.3. Hyperparameters Setting

- Learning rate,
- Batch size,
- Maximum number of epochs for training,
- Momentum (only for the backpropagation algorithm).

#### 3.2.4. Neural Network Training

- Initialization of weights,
- Selection of training algorithm,
- Testing and re-training.

- Backpropagation: The basic technique that finds the error of the network by propagating backward and changes weights calculating the gradient and magnitude of the error function in every epoch. We can have incremental or batch training.
- Resilient Propagation: An improvement of backpropagation proposed by M. Riedmiller et al. [27], where the change of weights is calculated only by the gradient’s sign and independently, avoiding the defining learning rate and momentum. It works well only with batch training.
- Stochastic Gradient Descent: It takes into account only a random portion (batch) of the trainset in each epoch and changes weights independently using the Adam update rule (proposed by Kingma and Ba [7]). This algorithm is also called mini-batch Gradient Descent.

## 4. Experimental Study

- Overall accuracy of network for testset,
- Accuracy per character and ability to distinguish look-alike characters by examining the Confusion Matrix of the testset,
- Impact of 28 × 28 versus 35 × 35 normalization on final accuracy,
- Generalizing ability of network (low variance between trainset–testset accuracy),
- Training time.

#### 4.1. Formulation of Datasets

#### 4.2. Training with Shuffled–Unshuffled Trainset

#### 4.3. Training with Different Layer Sizes

^{−4}and different activation functions, such as sigmoid, Elliott [28] and rectified linear unit (ReLU). The performances of the most indicative networks are presented in Table 1.

#### 4.4. Training with Different Activation Functions

#### 4.5. Training with Dropout Technique and Adaptive Learning Rate

#### 4.6. Training with Different Batch Sizes

#### 4.7. Training with Normalization Sizes 28 × 28 and 35 × 35

## 5. Application Development

#### 5.1. Basic Functionalities

#### 5.2. Additional Functionalities

## 6. Discussion of Results

#### 6.1. Effect of Hidden Layers

- Increasing the size of a hidden layer can lead to better accuracy till a certain point. Of course, by increasing the size, the network complexity is also increased, meaning that it can approximate more complex functions, but after a point, this will lead to memorizing the trainset without improvement of the testset (overfitting).
- By adding a second hidden layer, the network complexity increases even more with the trainset reaching 100% accuracy faster in some algorithms. Furthermore, testset accuracy may be improved a little, but generally, this results in greater variance of trainset–testset. Thus, the network learns the trainset perfectly but has low generalizing ability on new datasets.

#### 6.2. Effect of Activation Functions

- Sigmoid and Elliott behaved quite similarly in the backpropagation algorithm, in contrast to ReLU, which showed unpredictable and very low accuracy. Between sigmoid and Elliott, we believe Elliott is better for two reasons: its plot has a smoother curve, which leads to better classification detail, and it needs less computational power, thus lower training times.
- ReLU performed much better with SGD, achieving the best accuracy overall. We assume that the bad performance in backpropagation was caused by the exploding gradient problem that may have happened because of ReLU’s steep linear part, leading to very big weight values and accumulation of errors.
- The tanh function proved to be inappropriate for our network because of its output range from −1 to 1, while we encoded from 0 to 1.
- We believe that the Softmax function was the best choice for the output layer, as it is ideal for letting only one active neuron out, based on the possibilities (one-hot encoding).

#### 6.3. Effect of Dropout and Learning Rate

#### 6.4. Effect of Batch Size

#### 6.5. Effect of Normalization (Resizing-Thinning)

#### 6.6. Choice of Training Algorithm

- RPROP had good training times, but it resulted in the worst accuracy with unstable and sometimes unexplained performance. For example, we noticed a significant drop in accuracy after the 70th epoch, while the testset accuracy was higher than the trainset accuracy during all of the training. Although Encog’s developer suggests using RPROP over backpropagation, as it is more sophisticated, we assumed that it was inappropriate for our kind of problem. The highest accuracy achieved with this algorithm was 80.9%.
- Backpropagation produced quite good results with ideal parameters: Elliott activation function, adaptive learning rate with an initial value of 0.0005, and a momentum of 0.99. However, the largest problem was long training times (an average of 2–4 h per experiment) with 50–100 max epochs. In all of the presented training graphs, we see a large accuracy from the first epoch (greater than 60%), followed by slow improvements. This happened because of the initial learning rates and proper activation functions used (Logistic and Elliott). The highest accuracy we achieved was 89.1%, with two hidden layers of 400 neurons.
- The SGD algorithm achieved the best accuracies and, by using the ReLU activation function and proper batch size, managed to pass the 90% accuracy limit. Furthermore, it had the shortest training times, smaller trainset–testset variance and slightly better confusion matrices compared to backpropagation. We believe that its success is due to its stochastic nature and the Adam optimization method implemented in the Encog library. The highest accuracy achieved was 90.13%, and that is why, by default, we consider it as the best choice for the final trained network in our application.

#### 6.7. The Final Network of Application

- Character Normalization: 28 × 28,
- Input Layer: 784 inputs,
- Hidden layers: one layer of 400 neurons with ReLU activation function,
- Output layer: 67 neurons with Softmax activation function,
- Training algorithm: stochastic gradient descent with Adam optimization,
- Initial learning rate: 0.001,
- Batch size: 512.

## 7. Comparisons

## 8. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Rajalakshmi, M.; Saranya, P.; Shanmugavadivu, P. Pattern Recognition-Recognition of Handwritten Document Using Convolutional Neural Networks. In Proceedings of the 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), Tamilnadu, India, 11–13 April 2019; pp. 1–7. [Google Scholar] [CrossRef]
- Gao, X.; Deng, H.; Ma, M.; Guan, Q.; Sun, Q.; Si, W.; Zhong, X. Removing light interference to improve character recognition rate by using single-pixel imaging. Opt. Lasers Eng.
**2021**, 140, 106517. [Google Scholar] [CrossRef] - Jiao, S.; Feng, J.; Gao, Y.; Lei, T.; Yuan, X. Visual cryptography in single-pixel imaging. Opt. Express
**2020**, 28, 7301–7313. [Google Scholar] [CrossRef] [PubMed] - Vinjit, B.M.; Bhojak, M.K.; Kumar, S.; Chalak, G. A Review on Handwritten Character Recognition Methods and Techniques. In Proceedings of the 2020 International Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, India, 28–30 July 2020; pp. 1224–1228. [Google Scholar] [CrossRef]
- Ahlawat, S.; Choudhary, A.; Nayyar, A.; Singh, S.; Yoon, B. Improved Handwritten Digit Recognition Using Convolutional Neural Networks (CNN). Sensors
**2020**, 20, 3344. [Google Scholar] [CrossRef] [PubMed] - El-Sawy, A.; Loey, M.; El-Bakry, H. Arabic Handwritten Characters Recognition using Convolutional Neural Network. WSEAS Trans. Comput. Res.
**2017**, 5, 11–19. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Perwej, Y.; Chaturvedi, A. Neural Networks for Handwritten English Alphabet Recognition. Int. J. Comput. Appl.
**2011**, 20, 2449–2824. [Google Scholar] [CrossRef] - Kader, F.; Kaushik, D. Neural Network-Based English Alphanumeric Character Recognition. Int. J. Comput. Sci. Eng. Appl. (IJCSEA)
**2012**, 2, 1–10. [Google Scholar] [CrossRef] - Choudhary, A.; Rishi, R.; Ahlawat, S. Off-Line Handwritten Character Recognition using Features Extracted from Binarization Technique. AASRI Procedia
**2013**, 4, 306–312. [Google Scholar] [CrossRef] - Katiyar, G.; Mehfuz, S. MLPNN based handwritten character recognition using combined feature extraction. In Proceedings of the International Conference on Computing, Communication & Automation, Greater Noida, India, 15–16 May 2015; pp. 1155–1159. [Google Scholar] [CrossRef]
- Afroge, S.; Ahmed, B.; Mahmud, F. Optical character recognition using back propagation neural network. In Proceedings of the 2016 2nd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE), Rajshahi, Bangladesh, 8–10 December 2016; pp. 1–4. [Google Scholar] [CrossRef]
- Attigeri, S. Neural network based handwritten character recognition system. Int. J. Eng. Comput. Sci.
**2018**, 7, 23761–23768. [Google Scholar] [CrossRef] - Chen, A.W. Handwriting Recognition and Prediction Using Stochastic Logistic Regression. Int. J. Inf. Res. Rev.
**2018**, 5, 5526–5527. [Google Scholar] - Jana, R.; Bhattacharyya, S.; Das, S. Handwritten Digit Recognition Using Convolutional Neural Networks. In Deep Learning: Research and Applications; Bhattacharyya, S., Snasel, V., Hassanien, A.E., Saha, S., Tripathy, B.K., Eds.; De Gruyter: Berlin, Boston, 2020; pp. 51–68. [Google Scholar]
- Yousaf, A.; Khan, M.J.; Khan, M.J.; Javed, N.; Ibrahim, H.; Khursid, K. Size Invariant Handwritten Character Recognition using Single Layer Feedforward Backpropagation Neural Networks. In Proceedings of the 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sindh, Pakistan, 30–31 January 2019; pp. 1–7. [Google Scholar] [CrossRef]
- Kosykh, N.E.; Khomonenko, A.D.; Bochkov, A.P.; Kikot, A.V. Integration of Big Data Processing Tools and Neural Networks for Image Classification. In Proceedings of the Models and Methods of Information Systems Research Workshop 2019 (MMISR 2019), St. Petersburg, Russian, 4–5 December 2019; pp. 52–58. [Google Scholar]
- Parthiban, R.; Ezhilarasi, R.; Saravanan, D. Optical Character Recognition for English Handwritten Text Using Recurrent Neural Network. In Proceedings of the 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), Puducherry, India, 27–28 March 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Bora, M.B.; Daimary, D.; Amitab, K.; Kandar, D. Handwritten Character Recognition from Images using CNN-ECOC. Procedia Comput. Sci.
**2020**, 167, 2403–2409. [Google Scholar] [CrossRef] - Ahlawat, S.; Choudhary, A. Hybrid CNN-SVM Classifier for Handwritten Digit Recognition. Procedia Comput. Sci.
**2020**, 167, 2554–2560. [Google Scholar] [CrossRef] - Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern.
**1979**, 9, 62–66. [Google Scholar] [CrossRef] [Green Version] - Suzuki, S.; Be, K.A. Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process.
**1985**, 30, 32–46. [Google Scholar] [CrossRef] - Ha, J.; Haralick, R.M.; Philips, T.I. Document page decomposition by the bounding-box project. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 4–16 August 1995. [Google Scholar]
- Kunte, S.R.; Sudhaker, S.R.D. A simple and efficient optical character recognition system for basic symbols in printed Kannada text. Sadhana
**2007**, 32, 521–533. [Google Scholar] [CrossRef] [Green Version] - Zhang, T.Y.; Suen, C.Y. A Fast Parallel Algorithm for Thinning Digital Patterns. Commun. ACM
**1984**, 27, 236–239. [Google Scholar] [CrossRef] - Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Riedmiller, M.; Braun, H. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In Proceedings of the IEEE International Conference on Neural Networks, San Francisco, CA, USA, 28 March–1 April 1993; pp. 586–591. [Google Scholar]
- Elliott, D.L. A Better Activation Function for Artificial Neural Networks; Technical Report 93-8; Institute for Systems Research, University of Maryland: College Park, MD, USA, 1993. [Google Scholar]

**Figure 8.**Impact of shuffled–unshuffled trainset: (

**a**) unshuffled trainset, backpropagation, full batch; (

**b**) unshuffled trainset, backpropagation, batch = 1; (

**c**) shuffled trainset, backpropagation, full batch; (

**d**) shuffled trainset, backpropagation, batch = 1.

**Figure 9.**Comparison of backpropagation-RPROP-SGD testset accuracies of Table 1.

**Figure 10.**Comparison of activation functions. (

**a**) Training with the logistic sigmoid function; (

**b**) training with the Elliott function; (

**c**) training with the ReLU function (backpropagation); (

**d**) training with the ReLU function (SGD).

**Figure 11.**Comparison of fixed and adaptive learning rate training: (

**a**) training with fixed learning rate 0.0005; (

**b**) training with adaptive learning rate.

**Figure 12.**Training comparison with normal and thinned characters. (

**a**) Training with normal characters (35 × 35); (

**b**) training with thinned characters (35 × 35).

Training Algorithm | Hidden Layer Size | Batch Size | Activation Function | Testset Accuracy |
---|---|---|---|---|

Backpropagation | 200 | 1 | Sigmoid | 82.06% |

RPROP | 200 | Full | Elliott | 59.05% |

SGD | 200 | 512 | ReLU | 85.1% |

Backpropagation | 300 | 1 | Elliott | 83.2% |

RPROP | 300 | Full | Sigmoid | 48.7% |

SGD | 300 | 512 | ReLU | 90% |

Backpropagation | 400 | 1 | Elliott | 82.8% |

RPROP | 400 | Full | Sigmoid | 55.17% |

SGD | 400 | 512 | ReLU | 90.02% |

Backpropagation | 500 | 1 | Sigmoid | 76.2% |

RPROP | 500 | Full | Elliott | 65.8% |

SGD | 500 | 512 | ReLU | 88.98% |

Backpropagation | 700 | 1 | Sigmoid | 73.1% |

RPROP | 700 | Full | Sigmoid | 74.9% |

SGD | 700 | 512 | ReLU | 89.7% |

Backpropagation | 800 | 1 | Elliott | 88.5% |

RPROP | 800 | Full | Elliott | 62.25% |

SGD | 800 | 512 | ReLU | 90.03% |

Backpropagation | 900 | 1 | Elliott | 87.9% |

RPROP | 900 | Full | Sigmoid | 75.7% |

SGD | 900 | 512 | ReLU | 90.1% |

Training Algorithm | Hidden Layer 1 | Hidden Layer 2 | Activation Function | Testset Accuracy |
---|---|---|---|---|

Backpropagation | 300 | 300 | Elliott | 87.6% |

Backpropagation | 400 | 400 | Elliott | 88.8% |

SGD | 400 | 400 | Elliott | 89.4% |

RPROP | 300 | 300 | Sigmoid | 62.6% |

Training Algorithm | Hidden Layers | Activation Function | Trainset Accuracy | Testset Accuracy |
---|---|---|---|---|

Backpropagation | 400-400 | Elliott | 100% | 89% |

Backprop. (Dropout: 0.5) | 400-400 | Elliott | 96.7% | 89.1% |

Backpropagation | 800 | Elliott | 90% | 88.5% |

Backprop. (Dropout: 0.8) | 800 | Elliott | 88.6% | 88.9% |

RPROP | 500 | Elliott | 80.3% | 80.9% |

RPROP (Dropout: 0.1) | 500 | Elliott | 77.6% | 77.4% |

Training Algorithm | Hidden Layer Size | Batch Size | Testset Accuracy |
---|---|---|---|

SGD | 400 | 256 | 88.77% |

SGD | 400 | 512 | 90.13% |

SGD | 400 | 1024 | 89.7% |

SGD | 400 | full | 88.39% |

Training Algorithm | Hidden Layer Size | Batch Size | Activation Function | Trainset Accuracy | Testset Accuracy |
---|---|---|---|---|---|

Backprop. (28 × 28) | 800 | 1 | Elliott | 92.6% | 87.3% |

Backprop. (35 × 35) | 800 | 1 | Elliott | 92.48% | 87.21% |

SGD (28 × 28) | 400 | 512 | ReLU | 92.19% | 90.13% |

SGD (35 × 35) | 400 | 512 | ReLU | 94.24% | 90.17% |

Research Work | Dataset (Samples) | Character Type | NN Architecture | Training Algorithm | Accuracy |
---|---|---|---|---|---|

[8] Perwej and Chaturvedi (2011) | 650 (520 train, 130 test), size 5 × 5 | English lowercase alphabet | 2 hidden layers (25-25-25-?) | Backpropagation (?) | 82.5% |

[10] Choudhary etal (2013) | 1300, size 15 × 12 | English lowercase alphabet | 1 hidden layer (180-80-26) | Backpropagation, adaptive learning rate | 85.62% |

[11] Katiyar and Mehfuz (2015) | CEDAR: 21,328 (19,145 train, 2183 test) | English alphabet | 2 hidden layers (144-100-90-6) hybrid features | Backpropagation | 93.23% |

[13] Attigeri (2018) | 4840 (4400 train, 440 test), size 30 × 20 | English lowercase alphabet | two hidden layers of 100 neurons | Backpropagation | 90.19% |

[16] Yousaf et al. (2019) | HCD: 27,142 (19422 train, 7720 test), size 60 × 40 | English capital alphabet Digits | 1 hidden layer: 2400-240-26 (alphabet) 2400-120-10 (digits) | Backpropagation (?) | 96.98% (alphabet) 98.08% (digits) |

[17] Kosykh et al. (2019) | MNIST: 70,000 (60,000 train, 10,000 test), size 28 × 28 | Digits | 1 hidden layer (784-145-10) | SGD | 95.7% |

Our Approach | 24,697 (16,750 train, 7947 test), size 28 × 28 | English alphabet Digits Logic characters | 1 hidden layer (784-400-67) | SGD with Adam optimization | 90.13% |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ampelakiotis, V.; Perikos, I.; Hatzilygeroudis, I.; Tsihrintzis, G.
Optical Recognition of Handwritten Logic Formulas Using Neural Networks. *Electronics* **2021**, *10*, 2761.
https://doi.org/10.3390/electronics10222761

**AMA Style**

Ampelakiotis V, Perikos I, Hatzilygeroudis I, Tsihrintzis G.
Optical Recognition of Handwritten Logic Formulas Using Neural Networks. *Electronics*. 2021; 10(22):2761.
https://doi.org/10.3390/electronics10222761

**Chicago/Turabian Style**

Ampelakiotis, Vaios, Isidoros Perikos, Ioannis Hatzilygeroudis, and George Tsihrintzis.
2021. "Optical Recognition of Handwritten Logic Formulas Using Neural Networks" *Electronics* 10, no. 22: 2761.
https://doi.org/10.3390/electronics10222761