Next Article in Journal
Higher-Order Multiplicative Derivative Iterative Scheme to Solve the Nonlinear Problems

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# Hand–Eye Calibration Using a Tablet Computer

Department of Mechanical Engineering, Faculty of Engineering, Gifu University, 1-1 Yanagido, Gifu-shi, Gifu 501-1193, Japan
Math. Comput. Appl. 2023, 28(1), 22; https://doi.org/10.3390/mca28010022
Received: 20 January 2023 / Revised: 3 February 2023 / Accepted: 7 February 2023 / Published: 8 February 2023

## Abstract

:
Many approaches have been developed to solve the hand–eye calibration problem. The traditional approach involves a precise mathematical model, which has advantages and disadvantages. For example, mathematical representations can provide numerical and quantitative results to users and researchers. Thus, it is possible to explain and understand the calibration results. However, information about the end-effector, such as the position attached to the robot and its dimensions, is not considered in the calibration process. If there is no CAD model, additional calibration is required for accurate manipulation, especially for a handmade end-effector. A neural network-based method is used as the solution to this problem. By training a neural network model using data created via the attached end-effector, additional calibration can be avoided. Moreover, it is not necessary to develop a precise and complex mathematical model. However, it is difficult to provide quantitative information because a neural network is a black box. Hence, a method with both advantages is proposed in this study. A mathematical model was developed and optimized using the data created by the attached end-effector. To acquire accurate data and evaluate the calibration results, a tablet computer was utilized. The established method achieved a mean positioning error of 1.0 mm.

## 1. Introduction

Robots utilizing vision systems have been introduced at production sites to automate many assembly processes and supply industrial parts. For a robot to pick an object, which is identified using a camera, calibration is required in advance because transforming its position in the image coordinate system to the robot-based coordinate system is necessary. This is known as hand–eye calibration. Many studies have addressed this problem. One of the major approaches is to develop a precise mathematical model and use a calibration board, such as a checkerboard. This is because the feature points of a checkerboard are easy to detect from a captured image, and several classical studies have adopted it as a calibrator [1]. Using the captured images, the mathematical model is optimized.
However, this traditional approach has advantages and disadvantages. For example, mathematical models can provide numerical and quantitative results to users and researchers. Hence, they can understand and analyze why the calibration results are good. However, information about the end-effector, such as the position attached to the robot and its dimensions, is not considered in the calibration process. If there is no CAD model, additional calibration is required for accurate manipulation, especially for a handmade end-effector. One method to solve it is a neural network-based method [2]. By training a neural network model, it can directly transfer a position in the image coordinate system to the robot-based coordinate system. Because the training data are created by using an attached end-effector to the robot hand, developing a complex mathematical model and additional calibration for the end-effector are not necessary. However, it is difficult to understand and analyze the reason for the good calibration results because the neural network is a black box.
Therefore, a method with both advantages is proposed in this study. A mathematical model was developed and optimized using the data created through the same procedure as the neural network-based method. Because the optimized mathematical model can provide numerical and quantitative data and transfer a position in the image coordinate system to the robot-based coordinate system considering the offset of the attached end-effector, detailed information, such as a CAD model and an additional calibration to obtain the offset, are not required. Hence, the proposed method overcomes the disadvantages of both of these approaches and also involves some unique advantages. To acquire accurate data and measure the positioning error to check the calibration performance, a tablet computer was used.

## 2. Related Works

Many approaches have been proposed to solve the hand–eye calibration problem [3,4]. The most basic mathematical model is $A X = X B$ [5], where A and B are homogeneous transformation matrices (HTMs) that represent the relative motions of the robot and an attached camera, and X is an estimated HTM that represents the relationship between the hand and a camera. Based on this model, Motai et al. proposed a method considering the distortion of a camera lens [6]. By estimating camera parameters from multiple viewpoints, active viewpoints can be generated to obtain three-dimensional (3D) models of objects. Many methods have been proposed to solve the unknown parameters of X. According to reference [4], two approaches are represented in the relevant literature, separation, and simultaneous methods. In the former, a rotation matrix of X and a translation vector are solved separately [7,8,9,10,11]; in the latter, both are solved simultaneously [12,13,14,15]. In addition to $A X = X B$, another model ($A X = Y B$ [16]) is used. Hand–eye calibration methods have been developed based on these mathematical models and approaches to solve the unknown parameters of X.
Mišeikis et al. proposed a rapid automatic calibration method using 3D cameras [17]. Even if the cameras and robots being calibrated are repositioned, this method can recalibrate rapidly. Koide et al. proposed a method based on reprojection error minimization [18]. Unlike traditional approaches, their method does not need to explicitly estimate the camera pose for each input image. Pose graph optimization is performed to deal with different camera models. Cao et al. proposed an approach using a neural network for error compensation [18]. Because any device is not necessary for compensation, this method has a low cost.
These related studies employed a mathematical model to achieve hand–eye calibration. Hence, this approach can provide numerical and quantitative information on the calibration results to users and researchers. However, information on the end-effector, such as the position attached to the robot and its dimensions, is not considered in the calibration process. If there is no detailed information, such as a CAD model, additional calibration is required for accurate manipulation, especially for a handmade end-effector. Hua’s approach provides an effective solution [2]. By training a neural network model using the training data, which have various errors and noises in a real environment, robust transformation from the image coordinate system to the robot-based coordinate system is directly possible. Because the training data are created using the attached end-effector, additional calibration is unnecessary. In addition, the neural network model has high representative power. Therefore, the development of a precise and complex mathematical model is not required. However, this approach cannot provide quantitative information because the neural network model is a black box. Therefore, it is difficult to understand and explain the calibration results.
In this paper, a method with both advantages was proposed. A mathematical model was developed and the data to optimize the model were created through the same procedure as the neural network-based method. To acquire accurate data and evaluate the calibration results, a tablet computer was used.

## 3. Proposed Method

#### 3.1. Overview

The developed system is shown in Figure 1. A clear plastic box is attached to the tip flange of the hand of a robot. An RGB-D camera and a tablet pen holder, which is fabricated by a 3D printer, are attached. Because an Intel SR300 camera is used, consideration of image distortion is not necessary [19]. In addition, the intrinsic parameters of the camera can be obtained easily by a software development kit (SDK). Both the pen and camera are mounted at positions different from the rotation center of the robot hand because the end-effector is handmade. Of course, there is no CAD model of it. The pen rotates when the hand rotates. It is necessary to calibrate in this scenario.
For hand–eye calibration using this developed system, some preparations are necessary, similar to the study of Hua [2]. Figure 2 shows a data processing procedure performed by the proposed method. First, nine landmarks (i.e., targets to touch by the robot’s hand with the tablet pen) are displayed on the tablet, as shown in Figure 1a. The interior area surrounded by landmarks is the considered workspace. Second, the tablet display is captured by the attached RGB-D camera, and all positions of the landmarks in the image coordinate system are obtained, as shown in Figure 3a. Third, the robot hand is manually operated to enable the pen and one landmark to touch each other, and the hand position in the robot-based coordinate system is acquired. This data acquisition is repeated to obtain all landmarks. Following this, the rotation angle of the sixth axis is gradually rotated; therefore, the first and ninth dots are $0 ∘$ and $180 ∘$, respectively. Specifically, the sixth axis is rotated by $22 . 5 ∘$. This is because the attached camera and pen are not aligned to the sixth axis of the robot hand, and calibrating in this scenario is necessary. Figure 3b shows an example of the acquired data. Because the attached pen is not aligned and the hand rotates, the data distribution in Figure 3b is different from that in Figure 3a. The parameters of the homogeneous transformation matrices (HTMs) are optimized by two-stage optimization. Using the optimized matrices, the positions to touch in the image coordinate system (Figure 3a) are converted to those in the robot-based coordinate system (Figure 3b). To evaluate the calibration performance, the robot hand with the attached pen touches the nine displayed landmarks, and the mean touching error is calculated after the optimized parameters are introduced in the robot.

#### 3.2. Coordinate System and Homogeneous Transformation Matrix (HTM)

The used DENSO VP-6242 robot [20] has a range of motion with six degrees of freedom (DoF). The coordinate systems are shown in Figure 4 and Figure 5. $Σ b$, $Σ h$, $Σ c$, $Σ i$, and $Σ t$ denote the robot base, hand, camera, image, and tablet computer coordinate systems. $b T h$ and $h T c$ represent HTMs, which have rotation and translation parts, as expressed below.
where $t h = ( x h , y h , z h ) ⊤$ and $t c = ( x c , y c , z c ) ⊤$ are translation vectors from $Σ h$ to $Σ b$ and $Σ c$ to $Σ h$, respectively. $R x ( α c )$, $R y ( β c )$, and $R z ( γ c )$ are $3 × 3$ rotation matrices of the x, y, and z axes, respectively.
$t i$ represents the transformation from $Σ i$ to $Σ c$. It can be achieved using the pinhole camera model.
$u m v m 1 = f x 0 c x 0 f y c y 0 0 1 x m c / z m c y m c / z m c 1 ,$
where $( u m , v m ) ⊤$ represents the mth black dot in $Σ i$. Let $f x$ and $f y$ be the focal lengths of the x and y axes, respectively. $c x$ and $c y$ are the coordinates of the principal points of the x and y axes, respectively. $x m c$ and $y m c$ are the transformed positions in $Σ b$. $z m c$ is the distance from the camera to the mth black dot. It is measured by the RGB-D camera. Thus,
$t i = x m c y m c z m c 1 = z m c ( u m − c x ) / f x z m c ( v m − c y ) / f y z m c 1 .$
Finally, the position in $Σ b$ ($( x m bi , y m bi , z m bi ) ⊤$) can be obtained from the following equation:
$x m bi y m bi z m bi 1 = b T h h T c t i .$
In the developed system, the above equation is insufficient because the tablet pen and the camera are not aligned to the rotation axis of $Σ h$. Hence, the offset ($( x ′ , y ′ , z ′ ) ⊤$) should be considered, which can be calculated as follows:
$x ′ y ′ z ′ = R y ( − 180 ) R z ( θ h ) t p$
$x ′ y ′ z ′ = − x p cos θ + y p sin θ x p sin θ + y p cos θ − z p .$
As shown in Figure 4, $t p = ( x p , y p , z p ) ⊤$ is the translation vector from $Σ h$ to the tip of the tablet pen. $θ h$ represents the rotation angle of the z-axis in $Σ h$. By combining Equations (5) and (7), the final equation is
$x m bi ′ y m bi ′ z m bi ′ = x m bi y m bi z m bi − x ′ y ′ z ′ .$

#### 3.3. Transformation from $Σ t$ to $Σ b$

All positions of the black dots displayed on the tablet computer can be transformed from $Σ t$ to $Σ b$. The position in px should be converted to mm as the first step using the following equation:
$l mm = l px × 25.4 PPI × s ,$
where $l mm$, $l px$, PPI, and s are the converted result in mm, position in pixel, pixel per inch, and the display scale of the tablet computer, respectively. The PPI and s depend on a used tablet computer. The transformation from $Σ t$ to $Σ b$ is
$x m bt y m bt z m bt = b T t x m t y m t 0 1 ,$
where $( x m bt , y m bt , z m bt ) ⊤$ is the transformed result of the mth black dot, which is converted from px to mm ($( x m t , y m t , 0 ) ⊤$) using Equation (9), in $Σ b$. $t t = ( x t , y t , z t ) ⊤$ is the translation vector from $Σ t$ to $Σ b$. $α t$, $β t$, and $γ t$ are rotation angles of the x, y, and z axes of $Σ t$, respectively.

#### 3.4. Representation by DH Method

The relationships of each coordinate system can be represented by the Denavit–Hartenberg (DH) method [21]. In contrast to the HTMs with the six-DoF mentioned above, four parameters $a n − 1$, $α n − 1$, $d n$, and $θ n$ are used in this method. Here, let $x n$ and $z n$ be the x and z axes of the nth link, respectively. These four parameters, respectively, denote the length of a common normal between the $n − 1$th and the nth links (link length), the angle of rotation around the $x n − 1$ from the $z n − 1$ to $z n$ (link twist), the distance from the intersection of $x n − 1$ and $z n$ to the origin of the ith link’s frame (link offset), and the angle of rotation around the $z n$ from the $x n − 1$ to $x n$ (joint angle). Using this method, Equation (5) can be rewritten.
$x m bi y m bi z m bi 1 = b T h h T c DH R x y z α c , β c , 90 + γ c t i ,$
$h T c DH = cos 90 + θ 1 c − sin 90 + θ 1 c 0 a 1 c sin 90 + θ 1 c cos 90 + θ 1 c 0 0 0 0 1 d 1 c 0 0 0 1 .$
The $h T c DH$ is an HTM in the DH method that represents the relationship between $Σ h$ and $Σ c$. The $b T h$ can also be represented by the DH method; however, it can be acquired from the robot controller and it is considered a known quantity. The $R x y z$ is a rotation matrix for each axis in the 3D space. Similarly, the relationship between the $Σ b$ and $Σ t$ can be rewritten as follows.
$x m bt y m bt z m bt = b T t DH R x y z α t , β t , γ t x m t y m t 0 1 ,$
$b T t DH = cos θ 1 t − sin θ 1 t 0 a 1 t − sin θ 1 t − cos θ 1 t 0 0 0 0 − 1 d 1 t 0 0 0 1 .$
In addition to this approach, many other methods that represent the relationships of the coordinate system have been reported. In this study, a comparison of representations used by the 6-DoF HTM and DH methods was focused on.

#### 3.5. Parameters to Be Optimized

The known and unknown parameters that should be optimized are listed (Table 1). In Equation (1), $( x h , y h , z h ) ⊤$ are known because they can be obtained from the robot controller. In Equation (2), $( x c , y c , z c ) ⊤$ can be approximately measured by hand. However, the manual measurement has an error and affects the final positioning error of the robot hand. Thus, they were optimized in this study. Because the robot hand is operated by causing the pen and the tablet display to touch each other, the optimization of $z c$ is unnecessary. In the same equation, $α c$, $β c$, and $γ c$ are optimized. In Equation (3), $f x$, $f y$, $c x$, and $c y$ are known because they can be obtained from the software development kit (SDK) of the RGB-D camera. $z m c$ is also known because the camera can measure the distance. In Equation (6), $θ h$ is known because a user sets the angle to rotate the hand. $( x p , y p , z p ) ⊤$ can be measured by hand; however, they should be optimized because of the above reasons. Similarly, $z p$ can be ignored. In Equation (11), $α t$, $β t$, $γ t$, $x t$, $y t$, and $z t$ are unknown. However, $z t$ can be ignored. Therefore, optimizing the 12 parameters is necessary for the six-DoF HTMs. For the DH method, the parameters of $θ 1 c$, $a 1 c$, $θ 1 t$, and $a 1 t$ must be optimized. The $d 1 c$ and $d 1 t$ can be ignored for the same reason regarding $z c$, $z p$, and $z t$.

#### 3.6. Two-Stage Optimization

To optimize the 12 unknown parameters and further minimize the positioning error of the robot hand, the developed method introduces a two-stage optimization. In the first stage, the 12 parameters are optimized based on the mathematical model described in Section 3.2. Using the optimized parameters, the positions of the black dots in $Σ i$ (Figure 3a) can be converted to $Σ b$ (Figure 3b). Thus, the robot hand with the tablet pen can touch the black dots of the tablet display. To further minimize the error, affine transformation-based optimization is introduced in the second stage.

#### 3.6.1. First Optimization

Many optimization algorithms can be used. In this study, differential evolution (DE) [22] is adopted because of its ease of use. In DE, search points in a search space are referred to as individuals. Each individual includes a set of optimized parameters encoded as a vector. After the fitness of each individual is calculated using a fitness function, new individuals are generated for the next generation based on the calculated fitness, mutation, and crossover strategies. By iterating these procedures, the individuals gradually converge to an optimal solution.
In this study, the following equations are used for the fitness function.
$F 1 st = f 1 1 st + f 2 1 st ,$
$f 1 1 st = ∑ m = 0 8 ( x m bi − x m bt ) 2 + ( y m bi − y m bt ) 2 9 ,$
$f 2 1 st = ∑ m = 0 8 ( x m bi ′ − x m r ′ ) 2 + ( y m bi ′ − y m r ′ ) 2 9 ,$
$x m r ′ y m r ′ 0 0 = x m r y m r 0 0 − b T t Δ x m r Δ y m r 0 0 ,$
where $F 1 st$ is the fitness function, and it consists of $f 1 1 st$ and $f 2 1 st$. $f 1 1 st$ represents the mean Euclidean distance of the transformed nine black dots from $Σ i$ to $Σ b$ by Equation (5) and from $Σ t$ to $Σ b$ by Equation (10). This function is set because the transformed black dots by the different HTMs should match each other in $Σ b$ if the unknown parameters in Equations (2) and (11) are optimized correctly.
To optimize the remaining unknown parameters in Equation (6), $f 2 1 st$ is introduced. If all unknown parameters are optimized correctly, $x m bi ′$ and $y m bi ′$, which are calculated from Equation (8), as well as the hand positions in $Σ b$ (Figure 3b), should match each other. However, when the data of the hand positions are the generated errors that occur, the generated data cannot be used as the perfect ground truth. The reason these errors occur is the difficulty in operating the robot hand manually to ensure that each center of the displayed landmarks and the tip of the tablet pen touch each other perfectly (with no distance error). To minimize the error as much as possible in the optimization process, Equation (20) is introduced. Let $x m r$ and $y m r$ be the created mth position of the robot hand in $Σ b$, where the pen and the mth black dot touch each other with a small distance error. $Δ x m r$ and $Δ y m r$ are the distance errors between the mth landmark and the touched position in $Σ t$. They can be obtained easily from the tablet computer in px. The unit can be converted to mm using Equation (9). This conversion is necessary before applying Equation (20).

#### 3.6.2. Second Optimization

By the first optimization, a good calibration result of the six-DoF HTMs is confirmed, as shown in Figure 6. To further minimize the error, affine transformation matrices are optimized in the second stage to match the two data distributions more. For this purpose, the following fitness function is set:
$F 2 nd = f 1 2 nd + f 2 2 nd ,$
$f 1 2 nd = ∑ m = 0 8 ( A 1 ( x m bi ) − x m bt ) 2 + ( A 1 ( y m bi ) − y m bt ) 2 9 ,$
$f 2 2 nd = ∑ m = 0 8 ( A 2 ( x m bi ′ ) − x m r ′ ) 2 + ( A 2 ( y m bi ′ ) − y m r ′ ) 2 9 ,$
$A n ( x tgt ) A n ( y tgt ) 0 = 1 0 X n 0 1 Y n 0 0 1 1 0 X n cr 0 1 Y n cr 0 0 1 cos ( θ n ) − sin ( θ n ) 0 sin ( θ n ) cos ( θ n ) 0 0 0 1 S n x 0 0 0 S n y 0 0 0 1 1 0 − X n cr 0 1 − Y n cr 0 0 1 x tgt y tgt 0 ,$
where $F 2 nd$ is the fitness function, which consists of $f 1 2 nd$ and $f 2 2 nd$. They are almost the same as $f 1 1 st$ and $f 2 1 st$. The difference is that affine transformation for a target position ($( A n ( x tgt ) , A n ( y tgt ) ) ⊤$) is introduced. Let $X n$ and $Y n$ be the amounts of translation of the x and y axes, respectively. $X n cr$ and $Y n cr$ represent the centers of rotation of the x and y axes, respectively. In this study, the position of the fourth black dot was the center of rotation. $θ n$ is the angle of rotation. $S n x$ and $S n y$ are the scaling factors of the x and y axes, respectively.
The unknown parameters to be optimized are $X n$, $Y n$, $θ n$, $S n x$, and $S n y$. Because $n ∈ 1 , 2$ ($A 1$ and $A 2$), ten parameters should be optimized to match the two data distributions as much as possible. Figure 7 shows examples using the optimized affine transformation matrices and the six-DoF HTMs. The error decreases in both results. In the experiments, this effectiveness was evaluated quantitatively.

## 4. Experiment

#### 4.1. Used Robot and Devices

In the experiments, a six-axis robot (DENSO VP-6242), a tablet computer (Microsoft Surface Pro 7), a tablet pen (Surface Pen), and an RGB-D camera (Intel SR300) were used. The positional repeatability of the robot was $± 0.02$ mm [20]. The resolution of the tablet was 267 ppi.

#### 4.2. Data Creation

For the two-stage optimization, positions of the nine ($m ∈ [ 0 , 8 ]$) displayed black dots in $Σ i$ ($( u m , v m )$) and the corresponding positions of the robot hand in $Σ b$ ($( x m r , y m r )$) are necessary. To create both data, first, nine black dots were displayed with one pixel (Figure 8a). Second, the tablet display was captured by the attached RGB-D camera. Third, the captured image was binarized, as shown in Figure 8b. Because one blob with a few pixels was obtained for each dot, the averaged coordinates are used as $( u m , v m )$. Moreover, the corresponding depth data ($z m c$) of $( u m , v m )$ were acquired from the RGB-D camera.
Subsequently, the robot hand was operated such that the tip of the pen touched the displayed dots to create data $( x m r , y m r )$ as the ground truth. At this time, the touching error $( Δ x m r , Δ y m r )$ was obtained from the tablet computer to compensate for the error, as described in Section 3.6.1. Table 2 provides the created data.

#### 4.3. Set Values for Known Parameters

Table 3 presents the set values for the known parameters. $( x h , y h , z h )$ represent the initial position of the robot hand to capture the tablet display. This position was determined by the author. $f x$, $f y$, $c x$, and $c y$ were acquired from the SDK of the RGB-D camera [19]. $θ h$ is the rotation angle of the z-axis in $Σ h$ to touch each displayed dot.

#### 4.4. Setup for DE

Table 4 provides the set values for the hyperparameters of the DE. Let N and G be the population and the generation sizes, respectively. To avoid premature convergence, a sufficiently large size was set. As the crossover probability (CR) and the scaling factor (F), 0.9 and 0.5 were set, respectively. The binomial crossover and DE/rand/1 were adopted for the crossover and mutation strategies, respectively. Because the DE performance depended on a random seed, five different random seeds were used and compared in the two-stage optimization. Table 5 and Table 6 present the search ranges of the 12 optimized parameters for the 6-DoF HTMs and the DH method.

## 5. Results and Consideration

#### 5.1. First-Stage Optimization

Table 7 summarizes the optimization results of the six-DoF HTMs using the five different random seeds. There are signed and unsigned values in $α t$ and $β t$ although all $F 1 st$ are the same. Hence, this optimization problem is multimodal. Because DE can exhibit good performance in a multimodal problem [22], using this algorithm is reasonable. All $f 1 1 st$ and $f 2 1 st$ were 0.64 and 1.04 mm, respectively. The absence of any error is attributed to the poor representation of the developed mathematical model or the inclusion of the measurement error in the depth information ($z m c$).
Table 8 describes the optimization results of the DH method. Similar to the result of the six-DoF HTMs, all $F 1 st$ were the same. However, the values of some parameters were not identical. Thus, this optimization problem was also multimodal. The acquired values of $F 1 st$ were larger than the results of the six-DoF HTMs because the DH parameters were ill-conditioned. According to reference [21], adjacent joint axes of a real robot and end-effector are not perfectly parallel in practice owing to manufacturing tolerances and various types of errors. Therefore, the link length ($a n − 1$) can become extremely large. However, owing to the difficulty of precise prediction, the adjacent joint axes were assumed to be perfectly parallel in this experiment. This could cause $F 1 st$ values that are larger than those of the six-DoF HTMs.
Using the optimized values of seed 1 of the 6-DoF HTMs, the mean touching error was measured by making the robot hand touch all displayed dots. Table 9 presents the result. Equation (9) with $s = 2$ and $PPI = 267$ was used to convert the px to mm because Microsoft Surface Pro 7 was used. A mean touching error of 1.25 mm was achieved.

#### 5.2. Second-Stage Optimization

Using the optimized parameters of seed 1 in the first-stage optimization, parameters for the two affine transformation matrices are optimized in the second-stage optimization. Table 10 summarizes the results for the six-DoF HTMs. Because the $F 2 nd$ decreased compared to $F 1 st$, the second optimization contributes to reducing the error. Although different random seeds are set, all results are identical. Hence, the possibility of premature convergence is low, showing that these optimization results are reliable.
Table 11 presents the results of the DH method. Similar to the six-DoF HTMs, all values are the same. Because the result of the first-stage optimization is worse, the result of the second-stage optimization was also worse.
DH method represents a relationship of reference frames using four parameters, whereas HTM, which is often used in hand–eye calibration, uses six. Thus, the lower computational cost of the DH method is among its notable advantages. However, it does involve a few disadvantages as mentioned in the reference [21]. As described above, DH parameters have ill-conditioned behavior because the link length becomes extremely large when adjacent joint axes are not perfectly parallel. Moreover, link frames must be assigned such that valid DH parameters exist and arbitrary assignment is impossible. In contrast, six-DoF HTMs can be assigned. Thus, they are easy to use. As shown in the results of the two-stage optimizations, six-DoF HTMs achieve better results. Thus, this representation is suitable for hand–eye calibration.
Using all optimized parameters for the six-DoF HTMs, the mean touching error is measured. Hand positions to touch are calculated using the below equations.
$x m bi ′ y m bi ′ = A 2 ( A 1 ( x m bi ) − x ′ ) A 2 ( A 1 ( y m bi ) − y ′ ) .$
Because the robot hand always touches the tablet display, the calculation of $x m bi ′$ is unnecessary. Table 12 presents the results. Compared to the previous result, the mean touching error decreases. Thus, the affine transformation-based error minimization is effective. Because the mean touching errors in all trial numbers decrease, this method is stable.

## 6. Conclusions

In this study, a method that has the advantages of a traditional hand–eye calibration approach and a neural network-based approach is proposed. Simple mathematical models of six-DoF HTMs and the DH method were developed and optimized using the data, which were created using the attached end-effector. Therefore, the proposed method can provide numerical and quantitative results to users and researchers. Additional calibration for the end-effector can be avoided, although there is no CAD model because the data are created using the attached end-effector. Two-stage optimization was introduced to optimize the mathematical models. In the first-stage optimization, 12 parameters of the transformation matrices, which converted a position in the image coordinate system to that in the robot-based coordinate system to touch, were optimized. To further minimize the error, ten parameters of the two affine transformation matrices were optimized in the second-stage optimization. Using these optimized parameters, a mean touching error of 1.0 mm was achieved by the six-DoF HTMs. Because the proposed method can optimize the mathematical model using the data generated by the attached end-effector without detailed information, such as CAD diagrams, this method incorporates the advantages of both approaches, in contrast to conventional systems.
For smaller errors, developing a new method will be a future research direction. Additionally, similar to existing calibration methods, the proposed method requires recalibration if a different end-effector with different dimensions is attached. As this is a tedious process, a method that utilizes the first calibration result must be developed in future research to reduce the efforts required to perform the recalibration procedure.

## Funding

This research received no external funding.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

1. Lin, W.; Liang, P.; Luo, G.; Zhao, Z.; Zhang, C. Research of Online Hand-Eye Calibration Method Based on ChArUco Board. Sensors 2022, 119, 3805. [Google Scholar]
2. Hua, J.; Zeng, L. Hand-Eye Calibration Algorithm Based on an Optimized Neural Network. Actuators 2021, 10, 85. [Google Scholar] [CrossRef]
3. Enebuse, I.; Foo, M.; Ibrahim, B.S.K.K.; Ahmed, H.; Supmak, F.; Eyobu, O.S. A Comparative Review of Hand-Eye Calibration Techniques for Vision Guided Robots. IEEE Access 2021, 9, 113143–113155. [Google Scholar] [CrossRef]
4. Jiang, J.; Luo, X.; Luo, Q.; Qiao, L.; Li, M. An Overview of Hand-Eye Calibration. Int. J. Adv. Manuf. Technol. 2022, 22, 77–97. [Google Scholar]
5. Shiu, Y.C.; Ahmad, S. Calibration of wrist-mounted robotic sensors by solving homogeneous transform equations of the form AX = XB. IEEE Trans. Robot. Autom. 1989, 5, 16–29. [Google Scholar]
6. Motai, Y.; Kosaka, A. Hand-Eye Calibration Applied to Viewpoint Selection for Robotic Vision. IEEE Trans. Ind. Electron. 2008, 55, 3731–3741. [Google Scholar]
7. Tsai, R.Y.; Lenz, R.K. A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Robot. Autom. 1989, 5, 345–358. [Google Scholar]
8. Wang, C.C. Extrinsic calibration of a vision sensor mounted on a robot. IEEE Trans. Robot. Autom. 1992, 8, 161–175. [Google Scholar]
9. Park, F.C.; Martin, B.J. Robot sensor calibration: Solving AX = XB on the Euclidean group. IEEE Trans. Robot. Autom. 1994, 10, 717–721. [Google Scholar] [CrossRef]
10. Ma, S.D. A self-calibration technique for active vision systems. IEEE Trans. Robot. Autom. 1996, 12, 114–120. [Google Scholar]
11. Daniilidis, K. Hand-Eye Calibration Using Dual Quaternions. Int. J. Robot. Res. 1999, 18, 286–298. [Google Scholar]
12. Horaud, R.; Dornaika, F. Hand-Eye Calibration. Int. J. Robot. Res. 1995, 14, 195–210. [Google Scholar] [CrossRef]
13. Andreff, N.; Horaud, R.; Espiau, B. Robot Hand-Eye Calibration using Structure from Motion. Int. J. Robot. Res. 2001, 20, 228–248. [Google Scholar]
14. Zhao, Z. Hand-eye calibration using convex optimization. In Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 2947–2952. [Google Scholar]
15. Heller, J.; Havlena, M.; Pajdla, T. Globally Optimal Hand-Eye Calibration Using Branch-and-Bound. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1027–1033. [Google Scholar] [PubMed]
16. Zhuang, H.; Roth, Z.; Sudhakar, R. Simultaneous robot/world and tool/flange calibration by solving homogeneous transformation equations of the form AX = YB. IEEE Trans. Robot. Autom. 1994, 10, 549–554. [Google Scholar] [CrossRef]
17. Mišeikis, J.; Glatte, K.; Elle, O.J.; Torresen, J. Automatic Calibration of a Robot Manipulator and Multi 3D Camera System. In Proceedings of the IEEE/SICE International Symposium on System Integration, Sapporo, Japan, 13–15 December 2016; pp. 735–741. [Google Scholar]
18. Koide, K.; Menegatti, E. General Hand-Eye Calibration Based on Reprojection Error Minimization. IEEE Robot. Autom. Lett. 2019, 4, 1021–1028. [Google Scholar] [CrossRef]
19. Projection in Intel RealSense SDK 2.0. Available online: https://dev.intelrealsense.com/docs/projection-in-intel-realsense-sdk-20 (accessed on 19 January 2023).
20. DENSO ROBOT USER MANUALS. Available online: http://eidtech.dyndns-at-work.com/support/RC8_Manual/005929.html (accessed on 19 January 2023).
21. Lynch, K.M.; Park, F.C. Modern Robotics: Mechanics, Planning, and Control; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
22. Das, S.; Suganthan, P.N. Differential Evolution: A Survey of the State-of-the-Art. IEEE Trans. Evol. Comput. 2011, 15, 4–31. [Google Scholar]
Figure 1. (a) Appearance of the developed system; (b) details of the end-effector.
Figure 1. (a) Appearance of the developed system; (b) details of the end-effector.
Figure 2. Data processing procedure.
Figure 2. Data processing procedure.
Figure 3. (a) Positions of landmarks in the image coordinate system; (b) positions of the hand in the robot-based coordinate system.
Figure 3. (a) Positions of landmarks in the image coordinate system; (b) positions of the hand in the robot-based coordinate system.
Figure 4. Coordinate systems. Red, green, and blue arrows represent x, y, and z axes, respectively. DENSO VP-6242 robot [20] is used.
Figure 4. Coordinate systems. Red, green, and blue arrows represent x, y, and z axes, respectively. DENSO VP-6242 robot [20] is used.
Figure 5. The free body diagram of the system.
Figure 5. The free body diagram of the system.
Figure 6. (a) Transferred landmarks from $Σ i$ to $Σ b$ and $Σ t$ to $Σ b$ using optimized parameters, respectively. (b) Output hand positions using Equation (8) with optimized parameters and compensated ground truth using Equation (20).
Figure 6. (a) Transferred landmarks from $Σ i$ to $Σ b$ and $Σ t$ to $Σ b$ using optimized parameters, respectively. (b) Output hand positions using Equation (8) with optimized parameters and compensated ground truth using Equation (20).
Figure 7. Result examples by optimized affine transformation matrices: (a) transferred positions of black dots. (b) Output hand positions using Equation (8) and compensated ground truth using Equation (20).
Figure 7. Result examples by optimized affine transformation matrices: (a) transferred positions of black dots. (b) Output hand positions using Equation (8) and compensated ground truth using Equation (20).
Figure 8. (a) Captured image of the tablet display by RGB-D camera; (b) binarized image to detect displayed black dots.
Figure 8. (a) Captured image of the tablet display by RGB-D camera; (b) binarized image to detect displayed black dots.
Table 1. Known and unknown (should be optimized) parameters.
Table 1. Known and unknown (should be optimized) parameters.
Equation NumberKnownUnknown (Six-DoF HTM)Unknown (DH Method)
(1)$x h , y h , z h$
(2) $x c , y c$, $α c$, $β c$, $γ c$
(3)$f x$, $f y$, $c x$, $c y$, $z m c$
(6)$θ h$$x p$, $y p$$x p$, $y p$
(11) $α t$, $β t$, $γ t$, $x t$, $y t$
(13) $α c$, $β c$, $γ c$
(14) $θ 1 c$, $a 1 c$
(15) $α t$, $β t$, $γ t$
(16) $θ 1 t$, $a 1 t$
Table 2. Created data for two-stage optimization.
Table 2. Created data for two-stage optimization.
m$( u m , v m )$ in $Σ i$ [px]$z m c$ in $Σ c$ [mm]$( x m r , y m r )$ in $Σ b$ [mm]$( Δ x m r , Δ y m r )$ in $Σ t$ [px]
0(119, 105)168.2(289, 1)(−3, 1)
1(339, 108)168.6(347, 8)(−2, 3)
2(558, 109)167.7(403, 14)(−1, 5)
3(556, 246)167.5(395, −20)(−1, 4)
4(338, 244)169.2(327, −17)(−3, −1)
5(119, 242)168.7(257, −16)(3, 3)
6(118, 379)168.6(249, −59)(2, 0)
7(337, 379)170.2(305, −67)(−1, 3)
8(554, 381)168.9(363, −77)(1, 2)
Table 3. Set values for known parameters.
Table 3. Set values for known parameters.
ParameterValue
$( x h , y h , z h )$(320, −70, 290)
$( f x , f y )$(617.7, 617.7)
$( c x , c y )$(316.5, 242.3)
$θ h$$22.5 × m$
Table 4. Set values for DE.
Table 4. Set values for DE.
HyperparameterValue
N10,000
G10,000
CR0.9
F0.5
Crossover strategyBinomial crossover
Mutation strategyDE/rand/1
Table 5. Search ranges of optimized parameters for six-DoF HTMs.
Table 5. Search ranges of optimized parameters for six-DoF HTMs.
 Parameter $x c$ $y c$ $α c$ $β c$ $γ c$ Search range [−20, 20] [20, 50] [−30, 30] [−30, 30] [−30, 30] Parameter $x p$ $y p$ $α t$ $β t$ $γ t$ $x t$ $y t$ Search range [−20, 20] [−40, −10] [−30, 30] [−30, 30] [−30, 30] [100, 200] [0, 100]
Table 6. Search ranges for the DH method.
Table 6. Search ranges for the DH method.
 Parameter $θ 1 c$ $a 1 c$ $α t$ $β t$ $γ t$ Search range [−30, 30] [20, 100] [−30, 30] [−30, 30] [−30, 30] Parameter $θ 1 t$ $a 1 t$ $α t$ $β t$ $γ t$ $x p$ $y p$ Search range [0, 90] [80, 200] [−45, 45] [−30, 30] [−30, 30] [−20, 20] [−40, −10]
Table 7. Optimization results of the six-DoF HTMs after first-stage optimization.
Table 7. Optimization results of the six-DoF HTMs after first-stage optimization.
Seed Number
12345
$F 1 st$1.681.681.681.681.68
$f 1 1 st$0.640.640.640.640.64
$f 2 1 st$1.041.041.041.041.04
$x c$−2.09−2.09−2.09−2.09−2.09
$y c$33.9933.9933.9933.9933.99
$α c$−0.46−0.46−0.46−0.46−0.46
$β c$0.500.500.500.500.50
$γ c$0.720.720.720.720.72
$x p$193.33193.33193.33193.33193.33
$y p$31.6231.6231.6231.6231.62
$α t$1.511.51−1.51−1.51−1.51
$β t$−11.93−11.9311.9311.9311.93
$γ t$−1.26−1.26−1.26−1.26−1.26
$x t$−0.46−0.46−0.46−0.46−0.46
$y t$−22.27−22.27−22.27−22.27−22.27
Table 8. Optimization results of the DH method after first-stage optimization.
Table 8. Optimization results of the DH method after first-stage optimization.
Seed Number
12345
$F 1 st$7.107.107.107.107.10
$f 1 1 st$4.864.864.864.864.86
$f 2 1 st$2.242.242.242.242.24
$θ 1 c$11.795.671.992.4513.22
$a 1 c$91.0291.0291.0291.0291.02
$α c$10.2510.2510.2510.2510.25
$β c$−0.51−0.51−0.51−0.51−0.51
$γ c$−13.07−6.95−3.27−3.73−14.50
$θ 1 t$20.182.573.346.110.45
$a 1 t$94.7994.7994.7994.7994.79
$α t$35.0935.0935.0935.09−35.09
$β t$−5.33−5.33−5.33−5.335.33
$γ t$−26.08−8.48−9.28−12.01−6.36
$x p$−0.86−0.86−0.86−0.86−0.86
$y p$−24.49−24.49−24.49−24.49−24.49
Table 9. Demonstration result after first-stage optimization.
Table 9. Demonstration result after first-stage optimization.
Trial Number
Mean Touching Error12345Average
in px6.476.806.626.386.646.58
in mm1.231.291.261.211.261.25
Table 10. Optimization results in second-stage optimization for the six-DoF HTMs.
Table 10. Optimization results in second-stage optimization for the six-DoF HTMs.
Seed Number12345
$F 2 nd$1.001.001.001.001.00
$f 1 2 nd$0.190.190.190.190.19
$f 2 2 nd$0.820.820.820.820.82
$X 1$−0.11−0.11−0.11−0.11−0.11
$Y 1$00000
$S 1 x$1.001.001.001.001.00
$S 1 y$1.021.021.021.021.02
$θ n$0.030.030.030.030.03
$X 2$0.060.060.060.060.06
$Y 2$−0.11−0.11−0.11−0.11−0.11
$S 2 x$1.021.021.021.021.02
$S 2 y$0.990.990.990.990.99
$θ 2$−0.03−0.03−0.03−0.03−0.03
Table 11. Optimization results after second-stage optimization for the DH method.
Table 11. Optimization results after second-stage optimization for the DH method.
Seed Number12345
$F 2 nd$4.274.274.274.274.27
$f 1 2 nd$1.731.731.731.731.73
$f 2 2 nd$2.542.542.542.542.54
$X 1$−0.18−0.18−0.18−0.18−0.18
$Y 1$−0.28−0.28−0.28−0.28−0.28
$S 1 x$1.031.031.031.031.03
$S 1 y$0.890.890.890.890.89
$θ n$3.773.773.773.773.77
$X 2$0.410.410.410.410.41
$Y 2$2.582.582.582.582.58
$S 2 x$0.980.980.980.980.98
$S 2 y$1.141.141.141.141.14
$θ 2$−4.51−4.51−4.51−4.51−4.51
Table 12. Demonstration results after second-stage optimization.
Table 12. Demonstration results after second-stage optimization.
Trial Number
Mean Touching Error12345Average
in px5.145.375.185.395.395.30
in mm0.981.020.991.031.031.01
 Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## Share and Cite

MDPI and ACS Style

Sato, J. Hand–Eye Calibration Using a Tablet Computer. Math. Comput. Appl. 2023, 28, 22. https://doi.org/10.3390/mca28010022

AMA Style

Sato J. Hand–Eye Calibration Using a Tablet Computer. Mathematical and Computational Applications. 2023; 28(1):22. https://doi.org/10.3390/mca28010022

Chicago/Turabian Style

Sato, Junya. 2023. "Hand–Eye Calibration Using a Tablet Computer" Mathematical and Computational Applications 28, no. 1: 22. https://doi.org/10.3390/mca28010022