Next Article in Journal
Digital-Twin-Based System for Foam Cleaning Robots in Spent Fuel Pools
Next Article in Special Issue
The Efficiency of YOLOv5 Models in the Detection of Similar Construction Details
Previous Article in Journal
The Effect of Brewing Time on the Antioxidant Activity of Tea Infusions
Previous Article in Special Issue
Transmission Tower Re-Identification Algorithm Based on Machine Vision
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Robust Texture-Less Gray-Scale Surface Matching Method Applied to a Liquid Crystal Display TV Diffuser Plate Assembly System

1
Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China
2
Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
3
Institutes of Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China
4
University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(5), 2019; https://doi.org/10.3390/app14052019
Submission received: 10 January 2024 / Revised: 16 February 2024 / Accepted: 26 February 2024 / Published: 29 February 2024
(This article belongs to the Special Issue Computer Vision in Automatic Detection and Identification)

Abstract

:
In most liquid crystal display (LCD) backlight modules (BLMs), diffuser plates (DPs) play the essential role in blurring the backlight. A common BLM consists of multiple superimposed optical films. In a vision-based automated assembly system, to ensure sufficient accuracy, each of the multiple cameras usually shoots a local corner of the DP and jointly estimates the target pose, guiding the robot to assemble the DP on the BLM. In general, DPs are typical of texture-less objects with simple shapes. Due to the image background of superimposed multilayer optical films, the robustness of the most common detection methods must be improved to meet industrial needs. To solve the above problem, a texture-less surface matching method based on gray-scale images was proposed. An augmented and normalized gray-scale vector represents the texture-less gray-scale surface in a low-dimensional space. The cosine distance is then used to calculate the similarity between the template and matching vectors, combined with shape-based matching (SBM); the proposed method can obtain high robustness when detecting DPs. An image database from actual production lines was used in the experiment. In comparative tests with the NCC, SBM, YOLOv5s, and YOLOv5x methods, our proposed method had the best precision at all confidence thresholds. Although recall was slightly inferior to SBM, the comprehensive evaluation F1-Score reached 0.826, significantly outperforming the other methods. Regarding localizing accuracy, our algorithm also performed best, reaching 5.7 pixels. Although the time consumption of a single prediction is about 0.6 s, it can still meet industrial needs. These experimental results show that the proposed method has high robustness in detecting DPs and is especially suitable for vision-based automatic assembly tasks in BLM.

1. Introduction

Robot vision automatic assembly technology dramatically improves the production efficiency of industrial assembly lines. Electronics factories require a high production rhythm. Low labor involvement and high automation will help manufacturers increase production efficiency. In the automatic assembly of liquid crystal display (LCD) TV modules, the assembly of diffuser plates (DPs) is necessary. LCD TVs generally comprise two main parts, namely the backlight module (BLM) and LCD module (LCM). Because most LCMs cannot emit light themselves, the BLM undertakes the role of lighting the LCD. As shown in Figure 1, a common BLM mainly comprises a back plate, light source strips, a light guide plate, and a piece of DP from bottom to top. In the assembly process, robots placed at different stations on the assembly line must complete tasks such as placing the back plates, assembling light source strips, placing the light guide plates, and assembling the DPs. When assembling a piece of DP on a specific BLM, the group of vacuum suction cups mounted on the end effect of the robot blindly grasps the DP placed at the fixed feed position and moves directly below the cameras for vision-based localizing. In order to ensure sufficient accuracy, each of the multiple cameras shoots a local corner of the DP, and together, they estimate the pose.
A DP is a flat film made of polymethyl methacrylate material with some roughness on the surface, and has the effect of diffusing light. The light will scatter on the surface, spreading softly and evenly. This can be approximately considered to conform to the Lambert diffuse model. A DP can be considered a typical texture-less object. When the vision system begins detecting, the observed foreground comprises a gray-scale homogenous surface and a right-angled edge with low gradient amplitude. As the foreground overlaps the back plate and superimposed optical film, many classical vision detection methods show poor robustness and low locating accuracy.
Classical texture-less object detection methods can be divided into template-based matching and learning-based according to the method of obtaining features. The learning-based method generally automatically extracts features that can be used online through a large sum of training images. In contrast, template-based matching methods mainly use handcrafted features to build templates that participate in matching.
There are two main types of learning-based methods: traditional machine learning and deep learning. Hough forest [1,2], as a machine learning method, integrates Random Forest [3] and Hough voting. The image patch representations employed by this method are typically based on descriptors like edge filters, or HOG [4], which have limitations in detecting texture-less surface features. The Deformable Part Model (DPM) [5] represents an object as a configuration of multiple flexible parts. It comprises a root filter and several part models, which undergo processing by the SVM [6] classifier. While this method excels in handling deformed targets, it may not be the most optimal for diffuser plates.
Deep learning methods, such as RetinaNet [7] and YOLO [8,9,10], are convolutional neural network (CNN)-based object detection frameworks. RetinaNet takes ResNet [11] as its backbone and has a Feature Pyramid Network (FPN) structure [12], and can merge multi-scale features together so that it can perform well in detecting objects of different sizes. YOLO is a series of classical one-stage object detection approaches; each version focuses on lightweight design and excellent engineering, making it especially useful in industrial applications. However, it is a single-stage detector with relatively poor localizing accuracy. Compared with CNN-based methods, transformer-based methods such as Vision Transformer (VIT) [13] and Swin Transformer [14] usually have higher model complexity and slower training speed. Detection Transformer (DETR) [15,16] utilizes a CNN backbone to extract local features and an encoder–decoder module to learn global features. The main drawbacks of deep learning-based methods are that they require a large sum of training images, a lengthy training time, more computing resources, and sometimes even high-performance GPUs, which are expensive in industrial applications and make the system more vulnerable in challenging physical environments.
The template-based method has lower demands for training, and in most cases, not only is a single image enough, but the time cost is also low, making it still widely used in the industry. Normalized cross-correlation (NCC) [17] measures object similarity by comparing gray-value changes. It utilizes a single-image template, normalized through standard deviation, to calculate cosine distances with candidate vectors. However, this approach may mistakenly identify noise from the diffuser plate surface as a pattern, affecting detection robustness. The shape-based template matching algorithm (SBM) [18] operates the inner product of gradient direction vectors to detect the position of objects using edge features. It can achieve high accuracy and robustness, and international leaders such as Cognex, MvTec, and Keyence have developed several adaptations of this algorithm. However, the representations of SBM are entirely dependent on edge gradients. In the face of targets such as diffuser plates, surface features are completely ignored, and false detection is prone to occurring when there are similar objects in the scene. Hinterstoisser proposed LINEMod [19,20] to detect texture-less objects by combining two representations. Gradient spreading was introduced based on SBM for shape representation, strengthening the method’s robustness in coping with deformation. As for surface representation, it requires depth information or point cloud data, benefits from a 3D camera, and employs normal vectors to define features. The main problem of LINEMod is that it must be equipped with a 3D camera when dealing with texture-less surfaces. With a 2D camera, it degenerates into the LINE2D [19] algorithm, using only edge features.
Inspired by LINEMod, a representation method based on gray-scale images is proposed, which can be utilized to describe texture-less surfaces. When combined with SBM, this representation can robustly detect diffuser plates.

2. Proposed Method

2.1. Overview

Our proposed method consists of two phases—the offline phase and the online phase—as shown in Figure 2.
In the offline phase, the Canny algorithm was used to extract the edge of the one-pixel width in a specific training image and obtain the gradient orientations to build a shape template. Within the selected Region Of Interest (ROI), a second template was built from a feature vector of the texture-less gray-scale surfaces.
In the online phase, Sobel gradient detection is performed on the input image to acquire the gradient orientation image, on which the shape-based matching is first performed to achieve candidates. Next, texture-less gray-scale surface matching is performed on these candidates; the comprehensive similarity is then estimated under the hybrid representations. After the NMS process, the detection outputs are obtained.

2.2. Shape Similarity Measure

Although not unique in a cluttered background, the DP’s corner edges are still useful for localizing. The same shape metrics as those of SBM are employed. For each point participating in matching, the angle is calculated between the gradient vector in the template and, first, the corresponding point in the image, and then, the cosine, to obtain the point similarity. Finally, the similarity is summed up across the template to obtain a candidate score as follows:
F s h a p e ( x ) = 1 n i = 1 n cos ( θ θ i ¯ )
where  θ  and  θ i ¯  are the angles of the gradient vectors belonging to point i in the template and the corresponding one in the image, and n is the number of points in the template.
In this study, some implementation changes were made. The Canny algorithm was used to extract the one-pixel edge of the template image to obtain the gradient orientation image:
T ( u , v ) = 0.25 θ ( u , v ) + 360 ,
θ = arctan g u , g v .
Here,  g u  and  g v  are the gradient amplitudes obtained by applying the  2 × 2  Sobel operators over the image  I :
g u = 1 0 + 1 2 0 + 2 1 0 + 1 I ,
g v = 1 2 1 0 0 0 + 1 + 2 + 1 I .
Each point in  θ  is the original gradient orientation. The orientation values are squeezed from a range of 0–360 to 0–90 with a minimum resolution of 4 degrees, enabling their storage within a single-channel 8-bit image to obtain a gradient orientation image. When building templates offline, a 360 bias is added to the angle of each point in advance to ensure that it remains greater than zero when subtracting the angle of the matching point during online matching. As shown in Figure 3, in this way, the positive angle value can be directly utilized as an index for a 2D array to search entries in a look-up table (LUT). A value of 255 is also assumed to indicate the absence of gradient orientation for a point, indicating that the gradient amplitude is zero.
During template building, it is necessary to record the coordinate offset of each template edge point relative to the template center, the gradient angle of each point, and an LUT, which is essentially an array used to directly index the cosine value of the difference angle between two matching gradient vectors. When performing online matching, the template slides across the entire image at the center point and uses the coordinate offsets to find corresponding points for point-to-point matching. The cosine score of each point is then accumulated to obtain the similarity score of the target at the current position. Finally, the extreme similarity score is obtained for the entire image to determine the final detection position.

2.3. Texture-Less Surface Similarity Measure

The NCC method can measure the similarity of two gray-scale images. It treats the template of an image as a specific gray-scale vector of n-dimensional length. Each dimensional datapoint carries a gray-scale difference from the mean value, which is divided by the standard deviation to construct a normalized feature vector. The cosine distance of the two feature vectors is computed to obtain the matching score. As shown in Figure 4a, the representation used by NCC considers gray-scale changes to be patterns, so it is more suitable for detecting texture targets. Ideally, the gray value across the texture-less surface should be equal. However, it still changes to a certain extent due to varying lighting, the noise of the charge-coupled device (CCD), and the surface material. This change results in noise that cannot be used to represent the pattern of an object, but NCC will make this mistake regardless.
A pattern with a definite texture is easy to define; however, this is challenging in the case of a texture-less one. Considering the difficulty in representing an n-dimensional texture-less gray-scale surface using a same-size vector, a method of augmenting the vector was sought to express it in a higher dimension. Under the action of an augmented entry, a ruffled surface with noise can be flattened to a plane. As shown in Figure 4b, the augmented data increase the standard deviation, so the noise appears negligible after the template vector is normalized by the new standard deviation, allowing the augmented vector to characterize the surface as a texture-less one.

2.3.1. Global Representation

Let  p = q 1 , q 2 q n , 0 ]  be the original vector composed of the gray-scale points within the image. The desired  q n + 1  must be obtained such that the inner product score between the augmented vector  q = q 1 , q 2 q n , q n + 1 ]  and the normal vector  n = 0 , 0 0 , 1 ]  of the ideal plane is bigger than  τ score . As shown in Figure 5, the initially ruffled gray-scale surface is extended to an approximate plane under the action of  q n + 1 . For  q n + 1 , bigger is not better. The specific value needs to be found such that the similarity of the two vectors just meets the threshold  τ score , as an excessive  q n + 1  will flatten all textured surfaces.
The mean  μ p  and standard deviation  δ p  of p are
μ p = i = 1 n q i n + 1 , δ p = i = 1 n q i μ p 2 + μ p 2 n + 1 .
The mean  μ q  and standard deviation  δ q  of q are
μ q = n μ q ¯ + q n + 1 n + 1 , δ q = i = 1 n + 1 q i μ q 2 n + 1 .
Let  k q n + 1 = i = 1 n q i ; then,
μ q = λ q n + 1 , δ q = i = 1 n + 1 q i λ q n + 1 2 n + 1 , λ = k + 1 n + 1 .
There is a problem when finding  q n + 1 , whereby the gradient of the cosine function is subtle when the angle approaches zero; that is, the similarity function is insensitive to  q n + 1 , as the augmented vector and the ideal normal vector change from cross to parallel. Therefore, the solution is obtained by calculating the perpendicularity (6) of  q  and  n ¯ = n 1 , n 2 n n , 0 ] , where  | n ¯ | = 1 . Here,  n ¯  represents a specific vector in the  R n  plane.
T global ( x ) = n ¯ , q = i = 1 n ( q n μ p ) ( q n μ q ) + ( 0 μ p ) ( q n + 1 μ q ) δ p δ q ( n + 1 ) .
As shown in Figure 6, this global representation is suitable in the case of a vast change across the whole surface region. It is unsuitable for cases of local change, even if the change is dramatic, because the substation needs to be higher to affect the mean and variance.

2.3.2. Local Representation

Some image patches can have drastic local changes but be weak globally. Local change can involve an unwanted shape appearing at a certain position, but the mean and standard deviation have little impact. Therefore, a solution for these cases must be found.
For shape-based matching, if there is consistent gradient orientation between the image and the template at a certain point, that point can be assumed to be a part of that particular shape. However, it is important to discern whether or not a specific point belongs to a portion of an uncertain shape.
As shown in Figure 7, the gradient at point  ( x , y )  in the image is  g = g x , g y , and subsequently, the normalized augmented gradient is obtained:
g ¯ = g x , g y , g z μ , μ = g x 2 + g y 2 + g z 2 .
The inner product of  g ¯  and  v = 0 , 0 , 1  shows the strength of the texture around a point. The perpendicularity of  g ¯  and  v ˜  is utilized to determine  g z :
v ˜ = g x , g y , 0 γ , γ = g x 2 + g y 2 .
When processing an image, Sobel detection is performed first, and then, the n gradient vectors are acquired and augmented, where  g z  can be found as follows:
g z * = a r g m i n i = 1 n g x 2 + g y 2 μ γ τ g r a d .
where  τ g r a d  is set to 0.98.

2.4. Hybrid Similarity Measure

The comprehensive similarity is evaluated in the form of LINEMod, with modifications only made to the 3D surface component:
U ( x ) = λ F s h a p e ( x ) + ( 1 λ ) T g l o b a l ( x ) + T l o c a l m ( x ) .
where  T l o c a l m  only sums up the inner product of the top-m points, and m is the same size as the shape representation.

3. System Configuration

As shown in Figure 8, the system contains an ABB-IRB6700 6-axis industrial robot (1) and a suction cup matrix (2) installed on the end of the robot to grab the diffuser plate (5). The vision system mainly consists of a SPES-CISRTLU-LW01 computer and two Basler2500-16gm cameras with ring-shaped lights (3) installed on horizontal 2D mobile servo modules (4), which can adapt to different BLM (6) sizes. The Vision software (Version3.2.2) interface is shown in Figure 9.
The dissembling process consists of offline and online phases.
Offline phase: (a) Template building: One image is utilized from each camera to build a template of the DP corner. (b) Two cameras observe the robot grasping the DP and moving through translation and rotation once each. Through this process, the transformation from the image coordinate system to the robot coordinate system can be calibrated.
Online phase (Figure 10): (a) When the BLM finishes the previous process on the assembly line and stops at the current station, the robot receives the assembly signal from PLC and controls the servo module to carry the camera to the pre-calibrated position. (b) The cameras first acquire images and detect the outer corner of the BLM, and then jointly estimate the BLM pose; finally, the vision system sends the pose to the robot. (c) The robot blindly picks up the DP from the loading area and moves to a fixed position under the cameras; then, the cameras detect DP corners and jointly estimate the DP pose, which will be be sent to guide the robot to align the DP to the BLM. This aligning process will run several times until the deviation between the DP and the BLM is small enough.

4. Experiment

The proposed method was implemented in C++, and the software interface was programmed using QT5.7 on Windows 10. The computer’s CPU was an Intel Core i7. A dataset consisting of 1082 images with a resolution of 2592 × 1944 was collected from actual production lines. Details of the dataset can be seen in Table 1; the dataset was divided based on the size of the DP and the index of camera.
As shown in Figure 11, four methods were used in the comparative test, including template matching and deep learning. NCC and SBM were conducted using the machine vision software HALCON (version 21.05) [21], which is widely used in industry. YOLOv5s-v7.0 [22] was chosen due to its lightweight deep learning approach, ease of deployment at the edge, and suitability for industrial scenarios. The most efficient version (the “s” version) and the best performing version (the “x” version) of the parametric model were utilized, specifically YOLOv5s and YOLOv5x, respectively. Obtaining enough training samples before mass production is challenging in actual industrial scenarios. Data augmentation was employed for YOLOv5 to ensure fairness and achieve one-shot learning.
As shown in Figure 12, a cut-and-paste approach was utilized to paste objects onto ten distinct backgrounds, some of which are from the ITODD dataset [23] and some from industrial scenarios. Translation, small rotation, and lightness changes were performed, resulting in an augmented training set of size 1000 and a validation set of size 200.

4.1. Recall and Precision

The confidence threshold is a critical parameter that directly affects the detection result. A lower threshold increases the probability of detecting the target and the false detection rate. In this experiment, the confidence threshold was sampled from  30 %  to  90 %  at  5 %  intervals to count instances of recall and precision when the distance between the detection result and the labeled sample was less than 30 pixels. As is shown in Figure 13, when the threshold is below  50 % , the recall of the proposed method performs best; however, it declines slightly quicker than SBM as the threshold increases. Regarding precision, the proposed method is significantly better than SBM and NCC and is only slightly inferior to YOLOv5 in the threshold range of 30– 75 % .
SBM has the highest recall because it only uses edge features, which are sparser than those of other methods. Although it dramatically improves recall, it sacrifices precision, and many false positives appear in the test. As can be seen from Table 2, the false positive rate (FPR) of the proposed method is lower than that of SBM and NCC for most of the range. YOLOv5s and YOLOv5x have lower FPR and TPR at the same time, which means they have less positive outputs.
The F1-Score is a comprehensive evaluation metric that takes into account recall and precision, and in the proposed method, it is significantly better than in the other four methods, with a relatively significant drop occurring only when the threshold exceeds  90 % .

4.2. Localizing Accuracy

In this study, localizing accuracy  e accu  is defined as the pixel distance from the detected point  P det  relative to the labeled point  P lab :
e accu = min P det P lab , d thre .
The distance threshold  e accu  serves as an upper limit to prevent a single outlier value from excessively influencing the average positioning accuracy. This metric plays a pivotal role in determining the overall accuracy of the robot’s final assembly.
As can be seen in Table 3, with an upper limit of 30 pixels, the average pixel accuracy of the proposed algorithm is the best among all the tested algorithms. The results show that YOLOv5s and YOLOv5x performed better than SBM and NCC. As shown in Figure 14, YOLOv5 methods exhibit inconsistencies in the size of the detection bounding box and frequently demonstrate irregular shifts in the detection position near ground truth, potentially leading to a certain degree of reduced accuracy.

4.3. Computation Efficiency

Computational efficiency is also a critical performance metric in industrial applications. In the diffuser plate automatic assembling task, the average time required to finish the process is 30 s, and vision detection must be completed within 1.5 s. As can be seen in Table 4, the methods used in the test meet these requirements. Among the methods considered, the proposed approach tended to be relatively more time-consuming, excluding YOLOv5x. As can be seen in Table 3, SBM and NCC have the highest efficiency because HALCON is a professional piece of industrial machine vision software that performs a large number of parallel computing optimizations. Although its complexity is close to that of the proposed method, its time consumption is significantly smaller. Regarding YOLOv5 methods, a low-power NVIDIA GTX1650 graphics card was employed. Furthermore, inference tests were conducted exclusively using CPUs. The training time of YOLOv5 is slightly too long, which is a factor that considerably limits its application in industrial scenarios. All the methods used in the test meet the requirements regarding time consumption.

5. Discussion

The proposed method offers a representation tailored for texture-less gray-scale surfaces, with enhanced robustness in detecting diffuser plates when combined with SBM metrics. Nevertheless, its reliance solely on gray-scale values for surface feature extraction may not enable it to adequately capture the intricate textures of the target. In a certain degree, excessive brightness variations on the surface can undermine the method’s robustness. Additionally, this approach hinges on a singular template image, rather than learning representations from a diverse set of images. If there is a substantial discrepancy in gray-scale variance between the training template and the test images, the detection performance may still suffer to a certain extent.

6. Conclusions and Future Works

This paper provides a representation of the texture-less gray-scale surface of a diffuser plate so that a vision system can accurately detect a diffuser plate corner and assist a robot in performing an automatic assembly task for an LCD TV backlight module. The proposed method can accurately detect the corner of the diffuser plate against the cluttered background of the backlight module without the need to insert a customized plate between the foreground and background to block distractions or use a backlight source to enhance the foreground feature, which significantly improves the efficiency of the system without costing too much. In the experimental test, the proposed algorithm was compared to both deep learning-based methods and the traditional template matching method commonly employed in industry. All test images originated from practical production lines. The proposed method is significantly better than other methods in terms of its comprehensive evaluation F1-Score, and also performs best in terms of localizing accuracy. Although it could perform better in terms of inference time, it fully meets production requirements. In summary, the proposed method is especially suitable for the automatic assembly of diffuser plates for LCD TV backlight modules.
Although the proposed method was proposed for diffuser plate detection, it is inherently generic. In future work, we intend to continue refining the current method and adapt it to detect the most common texture-less workpieces in industrial scenes. Additionally, we plan to integrate CAD models with physical-based rendering techniques to achieve practical advancements and applications in template building that do not necessitate the utilization of genuine images.

Author Contributions

Conceptualization, S.L.; methodology, S.L.; software, S.L.; validation, S.L., F.Z. and Q.W.; formal analysis, S.L.; investigation, S.L. and Q.W; resources, F.Z. and Q.W.; data curation, S.L.; writing—original draft preparation, S.L.; writing—review and editing, S.L., F.Z. and Q.W.; visualization, S.L.; supervision, F.Z. and Q.W.; funding acquisition, F.Z. and Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. U171320067).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

We thank the anonymous reviewers for their valuable comments on our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gall, J.; Lempitsky, V. Class-specific Hough forests for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
  2. Gall, J.; Yao, A.; Razavi, N.; Van Gool, L.; Lempitsky, V. Hough Forests for Object Detection, Tracking, and Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2188–2202. [Google Scholar] [CrossRef] [PubMed]
  3. Breiman, L.; Cutler, R.A. Random Forests Machine Learning. J. Clin. Microbiol. 2001, 2, 199–228. [Google Scholar]
  4. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar]
  5. Sadeghi, M.A.; Forsyth, D. 30hz object detection with dpm v5. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 65–79. [Google Scholar]
  6. Andrews, S.; Tsochantaridis, I.; Hofmann, T. Support Vector Machines for Multiple-Instance Learning. Adv. Neural Inf. Process. Syst. 2003, 15, 561–568. [Google Scholar]
  7. Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Doll, P. Focal loss for dense object detection. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
  8. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  9. Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
  10. Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. In Proceedings of the Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
  11. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  12. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
  13. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
  14. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
  15. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
  16. Lv, W.; Xu, S.; Zhao, Y.; Wang, G.; Wei, J.; Cui, C.; Du, Y.; Dang, Q.; Liu, Y. Detrs beat yolos on real-time object detection. In Proceedings of the Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
  17. Han, B.; Mu, Z.; Le, X.; Jia, X.; Li, B. Fast recurrence algorithm for computing sub-Image energy using normalized cross correlation. Opt. Precis. Eng. 2018, 26, 2565–2574. [Google Scholar]
  18. Steger, C.; Occlusion, C. Illumination invariant object recognition. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2002, 34, 345–350. [Google Scholar]
  19. Hinterstoisser, S.; Cagniart, C.; Ilic, S.; Sturm, P.; Navab, N.; Fua, P.; Lepetit, V. Gradient response maps for real-time detection of textureless objects. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 876–888. [Google Scholar] [CrossRef] [PubMed]
  20. Hinterstoisser, S.; Lepetit, V.; Ilic, S.; Holzer, S.; Bradski, G.; Konolige, K.; Navab, N. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Proceedings of the Asian Conference on Computer Vision, Daejeon, Republic of Korea, 5–9 November 2013; pp. 548–562. [Google Scholar]
  21. MVTec. HALCON. 2021. Available online: https://www.mvtec.com/halcon/ (accessed on 10 January 2023).
  22. Ultralytics.YOLOv5-v7.0. Available online: https://github.com/ultralytics/yolov5 (accessed on 10 January 2023).
  23. Drost, B.; Ulrich, M.; Bergmann, P.; Hartinger, P.; Steger, C. Introducing mvtec itodd-a dataset for 3d object recognition in industry. In Proceedings of the International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 2200–2208. [Google Scholar]
Figure 1. The composition of a BLM.
Figure 1. The composition of a BLM.
Applsci 14 02019 g001
Figure 2. Flow of the proposed method. The arrows indicate the flow direction of the process.The blue and yellow blocks belong to the offline phase, the red and pink blocks belong to the online phase.
Figure 2. Flow of the proposed method. The arrows indicate the flow direction of the process.The blue and yellow blocks belong to the offline phase, the red and pink blocks belong to the online phase.
Applsci 14 02019 g002
Figure 3. Shape matching based on gradient orientation image. The green boxes correspond to the template area, the red box corresponds to the matching area, the blue box represents the gradient orientation difference area. The arrows indicate the flow direction of the method process.
Figure 3. Shape matching based on gradient orientation image. The green boxes correspond to the template area, the red box corresponds to the matching area, the blue box represents the gradient orientation difference area. The arrows indicate the flow direction of the method process.
Applsci 14 02019 g003
Figure 4. (a) NCC vector representation of texture-less surface; (b) augmented NCC vector representation of texture-less surface.
Figure 4. (a) NCC vector representation of texture-less surface; (b) augmented NCC vector representation of texture-less surface.
Applsci 14 02019 g004
Figure 5. Augmentation functions for the gray-scale surface. The green arrows represent the original vectors, the blue arrows represent the original components, the red arrows represent the augmented components, and the yellow arrow represents the augmented vector.
Figure 5. Augmentation functions for the gray-scale surface. The green arrows represent the original vectors, the blue arrows represent the original components, the red arrows represent the augmented components, and the yellow arrow represents the augmented vector.
Applsci 14 02019 g005
Figure 6. (a) Gray-scale changes are drastic and global; (b) local variation is drastic, but it is subtle from a global point of view. The green box represents the sampling area of grayscale surface.
Figure 6. (a) Gray-scale changes are drastic and global; (b) local variation is drastic, but it is subtle from a global point of view. The green box represents the sampling area of grayscale surface.
Applsci 14 02019 g006
Figure 7. Augmentation functions on gradient vector.
Figure 7. Augmentation functions on gradient vector.
Applsci 14 02019 g007
Figure 8. (a) Configuration of robot vision automatic assembling system; (b) captured image of DP corner. (1) ABB-IRB6700 6-axis industrial robot, (2) suction cup matrix, (3) Basler2500-16gm cameras with ring-shaped lights, (4) horizontal 2D mobile servo modules, (5) diffuser plate, (6) BLM.
Figure 8. (a) Configuration of robot vision automatic assembling system; (b) captured image of DP corner. (1) ABB-IRB6700 6-axis industrial robot, (2) suction cup matrix, (3) Basler2500-16gm cameras with ring-shaped lights, (4) horizontal 2D mobile servo modules, (5) diffuser plate, (6) BLM.
Applsci 14 02019 g008
Figure 9. Vision software interface. Green and yellow colors are used by the software to help the operator identify important information, green for correct information and yellow for warning information.
Figure 9. Vision software interface. Green and yellow colors are used by the software to help the operator identify important information, green for correct information and yellow for warning information.
Applsci 14 02019 g009
Figure 10. Online phase of assembling DP on BLM. The purple color represents input and output blocks, the blue color represents the blocks of robots or other mechanical mechanisms, the orange represents the blocks of the vision system, the green color represents a fork in the process.
Figure 10. Online phase of assembling DP on BLM. The purple color represents input and output blocks, the blue color represents the blocks of robots or other mechanical mechanisms, the orange represents the blocks of the vision system, the green color represents a fork in the process.
Applsci 14 02019 g010
Figure 11. Test methods used in the experiment. The green edges in the Proposed Method column represent the shape features, and the yellow boxes represent areas of texture-less features, the green box in the NCC column represents the template area, the green edges in the SBM column represent the shape features, the red boxes in the YOLOV5s and YOLOV5x columns represent the output bounding boxes and the red label in the last row means no object detected.
Figure 11. Test methods used in the experiment. The green edges in the Proposed Method column represent the shape features, and the yellow boxes represent areas of texture-less features, the green box in the NCC column represents the template area, the green edges in the SBM column represent the shape features, the red boxes in the YOLOV5s and YOLOV5x columns represent the output bounding boxes and the red label in the last row means no object detected.
Applsci 14 02019 g011
Figure 12. Augmented images for YOLOv5s and YOLOv5x. The first row of augmented images is for training YOLOv5s, while the second row is for training YOLOv5x. The green boxes indicate the labels used for training.
Figure 12. Augmented images for YOLOv5s and YOLOv5x. The first row of augmented images is for training YOLOv5s, while the second row is for training YOLOv5x. The green boxes indicate the labels used for training.
Applsci 14 02019 g012
Figure 13. Recall, precision, and F1-Score of the 5 test methods.
Figure 13. Recall, precision, and F1-Score of the 5 test methods.
Applsci 14 02019 g013
Figure 14. Cases of low accuracy of YOLOv5s. The red bounding boxes were the outputs predicted by YOLOv5s.
Figure 14. Cases of low accuracy of YOLOv5s. The red bounding boxes were the outputs predicted by YOLOv5s.
Applsci 14 02019 g014
Table 1. Size of datasets.
Table 1. Size of datasets.
DP TypePositive ImagesNegative Images
86inch Camera125347
86inch Camera225347
65inch Camera127624
65inch Camera227624
Total1058142
Table 2. TPR and FPR of test methods.
Table 2. TPR and FPR of test methods.
Conf@30%Conf@50%Conf@70%Conf@90%
MethodTPRFPRTPRFPRTPRFPRTPRFPR
Proposed1.0000.9660.9750.8600.7410.2990.0060.000
SBM0.9680.8710.9640.8710.8690.8540.2270.641
NCC0.3210.8470.3210.8470.1700.7220.0420.114
YOLOv5s0.3920.4240.3080.2140.2300.0860.2300.086
YOLOv5x0.5380.3070.4850.1930.3500.0940.3020.056
Table 3. Comparison of localizing accuracy.
Table 3. Comparison of localizing accuracy.
Conf (%)Proposed (pxls)SBM (pxls)NCC (pxls)YOLOv5s (pxls)YOLOv5x (pxls)
308.6922.2626.7418.618.64
508.2622.2626.7417.227.72
705.0122.2622.5611.668.25
900.8519.259.0411.384.60
Average5.7021.521.2714.717.30
Table 4. Specification and efficiency of test methods.
Table 4. Specification and efficiency of test methods.
MethodPlatformTraining ImagesTraining TimePredication Time
ProposeCore-i711.5 s0.6 s
SBMCore-i71<0.1 s<0.1 s
NCCCore-i71<0.1 s<0.1 s
YOLOv5sCore-i7 + GTX165012004.5 h (50 epoch)0.04 s (GPU)/0.35 s (CPU)
YOLOv5xCore-i7 + GTX1650120019 h (50 epoch)0.41 s (GPU)/1.4 s (CPU)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, S.; Zhu, F.; Wu, Q. A Robust Texture-Less Gray-Scale Surface Matching Method Applied to a Liquid Crystal Display TV Diffuser Plate Assembly System. Appl. Sci. 2024, 14, 2019. https://doi.org/10.3390/app14052019

AMA Style

Li S, Zhu F, Wu Q. A Robust Texture-Less Gray-Scale Surface Matching Method Applied to a Liquid Crystal Display TV Diffuser Plate Assembly System. Applied Sciences. 2024; 14(5):2019. https://doi.org/10.3390/app14052019

Chicago/Turabian Style

Li, Sicong, Feng Zhu, and Qingxiao Wu. 2024. "A Robust Texture-Less Gray-Scale Surface Matching Method Applied to a Liquid Crystal Display TV Diffuser Plate Assembly System" Applied Sciences 14, no. 5: 2019. https://doi.org/10.3390/app14052019

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop