Figure 1.
The visual comparison results of image reconstruction using GANomaly, SCADN, and our method show that, in comparison, their reconstructed images fail to accurately highlight the details of the original image. In contrast, our method achieves a restoration of the defective regions that is closer to the anomaly-free state, resulting in a more realistic representation of patterns and details.
Figure 1.
The visual comparison results of image reconstruction using GANomaly, SCADN, and our method show that, in comparison, their reconstructed images fail to accurately highlight the details of the original image. In contrast, our method achieves a restoration of the defective regions that is closer to the anomaly-free state, resulting in a more realistic representation of patterns and details.
Figure 2.
The overall architecture of generative adversarial networks based on U-ViT multi-scale mask feature fusion with pruning and merging. This network uses multi-scale block masks to mask some areas of the input image and reconstructs the image in an adversarial manner. The network is simplified by pruning and merging, and anomaly scores are calculated using error maps.
Figure 2.
The overall architecture of generative adversarial networks based on U-ViT multi-scale mask feature fusion with pruning and merging. This network uses multi-scale block masks to mask some areas of the input image and reconstructs the image in an adversarial manner. The network is simplified by pruning and merging, and anomaly scores are calculated using error maps.
Figure 3.
Structure of the ViT model. ViT divides the input image into multiple patches (16 × 16), and then projects each patch into a fixed-length vector and sends it to the Transformer. A special token is added to the input sequence, and the output corresponding to this token is the final category prediction.
Figure 3.
Structure of the ViT model. ViT divides the input image into multiple patches (16 × 16), and then projects each patch into a fixed-length vector and sends it to the Transformer. A special token is added to the input sequence, and the output corresponding to this token is the final category prediction.
Figure 4.
U-ViT architecture with long skip connections between the encoding and decoding layers.
Figure 4.
U-ViT architecture with long skip connections between the encoding and decoding layers.
Figure 5.
Visualization of the multi-scale block masks.
Figure 5.
Visualization of the multi-scale block masks.
Figure 6.
Schematic diagram of the Token Merging model.
Figure 6.
Schematic diagram of the Token Merging model.
Figure 7.
Test strategy diagram.
Figure 7.
Test strategy diagram.
Figure 8.
Defect-free and defective samples in the metal dataset.
Figure 8.
Defect-free and defective samples in the metal dataset.
Figure 9.
AUC and ROC line chart on the five metal datasets.
Figure 9.
AUC and ROC line chart on the five metal datasets.
Figure 10.
Test results on the five metal datasets. Def. represents the defect image, Rec. represents the reconstructed image, Res. represents the residual image, HM represents the heat map, and GT represents the label of the true defect.
Figure 10.
Test results on the five metal datasets. Def. represents the defect image, Rec. represents the reconstructed image, Res. represents the residual image, HM represents the heat map, and GT represents the label of the true defect.
Figure 11.
Test results on the MVTec AD dataset. Def. represents the defect image, Rec. represents the reconstructed image, Res. represents the residual image, HM represents the heat map, and GT represents the label of the true defect.
Figure 11.
Test results on the MVTec AD dataset. Def. represents the defect image, Rec. represents the reconstructed image, Res. represents the residual image, HM represents the heat map, and GT represents the label of the true defect.
Table 1.
Comparison of different methods regarding the AUC on five metal datasets. (1: GANomaly, 2: MAE, 3: ConvMAE, 4: SCADN, 5: DRAEM, 6: Patchore, 7: Student-Teacher, 8: Reverse-Distillation, 9: CFA, 10: Padim, 11: Csflow, 12: Fastflow). Bold numbers represent the optimal results.
Table 1.
Comparison of different methods regarding the AUC on five metal datasets. (1: GANomaly, 2: MAE, 3: ConvMAE, 4: SCADN, 5: DRAEM, 6: Patchore, 7: Student-Teacher, 8: Reverse-Distillation, 9: CFA, 10: Padim, 11: Csflow, 12: Fastflow). Bold numbers represent the optimal results.
CAT | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Ours |
---|
Kole | 0.752 | 0.768 | 0.618 | 0.809 | 0.813 | 1.000 | 0.833 | 0.726 | 0.644 | 0.985 | 0.816 | 0.697 | 0.988 |
Rail | 0.581 | 0.901 | 0.582 | 0.764 | 0.966 | 1.000 | 0.986 | 0.843 | 0.998 | 1.000 | 0.984 | 0.548 | 1.000 |
Steel | 0.953 | 0.850 | 0.983 | 0.907 | 0.941 | 0.998 | 0.893 | 0.719 | 0.928 | 0.987 | 0.999 | 0.593 | 0.996 |
Metal | 0.855 | 0.791 | 0.879 | 0.993 | 0.856 | 0.906 | 0.976 | 0.468 | 0.924 | 0.975 | 0.971 | 0.960 | 0.993 |
Shim | 0.869 | 0.745 | 0.748 | 0.957 | 0.892 | 0.935 | 0.857 | 0.606 | 0.988 | 0.999 | 0.981 | 0.996 | 1.000 |
Avg | 0.802 | 0.811 | 0.762 | 0.886 | 0.894 | 0.968 | 0.909 | 0.672 | 0.896 | 0.989 | 0.950 | 0.759 | 0.995 |
Table 2.
Comparison of different methods regarding the F1-Score on five metal datasets. (1: GANomaly, 2: MAE, 3: ConvMAE, 4: SCADN, 5: DRAEM, 6: Patchore, 7: Student-Teacher, 8: Reverse-Distillation, 9: CFA, 10: Padim, 11: Csflow, 12: Fastflow). Bold numbers represent the optimal results.
Table 2.
Comparison of different methods regarding the F1-Score on five metal datasets. (1: GANomaly, 2: MAE, 3: ConvMAE, 4: SCADN, 5: DRAEM, 6: Patchore, 7: Student-Teacher, 8: Reverse-Distillation, 9: CFA, 10: Padim, 11: Csflow, 12: Fastflow). Bold numbers represent the optimal results.
CAT | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Ours |
---|
Kole | 0.717 | 0.752 | 0.722 | 0.904 | 0.783 | 1.000 | 0.834 | 0.834 | 0.967 | 0.954 | 0.788 | 0.736 | 0.942 |
Rail | 0.955 | 0.959 | 0.955 | 0.965 | 0.977 | 1.000 | 0.988 | 0.966 | 0.979 | 1.000 | 0.977 | 0.941 | 1.000 |
Steel | 0.972 | 0.947 | 0.947 | 0.967 | 0.989 | 0.981 | 0.978 | 0.838 | 0.942 | 0.982 | 0.998 | 0.947 | 0.998 |
Metal | 0.981 | 0.982 | 0.982 | 0.994 | 0.983 | 0.958 | 0.992 | 0.981 | 0.875 | 0.985 | 0.991 | 0.980 | 0.997 |
Shim | 0.765 | 0.683 | 0.667 | 0.889 | 0.876 | 0.972 | 0.786 | 0.669 | 0.444 | 0.992 | 0.905 | 0.984 | 0.995 |
Avg | 0.878 | 0.865 | 0.855 | 0.944 | 0.922 | 0.982 | 0.916 | 0.858 | 0.841 | 0.983 | 0.932 | 0.918 | 0.986 |
Table 3.
Comparison of different methods regarding the AUC on the MVTec AD dataset. Bold numbers represent the optimal results.
Table 3.
Comparison of different methods regarding the AUC on the MVTec AD dataset. Bold numbers represent the optimal results.
CAT | GANomaly | Skip-Ganomaly | MAE | ConvMAE | VT-ADL | SCADN | Ours |
---|
bottle | 0.791 | 0.811 | 0.790 | 0.884 | 0.949 | 0.957 | 0.981 |
cable | 0.775 | 0.837 | 0.679 | 0.599 | 0.776 | 0.856 | 0.843 |
capsule | 0.774 | 0.768 | 0.629 | 0.663 | 0.672 | 0.765 | 0.877 |
carpet | 0.822 | 0.899 | 0.640 | 0.215 | 0.370 | 0.504 | 0.781 |
grid | 0.871 | 0.966 | 0.916 | 0.958 | 0.871 | 0.983 | 1.000 |
hazelnut | 0.775 | 0.791 | 0.830 | 0.641 | 0.897 | 0.833 | 0.911 |
leather | 0.804 | 0.786 | 0.685 | 0.364 | 0.728 | 0.659 | 0.985 |
metal-nut | 0.572 | 0.731 | 0.761 | 0.857 | 0.726 | 0.624 | 0.824 |
pill | 0.747 | 0.689 | 0.675 | 0.703 | 0.705 | 0.814 | 0.919 |
screw | 1.000 | 0.998 | 0.257 | 1.000 | 0.900 | 0.800 | 1.000 |
tile | 0.723 | 0.733 | 0.720 | 0.766 | 0.796 | 0.792 | 0.971 |
toothbrush | 0.704 | 0.742 | 0.740 | 0.522 | 0.901 | 0.981 | 1.000 |
transistor | 0.833 | 0.785 | 0.683 | 0.896 | 0.796 | 0.863 | 0.895 |
wood | 0.921 | 0.937 | 0.653 | 0.904 | 0.781 | 0.968 | 0.961 |
zipper | 0.744 | 0.657 | 0.725 | 0.710 | 0.808 | 0.846 | 0.971 |
Avg | 0.790 | 0.809 | 0.692 | 0.712 | 0.807 | 0.818 | 0.921 |
Table 4.
Comparison of different methods regarding the F1-score on the MVTec AD dataset. Bold numbers represent the optimal results.
Table 4.
Comparison of different methods regarding the F1-score on the MVTec AD dataset. Bold numbers represent the optimal results.
CAT | GANomaly | Skip-Ganomaly | MAE | ConvMAE | VT-ADL | SCADN | Ours |
---|
bottle | 0.863 | 0.863 | 0.863 | 0.863 | 0.932 | 0.913 | 0.955 |
cable | 0.777 | 0.789 | 0.765 | 0.763 | 0.749 | 0.814 | 0.843 |
capsule | 0.904 | 0.905 | 0.905 | 0.924 | 0.871 | 0.911 | 0.928 |
carpet | 0.864 | 0.864 | 0.864 | 0.864 | 0.834 | 0.865 | 0.865 |
grid | 0.859 | 0.844 | 0.885 | 0.844 | 0.916 | 0.974 | 0.992 |
hazelnut | 0.782 | 0.805 | 0.778 | 0.778 | 0.857 | 0.822 | 0.873 |
leather | 0.852 | 0.853 | 0.852 | 0.852 | 0.877 | 0.852 | 0.913 |
metal-nut | 0.894 | 0.901 | 0.895 | 0.894 | 0.902 | 0.894 | 0.915 |
pill | 0.915 | 0.919 | 0.915 | 0.915 | 0.898 | 0.927 | 0.950 |
screw | 0.873 | 0.958 | 0.853 | 0.992 | 0.911 | 0.921 | 1.000 |
tile | 0.836 | 0.836 | 0.836 | 0.841 | 0.849 | 0.841 | 0.864 |
toothbrush | 0.831 | 0.833 | 0.833 | 0.833 | 0.891 | 0.895 | 0.896 |
transistor | 0.593 | 0.697 | 0.624 | 0.756 | 0.801 | 0.719 | 0.822 |
wood | 0.863 | 0.918 | 0.899 | 0.894 | 0.893 | 0.922 | 0.937 |
zipper | 0.881 | 0.882 | 0.888 | 0.885 | 0.903 | 0.909 | 0.919 |
Avg | 0.839 | 0.858 | 0.844 | 0.860 | 0.872 | 0.879 | 0.911 |
Table 5.
Comparison of AUC and F1-score with different values of R on the metal datasets. Bold numbers represent the optimal results.
Table 5.
Comparison of AUC and F1-score with different values of R on the metal datasets. Bold numbers represent the optimal results.
CAT | R = 0 | R = 1 | R = 2 | R = 4 | R = 8 |
---|
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
---|
Kole | 0.981 | 0.932 | 0.979 | 0.922 | 0.984 | 0.934 | 0.965 | 0.919 | 0.969 | 0.919 |
Rail | 0.993 | 0.989 | 0.993 | 0.989 | 0.993 | 0.989 | 0.993 | 0.981 | 0.993 | 0.969 |
Steel | 0.994 | 0.998 | 0.994 | 0.998 | 0.994 | 0.998 | 0.985 | 0.978 | 0.965 | 0.978 |
Metal | 0.991 | 0.995 | 0.989 | 0.994 | 0.991 | 0.995 | 0.992 | 0.995 | 0.991 | 0.995 |
Shim | 1.000 | 0.991 | 1.000 | 0.991 | 1.000 | 0.992 | 0.999 | 0.971 | 0.999 | 0.971 |
Avg | 0.992 | 0.981 | 0.991 | 0.979 | 0.992 | 0.982 | 0.987 | 0.969 | 0.983 | 0.966 |
Table 6.
Comparison of the AUC and F1-score with different weight coefficients on the metal datasets. Bold numbers represent the optimal results.
Table 6.
Comparison of the AUC and F1-score with different weight coefficients on the metal datasets. Bold numbers represent the optimal results.
CAT | = 1, = 1, = 1 | = 1, = 1, = 10 | = 1, = 1, = 20 | = 1, = 1, = 30 |
---|
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
---|
Kole | 0.961 | 0.913 | 0.969 | 0.919 | 0.988 | 0.942 | 0.988 | 0.942 |
Rail | 0.995 | 0.987 | 0.994 | 0.985 | 1.000 | 1.000 | 0.999 | 0.992 |
Steel | 0.955 | 0.979 | 0.948 | 0.972 | 0.996 | 0.998 | 0.996 | 0.998 |
Metal | 0.937 | 0.985 | 0.951 | 0.992 | 0.993 | 0.997 | 0.935 | 0.991 |
Shim | 0.999 | 0.988 | 1.000 | 0.985 | 1.000 | 0.995 | 0.999 | 0.975 |
Avg | 0.969 | 0.970 | 0.972 | 0.971 | 0.995 | 0.986 | 0.983 | 0.980 |
Table 7.
Comparison of the AUC and F1-score under different mask settings on the metal datasets. Bold numbers represent the optimal results.
Table 7.
Comparison of the AUC and F1-score under different mask settings on the metal datasets. Bold numbers represent the optimal results.
CAT | Mask = 0 | Mask = 1 | Mask = 2 | Mask = 0/1 | Mask = 0/1/2 |
---|
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
---|
Kole | 0.977 | 0.897 | 0.969 | 0.962 | 0.938 | 0.926 | 0.981 | 0.927 | 0.988 | 0.942 |
Rail | 0.992 | 0.985 | 1.000 | 0.996 | 1.000 | 0.988 | 0.993 | 0.981 | 1.000 | 1.000 |
Steel | 0.986 | 0.978 | 0.983 | 0.977 | 0.972 | 0.972 | 0.994 | 0.989 | 0.996 | 0.998 |
Metal | 0.990 | 0.994 | 0.965 | 0.993 | 0.950 | 0.992 | 0.989 | 0.991 | 0.993 | 0.997 |
Shim | 0.966 | 0.945 | 1.000 | 0.989 | 0.999 | 0.979 | 1.000 | 0.985 | 1.000 | 0.995 |
Avg | 0.982 | 0.960 | 0.983 | 0.983 | 0.972 | 0.971 | 0.991 | 0.975 | 0.995 | 0.986 |
Table 8.
Ablation study results of the loss function module on the metal datasets. Bold numbers represent the optimal results.
Table 8.
Ablation study results of the loss function module on the metal datasets. Bold numbers represent the optimal results.
CAT | | | | |
---|
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
---|
Kole | 0.978 | 0.919 | 0.972 | 0.919 | 0.987 | 0.942 | 0.988 | 0.942 |
Rail | 0.985 | 0.981 | 0.995 | 0.984 | 0.991 | 0.992 | 1.000 | 1.000 |
Steel | 0.993 | 0.988 | 0.995 | 0.989 | 0.996 | 0.998 | 0.996 | 0.998 |
Metal | 0.956 | 0.991 | 0.991 | 0.995 | 0.993 | 0.997 | 0.993 | 0.997 |
Shim | 1.000 | 0.995 | 1.000 | 0.986 | 1.000 | 0.994 | 1.000 | 0.995 |
Avg | 0.982 | 0.975 | 0.991 | 0.975 | 0.993 | 0.985 | 0.995 | 0.986 |
Table 9.
Ablation study results of the consistency loss on the metal datasets. Bold numbers represent the optimal results.
Table 9.
Ablation study results of the consistency loss on the metal datasets. Bold numbers represent the optimal results.
CAT | | | |
---|
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
AUC
|
F1-Score
|
---|
Kole | 0.986 | 0.937 | 0.977 | 0.913 | 0.988 | 0.942 |
Rail | 0.992 | 0.969 | 0.992 | 0.973 | 1.000 | 1.000 |
Steel | 0.969 | 0.978 | 0.981 | 0.989 | 0.996 | 0.998 |
Metal | 0.969 | 0.991 | 0.972 | 0.992 | 0.993 | 0.997 |
Shim | 1.000 | 0.999 | 1.000 | 0.995 | 1.000 | 0.995 |
Avg | 0.983 | 0.975 | 0.988 | 0.972 | 0.995 | 0.986 |
Table 10.
Comparison of training time and data volume with and without pruning and merging on the metal datasets.
Table 10.
Comparison of training time and data volume with and without pruning and merging on the metal datasets.
CAT | | | |
---|
Kole | Train Time | 4 h 53 min | 4 h 22 min | 6 h 17 min |
Rail | Train Time | 4 h 25 min | 3 h 59 min | 5 h 03 min |
Steel | Train Time | 4 h 42 min | 4 h 13 min | 5 h 58 min |
Metal | Train Time | 4 h 39 min | 4 h 14 min | 6 h 01 min |
Shim | Train Time | 11 h 31 min | 10 h 27 min | 14 h 24 min |