Global Relation-Aware-Based Oil Detection Method for Water Surface of Catchment Wells in Hydropower Stations

Liu, Jiajun; Lin, Haokun; Liu, Yue; Xiong, Lei; Li, Chenjing; Zhou, Tinghu; Ma, Mike

doi:10.3390/su15086966

Open AccessArticle

Global Relation-Aware-Based Oil Detection Method for Water Surface of Catchment Wells in Hydropower Stations

by

Jiajun Liu

¹,

Haokun Lin

^1,*,

Yue Liu

¹,

Lei Xiong

¹,

Chenjing Li

¹,

Tinghu Zhou

² and

Mike Ma

²

¹

School of Electrical Engineering, Xi’an University of Technology, Xi’an 710048, China

²

Ankang Hydroelectric Power Station, State Grid Shaanxi Electric Power Company Limited, Ankang 725012, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(8), 6966; https://doi.org/10.3390/su15086966

Submission received: 17 February 2023 / Revised: 30 March 2023 / Accepted: 18 April 2023 / Published: 21 April 2023

(This article belongs to the Special Issue Recent Trends in Applications of Computer Vision in the Development of a Sustainable Environment)

Download

Browse Figures

Versions Notes

Abstract

:

The oil in hydropower station catchment wells is a source of water pollution which can cause the downstream river to become polluted. Timely detection of oil can effectively prevent the expansion of oil leakage and has important significance for protecting water sources. However, the poor environment and insufficient light on the water surface of catchment wells make oil pollution detection difficult, and the real-time performance is poor. To address these problems, this paper proposes a catchment well oil detection method based on the global relation-aware attention mechanism. By embedding the global relation-aware attention mechanism in the backbone network of Yolov5s, the main features of oil are highlighted and the minor information is suppressed at the spatial and channel levels, improving the detection accuracy. Additionally, to address the problem of partial loss of detail information in the dataset caused by the harsh environment of the catchment wells, such as dim light and limited area, single-scale retinex histogram equalization is used to improve the grayscale and contrast of the oil images, enhancing the details of the dataset images and suppressing the noise. The experimental results show that the accuracy of the proposed method achieves 94.1% and 89% in detecting engine oil and turbine oil pollution, respectively. Compared with the Yolov5s, Faster R-CNN, SSD, and FSSD detection algorithms, our method effectively reduces the problems of missing and false detection, and has certain reference significance for the detection of oil pollution on the water surface of catchment wells.

Keywords:

oil pollution detection; attention mechanism; single scale retinex; global relation-aware

1. Introduction

Water pollution is one of the most serious environmental problems, with significant impacts on human beings and ecosystems. Industry, agriculture, and urbanization are the main sources of water pollution. Pollutants such as oil [1,2,3], drugs [4,5], and pesticides [6,7] pose a severe threat to people’s health and life.

In industrial production, hydropower stations have been attracting attention in recent years as a clean energy source. However, the units of the hydropower stations often need lubrication and heat dissipation by hydro system lubricants such as turbine oil and engine oil to reduce wear and heat generation of units, thus prolonging the service life and operational safety of the equipment [8]. Meanwhile, under conditions such as routine maintenance or replacement of the units, failure of leaky pumps, or the use of too much oil, the oil is prone to leakage, which can cause pollution to travel to the water downstream if not dealt with in time [9]. Conventional methods of oil pollution detection are mainly observation by human eyes and chemical analysis of seepage water at fixed points, which cannot detect oil pollution effectively or in real-time in the catchment wells, and can easily let oil pollution reach the water downstream. It not only causes damage to the natural ecological environment downstream, but also affects agricultural planting and the safe water supply of residents, seriously endangering the safety of people’s lives and property.

Most current approaches for detecting oil pollution on the water surface involve either oil spill detection on the sea surface using various remote sensing monitoring methods to detect the type and area of the oil spill [10,11,12,13], or using remote sensing images to detect the oil slick [14,15]. Zhao Dong et al. [16] proposed an oil slick detection method based on multispectral remote sensing technology for sea surface oil slick detection, but the method is less effective in detecting oil slicks heavily polluted by sunlight and needs to be verified by more multispectral images. For the sake of detecting oil leakage from power transformers, Li Lu et al. [17] proposed a method for detecting and locating oil pollution on the surface of power transformers under UV irradiation, which recognizes the oil film automatically. Zhu Qiqi et al. [18] proposed an oil spill context and boundary-supervised detection network framework to detect oil spills in SAR images by fusing multi-scale features to extract oil spill regions. To solve the problem that oil slick analogues can reduce the detection accuracy of oil spills in SAR images, Tong Shengwu et al. [19] improved the detection accuracy of oil spills by introducing the self-similarity parameter to enhance the distinction between oil slicks and analogues. In terms of hard distinguishing of oil spills from look-alikes, Chen Yan et al. [20] proposed a marine oil spill detection from SAR images based on the attention U-Net Model, in which the rich polarization information of PolSAR data and sea surface wind speed information are considered. However, some of the above-mentioned studies of oil pollution detection often require multiple highly reliable and high-precision sensors such as lasers, reflectors, and light-receiving lenses, which also impose limitations on experimental operations. In addition, the deep learning-based oil spill detection methods on the sea surface have publicly available datasets. Compared to oil detection in catchment wells of hydropower stations, the area inside catchment wells is limited and the environment is dim. Furthermore, there is no publicly available data set about oil pollution samples in catchment wells. The methods mentioned above are not only unsuitable for oil detection inside catchment wells, but are also costly and wasteful of resources.

In recent years, the application of deep learning in power systems has gradually increased. However, research on object detection has mainly focused on transmission lines, towers, and types of equipment such as gold tools and insulators. For instance, research has investigated transmission line icing [21], foreign objects, tower collapse, missing gold tools [22], and insulator icing [23], as well as missing and cracked insulators. Zhu Jinguo et al. [24] proposed DFB-NN based on a convolutional neural network to detect foreign objects on transmission lines by regression strategy with directed bounding boxes to accurately predict the spatial location, orientation angle, and category of foreign objects. Gao Zishu et al. [25] proposed an insulator defects detection network including a batch normalization convolutional block attention module and feature fusion module, which had achieved good performance in insulator defect detection. Regarding the issue of difficult identification of the icing thickness on transmission lines, Wang Bo et al. [26] proposed a lightweight vision identification method based on discriminative-driven channel pruning, which could be applied to icing monitoring terminals with limited computing resources. Aiming the difficult detection of transmission line bolts with diverse visual structures and small sizes, Zhao Zhenbing et al. [27] established the AVSCNet detection model using the automatic visual shape clustering method, which eliminated the structural diversity of bolt images and reduced the corrosiveness of deep convolutional networks for small targets. Yang Lei et al. [28] proposed an end-to-end deep segmentation network for solving the problem of complex background and order imbalance in automatic power transmission line detection. For rapid insulator identification and localization, a lightweight target recognition network (SSD) for edge intelligence devices was proposed by Wei Baoquan et al. [29], reducing the redundant computation and running time effectively. However, these studies are mainly focused on the detection of defects in power equipment, while there are still relatively few studies related to the detection of oil on the water surface of catchment wells in hydropower stations through machine vision.

Inspired by this, we introduce the target detection technique in deep learning to solve the problem of oil detection in catchment wells at hydropower stations. We propose a global relation-aware-based oil detection method for catchment wells in hydropower stations. Since the acquired water surface oil image samples of hydropower station catchment wells are not conducive to direct processing due to environmental and other factors, we first use a single-scale retinex histogram equalization method to enhance the sample data by enforcing the detail information of the water surface oil. Secondly, to highlight the main features of oil pollution and suppress the secondary information, the global relation-aware attention mechanism is introduced into the Yolov5s target detection model during the training stage to enhance the model’s sensitivity to the features of oil pollution. Finally, the proposed algorithm is embedded into the catchment well monitoring system of a hydropower station to realize the real-time detection of oil pollution on the water surface. The main contributions of this paper are as follows.

Considering the problems of low brightness, high noise, and few extractable feature points in the image sample data caused by the catchment well environment, a sample data enhancement method is proposed. This method enhances the edge details of oil in the sample data through single-scale retinex histogram equalization, and dynamically stretches the overall image to enhance the contrast of oil in the image.
For the purpose of strengthening the feature extraction of sample data in the detection phase, we propose an improved Yolov5s method. In this method, the backbone network is enhanced by a global relation-aware attention mechanism that highlights the main features of sample data and suppresses minor information at both the spatial and channel levels. This results in improved detection accuracy and a stronger anti-interference capability against the background.
A deep learning-based method for pixel-level oil detection of catchment wells in hydropower stations is proposed, which has a good economic efficiency compared to other oil detection methods.

2. Proposed Methods

2.1. Yolov5s

Yolov5s is the network with the smallest network depth and width in the Yolov5 model framework. It has the advantages of a small number of model parameters and fast detection, making it suitable for deployment in embedded terminal devices. The network structure includes input, backbone, neck, and prediction. Mosaic is used in the input for the data enhancement, which is beneficial for improving the detection accuracy of small targets. The input also includes basic processing tasks such as adaptive image scaling and anchor box calculation.

The Focus and CSPnet modules are added to the backbone. The former is used for slicing operations, as shown in Figure 1. The 3 × 4 × 4 image is divided into four parts, with the first region of each part (dark blue) starting at 0, the second region (light blue) starting at 1, and so on. All slices are concatenated together according to the channel, and then a 12 × 2 × 2 feature map is obtained. The latter uses CSP1_X with a residual structure based on densely connected convolutional networks, which effectively prevents gradient disappearance, increases the computational power of the network, and reduces the computational effort. The neck contains the CSP2_X structure in addition to the FPN+PAN structure to further enhance feature fusion using the information extracted from the backbone network. The prediction includes the bounding box loss function and non-maximum suppression (NMS). The Yolov5s specific structure is shown in Figure 2.

In Figure 2, a 640 × 640 × 3 size image is fed through the input to the Focus structure of the backbone network to obtain a 2-fold subsampled feature map with no information loss, and then passed through the CBL, CSP1-X, and SPP modules in turn. The CBL module includes Conv, BatchNorm, and ReLU. Sub-feature map 1 is obtained from the output of the CBL module through the residual module ResUnit and Conv, and sub-feature map 2 is obtained directly from Conv. Sub-feature maps 1 and 2 then are input to BN, ReLU, and CBL by stitching them together to obtain the output of CSP1-X. The SPP module is to max-pool the obtained feature maps and extract features using convolution kernels of sizes 1 × 1, 5 × 5, 9 × 9, and 13 × 13, respectively. All features are pieced together to convert the feature maps of arbitrary sizes into feature vectors of fixed sizes. For the purpose of enhancing the detection performance of the model for objects of different sizes, the neck includes the structure of FPN and PAN. The feature maps are first upsampled to obtain high-level semantic feature maps, and then sub sampled to extract high-level semantic information. Three scale sizes of 128 × 80 × 80, 256 × 40 × 40, and 512 × 20 × 20 are output. Among them, 80 × 80 represents the shallow feature map, which contains more low-level information and is suitable for detecting small targets; 20 × 20 represents the deep feature map, which contains more high-level information and is suitable for detecting large targets; the 40 × 40 feature map is in between these two scales and is used to detect medium-sized targets. The prediction stage sequentially grids the predictions on these 3 feature maps to identify the target.

2.2. Global Relation-Aware-Based Yolov5s Algorithm

“Relation-Aware Global Attention (RAG)” is initially applied to the field of person re-identification [30], which is used to learn the discriminative features of humans efficiently. The relationships between nodes in the feature graph provide clustering-like information that helps infer semantics and attention. Therefore, the RGA module facilitates the capture of global structural information for better attention learning. For each feature node, the pairwise relationships of that node concerning all nodes are modeled to compactly capture the structural information of the global scope and local appearance information. These relationships are used as vectors to represent the information in the global structural scope. After that, the features of the node itself and these vectors are superimposed compactly to learn attention by a model with shallow convolution. This approach not only considers the relationship between the feature node and its global scope, but also determines the importance of the feature node from a global perspective, as shown in Figure 3.

Figure 3 shows that the feature space contains 5 feature nodes,

x_{1}, x_{2}, x_{3}, x_{4}, x_{5}

. For the feature node

x_{1}

, it is used with the correlation

r_{1}

with all feature nodes. In the correlation vector

r_{i} = [r_{i, 1}, r_{i, 2}, \dots, r_{i, N}, r_{1, i}, r_{2, i}, \dots, r_{N, i}]

,

r_{i, j}

denotes the correlation between the i-th feature node and the j-th feature node, and

a_{i}

denotes the attention weight. Afterward, the global structure information of the feature node is represented by using the feature

x_{1}

and its correlation vector

r_{1}

to concatenate in a fixed order; that is,

y_{1} = [x_{i}, r_{1}]

, as a transformation function to infer attentional features through learning.

Based on the idea of mining pairwise correlations, the RGA module is applied to both the spatial and channel dimensions. In the spatial dimension, RGA-S calculates the affinity matrix by expanding the feature nodes along the spatial direction to represent the correlation between features. As shown in Figure 4, for a given intermediate feature tensor,

x \in R^{C \times H \times W}

, from the CNN layer, the

C

-dimensional feature vector of each spatial location is used as a feature node. All spatial locations form a

W \times H

node feature map, which is raster scanned and identified in the order 1, …, N, where the correlation between feature nodes

i

and

j

is calculated as shown in Equation (1):

\{\begin{matrix} r_{i, j} = f_{s} (x_{i}, x_{j}) = θ_{s} {(x_{i})}^{T} ϕ_{s} (x_{j}) \\ θ_{s} (x_{i}) = ReLU (W_{θ} x_{i}), W_{θ} \in R^{\frac{C}{s_{1}} \times C}, \\ ϕ_{s} (x_{i}) = ReLU (W_{ϕ} x_{i}), W_{ϕ} \in R^{\frac{C}{s_{1}} \times C} \end{matrix}

(1)

where

θ_{s}

and

ϕ_{s}

are two embedding functions implemented by a 1 × 1 spatial convolution layer followed by batch normalization (BN) and ReLU activation, and

s_{1}

is a predefined positive integer to control the dimensionality reduction rate.

The similarity matrix is superimposed on the pairwise relationships of the

i

feature node with all its nodes along horizontal and vertical directions to obtain the relationship vector

r_{i} = [R_{s} (:, i), R_{s} (i, :)] \in R^{2 \times H \times W}

, which is the relationship feature of the attention at the

i

spatial location. The original feature and the relevance vector are mapped onto the same feature domain by functions

ψ (•)

and

φ (•)

, after which they are combined to form a new relation-aware feature map, as shown in Equation (2):

\{\begin{matrix} {\tilde{y}}_{i} = [p o o l_{c} (ψ_{s} (x_{i})), φ_{s} (r_{i})] \\ ψ_{s} (x_{i}) = ReLU (W_{ψ} x_{i}), W_{ψ} \in R^{\frac{C}{s_{1}} \times C}, \\ φ_{s} (r_{i}) = ReLU (W_{φ} r_{i}), W_{φ} \in R^{\frac{2 \times {(H \times W)}^{2}}{s_{1}}} \end{matrix}

(2)

where

ψ (•)

denotes the feature itself and

φ (•)

denotes the global relational embedding function; they are implemented by spatial 1 × 1 convolutional layers, with BN and ReLU activation.

p o o l_{c}

denotes the global averaging pooling operation along the channel dimension, which can further reduce the dimensionality to 1. The spatial attention weight

a_{i}

of the

i

feature node is obtained by Equation (3), as follows.

a_{i} = S i g m o i d (W_{2} ReLU (W_{1} {\tilde{y}}_{i})),

(3)

where

W_{1}

and

W_{2}

are implemented by 1 × 1 convolution and BN, where

W_{1}

reduces the channel size by the ratio of

s_{2}

and

W_{2}

converts the channel size to 1.

Similar to the principle of spatial attention, in the channel dimension, RGA-C is used to learn the channel attention weights by taking the H × W dimensional feature map on each channel as a feature node, expanding the feature nodes in the channel direction, and finally obtaining the attention weights of the channel. The correlation relationship between channel feature nodes

i

and

j

is calculated as shown in Equation (4).

\{\begin{matrix} r_{i, j} = f_{c} (x_{i}, x_{j}) = θ_{c} {(x_{i})}^{T} ϕ_{c} (x_{j}) \\ θ_{c} (x_{i}) = ReLU (W_{θ} x_{i}), W_{θ} \in R^{\frac{C}{s_{1}} \times C}, \\ ϕ_{c} (x_{i}) = ReLU (W_{ϕ} x_{i}), W_{ϕ} \in R^{\frac{C}{s_{1}} \times C} \end{matrix}

(4)

where

θ_{c}

and

ϕ_{c}

are two embedding functions shared between feature nodes, implemented by a 1 × 1 convolutional layer with BN, and later activated by ReLU. After obtaining the relationship vector

r_{i} = [R_{c} (:, i), R_{c} (i, :)] \in R^{2 \times C}

of the

i

channel feature node, the channel relation-aware features

{\tilde{y}}_{i}

and channel attention weights

a_{i}

are calculated by using Equations (2) and (3), as shown in Figure 5.

Yolov5s is combined with the RGA module in this paper to enhance the focus on oil pollution and improve the background immunity during detection, as well as to enhance the discriminative features and suppress irrelevant features. The RGA-S and RGA-C modules are used sequentially in the backbone part to obtain feature maps containing different learning weights in the spatial and channel dimensions before feeding them into the deep network to reduce information loss during feature extraction.

Figure 6 shows that the CSP module is replaced by the C3 module in the backbone section. At the end of this section, the RGA-S spatial attention module and the RGA-C channel attention machine module are sequentially added before the spatial pyramid pooling.

The feature map obtained after spatial attention extraction is utilized as input for the channel attention module. The channel attention first performs global average pooling on the feature map and produces a 1 × 1 × C size feature map with a global perceptual field. The correlation between C channels is learned by the Affinity Matrix. Finally, both the feature map and the Affinity Matrix are concatenated to output the global perceptual features of the channel. This process enables the extraction of important features related to oil pollution and suppression of unimportant features.

2.3. Single-Scale Retinex Histogram Equalization

Due to the strong hydration of the oil, it is difficult to distinguish the edges of the oil from the water surface with limited light intensity, which can affect the detection results. The catchment wells of the hydropower stations are arranged in the lowest part of the whole plant, which is the drainage system for the leakage water, maintenance water, and part of the production water of the power station, and its ambient light is weak and the light intensity is limited. Therefore, to improve the contrast of the input data, the image is processed in the input section with single-scale retinex (SSR) histogram equalization. This improves image clarity and enhances the detail of the oil pollution, laying the foundation for subsequent feature extraction and detection.

According to retinex theory [31,32], the color of an object is determined by the ability of the object itself to reflect red, green, and blue light, so for oil pollution, the inhomogeneity of the illumination does not affect the displayed color. In single-scale retinex theory [33], the image

S (x, y)

is obtained by the incident illumination

L

being captured by the camera after passing through the reflected rays

R

of the object itself, as shown in Equation (5).

S (x, y) = L (x, y) \times R (x, y),

(5)

Converting this to the logarithmic domain [34], the above equation becomes:

\log S = \log (L \times R) = \log L + \log R,

(6)

The true image of an object, called its reflected ray

R

, is:

R = \exp (\log S - \log L),

(7)

In the water surface oil image enhancement process of the catchment well, the incoming and outgoing light illuminance

L

is estimated from the original image

S

, and the original reflected light component

R

is obtained by removing the

L

component according to Equation (7), thus eliminating the effect of uneven brightness, achieving the brightness enhancement of the water surface oil image and improving the overall effect of the water surface oil image. Since the incident illuminance

L

is practically unobtainable, the light component

L

can only be estimated approximately by convolving a Gaussian function with the image

S (x, y)

to obtain a Gaussian blurred image, as shown in Equation (8).

\{\begin{array}{l} L (x, y) = S (x, y) \times G (x, y) \\ G (x, y) = \frac{1}{2 π σ^{2}} e^{- \frac{x^{2} + y^{2}}{2 σ^{2}}}, \\ \iint G (x, y) d x d y = 1 \end{array}

(8)

Thus Equation (7) becomes:

R = \exp (\log S - \log S \times G)

(9)

In the catchment well environment, insufficient brightness is one of the most notable reasons for the poor quality of oil images on the water surface. The RGB color space [35] is not suitable for dealing with image luminance and requires simultaneous consideration of all three components during processing. Therefore, when processing the oil images, the image is first converted from RGB color space to HSV color space [36,37]. Only the V component is processed in HSV space, which reduces computational complexity and subtly transforms multi-channel processing into single-channel processing. After enhancing the V component using the single-scale retinex algorithm, histogram equalization is used to stretch the dynamic range of the enhanced processed image to avoid low contrast issues. Finally, the processed image is converted from the HSV color space to the RGB color space and fed into the improved network with Yolov5s for training. The results are shown in Figure 7.

Figure 7 shows images of oil pollution on the water surface of catchment wells under three different lighting conditions. The images are enhanced using the histogram equalization and single-scale retinex histogram equalization methods. The results demonstrate that although the histogram-equalized image can improve image quality and contrast to some extent, it may cause the loss of image details, resulting in an unnatural exposure. On the other hand, by comparing the red circles in Figure 7, it can be seen that the single-scale retinex histogram equalization method preserves image details, improves clarity, enhances image brightness, highlights the oil part of the water surface, and provides strong support for subsequent models to learn oil features.

3. Experimental Results and Discussion

3.1. Experimental Environments and Parameters

The experiment is written in Python, with NVIDIA GTX 1080Ti GPU, Pytorch 1.12.1, and CUDA version 10.2 as the deep learning framework. In the network training phase, the input image size is set to 640 × 640, the training batch is set to 16, the training rounds are set to 200, and the optimizer learning rate is set to 0.2.

3.2. Experimental Data Sources

The sample data in the experiment is captured by a Hikvision camera in a hydroelectric power station in Shaanxi Province. Specifically, the captured videos of oil pollution are converted frame-by-frame into images to make a dataset to train and evaluate the model, with a total of 7220 images. The data is cleaned, oversized images are cut, low-definition images and high-similarity images are removed, and 3000 usable oil-contaminated images are obtained. The images are annotated using the MAKE SENSE online annotation tool, and the annotation files are stored in YOLO format. The oil samples are divided into two categories: “engine oil” and “turbine oil”. The dataset is divided into training, validation, and test sets in the ratio of 7:2:1. The detected oil has also been collected and rendered harmless after the completion of the experiments.

3.3. Results and Analysis

For reasonable analysis of the detection results, the precision (

P

), recall rate (

R

), average precision (

A P

), and mean average precision (

m A P

,

I o U = 0.5

) are adopted as the experimental evaluation metrics. The

P

and

R

metrics depend on true positive (

T_{P}

), false positive (

F_{P}

), true negative (

T_{N}

), and false negative (

F_{N}

). The specific equations are as follows.

P = \frac{T_{P}}{T_{P} + F_{P}} \times 100 %,

(10)

R = \frac{T_{P}}{T_{P} + F_{N}} \times 100 %,

(11)

In the above equation,

T_{p}

denotes the number of identified oil samples,

F_{p}

denotes the number of normal samples misidentified as oil samples, and

F_{N}

denotes the number of unidentified oil samples. Generally, the larger the values of

P

and

R

, the better the performance of the algorithm.

A P = \int_{0}^{1} P_{i} (R_{i}) d R,

(12)

m A P @ 0.5 = \frac{\sum_{i = 1}^{N_{c}} \int_{0}^{1} P_{i} (R_{i}) d R}{N_{c}} \times 100 %,

(13)

The meaning of AP in Equation (12) is the average precision at different recall rates. In Equation (13),

@ 0.5

indicates that the cross-merge ratio

I o U

threshold is taken as 0.5,

N_{c}

is the total number of all categories, and

\int_{0}^{1} P_{i} (R_{i}) d R

is the accuracy of the target of the category

i

. The overall accuracy

m A P @ 0.5

is the average of the APs from the dimension of the categories and can evaluate the performance of the multi-classifier.

To verify the effectiveness of the proposed method, ablation experiments are conducted on the proposed modified method of Yolov5s, and the experimental results are shown in Table 1.

As shown in Table 1, the proposed algorithm achieved a detection accuracy of 94.1% and 89% for engine oil and turbine oil, respectively. APe represents the average detection accuracy of engine oil and APt represents the average detection accuracy of turbine oil. Compared with the original Yolov5s algorithm, the proposed algorithm has improved the accuracy by 6.4% and the recall rate by 4%, effectively reducing miss detection and false detection. Additionally, the overall average accuracy has improved by 8.4%, greatly enhancing the detection effect of the algorithm.

After comparing the ablation experiments with the original model, it was revealed that the addition of the RGA attention model resulted in an improvement in detection accuracy for Yolov5s. Whether the global relation-aware spatial attention model RGA-S is added separately, the global channel attention model RGA-C is added separately, or RGA-SC is added sequentially in order, the detection accuracy of engine oil is improved by 5.9%, 4.5%, and 6.6%, respectively, and the detection accuracy of turbine oil is improved by 5.1%, 7%, and 8.5%, respectively. Therefore, it can be shown that the model using both spatial and channel attention mechanisms is better than using channel or spatial attention mechanisms alone. The model using both spatial and channel attention mechanisms focuses on useful information and suppresses useless information, improving the network’s ability to express the features of oil pollution and increasing the attention to the important features of oil pollution. After the input data’s SSR histogram equalization is fed to the network, it is easier for the network to learn the features of the oil pollution. The total average accuracy

m A P @ 0.5

of the oil pollution is improved by 2.9%, which improves the model detection capability.

To clearly describe the detection results of the proposed method in this paper, the confusion matrix of the model validation is shown in Figure 8. The horizontal axis represents the true results, and the vertical axis represents the prediction. By analyzing the values in the confusion matrix for both the real and predicted results, the model in this paper generally achieves a high classification accuracy. However, there were some instances of missed detection, indicated by the false positive rates of 54% and 46% for the two types of oils, respectively. This may be caused by the difficulty of capturing oil with very small targets, suggesting that the environment of the catchment wells has a certain interference effect on the detection of oil with very small targets.

To further verify the detection performance of the proposed method in this paper, the original Yolov5s model is compared with the algorithm proposed in this paper. Six different oil pollution images are randomly selected from the test set for the experimental test, and the experimental results are divided into six groups: a, b, c, d, e, and f. The specific detection results are shown in Figure 9.

In Figure 9, the original model has significant omissions in detecting small oil pollution (highlighted in red circles in the figure), while it performs well in detecting large areas of oil pollution. The improved model detects all the oil pollution, indicating that the omission has been greatly improved. In group f, the original Yolov5s model mistakenly identifies turbine oil as engine oil, resulting in false positives (highlighted in red boxes in the figure). The improved model can correctly identify them, and the confidence of detecting engine oil has increased from 0.35 to 0.86, while the confidence of detecting turbine oil has increased from 0.43 to 0.87. The improved model has enhanced its ability to identify oil contamination, proving that the algorithm can improve the feature learning ability of engine oil and turbine oil. In addition, the detection confidence of the improved model has also been increased, indicating that the algorithm significantly enhances the detection ability of oil contamination. According to the detection results, the improved algorithm can identify oil contamination (engine oil and turbine oil) more accurately, solving the problems of false and missed detections, and has a stronger capability to capture oil contamination compared to the original Yolov5s.

For a clear illustration of how the proposed model works in feature extraction, three different types of oil pollution images are selected, namely engine oil images, turbine oil images, and oil images with a mixture of both. The process of feature extraction for these three types of oil pollution images is divided into four stages, as shown in Figure 10. The feature maps generated at each stage are visualized to facilitate the analysis of the nature of the features extracted by the neural network.

In Figure 10, we can see that after the shallow feature information of oil pollution is extracted by the first stage convolution layer, the model can effectively segment the foreground and background areas of the oil pollution image. The subsampled effect of the feature map after the second-stage convolution operation is significantly enhanced. With the increase in the number of layers, the feature maps of the last two stages have more abstract global spatial information. Consequently, the feature maps contain high-level semantic information, making the oil pollution target abstract and beyond human perception. However, the computer can still learn from these abstract features. The visualized results of feature extraction demonstrate that the Yolov5s model proposed in this paper—based on global relation-aware detection—can fully capture the color, edge, and shape information of engine oil and turbine oil in oil pollution images.

The ability of the original Yolov5s to extract oil pollution features is enhanced by introducing the RGA attention mechanism in the backbone network. This enables the model to focus more on previously ignored fine-grained information. To clearly describe the advantages of the proposed method in the feature extraction process of the model backbone network, the critical parts that the proposed algorithm and the original Yolov5s focus on during feature extraction are visualized, as shown in Table 2. The red area is the region with high saliency, which is also the critical part that the neural network focuses on during feature extraction. The algorithm proposed in this paper can enhance the saliency of two types of oil pollution in the images, and can capture the location of oil pollution more accurately than the original Yolov5s. This lays a foundation for the precise detection of oil pollution targets in the next stage.

As a further illustration of the effectiveness of the proposed approach, the algorithm in this paper is compared with four classical target detection algorithms for experiments: Faster R-CNN, SSD, FSSD, and the original Yolov5s. The comparison experiment uses the same samples and parameters for training, and compares indicators such as precision (

P

), recall (

R

), detection speed (FPS), average accuracy (

m A P @ 0.5

), and model size (

w e i g h t

). The experimental results are shown in Table 3.

From the experimental results in Table 3, we can see that the algorithm proposed in this paper has the highest detection accuracy, with an improvement of 6.3%, 13.7%, 2.6%, and 3.9% over other algorithms, respectively. Due to the integration of a global relation-aware attention module, the model size of the proposed algorithm has increased compared to the original Yolov5s, but is much smaller than the model sizes of Faster R-CNN, FSSD, and SSD. Although the detection accuracies of FSSD and SSD algorithms are also relatively high, the detection speed of the proposed algorithm is twice as fast as theirs. Compared with the detection speed of the original Yolov5s, the detection speed of the proposed algorithm is 5FPS/s slower, but still meets the real-time requirements.

To clearly describe the training process of the models, the loss curves of the five algorithms’ validation processes are plotted as shown in Figure 11. It can be seen that except for Faster R-CNN, which has not converged in 200 rounds, the losses of the other algorithms gradually decreased and converged to a specific threshold. Among them, Yolov5s had the fastest convergence speed, converging to 0.005 in about 50 rounds. The loss of the algorithm proposed in this paper decreased from around 0.03 to 0.002, the final convergence value is the smallest, and the change in loss value is the smoothest.

In order to comprehensively compare the detection performance of several algorithms under different oil pollution conditions, four types of oil pollution images are selected for detection, including category 1 with less oil pollution, category 2 with dense oil pollution, category 3 with turbine oil only, and category 4 with a mixture of two types of oil. The detection results of each algorithm are shown in Table 4.

In Table 4, for category 1 with less oil pollution, the detection accuracy of the proposed algorithm is 0.90 and 0.80, respectively. The results are higher than those of the other four algorithms. Faster R-CNN shows a missed detection phenomenon in this situation. For category 2 with dense oil pollution, Yolov5s mistakenly detects water impurities as turbine oil, while Faster R-CNN shows a missed detection phenomenon due to the extracted feature maps being single-layered and having a low resolution; the other three algorithms correctly detect the oil pollution. Several experiments conducted on category 3 and category 4 show that Faster R-CNN has poor detection performance for turbine oil, possibly due to the low resolution of the feature maps, which makes it unable to learn the features of turbine oil. In the detection results of category 4, both Yolov5s and SSD have problems with overlapping bounding boxes under the same non-maximum suppression threshold. Overall, the proposed method has a higher detection accuracy for water surface oil pollution in all categories compared to the other algorithms. The other algorithms show varying degrees of missed detection, false detection, and over-detection during the detection process, while the detection results of this paper are all identified correctly.

4. Conclusions

Aiming to address the problems of ineffective real-time detection of oil pollution in the catchment wells of hydropower stations and the high cost of existing oil pollution detection methods, this paper proposes a global relation-aware-based Yolov5s oil pollution detection method for water surfaces of catchment wells in hydropower stations, with the following main conclusions:

(1): For the problems of poor sample quality and limited extractable feature points caused by the environment and insufficient light on the water surface of the catchment wells, a sample data preprocessing method is proposed. The sample dataset is processed using the SSR histogram equalization method to obtain images both of high brightness and of better visual effects of the oil pollution on the water’s surface, which benefits the deep learning model’s extraction of the target features of the oil pollution.
(2): Considering the issues of missed and false detections in Yolov5s when detecting oil pollution, a global relationship-aware attention mechanism is embedded in the backbone network of Yolov5s to enhance the feature extraction of oil pollution, thus suppressing the interference of noise on detection and improving the detection accuracy of the algorithm.

The method proposed in this paper has practical engineering significance, which has been applied in a hydroelectric power plant in Shaanxi Province, China. However, the model proposed in this paper is currently only applicable to two types of insulating oils, namely engine oil and turbine oil. Manual labeling of a large number of training samples is additionally required if other types of insulating oils need to be identified. Therefore, the semi-supervised approach for the identification of other types of oils will be considered in our future work to avoid the tedious work of manual labeling and simplify the preparation process before training.

Author Contributions

Conceptualization, J.L. and H.L.; methodology, T.Z.; software, H.L.; validation, J.L., H.L. and M.M.; formal analysis, H.L.; investigation, T.Z.; resources, T.Z.; data curation, M.M. and Y.L.; writing—original draft preparation, H.L.; writing—review and editing, J.L. and C.L.; visualization, H.L. and Y.L.; supervision, T.Z. and L.X.; project administration, J.L. and L.X.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Key R & D Program of State Grid Shaanxi Electric Power Company (SGTYHT/21-JS-223).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zafirakou, A.; Themeli, S.; Tsami, E.; Aretoulis, G. Multi-Criteria Analysis of Different Approaches to Protect the Marine and Coastal Environment from Oil Spills. J. Mar. Sci. Eng. 2018, 6, 125. [Google Scholar] [CrossRef]
Moroni, D.; Pieri, G.; Tampucci, M. Environmental Decision Support Systems for Monitoring Small Scale Oil Spills: Existing Solutions, Best Practices and Current Challenges. J. Mar. Sci. Eng. 2019, 7, 19. [Google Scholar] [CrossRef]
Tsivadze, A.Y.; Fridman, A.Y.; Tumanyan, B.P.; Novikov, A.K.; Polyakova, I.Y.; Sudarkin, A.P. Prospective Preparations for Accelerated Bioremediation of Oil-Contaminated Soils. Chem. Technol. Fuels Oils 2020, 56, 588–592. [Google Scholar] [CrossRef]
Imran, A.; Prashant, S.; Hassan, Y.A.E.; Bhavtosh, S. Chiral Analysis of Ibuprofen Residues in Water and Sediment. Anal. Lett. 2009, 42, 1747–1760. [Google Scholar]
Basheer, A.A. New generation nano-adsorbents for the removal of emerging contaminants in water. J. Mol. Liq. 2018, 261, 583–593. [Google Scholar] [CrossRef]
Basheer, A.A. Chemical chiral pollution: Impact on the society and science and need of the regulations in the 21st century. Chirality 2018, 30, 402–406. [Google Scholar] [CrossRef]
Basheer, A.A.; Ali, I. Stereoselective uptake and degradation of (+/-)-o, p-DDD pesticide stereomers in water-sediment system. Chirality 2018, 30, 1088–1095. [Google Scholar] [CrossRef]
González-Reyes, G.A.; Bayo-Besteiro, S.; Vich Llobet, J.; Añel, J.A. Environmental and Economic Constraints on the Use of Lubricant Oils for Wind and Hydropower Generation: The Case of NATURGY. Sustainability 2020, 12, 4242. [Google Scholar] [CrossRef]
Sun, J.; Zhang, Y.; Liu, B.; Ge, X.; Zheng, Y.; Fernandez-Rodriguez, E. Research on Oil Mist Leakage of Bearing in Hydropower Station: A Review. Energies 2022, 15, 2632. [Google Scholar] [CrossRef]
Niclòs, R.; Doña, C.; Valor, E.; Bisquert, M. Thermal-Infrared Spectral and Angular Characterization of Crude Oil and Seawater Emissivities for Oil Slick Identification. IEEE Trans. Geosci. Remote Sens. 2014, 52, 5387–5395. [Google Scholar] [CrossRef]
Liu, X.J.; Zhang, Y.J.; Zou, H.M.; Wang, F.; Cheng, X.; Wu, W.P.; Liu, X.Y.; Li, Y.S. Multi-source knowledge graph reasoning for ocean oil spill detection from satellite SAR images. Int. J. Appl. Earth Obs. Geoinf. 2023, 116, 103153. [Google Scholar] [CrossRef]
Amri, E.; Dardouillet, P.; Benoit, A.; Courteille, H.; Bolon, P.; Dubucq, D.; Credoz, A. Offshore Oil Slick Detection: From Photo-Interpreter to Explainable Multi-Modal Deep Learning Models Using SAR Images and Contextual Data. Remote Sens. 2022, 14, 3565. [Google Scholar] [CrossRef]
Tysiąc, P.; Strelets, T.; Tuszyńska, W. The Application of Satellite Image Analysis in Oil Spill Detection. Appl. Sci. 2022, 12, 4016. [Google Scholar] [CrossRef]
Sun, S.J.; Chen, Y.; Chen, X.; Ai, B.; Zhao, J. Optical discrimination of emulsified oil in optically complex estuarine waters. Mar. Pollut. Bull. 2022, 184, 114214. [Google Scholar] [CrossRef] [PubMed]
Sun, Z.; Sun, S.J.; Zhao, J.; Ai, B.; Yang, Q.S. Detection of Massive Oil Spills in Sun Glint Optical Imagery through Super-Pixel Segmentation. J. Mar. Sci. Eng. 2022, 10, 1630. [Google Scholar] [CrossRef]
Zhao, D.; He, H. Detecting Oil Slicks Under the Heterogeneous Marine Environment Utilizing Multispectral Images. IEEE Geosci. Remote Sens. Lett. 2021, 18, 761–765. [Google Scholar] [CrossRef]
Lu, L.; Ichimura, S.; Yamagishi, A.; Rokunohe, T. Oil Film Detection Under Solar Irradiation and Image Processing. IEEE Sens. J. 2020, 20, 3070–3077. [Google Scholar] [CrossRef]
Zhu, Q.Q.; Zhang, Y.; Li, Z.Q.; Yan, X.R.; Guan, Q.F.; Zhong, Y.F.; Zhang, L.P.; Li, D.R. Oil Spill Contextual and Boundary-Supervised Detection Network Based on Marine SAR Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–10. [Google Scholar] [CrossRef]
Tong, S.; Liu, X.; Chen, Q.; Zhang, Z.; Xie, G. Multi-Feature Based Ocean Oil Spill Detection for Polarimetric SAR Data Using Random Forest and the Self-Similarity Parameter. Remote Sens. 2019, 11, 451. [Google Scholar] [CrossRef]
Chen, Y.; Wang, Z. Marine Oil Spill Detection from SAR Images Based on Attention U-Net Model Using Polarimetric and Wind Speed Information. Int. J. Environ. Res. Public Health 2022, 19, 12315. [Google Scholar] [CrossRef]
Zhang, M.; Yang, J.; Liu, B.; Ma, X.; Zhao, B.; Ou, W. Research on On-line Monitoring Technology of Transmission Line Galloping Based on Edge Computing. In Proceedings of the 2021 IEEE 4th International Conference on Electronics Technology (ICET), Chengdu, China, 7–10 May 2021. [Google Scholar]
Zhang, X.; Yang, L.; Huang, R.; Lyu, J.; Li, T. A Fast Detection Algorithm of Small Targets Based on YOLOv3. In Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020. [Google Scholar]
Zhang, Y.; Huang, X.; Jia, J.; Zhu, Y.; Zhao, L.; Zhang, X. Detection and Condition Assessment of Icicle Bridging for Suspension Glass Insulator by Image Analysis. IEEE Trans. Instrum. Meas. 2020, 69, 7458–7471. [Google Scholar] [CrossRef]
Zhu, J.G.; Guo, Y.; Yue, F.; Yuan, H.; Yang, A.; Wang, X.H.; Rong, M.Z. A Deep Learning Method to Detect Foreign Objects for Inspecting Power Transmission Lines. IEEE Access 2020, 8, 94065–94075. [Google Scholar] [CrossRef]
Gao, Z.; Yang, G.; Li, E.; Liang, Z. Novel Feature Fusion Module-Based Detector for Small Insulator Defect Detection. IEEE Sens. J. 2021, 21, 16807–16814. [Google Scholar] [CrossRef]
Wang, B.; Ma, F.; Ge, L.; Ma, H.; Wang, H.; Mohamed, M.A. Icing-EdgeNet: A Pruning Lightweight Edge Intelligent Method of Discriminative Driving Channel for Ice Thickness of Transmission Lines. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
Zhao, Z.; Qi, H.; Qi, Y.; Zhang, K.; Zhai, Y.; Zhao, W. Detection Method Based on Automatic Visual Shape Clustering for Pin-Missing Defect in Transmission Lines. IEEE Trans. Instrum. Meas. 2020, 69, 6080–6091. [Google Scholar] [CrossRef]
Yang, L.; Fan, J.; Xu, S.; Li, E.; Liu, Y. Vision-Based Power Line Segmentation With an Attention Fusion Network. IEEE Sens. J. 2022, 22, 8196–8205. [Google Scholar] [CrossRef]
Wei, B.; Xie, Z.; Liu, Y.; Wen, K.; Deng, F.; Zhang, P. Online Monitoring Method for Insulator Self-explosion Based on Edge Computing and Deep Learning. CSEE J. Power Energy Syst. 2022, 8, 1684–1696. [Google Scholar]
Zhang, Z.; Lan, C.; Zeng, W.; Jin, X.; Chen, Z. Relation-Aware Global Attention for Person Re-Identification. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Land, E.H. An alternative technique for the computation of the designator in the retinex theory of color vision. Proc. Natl. Acad. Sci. USA 1986, 83, 3078–3080. [Google Scholar] [CrossRef]
Al-Hashim, M.A.; Al-Ameen, Z. Retinex-Based Multiphase Algorithm for Low-Light Image Enhancement. Trait. Du Signal 2020, 37, 733–743. [Google Scholar] [CrossRef]
Li, P.; Tian, J.; Tang, Y.; Wang, G.; Wu, C. Deep Retinex Network for Single Image Dehazing. IEEE Trans. Image Process. 2021, 30, 1100–1115. [Google Scholar] [CrossRef]
Liang, Z.; Zhang, W.; Ruan, R.; Zhuang, P.; Li, C. GIFM: An Image Restoration Method With Generalized Image Formation Model for Poor Visible Conditions. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–16. [Google Scholar] [CrossRef]
Zhang, D.; Chen, S. Insulator Contamination Grade Recognition Using the Deep Learning of Color Information of Images. Energies 2021, 14, 6662. [Google Scholar] [CrossRef]
Li, Y.; Wang, Z. RGB Line Pattern-Based Stereo Vision Matching for Single-Shot 3-D Measurement. IEEE Trans. Instrum. Meas. 2021, 70, 1–13. [Google Scholar] [CrossRef]
Hu, J.; Jiang, Q.; Cong, R.; Gao, W.; Shao, F. Two-Branch Deep Neural Network for Underwater Image Enhancement in HSV Color Space. IEEE Signal Process. Lett. 2021, 28, 2152–2156. [Google Scholar] [CrossRef]

Figure 1. Slicing operation of Focus.

Figure 2. Structure of Yolov5s.

Figure 3. Global relation-aware attention.

Figure 4. Spatial relations global attention.

Figure 5. Channel relations global attention.

Figure 6. Improved Yolov5s backbone, where Resunit*n means there are n Resunit modules.

Figure 7. Comparison of data enhancement.

Figure 8. The confusion matrix of the model validation.

Figure 9. Visualization of the detection results. The first row shows the detection results of the original Yolov5s, and the second row shows the detection results of the algorithm in this paper, using the same data for comparison of detection results, showing a total of 6 groups, (a–f).

Figure 10. Image feature extraction visualization.

Figure 11. Loss curves of different algorithms.

Table 1. Test results of ablation experiment.

Methods	P/%	R/%	APe/%	APt/%	mAP@0.5/%
Yolov5s	87.1	85.7	87.3	79.1	83.2
Yolov5s + RGA-S	90.6	89.9	92.3	84.2	88.3
Yolov5s + RGA-C	91.9	87.1	91.8	86.1	90.0
Yolov5s + RGA-SC	93.6	90.3	93.9	87.6	90.8
Yolov5s + SSR Histogram	89.7	86.0	91.0	81.2	86.1
The proposed algorithm	93.5	89.7	94.1	89.0	91.6

Table 2. Visualization and comparison results of heat map for feature extraction before and after algorithm improvement.

	A	B	C	D	E	F
Image
Yolov5s
Improved algorithms

Table 3. Performance comparison of different algorithms.

Algorithms	P/%	R/%	FPS/s	mAP@0.5/%	Weight/MB
Yolov5s	87.2	84.3	190	86.7	14.8
Faster R-CNN	85.7	83.1	24	79.3	547.0
FSSD	91.3	88.0	82	90.4	128.8
SSD	88.6	77.5	90	89.1	126.7
Improved algorithms	94.0	89.9	185	93.0	15.0

Table 4. Detection results of different algorithms.

	Category 1	Category 2	Category 3	Category 4
Yolov5s
Faster R-CNN
FSSD
SSD
Improved algorithms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, J.; Lin, H.; Liu, Y.; Xiong, L.; Li, C.; Zhou, T.; Ma, M. Global Relation-Aware-Based Oil Detection Method for Water Surface of Catchment Wells in Hydropower Stations. Sustainability 2023, 15, 6966. https://doi.org/10.3390/su15086966

AMA Style

Liu J, Lin H, Liu Y, Xiong L, Li C, Zhou T, Ma M. Global Relation-Aware-Based Oil Detection Method for Water Surface of Catchment Wells in Hydropower Stations. Sustainability. 2023; 15(8):6966. https://doi.org/10.3390/su15086966

Chicago/Turabian Style

Liu, Jiajun, Haokun Lin, Yue Liu, Lei Xiong, Chenjing Li, Tinghu Zhou, and Mike Ma. 2023. "Global Relation-Aware-Based Oil Detection Method for Water Surface of Catchment Wells in Hydropower Stations" Sustainability 15, no. 8: 6966. https://doi.org/10.3390/su15086966

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Global Relation-Aware-Based Oil Detection Method for Water Surface of Catchment Wells in Hydropower Stations

Abstract

1. Introduction

2. Proposed Methods

2.1. Yolov5s

2.2. Global Relation-Aware-Based Yolov5s Algorithm

2.3. Single-Scale Retinex Histogram Equalization

3. Experimental Results and Discussion

3.1. Experimental Environments and Parameters

3.2. Experimental Data Sources

3.3. Results and Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI